Project

Compliance Lab

Multi-agent system that validates synthetic IT systems against NIST 800-53 controls. Per-agent identity, inter-agent authorization, hash-chained audit, human-in-the-loop containment.

First ship complete · April 2026

What it is

A two-agent system that takes a synthetic IT target and a set of NIST 800-53 controls, runs the controls against the target's configuration, and produces a findings report. When something fails, the system proposes a containment action and waits for a signed human decision before any change runs. Every action — agent-to-agent message, retrieval, containment proposal, human approval — is cryptographically signed and recorded in a tamper-evident audit log.

Why this exists

Agentic AI systems are deploying faster than the security frameworks that govern them. The standard pattern — trust the model, gate the inputs, log the outputs — doesn't survive contact with multi-agent systems where one agent's output is another agent's instruction. Identity has to live between agents, not just inside them.

Compliance Lab is the personal-time prototype where I work that out. It's deliberately scoped: small control corpus, fabricated targets, two agents. The goal isn't to build a product — it's to put real engineering pressure on the architectural questions that show up in NIST RFI responses, practitioner guides, and zero trust overlays. I write the policy and build the systems; this is one of the systems.

Strict IP firewall: public artifacts only, no firm material, separate naming, personal hardware, personal accounts. The problem space is public; this take on it is mine.

Architecture

Seven components. Two agents do the work; everything else exists to make the agent actions safe and auditable.

Compliance Lab architecture: validator and reporter agents communicate through a Policy Decision Point that gates access to the NIST 800-53 corpus, synthetic targets, audit log, and human approver.

Validator Agent

Runs control checks against the synthetic target's configuration. Reads control text from the NIST corpus via PDP-gated retrieval.

Reporter Agent

Summarizes validator findings into a human-readable report. When a check fails, proposes a containment action.

Policy Decision Point

Every agent action passes through here. Agents are scoped to specific actions; a denied request never reaches its target.

NIST 800-53 Corpus

20 paraphrased controls indexed in Qdrant via LlamaIndex. Validator queries by control ID; retrieval is RAG-grounded, not hardcoded.

Synthetic Targets

Fabricated system configurations with deliberate pass/fail conditions. No real assets — full IP firewall safety.

Audit Log

Hash-chained append-only log. Every entry is signed by the actor; tampering breaks the chain visibly.

Human Approver

Containment actions require an Ed25519-signed human decision. Approve or deny; the decision itself is logged.

How it works

PASS path — control check succeeds

PASS path sequence diagram: validator checks control, retrieves authoritative text from NIST corpus, evaluates target, hands off to reporter, reporter summarizes — no containment triggered.

The validator pulls control text via the PDP-gated NIST corpus, evaluates the target's configuration, and produces a finding. The reporter summarizes. Every message is signed and logged. No containment is needed; no human prompt fires.

Containment path — control check fails

Containment path sequence diagram: validator flags failure, reporter proposes containment, human is prompted to approve or deny, signed decision is logged, action only runs if approved.

When a check fails, the reporter doesn't act on its own. It proposes a specific containment action and waits. The human prompt fires; the human's signed approval (or denial) is recorded; only an approved action runs. The model proposes, the human authorizes, the audit log proves both.

What this demonstrates

Per-agent cryptographic identity

Each agent has its own Ed25519 keypair. Every action it takes carries its signature. No agent can act as another.

Inter-agent authorization

An explicit Policy Decision Point sits between agents and resources. Authorization is a layer, not a prompt instruction.

RAG-grounded controls

NIST 800-53 control text is retrieved at check time via LlamaIndex + Qdrant — never hardcoded, never assumed from training data.

Human-in-the-loop containment

High-consequence actions require a signed human decision before execution. The model proposes; the human authorizes.

Tamper-evident audit

Hash-chained append-only log with per-entry signatures. The log is the source of truth for what happened — and tampering with it leaves visible evidence.

Try the demo

An interactive replay of the workflow. Pick a path, step through manually or auto-play, approve or deny the containment proposal. Runs entirely in the browser — no backend, no install.

Compliance Lab dashboard showing workflow DAG, agent bar, phase detail, and audit trail.

Open the Demo →

Or run it locally

Full live mode with an actual local LLM (Ollama):

ollama serve
ollama pull llama3.2:3b
ollama pull nomic-embed-text
uv sync --group dev
uv run python scripts/demo.py

Source on GitHub.

Stack & scale

Stack

LangGraph AutoGen LlamaIndex Qdrant Ollama FastAPI Ed25519

Scale

76 tests 20 controls 4 slices 2 agents

What's not here

This is a research prototype, not a product. The control corpus is 20 paraphrased entries, not all 800+. The targets are fabricated synthetic configurations, not real production systems. The crew is two agents, not the larger ensemble that a production deployment would need. The IP firewall is the reason: public artifacts only, full separation from any client engagement. The architecture decisions (identity, authorization, audit, human-in-the-loop) generalize; the specific scope is intentionally small.