Skip to content
Docs for briefcase-ai v3.3.0see what’s new.

Why Briefcase

AI systems don’t just produce text — they make decisions that trigger actions: routing a support ticket, approving a request, choosing a tool, escalating to a human. When one of those decisions is wrong, “the model did it” is not an answer anyone can act on.

Briefcase is infrastructure for governing those decisions. It sits around the decision points in your application and gives you three things that are otherwise impossible to reconstruct after the fact:

Controls before action

Evaluate whether an action is allowed before it runs — deny-by-default, composable, and side-effect-free.

Full context, captured

Every decision is recorded with its inputs, outputs, model parameters, evidence, and the data it depended on.

A record you can verify

Replay decisions, reconstruct exactly what was known at the time, and seal it into a tamper-evident bundle.

The questions Briefcase lets you answer

When a decision is challenged — by a teammate, an incident review, or a customer — you need to answer, precisely and after the fact:

  • What did the system decide, and what did it see? The inputs, outputs, and confidence behind the call.
  • What rule governed it? The exact policy version that was in effect at the decision’s moment — not today’s policy.
  • Did the controls run first? Proof that a guardrail evaluated the action before anything happened.
  • What did we know at the time? The evidence and external data as they were then — corrections appended, never overwritten.
  • Can we reproduce it? A deterministic replay that compares the original output against a fresh run.

How it works: five acts

Briefcase organizes around the lifecycle of a single decision. The rest of these docs follow the same five acts, and a single running example threads through all of them: a support-ticket triage agent. Each ticket it handles produces two decisions you’ll see throughout — it classifies the ticket (the classify_ticket call in most examples) and routes it to a queue. Both are decisions Briefcase captures, governs, and can replay.

graph LR
    A["Capture<br/>record inputs, outputs,<br/>context, evidence"] --> B["Control<br/>enforce guardrails &<br/>versioned policy"]
    B --> C["Store & Query<br/>durable, append-only,<br/>queryable trail"]
    C --> D["Replay & Verify<br/>re-run, compare,<br/>detect drift"]
    D --> E["Prove<br/>reconstruct as-of &<br/>seal an audit bundle"]
ActWhat you doKey building blocks
CaptureRecord every decision with full context@capture, DecisionSnapshot, exporters, PII sanitization
ControlEnforce controls before the action runsGuardrails, routing, versioned routing policy, validation
Store & QueryKeep a durable, queryable, append-only trailStorage adapters, bitemporal storage, external data, RAG versioning
Replay & VerifyRe-run and check decisions hold upDeterministic replay, drift detection, audit bundles
ProveReconstruct and verify after the factAs-of reconstruction, ExaminerBundle

Who Briefcase is for

Where it runs

Briefcase is an open-source Python SDK (with a Rust core) that wraps the decision points in code you already have. It is independent of model, vendor, and framework: bring your own LLM calls and storage. The base package is pip install briefcase-ai; optional capabilities are installed as extras.

Next steps