Skip to content
Docs for briefcase-ai v3.3.0see what’s new.

Observe AI in Production

base drift otel events For: engineers & operators

Once the triage agent is live, you need to see what it is doing — cheaply and continuously. This guide layers Briefcase’s observability tools onto the same classify_ticket function: emit records, track spend, detect drift, trace, and alert.

Terminal window
pip install briefcase-ai[drift,otel,events]
  1. Emit every decision in one line

    observe() wires up an exporter so captured decisions actually go somewhere. Use "console" in development, a .jsonl path for log shipping, or "memory" in tests.

    import briefcase
    briefcase.observe("decisions.jsonl") # append-only, thread-safe
    @briefcase.capture(decision_type="ticket-classification")
    def classify_ticket(text: str) -> str:
    # call your model here
    return "billing"
  2. Watch cost against a budget

    CostCalculator estimates per-call cost from token counts and checks spend against a limit. The cost types ship in the base package.

    from briefcase.cost import CostCalculator
    calc = CostCalculator()
    estimate = calc.estimate_cost("gpt-4o-mini", input_tokens=1000, output_tokens=200)
    print(estimate.total_cost, estimate.currency)
    budget = calc.check_budget(current_spend=85.0, budget_limit=100.0)
    print(budget.status, budget.alert_message) # e.g. "warning", "..."
  3. Measure drift across repeated runs

    Sample the same decision over time and ask how consistent it stays. A falling consistency_score is your signal that behavior is shifting.

    from briefcase.drift import DriftCalculator
    calc = DriftCalculator().with_similarity_threshold(0.9)
    metrics = calc.calculate_drift(["billing", "billing", "account", "billing"])
    print(metrics.consistency_score, metrics.agreement_rate)
    print(metrics.consensus_output, metrics.outliers)
  4. Trace alongside your existing telemetry

    get_tracer() returns a standard OpenTelemetry tracer. Spans describe the timeline of work; decision records carry the governance context — they are complementary and both flow to your collectors.

    from briefcase.otel import get_tracer
    tracer = get_tracer("briefcase")
    with tracer.start_as_current_span("classify_ticket"):
    classify_ticket("My invoice is wrong")
  5. Fire events when something looks off

    Turn signals into action. The emit helpers are coroutines — await them inside an async context — and are ideal for low-confidence outputs or detected drift.

    import asyncio
    from briefcase.events import emit_low_confidence, emit_drift_detected
    async def main():
    await emit_low_confidence({"id": "dec-1"}, confidence=0.4, threshold=0.7)
    await emit_drift_detected({"id": "dec-1"}, {"drift_score": 0.3})
    asyncio.run(main())

Where this fits