Observe AI in Production

base drift otel events For: engineers & operators

Once the triage agent is live, you need to see what it is doing — cheaply and continuously. This guide layers Briefcase’s observability tools onto the same classify_ticket function: emit records, track spend, detect drift, trace, and alert.

pip install briefcase-ai[drift,otel,events]

Emit every decision in one line

observe() wires up an exporter so captured decisions actually go somewhere. Use "console" in development, a .jsonl path for log shipping, or "memory" in tests.

import briefcase

briefcase.observe("decisions.jsonl")   # append-only, thread-safe

@briefcase.capture(decision_type="ticket-classification")
def classify_ticket(text: str) -> str:
    # call your model here
    return "billing"

Watch cost against a budget

CostCalculator estimates per-call cost from token counts and checks spend against a limit. The cost types ship in the base package.

from briefcase.cost import CostCalculator

calc = CostCalculator()
estimate = calc.estimate_cost("gpt-4o-mini", input_tokens=1000, output_tokens=200)
print(estimate.total_cost, estimate.currency)

budget = calc.check_budget(current_spend=85.0, budget_limit=100.0)
print(budget.status, budget.alert_message)   # e.g. "warning", "..."

Measure drift across repeated runs

Sample the same decision over time and ask how consistent it stays. A falling consistency_score is your signal that behavior is shifting.

from briefcase.drift import DriftCalculator

calc = DriftCalculator().with_similarity_threshold(0.9)
metrics = calc.calculate_drift(["billing", "billing", "account", "billing"])

print(metrics.consistency_score, metrics.agreement_rate)
print(metrics.consensus_output, metrics.outliers)

Trace alongside your existing telemetry

get_tracer() returns a standard OpenTelemetry tracer. Spans describe the timeline of work; decision records carry the governance context — they are complementary and both flow to your collectors.
```
from briefcase.otel import get_tracer

tracer = get_tracer("briefcase")
with tracer.start_as_current_span("classify_ticket"):
    classify_ticket("My invoice is wrong")
```

Fire events when something looks off

Turn signals into action. The emit helpers are coroutines — await them inside an async context — and are ideal for low-confidence outputs or detected drift.

import asyncio
from briefcase.events import emit_low_confidence, emit_drift_detected

async def main():
    await emit_low_confidence({"id": "dec-1"}, confidence=0.4, threshold=0.7)
    await emit_drift_detected({"id": "dec-1"}, {"drift_score": 0.3})

asyncio.run(main())

Where this fits

Exporters Stock and custom exporters, and the record shape they emit.

Cost Tracking Estimates, model comparison, projections, and budgets.

Drift Detection What the drift metrics mean and how to tune them.

Multi-Agent & Events Correlate decisions across a workflow and emit events.