Reproducible RAG
rag external validate For: RAG & reproducibility
When the triage agent answers from a knowledge base, the answer is only as reproducible as the context behind it: which documents, which embedding model, which version of an upstream source. This guide makes a retrieval-augmented decision reconstructable — so “why did it say that?” has an answer.
pip install briefcase-ai[rag,external,validate]-
Version the embedding index
A
VersionedEmbeddingPipelinerecords which documents and model produced an index in an atomic manifest, so you can detect when it goes stale.from briefcase.rag import VersionedEmbeddingPipeline, Documentclass EmbeddingModel:def embed(self, texts):return [[0.1, 0.2, 0.3] for _ in texts]pipeline = VersionedEmbeddingPipeline(embedding_model=EmbeddingModel())documents = [Document(id="kb-1", content="Reset your password from settings.", metadata={"topic": "account"}),]batch = pipeline.create_embedding_batch(documents)manifest = pipeline.create_manifest("faq-index", [batch])print(manifest.index_name) -
Detect when the index is stale
When documents or the model change,
check_invalidationreports it — your cue to rebuild before serving.report = pipeline.check_invalidation("faq-index", documents)print(report.is_valid) # False once a document's content hash changes -
Snapshot the external data a decision read
Agents also read sources you do not control.
ExternalDataTrackerhashes each fetch, detects drift against the last snapshot, and appends corrections without mutating history.from briefcase.external import ExternalDataTracker, SnapshotPolicy, SnapshotFrequencytracker = ExternalDataTracker(default_policy=SnapshotPolicy(frequency=SnapshotFrequency.ON_CHANGE, retention_days=30),)result = tracker.track_api_call(api_name="product-catalog",endpoint="/products",method="GET",response_data={"items": [1, 2, 3]},record_count=3,)print(result["snapshot_id"], result["drift_detected"]) -
Validate references before the model runs
Before a prompt reaches a model, confirm its references actually resolve against a versioned knowledge base. You supply an extractor (finds references) and a resolver (checks them); the engine records the commit it validated against.
import refrom briefcase.validation import PromptValidationEnginefrom briefcase.validation.errors import ValidationError, ValidationErrorCodeclass RegexExtractor:_REF = re.compile(r"[\w/]+\.md")def extract(self, prompt: str) -> list:return self._REF.findall(prompt)class AllowlistResolver:def __init__(self, known): self._known = knowndef resolve_all(self, references):return [ValidationError(code=ValidationErrorCode.REFERENCE_NOT_FOUND,message=f"Reference not found: {ref}",reference=ref, severity="error", layer="resolution",remediation="Add the document to the knowledge base.",)for ref in references if ref not in self._known]class DemoLakeFS:def get_commit(self, repository: str, branch: str) -> str:return "demo0000000000000000000000000000000000000"engine = PromptValidationEngine(extractor=RegexExtractor(),resolver=AllowlistResolver({"kb/faq.md"}),lakefs_client=DemoLakeFS(),repository="knowledge-base",branch="main",mode="strict",)report = engine.validate("See kb/faq.md and kb/missing.md")print(report.status, report.references_checked, report.has_errors)