Supervisor → worker graph
┌──────────────┐
user message →│ supervisor │← hops counter (cap: 5)
└──┬────┬───┬──┘
│ │ │
attachment? │ │ │ guideline-trigger token?
▼ │ │ │ (and no evidence yet)
┌──────────────┐│ │ │
│ intake_ ││ │ ▼
│ extractor ││ │ ┌────────────────────┐
│ (vision) ││ │ │ evidence_retriever │
└──────┬───────┘│ │ │ (BM25+dense+rerank)│
│ │ │ └─────────┬──────────┘
│ │ │ │
└──→ supervisor (re-route) ←┘
│
▼ default
┌──────────────┐
│ answer node │ ← W1 orchestrator (verifier + retry)
└──────────────┘
Routing rules (deterministic, in evaluation order)
| Rule | Test | Decision | Rationale |
|---|
Recent supervisor decisions
In-memory ring buffer (last 20). Cleared on agent restart.
| When | Decision | Hops | Has attach | Has evidence | Message preview |
|---|
Live retrieval inspector
Type any clinical question and see what the hybrid retriever returns BEFORE the LLM sees it. Same retriever the agent uses on every chat turn.
Architecture
- BM25 over the corpus text — keyword recall layer, dependency-free, in-memory.
- Voyage voyage-3 for dense embeddings — semantic recall layer. Optional; degrades gracefully to BM25-only if VOYAGE_API_KEY is unset.
- Cohere Rerank 3 over the BM25 ∪ dense union — final precision layer. Optional; if missing, the retriever fuses upstream scores via reciprocal rank.
Per-category coverage
| Category | Cases | Target | Baseline |
|---|
Per-rubric baseline rate
| Rubric | Pass rate |
|---|
Clinical guideline corpus
Hand-curated. chunks across USPSTF, ADA, ACIP, ACC/AHA, CDC.
| Source | Year | Title | Chunk ID |
|---|
Selected chunk
Click a row above to inspect the full text.