AgentForge — System Visibility

Supervisor → worker graph

┌──────────────┐ user message →│ supervisor │← hops counter (cap: 5) └──┬────┬───┬──┘ │ │ │ attachment? │ │ │ guideline-trigger token? ▼ │ │ │ (and no evidence yet) ┌──────────────┐│ │ │ │ intake_ ││ │ ▼ │ extractor ││ │ ┌────────────────────┐ │ (vision) ││ │ │ evidence_retriever │ └──────┬───────┘│ │ │ (BM25+dense+rerank)│ │ │ │ └─────────┬──────────┘ │ │ │ │ └──→ supervisor (re-route) ←┘ │ ▼ default ┌──────────────┐ │ answer node │ ← W1 orchestrator (verifier + retry) └──────────────┘

Routing rules (deterministic, in evaluation order)

Rule	Test	Decision	Rationale

Recent supervisor decisions

In-memory ring buffer (last 20). Cleared on agent restart.

When	Decision	Hops	Has attach	Has evidence	Message preview

Live retrieval inspector

Type any clinical question and see what the hybrid retriever returns BEFORE the LLM sees it. Same retriever the agent uses on every chat turn.

Architecture

BM25 over the corpus text — keyword recall layer, dependency-free, in-memory.
Voyage voyage-3 for dense embeddings — semantic recall layer. Optional; degrades gracefully to BM25-only if VOYAGE_API_KEY is unset.
Cohere Rerank 3 over the BM25 ∪ dense union — final precision layer. Optional; if missing, the retriever fuses upstream scores via reciprocal rank.

Per-category coverage

Category	Cases	Target	Baseline

Per-rubric baseline rate

Rubric	Pass rate

Clinical guideline corpus

Hand-curated. chunks across USPSTF, ADA, ACIP, ACC/AHA, CDC.

Source	Year	Title	Chunk ID

Selected chunk

Click a row above to inspect the full text.