AgentForge — Adversarial Platform

Live state of the W3 multi-agent adversarial AI security platform. Read-only operator view of coverage, the vuln pipeline, and recent campaigns. Companion to /visibility (the W2 Co-Pilot operator view).
loading

Coverage by attack category

Attempts and verdict distribution per category across every recorded campaign. The Orchestrator reads this exact data to prioritize the next campaign.
Category Attempts Verdicts Verdict mix Partial rate
loading…

Live findings

Confirmed vulnerabilities with no critical/high-severity flag — auto-promoted by the Documentation Agent or human-authored. The W2 eval gate enforces a regression case for each via the loaded JSON sidecar.
IDSeverityCategoryTitleStatus
loading…

Pending human approval

Critical/high-severity findings auto-gated by the Documentation Agent. NOT enforced by the W2 eval gate until a human reviewer moves them out of _pending/. The vuln-pipeline trust gate caught these — that's the gate doing its job.
IDSeverityCategoryTitleStatus
loading…

Recent campaigns

Newest first. Each row is one Red Team campaign + the Orchestrator's rationale for choosing it (when LLM-planned). Verdict counts show how the target's defenses fared.
Ran at Category Attempts Verdicts Orchestrator rationale
loading…

Attempts per day (last 14)

Bar height = total attempts that day. Color split = verdict distribution. Quick sanity-check that the platform is actually running campaigns over time.
DayAttemptsVerdictsBar
loading…

Orchestrator decisions

Each Orchestrator decision is itself a signal — the used_live_signals column is the proof that the decision consulted the in-flight verdict-distribution shifts (the live monitor's recent_shifts()) rather than just a static disk snapshot. The shifts the Orchestrator actually saw are listed in shifts_consulted.
When Category Used live signals? Shifts consulted Rationale
no signal stream yet — run python -m redteam.run_campaign --orchestrate N to populate

Judge drift

Mean LLM-Judge confidence per category (deterministic- shortcut verdicts at confidence 1.0 are excluded — they would mask actual Haiku calibration drift). The drift signals list below shows any category whose mean confidence moved past the threshold across an Orchestrator round; an empty list means Haiku's calibration held steady within the run.
Category LLM-decided samples Mean confidence Verdict mix
no LLM verdicts yet
Detected at Category Direction Detail
no drift signals in this run

Replay-on-deploy

A deploy.fired signal triggers the ReplaySubscriber which fires every confirmed regression case (from evals/w2/adversarial_findings/) against the just-deployed target and reports per-case pass/fail. Today the trigger is the POST /adversarial/admin/replay endpoint; Railway webhook integration lands once the W4 v4 wiring is in place. Empty state below = no replay has been invoked yet on the current build.
Trigger Target Cases Result Elapsed
no replay has been invoked yet
Case Vuln Status Passed Rubrics
no per-case results

Auto-promotions

Findings the AutoPromoteSubscriber moved from _pending/ to live in response to a green replay (failed_count + error_count == 0). Critical-severity findings always stay pending to preserve the trust gate's human-review requirement. Skipped rows below show what was considered but bypassed and why.
Promoted at Vuln Severity Sidecar move Markdown move
no auto-promotions in this run
Skipped at Vuln Severity Reason
no skipped promotions

Event timeline tail of most-recent run

Live stream of every event the campaign runner publishes to the SignalBus — campaign.startedattempt.recordedverdict.deliveredorchestrator.decidedcampaign.ended. Persisted to signals.jsonl in the run directory; this view tails the most recent. Auto-refreshes every 8s.
When Event Category Details
no events yet