…
loading
Coverage by attack category
Attempts and verdict distribution per category across every
recorded campaign. The Orchestrator reads this exact data to
prioritize the next campaign.
| Category | Attempts | Verdicts | Verdict mix | Partial rate |
|---|---|---|---|---|
| loading… | ||||
Live findings
Confirmed vulnerabilities with no critical/high-severity
flag — auto-promoted by the Documentation Agent or
human-authored. The W2 eval gate enforces a regression case
for each via the loaded JSON sidecar.
| ID | Severity | Category | Title | Status |
|---|---|---|---|---|
| loading… | ||||
Pending human approval
Critical/high-severity findings auto-gated by the
Documentation Agent. NOT enforced by the W2 eval gate
until a human reviewer moves them out of
_pending/.
The vuln-pipeline trust gate caught these — that's the
gate doing its job.
| ID | Severity | Category | Title | Status |
|---|---|---|---|---|
| loading… | ||||
Recent campaigns
Newest first. Each row is one Red Team campaign + the
Orchestrator's rationale for choosing it (when LLM-planned).
Verdict counts show how the target's defenses fared.
| Ran at | Category | Attempts | Verdicts | Orchestrator rationale |
|---|---|---|---|---|
| loading… | ||||
Attempts per day (last 14)
Bar height = total attempts that day. Color split = verdict
distribution. Quick sanity-check that the platform is
actually running campaigns over time.
| Day | Attempts | Verdicts | Bar |
|---|---|---|---|
| loading… | |||
Orchestrator decisions —
Each Orchestrator decision is itself a signal — the
used_live_signals column is the proof that the
decision consulted the in-flight verdict-distribution
shifts (the live monitor's recent_shifts())
rather than just a static disk snapshot. The shifts the
Orchestrator actually saw are listed in
shifts_consulted.
| When | Category | Used live signals? | Shifts consulted | Rationale |
|---|---|---|---|---|
no signal stream yet — run python -m redteam.run_campaign --orchestrate N to populate | ||||
Judge drift —
Mean LLM-Judge confidence per category (deterministic-
shortcut verdicts at confidence 1.0 are excluded — they
would mask actual Haiku calibration drift). The drift
signals list below shows any category whose mean
confidence moved past the threshold across an
Orchestrator round; an empty list means Haiku's
calibration held steady within the run.
| Category | LLM-decided samples | Mean confidence | Verdict mix |
|---|---|---|---|
| no LLM verdicts yet | |||
| Detected at | Category | Direction | Detail |
|---|---|---|---|
| no drift signals in this run | |||
Replay-on-deploy —
A
deploy.fired signal triggers the
ReplaySubscriber which fires every confirmed regression
case (from evals/w2/adversarial_findings/)
against the just-deployed target and reports per-case
pass/fail. Today the trigger is the
POST /adversarial/admin/replay endpoint;
Railway webhook integration lands once the W4 v4 wiring
is in place. Empty state below = no replay has been
invoked yet on the current build.
| Trigger | Target | Cases | Result | Elapsed |
|---|---|---|---|---|
| no replay has been invoked yet | ||||
| Case | Vuln | Status | Passed | Rubrics |
|---|---|---|---|---|
| no per-case results | ||||
Auto-promotions —
Findings the AutoPromoteSubscriber moved from
_pending/ to live in response to a green
replay (failed_count + error_count == 0). Critical-severity
findings always stay pending to preserve the trust gate's
human-review requirement. Skipped rows below show what was
considered but bypassed and why.
| Promoted at | Vuln | Severity | Sidecar move | Markdown move |
|---|---|---|---|---|
| no auto-promotions in this run | ||||
| Skipped at | Vuln | Severity | Reason |
|---|---|---|---|
| no skipped promotions | |||
Event timeline tail of most-recent run
Live stream of every event the campaign runner publishes
to the SignalBus —
campaign.started →
attempt.recorded →
verdict.delivered →
orchestrator.decided →
campaign.ended. Persisted to
signals.jsonl in the run directory; this
view tails the most recent. Auto-refreshes every 8s.
| When | Event | Category | Details |
|---|---|---|---|
| no events yet | |||