Multi-Step Evidence

The contract that makes Mantis say "evidence, not alerts". Three adversarial rounds; a finding ships only if every round produced a fresh, reproducible PoC.

Why three rounds, not one

A single verifier has a bias. Set it to "be skeptical" and it kills real findings. Set it to "be permissive" and hallucinated findings get through. Three rounds with deliberately different postures triangulate the truth:

Round 1 (brutalist) over-corrects toward false negatives. It rejects everything it can.
Round 2 (balanced) looks specifically at what the brutalist killed, and resurrects findings the brutalist over-rejected.
Round 3 (final) does fresh HTTP requests with fresh context on only the survivors. Same evidence on a fresh canvas = ship it.

The contract

A finding earns SUBMIT in the grader only if:

The final-round verification result has reportable: true.
The PoC field includes both request and response evidence.
Severity from final round matches or exceeds the original recording.

The grader has a pre-grading evidence gate that drops proof_quality to ≤ 5 if any of these are missing. That makes total ≥ 40 (SUBMIT threshold) basically impossible without real evidence.

The three rounds in detail

Round 1: brutalist

Posture: "This isn't real, prove me wrong."

The brutalist re-runs every PoC with fresh HTTP requests. It validates:

Does the exact request produce the exact response the hunter recorded?
Is the response evidence specific enough to constitute proof?
Could this be a false positive (CDN cache, debug page, public data)?

Output: brutalist.json with { finding_id, disposition, reportable, reasoning } for each finding.

Round 2: balanced

Posture: "Did the brutalist over-reject anything real?"

Reads the brutalist's output, looks at the original finding, and decides whether to resurrect over-rejected findings. Catches the systematic-skepticism failure mode of round 1.

The balanced verifier must include every finding the brutalist evaluated, even if just to pass through unchanged. The MCP mantis_write_verification_round tool enforces this server-side: a round-2 write that drops findings from round 1 hard-fails.

Round 3: final

Posture: "Fresh context, fresh requests. Same answer?"

Only re-runs the findings that survived round 2 with reportable: true. Fresh HTTP requests, fresh context window. If round 3 confirms, the finding is eligible for grading.

How evidence is stored

Every round writes to its own JSON in the session directory:

~/mantis-sessions/<domain>/
├── brutalist.json
├── balanced.json
└── verified-final.json

Plus markdown mirrors (brutalist.md, etc.) for human inspection. The MCP server owns the JSONs; markdown is debug-only.

What this catches

Hallucinated findings: the hunter "found" SQLi but the response was actually a generic 500 page. Brutalist kills it.
Severity inflation: the hunter claimed critical for an info leak that's actually public data. Balanced corrects the severity.
Flaky timing: the response that worked at HUNT time has changed. Final re-runs; if it still works, ship it.
WAF-cached responses: a cached 200 looks like a successful injection. Fresh request reveals the real 403.