Multi-Step Evidence
The contract that makes Mantis say "evidence, not alerts". Three adversarial rounds; a finding ships only if every round produced a fresh, reproducible PoC.
Why three rounds, not one
A single verifier has a bias. Set it to "be skeptical" and it kills real findings. Set it to "be permissive" and hallucinated findings get through. Three rounds with deliberately different postures triangulate the truth:
- Round 1 (brutalist) over-corrects toward false negatives. It rejects everything it can.
- Round 2 (balanced) looks specifically at what the brutalist killed, and resurrects findings the brutalist over-rejected.
- Round 3 (final) does fresh HTTP requests with fresh context on only the survivors. Same evidence on a fresh canvas = ship it.
The contract
A finding earns SUBMIT in the grader only if:
- The final-round verification result has
reportable: true. - The PoC field includes both request and response evidence.
- Severity from final round matches or exceeds the original recording.
The grader has a pre-grading evidence gate that drops proof_quality to ≤ 5 if any of these are missing. That makes total ≥ 40 (SUBMIT threshold) basically impossible without real evidence.
The three rounds in detail
Round 1: brutalist
Posture: "This isn't real, prove me wrong."
The brutalist re-runs every PoC with fresh HTTP requests. It validates:
- Does the exact request produce the exact response the hunter recorded?
- Is the response evidence specific enough to constitute proof?
- Could this be a false positive (CDN cache, debug page, public data)?
Output: brutalist.json with { finding_id, disposition, reportable, reasoning } for each finding.
Round 2: balanced
Posture: "Did the brutalist over-reject anything real?"
Reads the brutalist's output, looks at the original finding, and decides whether to resurrect over-rejected findings. Catches the systematic-skepticism failure mode of round 1.
The balanced verifier must include every finding the brutalist evaluated, even if just to pass through unchanged. The MCP mantis_write_verification_round tool enforces this server-side: a round-2 write that drops findings from round 1 hard-fails.
Round 3: final
Posture: "Fresh context, fresh requests. Same answer?"
Only re-runs the findings that survived round 2 with reportable: true. Fresh HTTP requests, fresh context window. If round 3 confirms, the finding is eligible for grading.
How evidence is stored
Every round writes to its own JSON in the session directory:
~/mantis-sessions/<domain>/
├── brutalist.json
├── balanced.json
└── verified-final.json
Plus markdown mirrors (brutalist.md, etc.) for human inspection. The MCP server owns the JSONs; markdown is debug-only.
What this catches
- Hallucinated findings: the hunter "found" SQLi but the response was actually a generic 500 page. Brutalist kills it.
- Severity inflation: the hunter claimed critical for an info leak that's actually public data. Balanced corrects the severity.
- Flaky timing: the response that worked at HUNT time has changed. Final re-runs; if it still works, ship it.
- WAF-cached responses: a cached 200 looks like a successful injection. Fresh request reveals the real 403.