Multi-Step Evidence

The contract that makes Mantis say "evidence, not alerts". Three adversarial rounds; a finding ships only if every round produced a fresh, reproducible PoC.

STRIKE phase

Why three rounds, not one

A single verifier has a bias. Set it to "be skeptical" and it kills real findings. Set it to "be permissive" and hallucinated findings get through. Three rounds with deliberately different postures triangulate the truth:

The contract

A finding earns SUBMIT in the grader only if:

  1. The final-round verification result has reportable: true.
  2. The PoC field includes both request and response evidence.
  3. Severity from final round matches or exceeds the original recording.

The grader has a pre-grading evidence gate that drops proof_quality to ≤ 5 if any of these are missing. That makes total ≥ 40 (SUBMIT threshold) basically impossible without real evidence.

The three rounds in detail

Round 1: brutalist

Posture: "This isn't real, prove me wrong."

The brutalist re-runs every PoC with fresh HTTP requests. It validates:

Output: brutalist.json with { finding_id, disposition, reportable, reasoning } for each finding.

Round 2: balanced

Posture: "Did the brutalist over-reject anything real?"

Reads the brutalist's output, looks at the original finding, and decides whether to resurrect over-rejected findings. Catches the systematic-skepticism failure mode of round 1.

The balanced verifier must include every finding the brutalist evaluated, even if just to pass through unchanged. The MCP mantis_write_verification_round tool enforces this server-side: a round-2 write that drops findings from round 1 hard-fails.

Round 3: final

Posture: "Fresh context, fresh requests. Same answer?"

Only re-runs the findings that survived round 2 with reportable: true. Fresh HTTP requests, fresh context window. If round 3 confirms, the finding is eligible for grading.

How evidence is stored

Every round writes to its own JSON in the session directory:

~/mantis-sessions/<domain>/
├── brutalist.json
├── balanced.json
└── verified-final.json

Plus markdown mirrors (brutalist.md, etc.) for human inspection. The MCP server owns the JSONs; markdown is debug-only.

What this catches