Specialist agents
Twelve agents, each with one job, a narrow tool whitelist, and a short role prompt. The single most important anti-hallucination lever.
Every agent lives as a markdown file in .claude/agents/ with YAML frontmatter declaring its model, tools, and color. Agents are spawned per-task with injected context; they don't see the full conversation, only what they need.
The catalog
| Agent | Mantis role | Default model | Tools |
|---|---|---|---|
recon-agent | DISCOVER · asset discovery, fingerprinting, JS extraction, nuclei | Sonnet | Bash, Read, Write, Glob, Grep |
triage-agent | DISCOVER · Haiku-grade surface scorer (promote / defer / kill) | Haiku | Read, Write, Glob, Grep |
hunter-agent | REASON + TEST · specialist hunter (webapp / api / identity / network per brief) | Opus | Bash, Read, Grep, Glob, MCP |
chain-builder | REASON · A→B kill-chain analysis | Opus | Read, Write, Bash, MCP |
brutalist-verifier | TEST round 1 · maximum skepticism | Opus | Bash, Read, MCP |
balanced-verifier | TEST round 2 · catch false negatives | Opus | Bash, Read, MCP |
final-verifier | TEST round 3 · fresh PoC confirmation | Opus | Bash, MCP |
grader | LEARN · 5-axis scoring + SUBMIT/HOLD/SKIP | Sonnet | MCP |
report-writer | LEARN · submission-ready report under 600 words | Sonnet | Write, MCP |
patch-writer | LEARN · suggested code-level fix per finding (advisory) | Sonnet | Read, Write, MCP |
disclosure-sender | LEARN · gated email send to verified security contact | Sonnet | Read, Write, Bash, Gmail MCP |
(orchestrator) | FSM driver, never tests itself | Opus | Agent spawning, MCP, Bash (whitelisted) |
The hunter's four specialist modes
The hunter-agent reads tech_stack from its brief and adopts the matching specialist persona before testing.
| Specialist | Triggers (in tech_stack) | Focus |
|---|---|---|
webapp (default) | next, react, vue, angular, wordpress, rails, django, laravel | OWASP Top 10, IDOR, SQLi, XSS, SSRF, business logic, file upload, auth flaws |
api | graphql, rest-api, grpc, swagger, openapi, websocket | GraphQL introspection chains, REST IDOR/auth, gRPC reflection, WebSocket origin checks |
identity | oauth, oidc, saml, sso, jwt, auth0, okta, keycloak | SAML XSW, OAuth flow flaws, JWT alg confusion, OIDC state reuse, SSO bypass |
network | nmap, raw IPs (rare in HTTP scope) | Service enum, CVE correlation, only when nmap is in-scope |
Why the separation matters
A single long-running agent will happily invent findings, inflate severity, and forget what it already tested. Splitting the work across narrow agents with tool whitelists is the single highest-leverage anti-hallucination lever in the system:
- The hunter cannot write findings to disk directly; it must call
mantis_record_finding, which validates the schema and assigns a canonical ID. - The grader cannot make HTTP requests; it only reads structured verification artifacts.
- The report-writer cannot grade; it only renders the verdict the grader already issued.