Specialist agents

Twelve agents, each with one job, a narrow tool whitelist, and a short role prompt. The single most important anti-hallucination lever.

Every agent lives as a markdown file in .claude/agents/ with YAML frontmatter declaring its model, tools, and color. Agents are spawned per-task with injected context; they don't see the full conversation, only what they need.

The catalog

Agent	Mantis role	Default model	Tools
`recon-agent`	DISCOVER · asset discovery, fingerprinting, JS extraction, nuclei	Sonnet	Bash, Read, Write, Glob, Grep
`triage-agent`	DISCOVER · Haiku-grade surface scorer (promote / defer / kill)	Haiku	Read, Write, Glob, Grep
`hunter-agent`	REASON + TEST · specialist hunter (webapp / api / identity / network per brief)	Opus	Bash, Read, Grep, Glob, MCP
`chain-builder`	REASON · A→B kill-chain analysis	Opus	Read, Write, Bash, MCP
`brutalist-verifier`	TEST round 1 · maximum skepticism	Opus	Bash, Read, MCP
`balanced-verifier`	TEST round 2 · catch false negatives	Opus	Bash, Read, MCP
`final-verifier`	TEST round 3 · fresh PoC confirmation	Opus	Bash, MCP
`grader`	LEARN · 5-axis scoring + SUBMIT/HOLD/SKIP	Sonnet	MCP
`report-writer`	LEARN · submission-ready report under 600 words	Sonnet	Write, MCP
`patch-writer`	LEARN · suggested code-level fix per finding (advisory)	Sonnet	Read, Write, MCP
`disclosure-sender`	LEARN · gated email send to verified security contact	Sonnet	Read, Write, Bash, Gmail MCP
`(orchestrator)`	FSM driver, never tests itself	Opus	Agent spawning, MCP, Bash (whitelisted)

The hunter's four specialist modes

The hunter-agent reads tech_stack from its brief and adopts the matching specialist persona before testing.

Specialist	Triggers (in `tech_stack`)	Focus
`webapp` (default)	next, react, vue, angular, wordpress, rails, django, laravel	OWASP Top 10, IDOR, SQLi, XSS, SSRF, business logic, file upload, auth flaws
`api`	graphql, rest-api, grpc, swagger, openapi, websocket	GraphQL introspection chains, REST IDOR/auth, gRPC reflection, WebSocket origin checks
`identity`	oauth, oidc, saml, sso, jwt, auth0, okta, keycloak	SAML XSW, OAuth flow flaws, JWT alg confusion, OIDC state reuse, SSO bypass
`network`	nmap, raw IPs (rare in HTTP scope)	Service enum, CVE correlation, only when nmap is in-scope

Why the separation matters

A single long-running agent will happily invent findings, inflate severity, and forget what it already tested. Splitting the work across narrow agents with tool whitelists is the single highest-leverage anti-hallucination lever in the system:

The hunter cannot write findings to disk directly; it must call mantis_record_finding, which validates the schema and assigns a canonical ID.
The grader cannot make HTTP requests; it only reads structured verification artifacts.
The report-writer cannot grade; it only renders the verdict the grader already issued.