AGENTS.md is a project-level configuration file that provides AI coding assistants with complete project context, development principles, and operational guidelines. It follows the agents.md standard and works across Claude, Cursor, GitHub Copilot, and other AI tools.

What is Jimmy's Workflow?

Jimmy's Workflow is a four-phase validation system (PRE-FLIGHT, IMPLEMENT, VALIDATE, CHECKPOINT) designed to prevent AI hallucination and ensure robust implementation. Each phase has specific gates and confidence levels (HIGH/MEDIUM/LOW) that determine when human review is needed.

How do I set up multiple AI instances as a team?

Assign each AI instance a role card with identity, responsibilities, personality traits, and success criteria. Use file-based handoff protocols for coordination. Personality orchestration at the system prompt level produces measurably better results than treating AI instances as generic tools.

Audit MAP Methodology

The Audit MAP Execution Patterns document contains operational knowledge from running 4 real source code audits across 4 risk domains (health, children’s education, financial, cryptographic). It fills the gap between designing an audit and executing one well.

Why this exists

Every audit surfaced execution problems that the design methodology didn’t anticipate:

Problem discovered	Which audit	Fix
22+ file paths wrong in pass 1	Crypto protocol audit	Recon-first session architecture
Context window exhausted on 8 lenses	Crypto protocol audit	Multi-pass execution
Validation treated as checkbox theater	Financial audit	Validation must quote evidence
Inherited findings couldn’t be re-verified	Crypto + Financial	Carry-forward confidence penalties
Combined findings worse than individual	Financial audit	Attack chain analysis
Context compaction lost earlier reasoning	Crypto + Financial	Compaction recovery protocol
All confidence levels clustered at HIGH	Multiple audits	Confidence calibration field rules
Severity assumed public deployment	Financial audit	Deployment context modifiers

Recon-first session architecture

Before any audit pass executes, Session 0 (recon) reads the actual codebase and updates the pass MAPs in place with verified information.

Session 0: RECON
  ├── Discover available tools (MCP servers, codebase access)
  ├── Read actual file structure
  ├── Get real dependency versions from lockfiles
  ├── Verify each file path in pass MAPs
  ├── UPDATE pass MAPs in place with correct details
  └── Output: pass MAPs are verified, not templates

Recon does NOT execute lenses, file findings, or make audit judgments. It’s infrastructure.

Why separate: If recon and Pass 1 share a session, path discovery consumes context that Pass 1 needs for deep analysis.

Multi-pass execution

Lens count	Decision
1-5	Single pass
6	Consider multi-pass
7+	Multi-pass recommended
8+	Multi-pass mandatory

Each pass is self-contained with its own pre-flight, Finding Contract, validation, and checkpoint. The agent executing Pass 2 reads the Pass 1 checkpoint (optionally), not the Pass 1 MAP.

Session 0: Recon (verify paths, discover tools)
Session 1: Pass 1 (execute lenses, write checkpoint)
Session 2: Pass 2 (execute lenses, write checkpoint)
Session N: Synthesis (read all checkpoints, produce final report)

Pass grouping principles

Group tightly-coupled lenses (crypto + zero-knowledge share source files)
Separate distinct attack surfaces
Put regression checks in the last pass
2-3 lenses per pass ideal, 4 maximum

Finding Contract

Every finding must include these 10 fields:

Field	Requirement
`id`	Unique (e.g., SEC-001)
`lens`	Which lens produced it
`decision`	What is wrong
`severity`	CRITICAL / HIGH / MEDIUM / LOW
`confidence`	HIGH / MEDIUM / LOW
`reasoning`	Minimum 2 points
`alternatives_rejected`	Minimum 1
`weaknesses_acknowledged`	Minimum 1
`evidence`	File, line, code snippet
`remediation_hint`	Direction for fix

See also: JSON Sidecar Pattern for machine-readable output of findings.

Carry-forward protocol

When a later pass references findings from an earlier pass:

Original confidence	Inherited without re-verification
HIGH	→ MEDIUM
MEDIUM	→ LOW
LOW	→ stays LOW

Tag inherited findings: “Inherited from Pass N — not re-verified in this session.”

Never copy finding text as evidence. If carrying forward, the evidence must be freshly gathered.

Validation rigor

Validation must quote evidence, not tick checkboxes.

Bad (theater):

All findings have >=2 reasoning points — PASS

Good (actual gate):

All findings have >=2 reasoning points — PASS
  Spot-checked: SEC-003 has 3 reasoning points.
  VAULT-001 has 2 (minimum). CHAIN-002 has 4.
  Lowest count: 2. Contract met.

Hard gates

Condition	Action
All findings same confidence level	STOP — recalibrate
HIGH confidence > 50%	Suspicious — re-examine each
Zero positive observations	Suspicious — look for what’s done right too
Dynamic tests available but not run	Quality gap — must note

Confidence calibration field rules

Domain knowledge gaps cap at MEDIUM — If you lack domain knowledge for a technology, findings cap at MEDIUM
Depth budget scales with priority — CRITICAL lenses must exhaust verification paths before settling on MEDIUM
Dual verification for HIGH — Requires both code evidence AND API/behaviour verification
Dynamic tests upgrade confidence — grep, cargo test, npm audit results can upgrade specific findings

Deployment context modifiers

Deployment	Modifier
Public internet, production	No modifier (baseline)
Private network	Consider -1 for network-exposure findings
Preprod / staging	Consider -1 for operational findings
Air-gapped	-1 for all network-exposure findings
Multi-tenant production	Consider +1 for isolation findings

Never apply downward modifiers to: auth bypass, crypto correctness, data integrity, design flaws.

Attack chain analysis

After all lenses complete, scan for combinations:

Pattern	Example
Auth bypass + privileged action	Anyone can trigger destructive operations
Input validation gap + dangerous sink	Injection vulnerability
Key exposure + encrypted data	Plaintext recovery
State inconsistency + financial action	Double-spend

Attack chains go in synthesis. They reference findings by ID — individual findings don’t change.

Domain severity uplift tables

The methodology includes severity uplift tables for:

Children’s systems — COPPA, parental consent, audio recording
Financial systems — Payment integrity, blockchain irrecoverability, webhook trust
Cryptographic protocols — Nonce reuse, key material, timing side-channels, human cryptographer required
Health systems — GDPR special category, consent, biometric data

Full reference

The complete execution patterns document (800+ lines with all tables, examples, and the quality checklist for MAP designers) is at audit-map-execution-patterns.md.