AGENTS.md is a project-level configuration file that provides AI coding assistants with complete project context, development principles, and operational guidelines. It follows the agents.md standard and works across Claude, Cursor, GitHub Copilot, and other AI tools.

What is Jimmy's Workflow?

Jimmy's Workflow is a four-phase validation system (PRE-FLIGHT, IMPLEMENT, VALIDATE, CHECKPOINT) designed to prevent AI hallucination and ensure robust implementation. Each phase has specific gates and confidence levels (HIGH/MEDIUM/LOW) that determine when human review is needed.

How do I set up multiple AI instances as a team?

Assign each AI instance a role card with identity, responsibilities, personality traits, and success criteria. Use file-based handoff protocols for coordination. Personality orchestration at the system prompt level produces measurably better results than treating AI instances as generic tools.

External Research Methodology

Using one AI system for everything creates blind spots. Different AI architectures have different training data, different strengths, and different failure modes. The external research methodology uses this to your advantage: research with one system, review with another, implement with a third.

The problem it solves

A single AI system reviewing its own work is a conflict of interest. The same biases that shaped the initial output also shape the review. Blind spots are invisible precisely because they are blind spots.

Single-system failure	Multi-system mitigation
Confirms its own assumptions	A different system challenges assumptions from a different angle
Misses the same edge cases consistently	Different training data surfaces different edge cases
Optimises for its own strengths	The reviewer has different strengths
Produces internally consistent but externally weak specs	External review breaks the consistency bubble
No adversarial pressure on reasoning	The reviewer’s job is to challenge, not agree

The two-pronged approach

Research AI → Draft Documentation → Review AI → Strengthened Docs → Implementation AI
                    ↑                                   |
                    └───────────────────────────────────┘
                         (iterate until approved)

Phase 1: Research

A conversational AI system handles deep thinking, strategy, and documentation production. This is where executive summaries, specifications, implementation plans, and research reports are created.

What the research phase produces:

Output	Purpose
Executive Summary	Architecture, key decisions, trade-offs
Specification	Interfaces, data structures, constraints
Implementation Plan	Test cases, ordered steps, validation criteria
Research Report	Findings, comparisons, recommendations

The research AI is chosen for depth of reasoning and quality of structured output. It operates in a conversational mode — the human guides the research, asks probing questions, and refines the output iteratively.

Phase 2: Review

A different AI system reviews the research output. The reviewer’s role is explicitly adversarial — its job is to find weaknesses, not to agree.

What the review phase checks:

Check	What it catches
Weak reasoning	Conclusions that don’t follow from evidence
Missing edge cases	Scenarios the research AI didn’t consider
Untested assumptions	Claims presented as facts without validation
Gaps in specification	Interfaces that are underspecified or ambiguous
Internal contradictions	Requirements that conflict with each other
Over-engineering	Complexity that isn’t justified by requirements
Under-specification	Areas where “it depends” needs to become a concrete decision

The reviewer uses a different AI architecture. Different training data means different blind spots. What one system misses, another is more likely to catch.

Phase 3: Implementation

A CLI-based AI coding tool implements from the reviewed documentation using Jimmy’s Workflow. The implementation AI never sees the original requirements directly — it works from the specification and implementation plan that survived review.

This separation is deliberate. The implementation AI is not influenced by the back-and-forth of the research phase. It receives clean, reviewed documentation and translates it into code.

Why this works

Different training data produces different blind spots. An AI system trained primarily on one corpus will have systematic gaps. A second system trained on a different corpus fills different gaps. The overlap of two imperfect systems is more complete than either alone.

The reviewer’s job is to challenge, not agree. This is explicitly stated in the review prompt. The reviewer is not asked “does this look good?” — it is asked “what is wrong with this?” The adversarial framing produces substantively different output.

Specs that survive review are specs worth building. If a specification has weak reasoning, missing edge cases, or untested assumptions, it is far cheaper to catch these in the documentation phase than in the implementation phase. A review that sends the spec back for revision saves hours of implementation rework.

It is a feedback loop, not a pipeline. Research and review iterate. The research AI revises based on review feedback. The reviewer re-evaluates the revision. This continues until the documentation meets the standard. Only then does implementation begin.

Review format

The reviewer produces a structured verdict:

## Review: [Document Title]

**Verdict**: APPROVED | APPROVED WITH ADVISORIES | REQUIRES REVISION | REJECTED

### Structural Strengths
- [What the document does well]
- [Sound reasoning or thorough coverage]

### Critical Advisories
- [Issues that must be addressed before implementation]
- [Missing edge cases or weak reasoning]

### Recommended Changes
1. [Specific, actionable change]
2. [Specific, actionable change]

### Minor Observations
- [Non-blocking suggestions for improvement]

Verdict definitions:

Verdict	Meaning	Action
APPROVED	Document is ready for implementation	Proceed to implementation phase
APPROVED WITH ADVISORIES	Document is usable but has minor gaps	Address advisories during implementation
REQUIRES REVISION	Significant issues found	Return to research phase, revise, re-submit for review
REJECTED	Fundamental problems with approach	Return to research phase, reconsider approach

How this connects to documentation-first workflow

The external research methodology is the engine that produces documentation for the documentation-first workflow:

External Research → Reviewed Documentation → Documentation-First Implementation

Stage	Who	Produces
Research	Conversational AI	Draft executive summaries, specs, plans
Review	Different AI	Verdict, advisories, required changes
Revision	Conversational AI	Strengthened documentation
Implementation	CLI AI tool	Working code from reviewed docs

The documentation that enters the implementation phase has been through at least two AI systems and human review. It is substantially stronger than documentation produced by a single system in a single pass.

When NOT to use this

The full two-pronged approach has overhead. Not every task justifies it.

Task type	Research methodology	Reasoning
New system or major feature	Full: research + review + implementation	High-stakes decisions benefit from adversarial review
Significant refactoring	Full or research + implementation	Depends on risk level
New feature following existing patterns	Research + implementation (skip review)	Pattern is already validated
Bug fix with clear root cause	Implementation only	No research needed
Configuration change	Implementation only	No design decisions
Exploratory prototyping	Research only (no review)	Speed matters more than correctness

The threshold: if the task involves architectural decisions, new patterns, or significant trade-offs, use the full two-pronged approach. If the task follows established patterns with no new decisions, skip the review.

Which AI for which role?

The Guides overview diagram names specific tools — here’s the reasoning behind those choices and how to adapt them to your own setup.

Role	Our choice	Why	Alternatives
Research	Claude (claude.ai)	Strong at structured output, long-form reasoning, specification writing	GPT-4, Gemini — any conversational AI with strong reasoning
Review	Gemini Pro	Different training data = different blind spots. Good at adversarial critique.	GPT-4, Claude (different model), DeepSeek — the key is different architecture
Implementation	Claude Code (CLI)	Executes from documentation, follows Jimmy’s Workflow gates	Cursor, GitHub Copilot, Aider — any AI coding tool that reads markdown

The principle matters more than the specific tools. The critical insight is: the system that researches should not be the same system that reviews. Use whatever models you have access to — the value comes from architectural diversity, not from any specific brand.

Setting up the review step: Paste your research output (executive summary, spec, or plan) into a different AI system with this framing: “Review this specification for weak reasoning, missing edge cases, and untested assumptions. Your job is to find problems, not to agree.” That’s it — no special setup needed. The adversarial framing does the work.

Practical tips

Keep research sessions focused. One topic per research session. “Design the authentication system” is a good session. “Design the authentication system and also figure out the database schema and the deployment strategy” produces weaker output in all three areas.

Give the reviewer explicit instructions. Do not simply paste the document and ask “what do you think?” Instead: “Review this specification for weak reasoning, missing edge cases, and untested assumptions. Your job is to find problems, not to agree.”

Track revisions. When the reviewer sends a document back for revision, note what changed and why. This creates a decision trail that the implementation AI can reference if questions arise.

Do not cherry-pick review feedback. If the reviewer raises an issue, address it — even if you disagree. Either revise the document or explicitly document why the feedback was considered and rejected. Ignoring review feedback defeats the purpose.