Skip to content

External Research Methodology

Using one AI system for everything creates blind spots. Different AI architectures have different training data, different strengths, and different failure modes. The external research methodology uses this to your advantage: research with one system, review with another, implement with a third.

A single AI system reviewing its own work is a conflict of interest. The same biases that shaped the initial output also shape the review. Blind spots are invisible precisely because they are blind spots.

Single-system failureMulti-system mitigation
Confirms its own assumptionsA different system challenges assumptions from a different angle
Misses the same edge cases consistentlyDifferent training data surfaces different edge cases
Optimises for its own strengthsThe reviewer has different strengths
Produces internally consistent but externally weak specsExternal review breaks the consistency bubble
No adversarial pressure on reasoningThe reviewer’s job is to challenge, not agree
Research AI → Draft Documentation → Review AI → Strengthened Docs → Implementation AI
↑ |
└───────────────────────────────────┘
(iterate until approved)

A conversational AI system handles deep thinking, strategy, and documentation production. This is where executive summaries, specifications, implementation plans, and research reports are created.

What the research phase produces:

OutputPurpose
Executive SummaryArchitecture, key decisions, trade-offs
SpecificationInterfaces, data structures, constraints
Implementation PlanTest cases, ordered steps, validation criteria
Research ReportFindings, comparisons, recommendations

The research AI is chosen for depth of reasoning and quality of structured output. It operates in a conversational mode — the human guides the research, asks probing questions, and refines the output iteratively.

A different AI system reviews the research output. The reviewer’s role is explicitly adversarial — its job is to find weaknesses, not to agree.

What the review phase checks:

CheckWhat it catches
Weak reasoningConclusions that don’t follow from evidence
Missing edge casesScenarios the research AI didn’t consider
Untested assumptionsClaims presented as facts without validation
Gaps in specificationInterfaces that are underspecified or ambiguous
Internal contradictionsRequirements that conflict with each other
Over-engineeringComplexity that isn’t justified by requirements
Under-specificationAreas where “it depends” needs to become a concrete decision

The reviewer uses a different AI architecture. Different training data means different blind spots. What one system misses, another is more likely to catch.

A CLI-based AI coding tool implements from the reviewed documentation using Jimmy’s Workflow. The implementation AI never sees the original requirements directly — it works from the specification and implementation plan that survived review.

This separation is deliberate. The implementation AI is not influenced by the back-and-forth of the research phase. It receives clean, reviewed documentation and translates it into code.

Different training data produces different blind spots. An AI system trained primarily on one corpus will have systematic gaps. A second system trained on a different corpus fills different gaps. The overlap of two imperfect systems is more complete than either alone.

The reviewer’s job is to challenge, not agree. This is explicitly stated in the review prompt. The reviewer is not asked “does this look good?” — it is asked “what is wrong with this?” The adversarial framing produces substantively different output.

Specs that survive review are specs worth building. If a specification has weak reasoning, missing edge cases, or untested assumptions, it is far cheaper to catch these in the documentation phase than in the implementation phase. A review that sends the spec back for revision saves hours of implementation rework.

It is a feedback loop, not a pipeline. Research and review iterate. The research AI revises based on review feedback. The reviewer re-evaluates the revision. This continues until the documentation meets the standard. Only then does implementation begin.

The reviewer produces a structured verdict:

## Review: [Document Title]
**Verdict**: APPROVED | APPROVED WITH ADVISORIES | REQUIRES REVISION | REJECTED
### Structural Strengths
- [What the document does well]
- [Sound reasoning or thorough coverage]
### Critical Advisories
- [Issues that must be addressed before implementation]
- [Missing edge cases or weak reasoning]
### Recommended Changes
1. [Specific, actionable change]
2. [Specific, actionable change]
### Minor Observations
- [Non-blocking suggestions for improvement]

Verdict definitions:

VerdictMeaningAction
APPROVEDDocument is ready for implementationProceed to implementation phase
APPROVED WITH ADVISORIESDocument is usable but has minor gapsAddress advisories during implementation
REQUIRES REVISIONSignificant issues foundReturn to research phase, revise, re-submit for review
REJECTEDFundamental problems with approachReturn to research phase, reconsider approach

How this connects to documentation-first workflow

Section titled “How this connects to documentation-first workflow”

The external research methodology is the engine that produces documentation for the documentation-first workflow:

External Research → Reviewed Documentation → Documentation-First Implementation
StageWhoProduces
ResearchConversational AIDraft executive summaries, specs, plans
ReviewDifferent AIVerdict, advisories, required changes
RevisionConversational AIStrengthened documentation
ImplementationCLI AI toolWorking code from reviewed docs

The documentation that enters the implementation phase has been through at least two AI systems and human review. It is substantially stronger than documentation produced by a single system in a single pass.

The full two-pronged approach has overhead. Not every task justifies it.

Task typeResearch methodologyReasoning
New system or major featureFull: research + review + implementationHigh-stakes decisions benefit from adversarial review
Significant refactoringFull or research + implementationDepends on risk level
New feature following existing patternsResearch + implementation (skip review)Pattern is already validated
Bug fix with clear root causeImplementation onlyNo research needed
Configuration changeImplementation onlyNo design decisions
Exploratory prototypingResearch only (no review)Speed matters more than correctness

The threshold: if the task involves architectural decisions, new patterns, or significant trade-offs, use the full two-pronged approach. If the task follows established patterns with no new decisions, skip the review.

The Guides overview diagram names specific tools — here’s the reasoning behind those choices and how to adapt them to your own setup.

RoleOur choiceWhyAlternatives
ResearchClaude (claude.ai)Strong at structured output, long-form reasoning, specification writingGPT-4, Gemini — any conversational AI with strong reasoning
ReviewGemini ProDifferent training data = different blind spots. Good at adversarial critique.GPT-4, Claude (different model), DeepSeek — the key is different architecture
ImplementationClaude Code (CLI)Executes from documentation, follows Jimmy’s Workflow gatesCursor, GitHub Copilot, Aider — any AI coding tool that reads markdown

The principle matters more than the specific tools. The critical insight is: the system that researches should not be the same system that reviews. Use whatever models you have access to — the value comes from architectural diversity, not from any specific brand.

Setting up the review step: Paste your research output (executive summary, spec, or plan) into a different AI system with this framing: “Review this specification for weak reasoning, missing edge cases, and untested assumptions. Your job is to find problems, not to agree.” That’s it — no special setup needed. The adversarial framing does the work.

Keep research sessions focused. One topic per research session. “Design the authentication system” is a good session. “Design the authentication system and also figure out the database schema and the deployment strategy” produces weaker output in all three areas.

Give the reviewer explicit instructions. Do not simply paste the document and ask “what do you think?” Instead: “Review this specification for weak reasoning, missing edge cases, and untested assumptions. Your job is to find problems, not to agree.”

Track revisions. When the reviewer sends a document back for revision, note what changed and why. This creates a decision trail that the implementation AI can reference if questions arise.

Do not cherry-pick review feedback. If the reviewer raises an issue, address it — even if you disagree. Either revise the document or explicitly document why the feedback was considered and rejected. Ignoring review feedback defeats the purpose.