External Research Methodology
Using one AI system for everything creates blind spots. Different AI architectures have different training data, different strengths, and different failure modes. The external research methodology uses this to your advantage: research with one system, review with another, implement with a third.
The problem it solves
Section titled “The problem it solves”A single AI system reviewing its own work is a conflict of interest. The same biases that shaped the initial output also shape the review. Blind spots are invisible precisely because they are blind spots.
| Single-system failure | Multi-system mitigation |
|---|---|
| Confirms its own assumptions | A different system challenges assumptions from a different angle |
| Misses the same edge cases consistently | Different training data surfaces different edge cases |
| Optimises for its own strengths | The reviewer has different strengths |
| Produces internally consistent but externally weak specs | External review breaks the consistency bubble |
| No adversarial pressure on reasoning | The reviewer’s job is to challenge, not agree |
The two-pronged approach
Section titled “The two-pronged approach”Research AI → Draft Documentation → Review AI → Strengthened Docs → Implementation AI ↑ | └───────────────────────────────────┘ (iterate until approved)Phase 1: Research
Section titled “Phase 1: Research”A conversational AI system handles deep thinking, strategy, and documentation production. This is where executive summaries, specifications, implementation plans, and research reports are created.
What the research phase produces:
| Output | Purpose |
|---|---|
| Executive Summary | Architecture, key decisions, trade-offs |
| Specification | Interfaces, data structures, constraints |
| Implementation Plan | Test cases, ordered steps, validation criteria |
| Research Report | Findings, comparisons, recommendations |
The research AI is chosen for depth of reasoning and quality of structured output. It operates in a conversational mode — the human guides the research, asks probing questions, and refines the output iteratively.
Phase 2: Review
Section titled “Phase 2: Review”A different AI system reviews the research output. The reviewer’s role is explicitly adversarial — its job is to find weaknesses, not to agree.
What the review phase checks:
| Check | What it catches |
|---|---|
| Weak reasoning | Conclusions that don’t follow from evidence |
| Missing edge cases | Scenarios the research AI didn’t consider |
| Untested assumptions | Claims presented as facts without validation |
| Gaps in specification | Interfaces that are underspecified or ambiguous |
| Internal contradictions | Requirements that conflict with each other |
| Over-engineering | Complexity that isn’t justified by requirements |
| Under-specification | Areas where “it depends” needs to become a concrete decision |
The reviewer uses a different AI architecture. Different training data means different blind spots. What one system misses, another is more likely to catch.
Phase 3: Implementation
Section titled “Phase 3: Implementation”A CLI-based AI coding tool implements from the reviewed documentation using Jimmy’s Workflow. The implementation AI never sees the original requirements directly — it works from the specification and implementation plan that survived review.
This separation is deliberate. The implementation AI is not influenced by the back-and-forth of the research phase. It receives clean, reviewed documentation and translates it into code.
Why this works
Section titled “Why this works”Different training data produces different blind spots. An AI system trained primarily on one corpus will have systematic gaps. A second system trained on a different corpus fills different gaps. The overlap of two imperfect systems is more complete than either alone.
The reviewer’s job is to challenge, not agree. This is explicitly stated in the review prompt. The reviewer is not asked “does this look good?” — it is asked “what is wrong with this?” The adversarial framing produces substantively different output.
Specs that survive review are specs worth building. If a specification has weak reasoning, missing edge cases, or untested assumptions, it is far cheaper to catch these in the documentation phase than in the implementation phase. A review that sends the spec back for revision saves hours of implementation rework.
It is a feedback loop, not a pipeline. Research and review iterate. The research AI revises based on review feedback. The reviewer re-evaluates the revision. This continues until the documentation meets the standard. Only then does implementation begin.
Review format
Section titled “Review format”The reviewer produces a structured verdict:
## Review: [Document Title]
**Verdict**: APPROVED | APPROVED WITH ADVISORIES | REQUIRES REVISION | REJECTED
### Structural Strengths- [What the document does well]- [Sound reasoning or thorough coverage]
### Critical Advisories- [Issues that must be addressed before implementation]- [Missing edge cases or weak reasoning]
### Recommended Changes1. [Specific, actionable change]2. [Specific, actionable change]
### Minor Observations- [Non-blocking suggestions for improvement]Verdict definitions:
| Verdict | Meaning | Action |
|---|---|---|
| APPROVED | Document is ready for implementation | Proceed to implementation phase |
| APPROVED WITH ADVISORIES | Document is usable but has minor gaps | Address advisories during implementation |
| REQUIRES REVISION | Significant issues found | Return to research phase, revise, re-submit for review |
| REJECTED | Fundamental problems with approach | Return to research phase, reconsider approach |
How this connects to documentation-first workflow
Section titled “How this connects to documentation-first workflow”The external research methodology is the engine that produces documentation for the documentation-first workflow:
External Research → Reviewed Documentation → Documentation-First Implementation| Stage | Who | Produces |
|---|---|---|
| Research | Conversational AI | Draft executive summaries, specs, plans |
| Review | Different AI | Verdict, advisories, required changes |
| Revision | Conversational AI | Strengthened documentation |
| Implementation | CLI AI tool | Working code from reviewed docs |
The documentation that enters the implementation phase has been through at least two AI systems and human review. It is substantially stronger than documentation produced by a single system in a single pass.
When NOT to use this
Section titled “When NOT to use this”The full two-pronged approach has overhead. Not every task justifies it.
| Task type | Research methodology | Reasoning |
|---|---|---|
| New system or major feature | Full: research + review + implementation | High-stakes decisions benefit from adversarial review |
| Significant refactoring | Full or research + implementation | Depends on risk level |
| New feature following existing patterns | Research + implementation (skip review) | Pattern is already validated |
| Bug fix with clear root cause | Implementation only | No research needed |
| Configuration change | Implementation only | No design decisions |
| Exploratory prototyping | Research only (no review) | Speed matters more than correctness |
The threshold: if the task involves architectural decisions, new patterns, or significant trade-offs, use the full two-pronged approach. If the task follows established patterns with no new decisions, skip the review.
Which AI for which role?
Section titled “Which AI for which role?”The Guides overview diagram names specific tools — here’s the reasoning behind those choices and how to adapt them to your own setup.
| Role | Our choice | Why | Alternatives |
|---|---|---|---|
| Research | Claude (claude.ai) | Strong at structured output, long-form reasoning, specification writing | GPT-4, Gemini — any conversational AI with strong reasoning |
| Review | Gemini Pro | Different training data = different blind spots. Good at adversarial critique. | GPT-4, Claude (different model), DeepSeek — the key is different architecture |
| Implementation | Claude Code (CLI) | Executes from documentation, follows Jimmy’s Workflow gates | Cursor, GitHub Copilot, Aider — any AI coding tool that reads markdown |
The principle matters more than the specific tools. The critical insight is: the system that researches should not be the same system that reviews. Use whatever models you have access to — the value comes from architectural diversity, not from any specific brand.
Setting up the review step: Paste your research output (executive summary, spec, or plan) into a different AI system with this framing: “Review this specification for weak reasoning, missing edge cases, and untested assumptions. Your job is to find problems, not to agree.” That’s it — no special setup needed. The adversarial framing does the work.
Practical tips
Section titled “Practical tips”Keep research sessions focused. One topic per research session. “Design the authentication system” is a good session. “Design the authentication system and also figure out the database schema and the deployment strategy” produces weaker output in all three areas.
Give the reviewer explicit instructions. Do not simply paste the document and ask “what do you think?” Instead: “Review this specification for weak reasoning, missing edge cases, and untested assumptions. Your job is to find problems, not to agree.”
Track revisions. When the reviewer sends a document back for revision, note what changed and why. This creates a decision trail that the implementation AI can reference if questions arise.
Do not cherry-pick review feedback. If the reviewer raises an issue, address it — even if you disagree. Either revise the document or explicitly document why the feedback was considered and rejected. Ignoring review feedback defeats the purpose.
Further reading
Section titled “Further reading”- Documentation-First Development — The workflow that consumes research output
- Implementation Plans — How to structure the plans that research produces
- Jimmy’s Workflow v2.1 — The validation system used during implementation
- 11 Core Principles — The principles that govern all phases