AGENTS.md is a project-level configuration file that provides AI coding assistants with complete project context, development principles, and operational guidelines. It follows the agents.md standard and works across Claude, Cursor, GitHub Copilot, and other AI tools.

What is Jimmy's Workflow?

Jimmy's Workflow is a four-phase validation system (PRE-FLIGHT, IMPLEMENT, VALIDATE, CHECKPOINT) designed to prevent AI hallucination and ensure robust implementation. Each phase has specific gates and confidence levels (HIGH/MEDIUM/LOW) that determine when human review is needed.

How do I set up multiple AI instances as a team?

Assign each AI instance a role card with identity, responsibilities, personality traits, and success criteria. Use file-based handoff protocols for coordination. Personality orchestration at the system prompt level produces measurably better results than treating AI instances as generic tools.

Orchestrator Pattern

A capable model orchestrates while cheaper models execute structured subtasks. Combined with mandatory workflow structure on every specialist, this pattern delivers 50%+ cost reduction and 2-3x speed improvement with no measurable quality loss.

Executive summary

Approach	Cost (input/output per 1M tokens)	Latency	Quality
All-capable (every call uses premium model)	$15 / $75 (5 premium calls)	~60s sequential	Baseline
Orchestrator + specialists	$7 / $35 (1 premium + 4 cheap)	~19s parallel	Equal or better
Savings	53% cost reduction	68% time reduction	No loss

The key requirement: every specialist must have structured workflow built in. Without it, cheaper models are unreliable. With it, they match or exceed premium model quality on well-defined tasks.

The architecture

Orchestrator + Specialist pattern — capable orchestrator delegates to parallel cheaper specialists

Orchestrator (capable model)
  │
  │  Complex reasoning, task decomposition, synthesis
  │
  ├── Specialist 1 (cheaper model + structured workflow)
  │     Structured subtask with validation gates
  │
  ├── Specialist 2 (cheaper model + structured workflow)
  │     Structured subtask with validation gates
  │
  ├── Specialist 3 (cheaper model + structured workflow)
  │     Structured subtask with validation gates
  │
  └── Specialist 4 (cheaper model + structured workflow)
        Structured subtask with validation gates

Role separation

Orchestrator (capable model — e.g. Claude Sonnet 4.5, Gemini Pro)

Receives the high-level task
Decomposes it into well-defined subtasks
Assigns subtasks to specialists with explicit instructions
Synthesises specialist outputs into a coherent result
Handles edge cases and ambiguity that require reasoning

Specialists (cheaper model — e.g. Claude Haiku 4.5)

Receive a single, well-defined subtask
Execute with structured workflow (PRE-FLIGHT, IMPLEMENT, VALIDATE, CHECKPOINT)
Return structured output with confidence levels
Flag uncertainties back to the orchestrator

The orchestrator does the thinking. The specialists do the doing.

Cost comparison

Scenario: 5-step content analysis pipeline

All-capable approach (Claude Sonnet 4.5 for every step):

Step	Model	Est. tokens (in/out)	Cost
1. Analyse content	Sonnet 4.5	3,000 / 1,500	$0.032
2. Extract entities	Sonnet 4.5	2,500 / 1,000	$0.023
3. Classify topics	Sonnet 4.5	2,000 / 500	$0.014
4. Generate summary	Sonnet 4.5	3,000 / 800	$0.021
5. Quality check	Sonnet 4.5	2,000 / 600	$0.015
Total			$0.105
Execution	Sequential		~60s

Orchestrator + specialists (1 Sonnet + 4 Haiku):

Step	Model	Est. tokens (in/out)	Cost
1. Decompose + synthesise	Sonnet 4.5	3,000 / 1,500	$0.032
2. Extract entities	Haiku 4.5	2,500 / 1,000	$0.008
3. Classify topics	Haiku 4.5	2,000 / 500	$0.005
4. Generate summary	Haiku 4.5	3,000 / 800	$0.007
5. Quality check	Haiku 4.5	2,000 / 600	$0.005
Total			$0.057
Execution	Steps 2-5 parallel		~19s

Metric	All-capable	Orchestrator + specialists	Improvement
Cost per request	$0.105	$0.057	46% cheaper
Latency	~60s	~19s	68% faster
Quality	Baseline	Equal (with workflow)	No loss

At scale, these margins compound.

Key success factors

1. Clear task boundaries

Specialists need well-defined tasks with unambiguous inputs and expected outputs. If you cannot write a clear one-paragraph brief for a subtask, it is too complex for a specialist — keep it at the orchestrator level.

Good specialist task: “Extract all named entities (people, organisations, locations) from this article. Return as JSON with entity type, name, and sentence context.”

Bad specialist task: “Analyse this article and figure out what’s important.” This requires judgment that a cheap model without deep context cannot reliably provide.

2. Workflow integration is mandatory

Every specialist must operate with structured workflow — PRE-FLIGHT checks, validation gates, confidence levels. This is not optional. The Haiku 4.5 findings demonstrate that structured workflow is the mechanism that makes cheaper models production-viable. Without it, error rates are unacceptable.

3. Parallel execution

Specialists that do not depend on each other’s output should run simultaneously. This is where the latency improvement comes from. Sequential specialist execution saves cost but not time. Parallel execution saves both.

4. Orchestrator quality

The pattern is only as good as the orchestrator’s ability to:

Decompose tasks into independent, well-defined subtasks
Write clear specialist briefs with explicit success criteria
Synthesise specialist outputs, resolving conflicts and filling gaps
Recognise when a specialist result is unreliable and escalate

Skimping on the orchestrator model to save money undermines the entire pattern. The orchestrator is the one place where model capability matters most.

When to use this pattern

Use when	Do not use when
Task decomposes into 3+ independent subtasks	Task is a single indivisible operation
Cost or speed matters at scale	Running fewer than 50 requests per day
Subtasks can be defined with clear inputs/outputs	Subtasks are deeply interconnected (output of each feeds the next)
Can parallelise at least some specialist calls	Cannot define clear task boundaries
Quality requirements can be validated per-subtask	Quality can only be assessed holistically on the final output

Pattern variations

Mixed specialist tiers

Not all specialists need to be the cheapest model. Some subtasks may require an intermediate model while others work fine with the cheapest.

Orchestrator (Sonnet 4.5)
  ├── Specialist A (Haiku 4.5)     -- entity extraction
  ├── Specialist B (Haiku 4.5)     -- classification
  ├── Specialist C (Sonnet 4.5)    -- nuanced summary
  └── Specialist D (Haiku 4.5)     -- format validation

The decision is per-subtask: can this task be defined explicitly enough for a cheaper model with workflow? If yes, use cheap. If no, use capable.

Cheap orchestrator for simple flows

When the decomposition is trivial (e.g. always the same 4 steps, no conditional logic), even the orchestrator can be a cheaper model. This works only when the pipeline is fixed and well-tested.

Hierarchical specialists

For complex pipelines, specialists can themselves become orchestrators of sub-specialists.

Orchestrator (capable model)
  ├── Specialist-Orchestrator A (mid-tier model)
  │     ├── Sub-specialist A1 (cheap model)
  │     └── Sub-specialist A2 (cheap model)
  └── Specialist B (cheap model)

This adds complexity and should only be used when the cost savings at scale justify the engineering overhead.

Scaling projections

Based on the 5-step pipeline example above at 1,000 requests per day:

Metric	All-capable	Orchestrator + specialists	Difference
Daily cost	$105.00	$57.00	-$48.00/day
Monthly cost	$3,150	$1,710	-$1,440/month
Annual cost	$38,325	$20,805	-$17,520/year
Daily compute time	16.7 hours	5.3 hours	-11.4 hours
Avg. latency per request	60s	19s	-41s

At 10,000 requests per day, annual savings exceed $175,000.

The critical insight

Structured workflow is what makes cheap specialists viable for production.

Without structured workflow, cheaper models produce inconsistent results. Error rates climb, retry costs eat into savings, and quality becomes unpredictable. The theoretical cost saving evaporates in practice.

With structured workflow — explicit thinking templates, mandatory validation gates, confidence levels, self-correction loops — cheaper models achieve reliability rates that match or exceed premium models on well-defined tasks. The Haiku 4.5 findings document an 86% vs 71% success rate in favour of the cheaper model with workflow.

The Orchestrator + Specialist pattern is not just “use cheap models for easy stuff.” It is a deliberate architecture where the orchestrator provides the reasoning and the workflow provides the reliability, allowing specialists to operate at a quality level they could not reach alone.

Limitations

Tested on content processing tasks only. Other domains (code generation, creative writing, data analysis) may behave differently.
Specific model versions. Claude Haiku 4.5 and Claude Sonnet 4.5 as available in October 2025.
Engineering overhead. The pattern requires building task decomposition, parallel execution, and result synthesis. For simple applications, this overhead may not justify the savings.
Latency assumptions. Parallel execution assumes your infrastructure supports concurrent API calls. Rate limits may constrain parallelism.
Quality validation per-subtask. If quality can only be measured on the final assembled output, individual specialist validation gates are less effective.