AGENTS.md is a project-level configuration file that provides AI coding assistants with complete project context, development principles, and operational guidelines. It follows the agents.md standard and works across Claude, Cursor, GitHub Copilot, and other AI tools.

What is Jimmy's Workflow?

Jimmy's Workflow is a four-phase validation system (PRE-FLIGHT, IMPLEMENT, VALIDATE, CHECKPOINT) designed to prevent AI hallucination and ensure robust implementation. Each phase has specific gates and confidence levels (HIGH/MEDIUM/LOW) that determine when human review is needed.

How do I set up multiple AI instances as a team?

Assign each AI instance a role card with identity, responsibilities, personality traits, and success criteria. Use file-based handoff protocols for coordination. Personality orchestration at the system prompt level produces measurably better results than treating AI instances as generic tools.

Evolution: From Copy-Paste to Multi-Agent Team

This is the story of how a solo developer’s AI workflow evolved through five stages over approximately eighteen months. It started with copy-pasting ChatGPT output into VS Code and ended with a coordinated multi-agent team across four machines. Each version solved real problems discovered in daily production use. The pattern: each iteration added structure that made AI output more reliable.

v0: Copy-Paste Era (late 2024 – early 2025)

Setup

ChatGPT in a browser tab, VS Code open beside it. Ask ChatGPT a question, read the answer, manually copy code snippets into the editor. No integration, no context sharing, no automation. The AI could not see the codebase — you described it in the chat and hoped the description was accurate enough.

What worked

Zero setup — anyone with a browser could start immediately
Good for learning and exploration — “how do I do X in Rust?”
Forced the human to understand every line (you had to read it to paste it)

What broke

Problem	Impact
No codebase awareness	AI gave generic answers, not project-specific ones
Context lost every session	Had to re-explain the project each time
Manual copy-paste errors	Wrong indentation, missing imports, partial snippets
No validation	AI said it worked, you trusted it, it didn’t
Conversational drift	Long chats wandered from the original task
Tab-switching fatigue	Constant context-switching between browser and editor

Lesson learned

AI-generated code is useful but the delivery mechanism matters. Copy-pasting between disconnected tools is error-prone and exhausting. The AI needs to see the code, and the code needs to see the AI’s output. Integration isn’t a luxury — it’s a prerequisite for reliability.

v1: IDE Era (mid-2025)

Setup

AI assistance embedded directly in an IDE (such as Cursor). Project-specific rules lived in IDE configuration files (e.g., .cursorrules). The AI had access to the codebase through the editor.

What worked

Low friction — AI assistance available immediately in the editor
Rules files could encode project-specific conventions
Good for single-file and single-feature tasks

What broke

Problem	Impact
IDE lock-in	Rules tied to one editor, not portable
No cross-machine coordination	Could only work on one machine
Rules scattered across projects	No standardisation, each project had different conventions
No validation gates	AI would claim “done” without verification
Session isolation	Each session started blank with no memory of prior work

Lesson learned

IDE-embedded AI is a good starting point, but the rules need to be portable and the workflow needs validation gates. Without structure, AI assistance is helpful but unreliable.

v2: CLI Era (late 2025)

Setup

Migration from IDE-embedded AI to CLI-based tools (such as Claude Code). Introduction of structured project files:

AGENTS.md — comprehensive project context for any AI tool
CLAUDE.md — quick-reference companion for Claude Code specifically
Jimmy’s Workflow — four-phase validation system (PRE-FLIGHT, IMPLEMENT, VALIDATE, CHECKPOINT)
11 Core Principles — mandatory rules that persist across sessions

A template system provided standardised starting points for new projects.

Key innovations

Innovation	What it solved
AGENTS.md	Portable project context — works with any AI tool
Jimmy’s Workflow	Validation gates prevent “it works, trust me”
11 Core Principles	Consistency across sessions, survives context compression
Template versioning	Projects can check if their templates are current
Confidence levels	HIGH/MEDIUM/LOW system tells the human when to intervene

What broke

Problem	Impact
Single machine	Could not delegate heavy compute to more powerful hardware
Context limits	Large codebases exceeded context windows
No specialisation	One instance tried to do everything
No delegation pattern	No way to say “this task needs a bigger machine”

Lesson learned

Structured context and validation gates dramatically improve reliability. But a single machine, no matter how well-configured, cannot handle every type of task. Development needs precision. Compute needs power. Monitoring needs always-on availability.

v3: Multi-Machine (early 2026)

Setup

Three machines with distinct roles connected via mesh VPN:

Machine type	Role	Why this hardware
Development workstation	Code, testing, prototyping	Good tooling, fast iteration
High-memory server	Compute, containers, builds	64GB+ RAM for heavy tasks
Low-power device	Monitoring, security scanning	Always-on, low power draw

Each machine had its own AI instance with role-appropriate configuration. The code repository served as the coordination hub — all machines pulled from and pushed to the same repo.

Key innovations

Innovation	What it solved
Machine specialisation	Right task on right hardware
Resource delegation	Development machine stops trying to run inference
Mesh networking	Direct machine-to-machine communication without public exposure
Per-machine configuration	Each instance configured for its specific role

What broke

Problem	Impact
No team identity	Instances did not know about each other
No handoff protocol	Work transferred ad-hoc with inconsistent formatting
No personality constraints	Each instance defaulted to generic helpful mode
External research disconnected	Research done in web-based AI had no structured path to internal team

Lesson learned

Multiple machines with specialised roles is a significant improvement. But machines alone are not enough — the AI instances on those machines need to know they are part of a team, who their colleagues are, and how to communicate.

v4: Multi-Agent Team (early 2026 - current)

Setup

Four machines plus external AI services, operating as a coordinated team with defined personalities, relationships, and communication protocols.

EXTERNAL
────────────────────────────────
External AI Service A (Research)
External AI Service B (Review)
────────────────────────────────
BOUNDARY (file-based dead drop)
────────────────────────────────
INTERNAL
├── Coordinator (gateway machine)
├── Developer (workstation)
├── Compute (high-memory server)
└── Monitor (always-on device)

Key innovations

Innovation	What it solved
Team structure with personalities	Instances produce work that fits together, not isolated outputs
Role cards	Each instance has defined identity, responsibilities, and relationships
Dead drop protocol	Clean boundary between external research and internal implementation
Two external AI services	Different AI architectures provide different perspectives on research and review
Communication norms	Standardised how instances give and receive work
Shared values	Consistency across all instances regardless of role
Relationship tables	Every instance knows who it works with and in what direction
Escalation protocol	Problems reach the right level (P1/P2/P3)

The critical insight

The single biggest improvement in v4 was not technical. It was treating AI instances as colleagues rather than tools. Giving each instance a name, personality traits, and awareness of the team produced measurably better output:

Handoffs became structured and complete
Instances considered downstream consumers when formatting output
Quality became consistent across the team (shared values)
Resource delegation happened naturally (role awareness)
Failure modes from v3 (restart from scratch, ignore prior work) disappeared

This insight is documented in detail in the Team Orchestration guide.

What changed at each transition

Transition	What was added	What it fixed
v0 to v1	IDE integration, codebase access, rules files	Copy-paste errors, no project awareness, tab-switching
v1 to v2	Structured files, validation gates, principles	IDE lock-in, no verification, inconsistency
v2 to v3	Multiple machines, specialisation, mesh networking	Single-machine limits, no delegation
v3 to v4	Team identity, personalities, communication norms	Disconnected instances, ad-hoc handoffs, no team awareness

The pattern

Each version followed the same arc:

Use the current setup in production until its limitations become clear
Identify the specific failure mode that causes the most friction
Add the minimum structure needed to address that failure mode
Validate in daily use before adding more complexity

The progression was always from less structure to more structure, driven by observed problems rather than theoretical concerns. Nothing was added speculatively — every layer of structure exists because its absence caused measurable problems.

Current state

The v4 setup has been in daily production use since early 2026. The key metrics:

Metric	Observation
Handoff quality	Structured, complete, no clarification needed
Cross-machine coordination	Smooth, protocol-driven, minimal friction
Quality consistency	Shared values produce uniform quality bar
Resource utilisation	Right work on right hardware, no thrashing
Failure recovery	Any instance can be replaced without losing team context

The system is not finished. v5 will likely emerge when current limitations become clear through continued use. The methodology — observe problems, add minimum structure, validate in production — will remain the same.

Evolution: From Copy-Paste to Multi-Agent Team

v0: Copy-Paste Era (late 2024 – early 2025)

Setup

What worked

What broke

Lesson learned

v1: IDE Era (mid-2025)

Setup

What worked

What broke

Lesson learned

v2: CLI Era (late 2025)

Setup

Key innovations

What broke

Lesson learned

v3: Multi-Machine (early 2026)

Setup

Key innovations

What broke

Lesson learned

v4: Multi-Agent Team (early 2026 - current)

Setup

Key innovations

The critical insight

What changed at each transition

The pattern

Current state

Further reading