Evolution: From Copy-Paste to Multi-Agent Team
This is the story of how a solo developer’s AI workflow evolved through five stages over approximately eighteen months. It started with copy-pasting ChatGPT output into VS Code and ended with a coordinated multi-agent team across four machines. Each version solved real problems discovered in daily production use. The pattern: each iteration added structure that made AI output more reliable.
v0: Copy-Paste Era (late 2024 – early 2025)
Section titled “v0: Copy-Paste Era (late 2024 – early 2025)”ChatGPT in a browser tab, VS Code open beside it. Ask ChatGPT a question, read the answer, manually copy code snippets into the editor. No integration, no context sharing, no automation. The AI could not see the codebase — you described it in the chat and hoped the description was accurate enough.
What worked
Section titled “What worked”- Zero setup — anyone with a browser could start immediately
- Good for learning and exploration — “how do I do X in Rust?”
- Forced the human to understand every line (you had to read it to paste it)
What broke
Section titled “What broke”| Problem | Impact |
|---|---|
| No codebase awareness | AI gave generic answers, not project-specific ones |
| Context lost every session | Had to re-explain the project each time |
| Manual copy-paste errors | Wrong indentation, missing imports, partial snippets |
| No validation | AI said it worked, you trusted it, it didn’t |
| Conversational drift | Long chats wandered from the original task |
| Tab-switching fatigue | Constant context-switching between browser and editor |
Lesson learned
Section titled “Lesson learned”AI-generated code is useful but the delivery mechanism matters. Copy-pasting between disconnected tools is error-prone and exhausting. The AI needs to see the code, and the code needs to see the AI’s output. Integration isn’t a luxury — it’s a prerequisite for reliability.
v1: IDE Era (mid-2025)
Section titled “v1: IDE Era (mid-2025)”AI assistance embedded directly in an IDE (such as Cursor). Project-specific rules lived in IDE configuration files (e.g., .cursorrules). The AI had access to the codebase through the editor.
What worked
Section titled “What worked”- Low friction — AI assistance available immediately in the editor
- Rules files could encode project-specific conventions
- Good for single-file and single-feature tasks
What broke
Section titled “What broke”| Problem | Impact |
|---|---|
| IDE lock-in | Rules tied to one editor, not portable |
| No cross-machine coordination | Could only work on one machine |
| Rules scattered across projects | No standardisation, each project had different conventions |
| No validation gates | AI would claim “done” without verification |
| Session isolation | Each session started blank with no memory of prior work |
Lesson learned
Section titled “Lesson learned”IDE-embedded AI is a good starting point, but the rules need to be portable and the workflow needs validation gates. Without structure, AI assistance is helpful but unreliable.
v2: CLI Era (late 2025)
Section titled “v2: CLI Era (late 2025)”Migration from IDE-embedded AI to CLI-based tools (such as Claude Code). Introduction of structured project files:
- AGENTS.md — comprehensive project context for any AI tool
- CLAUDE.md — quick-reference companion for Claude Code specifically
- Jimmy’s Workflow — four-phase validation system (PRE-FLIGHT, IMPLEMENT, VALIDATE, CHECKPOINT)
- 11 Core Principles — mandatory rules that persist across sessions
A template system provided standardised starting points for new projects.
Key innovations
Section titled “Key innovations”| Innovation | What it solved |
|---|---|
| AGENTS.md | Portable project context — works with any AI tool |
| Jimmy’s Workflow | Validation gates prevent “it works, trust me” |
| 11 Core Principles | Consistency across sessions, survives context compression |
| Template versioning | Projects can check if their templates are current |
| Confidence levels | HIGH/MEDIUM/LOW system tells the human when to intervene |
What broke
Section titled “What broke”| Problem | Impact |
|---|---|
| Single machine | Could not delegate heavy compute to more powerful hardware |
| Context limits | Large codebases exceeded context windows |
| No specialisation | One instance tried to do everything |
| No delegation pattern | No way to say “this task needs a bigger machine” |
Lesson learned
Section titled “Lesson learned”Structured context and validation gates dramatically improve reliability. But a single machine, no matter how well-configured, cannot handle every type of task. Development needs precision. Compute needs power. Monitoring needs always-on availability.
v3: Multi-Machine (early 2026)
Section titled “v3: Multi-Machine (early 2026)”Three machines with distinct roles connected via mesh VPN:
| Machine type | Role | Why this hardware |
|---|---|---|
| Development workstation | Code, testing, prototyping | Good tooling, fast iteration |
| High-memory server | Compute, containers, builds | 64GB+ RAM for heavy tasks |
| Low-power device | Monitoring, security scanning | Always-on, low power draw |
Each machine had its own AI instance with role-appropriate configuration. The code repository served as the coordination hub — all machines pulled from and pushed to the same repo.
Key innovations
Section titled “Key innovations”| Innovation | What it solved |
|---|---|
| Machine specialisation | Right task on right hardware |
| Resource delegation | Development machine stops trying to run inference |
| Mesh networking | Direct machine-to-machine communication without public exposure |
| Per-machine configuration | Each instance configured for its specific role |
What broke
Section titled “What broke”| Problem | Impact |
|---|---|
| No team identity | Instances did not know about each other |
| No handoff protocol | Work transferred ad-hoc with inconsistent formatting |
| No personality constraints | Each instance defaulted to generic helpful mode |
| External research disconnected | Research done in web-based AI had no structured path to internal team |
Lesson learned
Section titled “Lesson learned”Multiple machines with specialised roles is a significant improvement. But machines alone are not enough — the AI instances on those machines need to know they are part of a team, who their colleagues are, and how to communicate.
v4: Multi-Agent Team (early 2026 - current)
Section titled “v4: Multi-Agent Team (early 2026 - current)”Four machines plus external AI services, operating as a coordinated team with defined personalities, relationships, and communication protocols.
EXTERNAL────────────────────────────────External AI Service A (Research)External AI Service B (Review)────────────────────────────────BOUNDARY (file-based dead drop)────────────────────────────────INTERNAL├── Coordinator (gateway machine)├── Developer (workstation)├── Compute (high-memory server)└── Monitor (always-on device)Key innovations
Section titled “Key innovations”| Innovation | What it solved |
|---|---|
| Team structure with personalities | Instances produce work that fits together, not isolated outputs |
| Role cards | Each instance has defined identity, responsibilities, and relationships |
| Dead drop protocol | Clean boundary between external research and internal implementation |
| Two external AI services | Different AI architectures provide different perspectives on research and review |
| Communication norms | Standardised how instances give and receive work |
| Shared values | Consistency across all instances regardless of role |
| Relationship tables | Every instance knows who it works with and in what direction |
| Escalation protocol | Problems reach the right level (P1/P2/P3) |
The critical insight
Section titled “The critical insight”The single biggest improvement in v4 was not technical. It was treating AI instances as colleagues rather than tools. Giving each instance a name, personality traits, and awareness of the team produced measurably better output:
- Handoffs became structured and complete
- Instances considered downstream consumers when formatting output
- Quality became consistent across the team (shared values)
- Resource delegation happened naturally (role awareness)
- Failure modes from v3 (restart from scratch, ignore prior work) disappeared
This insight is documented in detail in the Team Orchestration guide.
What changed at each transition
Section titled “What changed at each transition”| Transition | What was added | What it fixed |
|---|---|---|
| v0 to v1 | IDE integration, codebase access, rules files | Copy-paste errors, no project awareness, tab-switching |
| v1 to v2 | Structured files, validation gates, principles | IDE lock-in, no verification, inconsistency |
| v2 to v3 | Multiple machines, specialisation, mesh networking | Single-machine limits, no delegation |
| v3 to v4 | Team identity, personalities, communication norms | Disconnected instances, ad-hoc handoffs, no team awareness |
The pattern
Section titled “The pattern”Each version followed the same arc:
- Use the current setup in production until its limitations become clear
- Identify the specific failure mode that causes the most friction
- Add the minimum structure needed to address that failure mode
- Validate in daily use before adding more complexity
The progression was always from less structure to more structure, driven by observed problems rather than theoretical concerns. Nothing was added speculatively — every layer of structure exists because its absence caused measurable problems.
Current state
Section titled “Current state”The v4 setup has been in daily production use since early 2026. The key metrics:
| Metric | Observation |
|---|---|
| Handoff quality | Structured, complete, no clarification needed |
| Cross-machine coordination | Smooth, protocol-driven, minimal friction |
| Quality consistency | Shared values produce uniform quality bar |
| Resource utilisation | Right work on right hardware, no thrashing |
| Failure recovery | Any instance can be replaced without losing team context |
The system is not finished. v5 will likely emerge when current limitations become clear through continued use. The methodology — observe problems, add minimum structure, validate in production — will remain the same.
Further reading
Section titled “Further reading”- Team Orchestration — The core insight from v4: colleague framing
- Writing Role Cards — How to define instance identities
- Multi-Agent Setup — Architecture patterns for multi-machine teams
- Handoff Protocol — Coordination patterns developed through v3 and v4
- Philosophy — The design principles underlying all four versions