Skip to content

Evolution: From Copy-Paste to Multi-Agent Team

This is the story of how a solo developer’s AI workflow evolved through five stages over approximately eighteen months. It started with copy-pasting ChatGPT output into VS Code and ended with a coordinated multi-agent team across four machines. Each version solved real problems discovered in daily production use. The pattern: each iteration added structure that made AI output more reliable.

v0: Copy-Paste Era (late 2024 – early 2025)

Section titled “v0: Copy-Paste Era (late 2024 – early 2025)”

ChatGPT in a browser tab, VS Code open beside it. Ask ChatGPT a question, read the answer, manually copy code snippets into the editor. No integration, no context sharing, no automation. The AI could not see the codebase — you described it in the chat and hoped the description was accurate enough.

  • Zero setup — anyone with a browser could start immediately
  • Good for learning and exploration — “how do I do X in Rust?”
  • Forced the human to understand every line (you had to read it to paste it)
ProblemImpact
No codebase awarenessAI gave generic answers, not project-specific ones
Context lost every sessionHad to re-explain the project each time
Manual copy-paste errorsWrong indentation, missing imports, partial snippets
No validationAI said it worked, you trusted it, it didn’t
Conversational driftLong chats wandered from the original task
Tab-switching fatigueConstant context-switching between browser and editor

AI-generated code is useful but the delivery mechanism matters. Copy-pasting between disconnected tools is error-prone and exhausting. The AI needs to see the code, and the code needs to see the AI’s output. Integration isn’t a luxury — it’s a prerequisite for reliability.

AI assistance embedded directly in an IDE (such as Cursor). Project-specific rules lived in IDE configuration files (e.g., .cursorrules). The AI had access to the codebase through the editor.

  • Low friction — AI assistance available immediately in the editor
  • Rules files could encode project-specific conventions
  • Good for single-file and single-feature tasks
ProblemImpact
IDE lock-inRules tied to one editor, not portable
No cross-machine coordinationCould only work on one machine
Rules scattered across projectsNo standardisation, each project had different conventions
No validation gatesAI would claim “done” without verification
Session isolationEach session started blank with no memory of prior work

IDE-embedded AI is a good starting point, but the rules need to be portable and the workflow needs validation gates. Without structure, AI assistance is helpful but unreliable.

Migration from IDE-embedded AI to CLI-based tools (such as Claude Code). Introduction of structured project files:

  • AGENTS.md — comprehensive project context for any AI tool
  • CLAUDE.md — quick-reference companion for Claude Code specifically
  • Jimmy’s Workflow — four-phase validation system (PRE-FLIGHT, IMPLEMENT, VALIDATE, CHECKPOINT)
  • 11 Core Principles — mandatory rules that persist across sessions

A template system provided standardised starting points for new projects.

InnovationWhat it solved
AGENTS.mdPortable project context — works with any AI tool
Jimmy’s WorkflowValidation gates prevent “it works, trust me”
11 Core PrinciplesConsistency across sessions, survives context compression
Template versioningProjects can check if their templates are current
Confidence levelsHIGH/MEDIUM/LOW system tells the human when to intervene
ProblemImpact
Single machineCould not delegate heavy compute to more powerful hardware
Context limitsLarge codebases exceeded context windows
No specialisationOne instance tried to do everything
No delegation patternNo way to say “this task needs a bigger machine”

Structured context and validation gates dramatically improve reliability. But a single machine, no matter how well-configured, cannot handle every type of task. Development needs precision. Compute needs power. Monitoring needs always-on availability.

Three machines with distinct roles connected via mesh VPN:

Machine typeRoleWhy this hardware
Development workstationCode, testing, prototypingGood tooling, fast iteration
High-memory serverCompute, containers, builds64GB+ RAM for heavy tasks
Low-power deviceMonitoring, security scanningAlways-on, low power draw

Each machine had its own AI instance with role-appropriate configuration. The code repository served as the coordination hub — all machines pulled from and pushed to the same repo.

InnovationWhat it solved
Machine specialisationRight task on right hardware
Resource delegationDevelopment machine stops trying to run inference
Mesh networkingDirect machine-to-machine communication without public exposure
Per-machine configurationEach instance configured for its specific role
ProblemImpact
No team identityInstances did not know about each other
No handoff protocolWork transferred ad-hoc with inconsistent formatting
No personality constraintsEach instance defaulted to generic helpful mode
External research disconnectedResearch done in web-based AI had no structured path to internal team

Multiple machines with specialised roles is a significant improvement. But machines alone are not enough — the AI instances on those machines need to know they are part of a team, who their colleagues are, and how to communicate.

v4: Multi-Agent Team (early 2026 - current)

Section titled “v4: Multi-Agent Team (early 2026 - current)”

Four machines plus external AI services, operating as a coordinated team with defined personalities, relationships, and communication protocols.

EXTERNAL
────────────────────────────────
External AI Service A (Research)
External AI Service B (Review)
────────────────────────────────
BOUNDARY (file-based dead drop)
────────────────────────────────
INTERNAL
├── Coordinator (gateway machine)
├── Developer (workstation)
├── Compute (high-memory server)
└── Monitor (always-on device)
InnovationWhat it solved
Team structure with personalitiesInstances produce work that fits together, not isolated outputs
Role cardsEach instance has defined identity, responsibilities, and relationships
Dead drop protocolClean boundary between external research and internal implementation
Two external AI servicesDifferent AI architectures provide different perspectives on research and review
Communication normsStandardised how instances give and receive work
Shared valuesConsistency across all instances regardless of role
Relationship tablesEvery instance knows who it works with and in what direction
Escalation protocolProblems reach the right level (P1/P2/P3)

The single biggest improvement in v4 was not technical. It was treating AI instances as colleagues rather than tools. Giving each instance a name, personality traits, and awareness of the team produced measurably better output:

  • Handoffs became structured and complete
  • Instances considered downstream consumers when formatting output
  • Quality became consistent across the team (shared values)
  • Resource delegation happened naturally (role awareness)
  • Failure modes from v3 (restart from scratch, ignore prior work) disappeared

This insight is documented in detail in the Team Orchestration guide.

TransitionWhat was addedWhat it fixed
v0 to v1IDE integration, codebase access, rules filesCopy-paste errors, no project awareness, tab-switching
v1 to v2Structured files, validation gates, principlesIDE lock-in, no verification, inconsistency
v2 to v3Multiple machines, specialisation, mesh networkingSingle-machine limits, no delegation
v3 to v4Team identity, personalities, communication normsDisconnected instances, ad-hoc handoffs, no team awareness

Each version followed the same arc:

  1. Use the current setup in production until its limitations become clear
  2. Identify the specific failure mode that causes the most friction
  3. Add the minimum structure needed to address that failure mode
  4. Validate in daily use before adding more complexity

The progression was always from less structure to more structure, driven by observed problems rather than theoretical concerns. Nothing was added speculatively — every layer of structure exists because its absence caused measurable problems.

The v4 setup has been in daily production use since early 2026. The key metrics:

MetricObservation
Handoff qualityStructured, complete, no clarification needed
Cross-machine coordinationSmooth, protocol-driven, minimal friction
Quality consistencyShared values produce uniform quality bar
Resource utilisationRight work on right hardware, no thrashing
Failure recoveryAny instance can be replaced without losing team context

The system is not finished. v5 will likely emerge when current limitations become clear through continued use. The methodology — observe problems, add minimum structure, validate in production — will remain the same.