Skip to content

Test-Driven Implementation Plans

Implementation plans are natural language documents, structured for AI consumption, that define what to build and how to verify it. Test cases come first. There is no pseudocode. Every step is written in plain English with explicit dependencies and success criteria.

AI models parse natural language better than pseudocode.

Pseudocode sits in an uncanny valley — it looks like code but isn’t code. When an AI reads pseudocode, it faces an ambiguity: is this meant to be implemented literally, or is it a conceptual sketch? Different models interpret this differently. The same pseudocode produces different implementations across sessions.

Natural language with clear structure removes this ambiguity entirely. “Create a function that accepts a user ID and returns the user’s profile, or throws a NotFoundError if the user does not exist” is unambiguous. The AI knows exactly what to build. The implementation details (naming, error types, return types) are decided by the AI based on the project’s existing patterns.

Pseudocode problemNatural language solution
Ambiguous syntax — is getData() literal or conceptual?”Fetch the data from the database” is always conceptual
Implies implementation — for i in range(n) suggests a loop pattern”Process each item” lets the AI choose the right pattern
Language-specific — pseudocode leans toward one languageNatural language is language-agnostic
False precision — looks exact but isn’tNatural language is explicitly approximate
Stale quickly — pseudocode is harder to update than proseProse is easy to revise

The implementation plan lists test cases before implementation steps. This is deliberate:

  1. Tests define the contract. Before writing any code, the plan establishes what “correct” means.
  2. AI writes better code when tests exist first. The implementation targets the test cases, not the AI’s interpretation of requirements.
  3. Tests are verifiable. “The function returns the user profile” is a requirement. “Calling getProfile('user-123') returns { id: 'user-123', name: 'Test User' }” is a test case.

What is being built and why. One to three sentences maximum.

## Goal
Add rate limiting to the API gateway. Requests exceeding 100 per minute
per API key should receive a 429 response with a Retry-After header.
This prevents abuse and protects downstream services from overload.

Measurable, testable conditions. These are the acceptance criteria — when all are met, the task is done.

## Success Criteria
- [ ] Requests within rate limit succeed normally (200)
- [ ] Request 101 within a 60-second window returns 429
- [ ] 429 response includes Retry-After header with seconds remaining
- [ ] Rate limit state persists across server restarts
- [ ] Rate limit is per API key, not per IP address
- [ ] Existing tests continue to pass

Written before implementation steps. Each test case has a name, setup conditions, action, and expected result.

## Test Cases
### TC-1: Request within limit succeeds
- Setup: Clean rate limit state
- Action: Send 1 request with valid API key
- Expected: 200 response, normal body
### TC-2: Request at exact limit succeeds
- Setup: Clean rate limit state
- Action: Send 100 requests with same API key within 60 seconds
- Expected: All return 200
### TC-3: Request exceeding limit is rejected
- Setup: Clean rate limit state
- Action: Send 101 requests with same API key within 60 seconds
- Expected: First 100 return 200, request 101 returns 429
### TC-4: Retry-After header is accurate
- Setup: Exceed rate limit at T+30s of the window
- Action: Read Retry-After header from 429 response
- Expected: Value is approximately 30 (seconds remaining in window)
### TC-5: Different API keys have independent limits
- Setup: Clean rate limit state
- Action: Send 100 requests with key-A, then 1 request with key-B
- Expected: All requests succeed (key-B has its own counter)
### TC-6: Rate limit resets after window expires
- Setup: Exceed rate limit
- Action: Wait 60 seconds, send another request
- Expected: 200 response (new window)
### TC-7: State survives server restart
- Setup: Send 50 requests, restart server
- Action: Send 51 more requests
- Expected: Request 101 overall returns 429

Natural language, ordered, with dependencies noted. No pseudocode. Each step describes what to do, not how to do it at the code level.

## Implementation Steps
### Step 1: Add rate limit storage
Create a persistent store for rate limit counters. Each entry tracks
an API key, a window start timestamp, and a request count.
Depends on: nothing (new component)
### Step 2: Create rate limit middleware
Create middleware that runs before route handlers. It should:
- Extract the API key from the request header
- Look up or create the rate limit entry for that key
- Increment the counter
- If the counter exceeds 100 and the window has not expired, return 429
with a Retry-After header
- If the window has expired, reset the counter and start a new window
Depends on: Step 1
### Step 3: Register middleware on the API gateway
Add the rate limit middleware to the gateway's middleware chain,
before authentication middleware (rate limiting should apply even
to requests with invalid keys).
Depends on: Step 2
### Step 4: Write and run tests
Implement the test cases from the Test Cases section above.
Run all tests including existing test suite.
Depends on: Steps 1-3
### Step 5: Update API documentation
Document the rate limiting behaviour, including the 429 response
format and Retry-After header, in the API reference.
Depends on: Step 4 (confirms final behaviour)

How to know the task is complete. This maps to the VALIDATE phase of Jimmy’s Workflow.

## Validation
- All 7 test cases pass
- Existing test suite passes with no regressions
- Manual test: send 101 rapid requests via curl, confirm 429 on the last
- Rate limit persists after restarting the server
- Confidence: HIGH if all above pass

The implementation plan is a document — typically a markdown file in the project. When the coding session begins, the AI reads the plan and executes it using Jimmy’s Workflow:

“Read the implementation plan at ./docs/plans/rate-limiting.md and execute it using Jimmy’s Workflow”

The AI then:

  1. PRE-FLIGHT — Reads the plan, checks that all dependencies are available, confirms requirements are clear
  2. IMPLEMENT — Follows the implementation steps in order, writing tests first (from the Test Cases section), then implementation code
  3. VALIDATE — Runs the test suite, checks against the success criteria
  4. CHECKPOINT — Reports confidence level and completion status

The plan is the single source of truth. The AI does not invent requirements. It does not skip steps. It implements what the plan says.

Implementation plans structured this way map directly to Principle 5.5 (AI-Optimized Documentation):

Principle 5.5 sub-principleHow implementation plans apply it
Structured data over proseSections with clear headings, lists, explicit labels
Explicit contextGoal statement explains what and why; dependencies are noted
Cause-effect relationships”If counter exceeds 100 and window has not expired, return 429”
Machine-readable formatsConsistent structure: Goal, Success Criteria, Test Cases, Steps, Validation
Searchable contentTest case IDs (TC-1, TC-2), step numbers, clear heading hierarchy
Version-stampedPlans include dates when relevant
Cross-referencedDependencies between steps are explicit

The plan is structured data that happens to be readable by humans. It is designed to be consumed by an AI implementation agent — and it works because AI excels at following explicit, structured instructions.

MistakeWhy it failsFix
Writing implementation steps before test casesAI implements to the steps, not to the tests — no verification contractAlways write test cases first
Using pseudocode in stepsAmbiguous — AI doesn’t know if it’s literal or conceptualUse natural language exclusively
Vague success criteria (“it should work”)AI declares success prematurelyMeasurable, testable conditions only
Skipping dependency notesAI implements steps out of order or misses prerequisitesNote dependencies on every step
Over-specifying implementation detailsConstrains AI from using project-appropriate patternsDescribe what, not how
Combining multiple features in one planHarder to validate, harder to checkpointOne feature per plan