Prompt Testing
Prompt testing applies the same rigour to prompts that software testing applies to code. This guide covers testing strategies and implementations for the CAP (Composable Agentic Prompt) workflow across Python, TypeScript, and Rust.
The prompt testing pyramid
Section titled “The prompt testing pyramid” /\ / \ E2E (5%) — Full prompt runs /----\ / \ Integration (15%) — Multi-component /--------\ / \ Component (30%) — Single lens /------------\ / \ Unit (50%) — Schema, parsing/________________\What to test
Section titled “What to test”| Dimension | What | Why |
|---|---|---|
| Correctness | Output matches expected | Core functionality |
| Schema compliance | Output structure valid | Integration reliability |
| Robustness | Handles edge cases | Production stability |
| Performance | Latency and token usage | Cost management |
| Determinism | Consistent results | Reproducibility |
| Safety | No harmful outputs | Security and compliance |
Python implementation
Section titled “Python implementation”from pydantic import BaseModelfrom typing import Literal, Optional
class SecurityFinding(BaseModel): type: str severity: Literal['CRITICAL', 'HIGH', 'MEDIUM', 'LOW'] file: str line: int evidence: str cwe: Optional[str]
def test_security_lens_output_schema(): result = run_security_lens(test_input) finding = SecurityFinding(**result) assert finding.severity in ('CRITICAL', 'HIGH', 'MEDIUM', 'LOW')
def test_sql_injection_detection(): code = "query = 'SELECT * FROM users WHERE id=' + userId" result = run_security_lens(code) assert any(f['type'] == 'sql_injection' for f in result['findings'])TypeScript implementation
Section titled “TypeScript implementation”import { z } from 'zod';
const FindingSchema = z.object({ type: z.string(), severity: z.enum(['CRITICAL', 'HIGH', 'MEDIUM', 'LOW']), file: z.string(), line: z.number(), evidence: z.string(),});
test('Security lens detects SQL injection', async () => { const code = "query = 'SELECT * FROM users WHERE id=' + userId"; const result = await runLens(securityLens, code);
expect(result.findings).toContainEqual( expect.objectContaining({ type: 'sql_injection', severity: 'CRITICAL' }) );});
test('Output matches schema', async () => { const result = await runLens(securityLens, testInput); const parsed = FindingSchema.safeParse(result.findings[0]); expect(parsed.success).toBe(true);});Rust implementation
Section titled “Rust implementation”use serde::Deserialize;
#[derive(Deserialize, Debug)]struct SecurityFinding { finding_type: String, severity: Severity, file: String, line: u32, evidence: String,}
#[derive(Deserialize, Debug, PartialEq)]enum Severity { CRITICAL, HIGH, MEDIUM, LOW,}
#[test]fn test_output_schema_valid() { let result = run_security_lens(test_input()); let finding: SecurityFinding = serde_json::from_str(&result) .expect("Output should deserialise to SecurityFinding"); assert!(matches!(finding.severity, Severity::CRITICAL | Severity::HIGH));}Testing patterns
Section titled “Testing patterns”Boundary exclusion testing
Section titled “Boundary exclusion testing”Verify that components respect their boundaries:
def test_security_lens_ignores_performance(): """Security lens should NOT flag performance issues""" slow_code = "for i in range(1000000): data.append(fetch(i))" result = run_security_lens(slow_code) assert not any(f['type'].startswith('perf') for f in result['findings'])Determinism testing
Section titled “Determinism testing”Run the same prompt multiple times and check consistency:
def test_determinism(): results = [run_security_lens(test_input) for _ in range(5)] types = [set(f['type'] for f in r['findings']) for r in results] assert all(t == types[0] for t in types), "Results should be consistent"Token budget testing
Section titled “Token budget testing”Track and control cost:
def test_token_budget(): result = run_lens_with_metrics(security_lens, large_codebase) assert result.input_tokens < 50000, "Should stay under budget" assert result.output_tokens < 5000, "Output should be concise"Full reference
Section titled “Full reference”The complete prompt testing guide (1200+ lines with CI/CD integration, cross-language patterns, and advanced testing strategies) is at prompt-testing-implementation-guide.md.