Skip to content

Haiku 4.5 Findings

Claude Haiku 4.5, when paired with Jimmy’s Workflow, delivered 1.8x faster execution, 67% lower cost, and 5% higher quality scores than Claude Sonnet 4.5 operating without structured workflow. This was the opposite of the expected result.

MetricLarger model (Sonnet 4.5)Smaller model + workflow (Haiku 4.5)Difference
Average speed12.9s per query7.6s per query1.8x faster
Cost (input/output per 1M tokens)$3 / $15$1 / $567% cheaper
Quality score4.4 / 54.6 / 5+5% (Haiku higher)
Success rate71%86%+15 percentage points

Testing was conducted on a content processing platform with a production database of 3,833 articles across 29 RSS sources.

The central finding was not that Haiku is “good enough” — it is that Jimmy’s Workflow eliminates Haiku’s primary weakness while amplifying its primary strength.

Haiku’s weakness: Less implicit inference. Without explicit structure, Haiku makes more assumptions and misses nuance that larger models handle intuitively.

Haiku’s strength: Fast, precise pattern execution. When told exactly what to do and how to validate it, Haiku follows instructions with higher fidelity than Sonnet.

Jimmy’s Workflow provides: Explicit thinking templates, mandatory validation gates, progressive disclosure patterns, and structured output requirements.

The result is a synergy — the workflow compensates for the model’s limitation and the model excels at the workflow’s demands. Haiku does not merely tolerate the structured workflow. It thrives on it.

MetricSonnet 4.5Haiku 4.5
Average response time12.9s7.6s
ConsistencyVariable (high variance)Consistent (low variance)
Speed advantageBaseline1.8x faster

Haiku was not only faster on average but more predictable. The lower variance matters for production systems where consistent response times affect user experience.

MetricSonnet 4.5Haiku 4.5
Input cost (per 1M tokens)$3.00$1.00
Output cost (per 1M tokens)$15.00$5.00
Cost ratioBaseline67% cheaper

The 67% cost reduction is on the token pricing alone. Combined with the speed improvement and higher success rate, effective cost savings are even larger when accounting for retries and timeouts.

MetricSonnet 4.5Haiku 4.5 + workflow
Quality score4.4 / 54.6 / 5
Success rate71%86%

This was the unexpected result. The smaller model with structured workflow scored higher on quality than the larger model. The explanation lies in the behavioural differences documented below.

What works excellently with smaller models + structured workflow

Section titled “What works excellently with smaller models + structured workflow”

Haiku follows thinking templates more thoroughly than Sonnet. Where Sonnet sometimes abbreviates its reasoning (treating steps as obvious), Haiku executes every step of the template with full detail. The result is more transparent, more auditable reasoning.

Haiku never skipped a validation gate during testing. Not once. Sonnet occasionally treated validation as a formality, producing abbreviated checks. Haiku treated every gate as a genuine checkpoint, running full validation each time.

When given multi-step tasks with progressive complexity, Haiku executed the progression flawlessly. Each step built cleanly on the previous one, with no shortcuts or collapsed steps.

Haiku was more cautious and more detailed in its safety checks than Sonnet. It flagged more edge cases, asked more clarifying questions, and documented more assumptions. For production systems, this is a feature, not a limitation.

When Haiku’s initial output contained an error, the validation gate triggered self-correction. The structured workflow creates a feedback loop that catches and fixes mistakes before they reach the output. This self-healing property is critical for reliability.

Without explicit structure, Haiku’s quality drops measurably. Tasks that rely on the model “knowing what you mean” without being told are where larger models have a genuine advantage. Structured workflow eliminates this gap by making all expectations explicit.

Both Haiku and Sonnet make incorrect assumptions about data schemas when not given explicit schema definitions. This is not a Haiku-specific weakness — it is a universal AI limitation that structured workflow mitigates equally for both model tiers.

AspectLarger model (Sonnet 4.5)Smaller model + workflow (Haiku 4.5)
Thinking detailModerate — sometimes abbreviatesHigh — executes every template step
Validation steps per task3-44-5 (more thorough)
Workflow complianceGood (occasional shortcuts)Perfect (100% compliance)
Edge case flaggingModerateHigh (more cautious)
Self-correction rateLower (fewer errors caught)Higher (validation gates trigger fixes)
Reasoning transparencyImplicit leapsExplicit step-by-step

Structure matters more than raw model capability for well-defined tasks.

This is the central finding. When the task is well-defined and the workflow is explicit, the model’s “intelligence ceiling” is less important than its ability to follow structured instructions reliably. Haiku’s perfect workflow compliance outweighs Sonnet’s superior implicit reasoning for any task where the workflow can be made explicit.

This does not mean smaller models are universally better. It means that for structured, well-defined tasks — which is the majority of production workloads — investing in workflow design pays higher returns than investing in larger models.

  1. Use smaller models for structured, well-defined tasks. If you can define the task explicitly with clear inputs, expected outputs, and validation criteria, a smaller model with structured workflow will likely match or beat a larger model.

  2. Always provide explicit workflow structure. Never rely on the model “figuring out” what you want. Make thinking steps, validation criteria, and output format explicit.

  3. Validation gates are critical. They are not overhead — they are the mechanism that makes smaller models viable. Removing validation gates to save tokens will cost more in retries and quality failures.

  4. Cost savings scale linearly with query volume. The 67% per-query savings compound directly with volume.

Assuming average token usage of 2,000 input tokens and 1,000 output tokens per query:

Daily queriesSonnet 4.5 daily costHaiku 4.5 daily costDaily savingsMonthly savings
100$2.10$0.70$1.40$42
1,000$21.00$7.00$14.00$420
10,000$210.00$70.00$140.00$4,200

These figures reflect token cost only. When factoring in the higher success rate (86% vs 71%), the effective savings increase further because fewer retries are needed.

  • Sample size: 7-query test suite. Directionally strong but not statistically robust.
  • Single domain: Content processing tasks only. Code generation, creative writing, and other domains were not tested.
  • Model versions: Claude Haiku 4.5 and Claude Sonnet 4.5 as available in October 2025. Model updates may change the dynamics.
  • Quality assessment: Scores assigned by a single evaluator on a subjective 1-5 scale.
  • Workflow dependency: Results are specific to Jimmy’s Workflow v2.1. Other structured workflow systems may produce different outcomes.

These findings inform architecture decisions. They are not a substitute for testing with your own workloads and quality criteria.