Course → Module 10: Batch Processing & Scale
Session 4 of 8

Know the Bill Before You Press Enter

API calls cost money. Not much per call, but costs compound in batch processing. A batch of 100 articles that costs $15 is affordable. A batch of 100 articles where each fails and regenerates three times, with an over-long prompt that triples token usage, costs $135. The difference is the gap between estimation and guessing.

Cost estimation before execution means: calculating expected token counts, multiplying by per-token rates, adding a failure margin, and knowing the number before you commit. This is not optional at scale. It is how you prevent budget surprises.

Token Counting Fundamentals

API costs are measured in tokens. A token is roughly 0.75 words in English (or about 4 characters). A 1,000-word article is approximately 1,333 tokens of output. Your prompt (system message + user message + context) might be 2,000 to 5,000 tokens of input.

Costs are charged separately for input tokens and output tokens. Output tokens are typically 3 to 5 times more expensive than input tokens.

Provider / Model Input (per 1M tokens) Output (per 1M tokens) 1,000-word article cost*
Claude Sonnet 4.6 $3.00 $15.00 $0.03
Claude Haiku 4.5 $1.00 $5.00 $0.01
GPT-5.2 $1.75 $14.00 $0.03
Gemini 2.5 Pro $1.25 $10.00 $0.02
Gemini 2.0 Flash $0.30 $2.50 $0.005

* Estimated for a single generation call with ~3,000 input tokens and ~1,333 output tokens. Multi-agent chains multiply this by the number of agents.

The Cost Estimation Formula

For a batch of N items, each processed by an agent chain with A agents:

flowchart LR A["Per-item cost"] --> B["× Number of items (N)"] B --> C["× Failure multiplier"] C --> D["= Total batch cost"] style A fill:#222221,stroke:#c8a882,color:#ede9e3 style B fill:#222221,stroke:#6b8f71,color:#ede9e3 style C fill:#222221,stroke:#c47a5a,color:#ede9e3 style D fill:#222221,stroke:#c8a882,color:#ede9e3

Per-item cost = sum of (input_tokens * input_rate + output_tokens * output_rate) for each agent in the chain.

Failure multiplier = 1 + (expected_failure_rate * average_retries). If 20% of items fail and each gets 1 retry, the multiplier is 1.2. If 10% fail with 2 retries each, the multiplier is 1.2 as well.

Scenario Items Per-item cost Failure rate Total estimated cost
Blog posts, 3-agent chain, Sonnet 10 $0.09 10% $0.99
Blog posts, 3-agent chain, Sonnet 100 $0.09 15% $10.35
Product descriptions, 4-agent, Haiku 500 $0.03 10% $16.50
Book chapters, 3-agent, Sonnet (long) 25 $0.35 20% $10.50

Building a Cost Estimator

A cost estimator is a spreadsheet or script that takes your batch manifest and calculates the total cost before you execute. Inputs:

Your AI coding assistant can build this in under 5 minutes. The spreadsheet version is a single formula row. Either way, run it before every batch.

Cost Optimization Strategies

Four strategies reduce batch costs without reducing quality:

Strategy How It Saves Typical Savings
Use smaller models for appropriate tasks Research and formatting agents can use Haiku/Flash instead of Sonnet/Pro 40-70% per agent
Trim prompt length Remove redundant instructions, reduce context to essentials 10-30% on input costs
Prompt caching Repeated system prompts are cached at 90% discount on most providers Up to 90% on system prompt tokens
Batch API Submit jobs for async processing (not real-time) at 50% discount 50% across all tokens

Prompt caching and batch API discounts are significant. If your system prompt is 2,000 tokens and you run 100 items, that is 200,000 cached input tokens at 10% of normal price instead of full price. The savings justify the minor increase in latency.

The cost of producing AI content is not zero. It is low enough to be dangerous. Low costs encourage waste: over-long prompts, unnecessary retries, premium models for simple tasks. Estimate costs before every batch. Track actual costs after. The discipline prevents waste from compounding.

Further Reading

Assignment

Build a cost estimator for your batch pipeline:

  1. Measure actual token counts from your test runs: input tokens per agent, output tokens per agent.
  2. Look up current per-token pricing for your chosen model.
  3. Calculate per-item cost across your full agent chain.
  4. Apply your failure rate from error logs (or estimate 15% if you do not have data yet).
  5. Run the estimator on your 10-item manifest from Session 10.2. What is the predicted cost?

After running the batch, compare estimated cost to actual cost. How close was your estimate? Adjust the estimator based on actual data.