Session 10.4: Cost Estimation Before Execution

Course → Module 10: Batch Processing & Scale

Session 4 of 8

Know the Bill Before You Press Enter

API calls cost money. Not much per call, but costs compound in batch processing. A batch of 100 articles that costs $15 is affordable. A batch of 100 articles where each fails and regenerates three times, with an over-long prompt that triples token usage, costs $135. The difference is the gap between estimation and guessing.

Cost estimation before execution means: calculating expected token counts, multiplying by per-token rates, adding a failure margin, and knowing the number before you commit. This is not optional at scale. It is how you prevent budget surprises.

Token Counting Fundamentals

API costs are measured in tokens. A token is roughly 0.75 words in English (or about 4 characters). A 1,000-word article is approximately 1,333 tokens of output. Your prompt (system message + user message + context) might be 2,000 to 5,000 tokens of input.

Costs are charged separately for input tokens and output tokens. Output tokens are typically 3 to 5 times more expensive than input tokens.

Provider / Model	Input (per 1M tokens)	Output (per 1M tokens)	1,000-word article cost*
Claude Sonnet 4.6	$3.00	$15.00	$0.03
Claude Haiku 4.5	$1.00	$5.00	$0.01
GPT-5.2	$1.75	$14.00	$0.03
Gemini 2.5 Pro	$1.25	$10.00	$0.02
Gemini 2.0 Flash	$0.30	$2.50	$0.005

* Estimated for a single generation call with ~3,000 input tokens and ~1,333 output tokens. Multi-agent chains multiply this by the number of agents.

The Cost Estimation Formula

For a batch of N items, each processed by an agent chain with A agents:

flowchart LR A["Per-item cost"] --> B["× Number of items (N)"] B --> C["× Failure multiplier"] C --> D["= Total batch cost"] style A fill:#222221,stroke:#c8a882,color:#ede9e3 style B fill:#222221,stroke:#6b8f71,color:#ede9e3 style C fill:#222221,stroke:#c47a5a,color:#ede9e3 style D fill:#222221,stroke:#c8a882,color:#ede9e3

Per-item cost = sum of (input_tokens * input_rate + output_tokens * output_rate) for each agent in the chain.

Failure multiplier = 1 + (expected_failure_rate * average_retries). If 20% of items fail and each gets 1 retry, the multiplier is 1.2. If 10% fail with 2 retries each, the multiplier is 1.2 as well.

Scenario	Items	Per-item cost	Failure rate	Total estimated cost
Blog posts, 3-agent chain, Sonnet	10	$0.09	10%	$0.99
Blog posts, 3-agent chain, Sonnet	100	$0.09	15%	$10.35
Product descriptions, 4-agent, Haiku	500	$0.03	10%	$16.50
Book chapters, 3-agent, Sonnet (long)	25	$0.35	20%	$10.50

Building a Cost Estimator

A cost estimator is a spreadsheet or script that takes your batch manifest and calculates the total cost before you execute. Inputs:

Number of items in the manifest
Estimated input tokens per agent per item (measure from your test runs)
Estimated output tokens per agent per item
API price per token (input and output, for your chosen model)
Expected failure rate (from your error logs)
Average retries per failure

Your AI coding assistant can build this in under 5 minutes. The spreadsheet version is a single formula row. Either way, run it before every batch.

Cost Optimization Strategies

Four strategies reduce batch costs without reducing quality:

Strategy	How It Saves	Typical Savings
Use smaller models for appropriate tasks	Research and formatting agents can use Haiku/Flash instead of Sonnet/Pro	40-70% per agent
Trim prompt length	Remove redundant instructions, reduce context to essentials	10-30% on input costs
Prompt caching	Repeated system prompts are cached at 90% discount on most providers	Up to 90% on system prompt tokens
Batch API	Submit jobs for async processing (not real-time) at 50% discount	50% across all tokens

Prompt caching and batch API discounts are significant. If your system prompt is 2,000 tokens and you run 100 items, that is 200,000 cached input tokens at 10% of normal price instead of full price. The savings justify the minor increase in latency.

The cost of producing AI content is not zero. It is low enough to be dangerous. Low costs encourage waste: over-long prompts, unnecessary retries, premium models for simple tasks. Estimate costs before every batch. Track actual costs after. The discipline prevents waste from compounding.

Assignment

Build a cost estimator for your batch pipeline:

Measure actual token counts from your test runs: input tokens per agent, output tokens per agent.
Look up current per-token pricing for your chosen model.
Calculate per-item cost across your full agent chain.
Apply your failure rate from error logs (or estimate 15% if you do not have data yet).
Run the estimator on your 10-item manifest from Session 10.2. What is the predicted cost?

After running the batch, compare estimated cost to actual cost. How close was your estimate? Adjust the estimator based on actual data.

Cost Estimation Before Execution