Session 2.5: The Math: What AI Infrastructure Actually Costs

Course → Module 2: AI as Infrastructure, Not Magic

Session 5 of 5

"AI is cheap" is a statement made by people who have never tracked their costs. The raw API pricing looks trivial. A few dollars per million tokens. But raw pricing is not total cost. Total cost includes every failed generation you paid for, every hour of human review, every iteration of a prompt that did not work, and every tool subscription that makes the pipeline run.

This session breaks down what AI infrastructure actually costs in production. Not in theory. In practice.

The Visible Costs

API pricing is the easiest number to find. As of early 2026, the major providers charge per million tokens, with significant variation between models and tiers.

Provider / Model	Input (per 1M tokens)	Output (per 1M tokens)	Best For
Gemini 2.0 Flash-Lite	$0.075	$0.30	High-volume, low-complexity tasks
Gemini 2.5 Flash	$0.30	$2.50	Balanced speed and quality
Claude Sonnet	$3.00	$15.00	Complex writing, voice matching
GPT-5.2	$1.75	$14.00	General-purpose flagship
Claude Opus	$5.00	$25.00	Highest-quality reasoning
DeepSeek V3.2	$0.28	$0.42	Budget-friendly production

These numbers look small. A 1,000-word article uses roughly 1,500 tokens of output and maybe 3,000 tokens of input (prompt + system prompt + context). At Claude Sonnet rates, that is about $0.03 per article. Cheap, right?

The Hidden Costs

The API call is the smallest line item. Here is where the real money goes.

graph TD A["API Token Cost
~5% of total"] --> T["Total Production Cost"] B["Failed Generations
~15% of total"] --> T C["Human Review Time
~40% of total"] --> T D["Prompt Development
~20% of total"] --> T E["Tool Subscriptions
~10% of total"] --> T F["Rework Cycles
~10% of total"] --> T style A fill:#2a2a28,stroke:#6b8f71,color:#ede9e3 style C fill:#2a2a28,stroke:#c8a882,color:#ede9e3 style D fill:#2a2a28,stroke:#c8a882,color:#ede9e3

API token costs are typically 5% or less of total production cost. The expensive part is human time: reviewing output, developing prompts, and managing rework cycles.

Failed generations

Not every API call produces usable output. In a well-tuned pipeline, maybe 70-80% of generations pass quality checks on the first try. The rest get regenerated. You pay for the failures too. At scale, a 25% failure rate means your effective API cost is 33% higher than the raw per-token price.

Human review hours

If you value your review time at $50/hour and a 1,000-word article takes 15 minutes to review thoroughly, that is $12.50 per article in review cost alone. Compare that to the $0.03 in API costs. The review is 400 times more expensive than the generation.

Prompt development

A good system prompt takes 5-15 iterations to develop. Each iteration requires generation, evaluation, and adjustment. The time investment for a new prompt template can easily reach 2-4 hours. Amortized over hundreds of uses, the per-piece cost drops. But the upfront investment is real.

Tool subscriptions

VS Code is free. Python is free. But Tavily search API, cloud hosting for scripts, version control platforms, grammar checking tools, and specialized formatting tools add up. A typical production setup runs $50-150/month in tool costs before a single piece of content is generated.

Real Cost Breakdown: A Book-Length Project

Consider a practical example: producing a 50,000-word book using an AI-assisted pipeline.

Cost Category	Estimate	Notes
API calls (drafting)	$15-40	~75K output tokens + context, multiple passes
API calls (research)	$10-25	Search API calls, source extraction
Failed generations	$5-15	~20-30% regen rate
Human review (40 hrs @ $50)	$2,000	Reading, checking facts, voice editing
Prompt development (8 hrs)	$400	System prompts, templates, testing
Tool subscriptions (1 month)	$100	Search APIs, formatting tools
Total	$2,530-2,580	API is ~2% of total cost

That same book, written entirely by a human at $50/hour, might take 200-400 hours: $10,000-20,000. The AI-assisted pipeline is 75-85% cheaper. But it is not free, and the savings come from reduced drafting time, not from eliminating human involvement.

Cost Reduction Strategies

Two techniques can significantly reduce API costs at scale. Prompt caching stores frequently used context (like system prompts and voice fingerprints) so you do not pay to resend them with every request. Depending on the provider, this saves 50-90% on repeated context. Batch APIs allow you to submit jobs asynchronously and receive results later, typically at 50% of the real-time price. If you do not need instant output, batch processing cuts your API bill in half.

The Comparison That Matters

The relevant comparison is not "AI vs. free." It is "AI-assisted production vs. the alternative." If the alternative is hiring writers at $0.10-0.50 per word, a 50,000-word book costs $5,000-25,000 in writing fees alone, before editing. If the alternative is doing everything yourself, the cost is your time at your hourly rate.

AI infrastructure does not eliminate cost. It shifts cost from content generation (where AI is fast and cheap) to quality control (where humans are slow and expensive). Understanding this shift is how you budget accurately and avoid the trap of thinking AI content is "basically free."

Assignment

Calculate the cost of producing one piece of your typical content using AI. Include: API costs (estimate token counts using a token counter), your time for review and editing (at your hourly rate), and any tool subscriptions you use.
Compare this to your pre-AI cost for the same output. If you wrote everything yourself, calculate your time at your hourly rate. If you hired writers, use their rates.
Build a simple cost-per-piece calculator in a spreadsheet with columns for: API input tokens, API output tokens, price per token, failure rate multiplier, review time (hours), hourly rate, and tool cost allocation. Calculate total cost per piece and compare to the non-AI alternative.

The Math: What AI Infrastructure Actually Costs