Course → Module 10: Batch Processing & Scale
Session 2 of 8

The Production Manifest

Batch processing starts with a structured input file. Not a list of topics in a text document. Not a folder of notes. A proper manifest: a CSV or spreadsheet where each row is one piece of content to produce, and each column is a parameter your pipeline needs.

The manifest is your production order. It defines everything that gets built, how it gets built, and what constraints apply. Your pipeline script reads it row by row, runs the agent chain for each row, and saves the output in a structured folder. Human involvement drops to reviewing outputs, not configuring each piece.

Manifest Structure

The columns in your manifest correspond to the inputs your pipeline requires. At minimum:

Column Purpose Example Value
id Unique identifier for tracking blog-042
topic Content topic or title Why remote onboarding fails
audience Target reader HR directors at mid-size SaaS companies
angle Specific thesis or perspective The problem is not the tools; it is the absence of informal trust-building
word_count Target length 1200
voice_variant Which voice profile to use professional
research_questions Semicolon-separated questions What studies exist on remote onboarding?; What is the attrition rate...
required_elements What the piece must include At least 2 data points; one case study
forbidden_elements What the piece must not include No bullet lists; no rhetorical questions in headings
status Pipeline tracking pending / researched / drafted / reviewed / published

The Processing Flow

flowchart TD A["CSV Manifest"] --> B["Script reads row"] B --> C["Build pipeline inputs
from row columns"] C --> D["Run agent chain"] D --> E["Save output to
structured folder"] E --> F["Update status column"] F --> G{"More rows?"} G -- Yes --> B G -- No --> H["Batch complete"] style A fill:#222221,stroke:#c8a882,color:#ede9e3 style B fill:#222221,stroke:#6b8f71,color:#ede9e3 style C fill:#222221,stroke:#8a8478,color:#ede9e3 style D fill:#222221,stroke:#c8a882,color:#ede9e3 style E fill:#222221,stroke:#6b8f71,color:#ede9e3 style F fill:#222221,stroke:#8a8478,color:#ede9e3 style G fill:#222221,stroke:#c47a5a,color:#ede9e3 style H fill:#222221,stroke:#c8a882,color:#ede9e3

The script does not process rows that already have a non-pending status. If you stop the batch and restart it, it picks up where it left off. This is idempotent processing: running the script again does not re-process completed items.

Output Folder Structure

Each manifest row produces outputs across multiple pipeline stages. Organize them consistently:

output/
├── blog-042/
│   ├── research-brief.json
│   ├── outline.md
│   ├── draft-v1.md
│   ├── review.json
│   ├── draft-final.md
│   ├── output.html
│   ├── output.pdf
│   └── metadata.json
├── blog-043/
│   └── ...
└── batch-log.csv

Every intermediate artifact is preserved. If the final output has a problem, you can trace it back to the specific pipeline stage where the problem originated. The batch log records timestamps, token usage, costs, and error counts per item.

Batch Validation

Before running a batch, validate the manifest itself:

Check What It Catches
No empty required columns Missing topics, missing audiences
No duplicate IDs Two rows that would overwrite each other's output
Word counts within range Typos (120 instead of 1200) or unrealistic targets
Voice variant exists References to voice profiles that have not been created
Research questions parseable Malformed semicolon-separated lists

Run validation before processing. A manifest error on row 47 that crashes the script after rows 1 through 46 have completed is a waste of 46 rows of processing time.

Incremental Batching

You do not have to process the entire manifest at once. Incremental batching means processing 5 to 10 rows, reviewing the outputs, adjusting prompts or manifest entries if needed, and then processing the next 5 to 10. This catches systematic issues early instead of discovering after 100 rows that the voice variant was wrong.

The manifest is the single source of truth for what your pipeline produces. If it is not in the manifest, it does not get built. If it is in the manifest with incorrect parameters, it gets built incorrectly. Invest time in the manifest before pressing Enter on the batch.

Further Reading

Assignment

Create a production manifest for a batch of 10 pieces of content:

  1. Define columns for every parameter your pipeline needs.
  2. Fill in all 10 rows with real content specifications (not placeholders).
  3. Run manifest validation: check for empty fields, duplicate IDs, and parseable research questions.
  4. Process the first 2 rows through your pipeline. Review the outputs.

If the first 2 outputs meet your quality standards, process the remaining 8. If not, identify the issue, fix the manifest or pipeline, and re-run the first 2 before proceeding. This is your first real batch run.