Batch Architecture: CSV as Input, Structured Output
Session 10.2 · ~5 min read
The Production Manifest
Batch processing starts with a structured input file. Not a list of topics in a text document. Not a folder of notes. A proper manifest: a CSV or spreadsheet where each row is one piece of content to produce, and each column is a parameter your pipeline needs.
The manifest is your production order. It defines everything that gets built, how it gets built, and what constraints apply. Your pipeline script reads it row by row, runs the agent chain for each row, and saves the output in a structured folder. Human involvement drops to reviewing outputs, not configuring each piece.
Manifest Structure
The columns in your manifest correspond to the inputs your pipeline requires. At minimum:
| Column | Purpose | Example Value |
|---|---|---|
| id | Unique identifier for tracking | blog-042 |
| topic | Content topic or title | Why remote onboarding fails |
| audience | Target reader | HR directors at mid-size SaaS companies |
| angle | Specific thesis or perspective | The problem is not the tools; it is the absence of informal trust-building |
| word_count | Target length | 1200 |
| voice_variant | Which voice profile to use | professional |
| research_questions | Semicolon-separated questions | What studies exist on remote onboarding?; What is the attrition rate... |
| required_elements | What the piece must include | At least 2 data points; one case study |
| forbidden_elements | What the piece must not include | No bullet lists; no rhetorical questions in headings |
| status | Pipeline tracking | pending / researched / drafted / reviewed / published |
The Processing Flow
from row columns"] C --> D["Run agent chain"] D --> E["Save output to
structured folder"] E --> F["Update status column"] F --> G{"More rows?"} G -- Yes --> B G -- No --> H["Batch complete"] style A fill:#222221,stroke:#c8a882,color:#ede9e3 style B fill:#222221,stroke:#6b8f71,color:#ede9e3 style C fill:#222221,stroke:#8a8478,color:#ede9e3 style D fill:#222221,stroke:#c8a882,color:#ede9e3 style E fill:#222221,stroke:#6b8f71,color:#ede9e3 style F fill:#222221,stroke:#8a8478,color:#ede9e3 style G fill:#222221,stroke:#c47a5a,color:#ede9e3 style H fill:#222221,stroke:#c8a882,color:#ede9e3
The script does not process rows that already have a non-pending status. If you stop the batch and restart it, it picks up where it left off. This is idempotent processing: running the script again does not re-process completed items.
Output Folder Structure
Each manifest row produces outputs across multiple pipeline stages. Organize them consistently:
output/
├── blog-042/
│ ├── research-brief.json
│ ├── outline.md
│ ├── draft-v1.md
│ ├── review.json
│ ├── draft-final.md
│ ├── output.html
│ ├── output.pdf
│ └── metadata.json
├── blog-043/
│ └── ...
└── batch-log.csv
Every intermediate artifact is preserved. If the final output has a problem, you can trace it back to the specific pipeline stage where the problem originated. The batch log records timestamps, token usage, costs, and error counts per item.
Batch Validation
Before running a batch, validate the manifest itself:
| Check | What It Catches |
|---|---|
| No empty required columns | Missing topics, missing audiences |
| No duplicate IDs | Two rows that would overwrite each other's output |
| Word counts within range | Typos (120 instead of 1200) or unrealistic targets |
| Voice variant exists | References to voice profiles that have not been created |
| Research questions parseable | Malformed semicolon-separated lists |
Run validation before processing. A manifest error on row 47 that crashes the script after rows 1 through 46 have completed is a waste of 46 rows of processing time.
Incremental Batching
You do not have to process the entire manifest at once. Incremental batching means processing 5 to 10 rows, reviewing the outputs, adjusting prompts or manifest entries if needed, and then processing the next 5 to 10. This catches systematic issues early instead of discovering after 100 rows that the voice variant was wrong.
The manifest is the single source of truth for what your pipeline produces. If it is not in the manifest, it does not get built. If it is in the manifest with incorrect parameters, it gets built incorrectly. Invest time in the manifest before pressing Enter on the batch.
Further Reading
- Building an AI Production Pipeline That Scales, Joyspace
- Scalable Content Production Process, Heinz Marketing
- Content Workflow Guide, Planable
Assignment
Create a production manifest for a batch of 10 pieces of content:
- Define columns for every parameter your pipeline needs.
- Fill in all 10 rows with real content specifications (not placeholders).
- Run manifest validation: check for empty fields, duplicate IDs, and parseable research questions.
- Process the first 2 rows through your pipeline. Review the outputs.
If the first 2 outputs meet your quality standards, process the remaining 8. If not, identify the issue, fix the manifest or pipeline, and re-run the first 2 before proceeding. This is your first real batch run.