Stage 6: Formatting and Export
Session 8.7 · ~5 min read
Formatting Is Automation's Best Use Case
You have an approved piece of content. It passed review. It passed editing. Now it needs to exist in every format your distribution channels require. HTML for the website. PDF for downloads. EPUB for e-readers. WordPress XML for import. Markdown for archives.
This is mechanical work. It requires no creativity, no judgment, no taste. It requires exact, repeatable conversion from one format to another. That makes it the perfect job for automation.
The Single-Source Principle
The cardinal rule of formatting: one source file produces all output formats. You do not maintain separate versions for web, print, and email. You maintain one canonical version (typically Markdown or clean HTML) and convert it automatically.
(Markdown)"] --> B["HTML"] A --> C["PDF"] A --> D["EPUB"] A --> E["WordPress XML"] A --> F["Email HTML"] A --> G["Plain Text"] style A fill:#222221,stroke:#c8a882,color:#ede9e3 style B fill:#222221,stroke:#6b8f71,color:#ede9e3 style C fill:#222221,stroke:#8a8478,color:#ede9e3 style D fill:#222221,stroke:#c47a5a,color:#ede9e3 style E fill:#222221,stroke:#c8a882,color:#ede9e3 style F fill:#222221,stroke:#6b8f71,color:#ede9e3 style G fill:#222221,stroke:#8a8478,color:#ede9e3
If you edit the PDF version separately from the HTML version, they will diverge. Within three edits, you have two different pieces of content that are supposed to be identical. This is how errors multiply. One source, many outputs. Always.
Pandoc: The Universal Converter
Pandoc is a free, open-source document converter that handles over 40 formats. It converts Markdown to HTML, HTML to PDF (via LaTeX), Markdown to EPUB, Markdown to DOCX, and nearly every other combination. It runs from the command line, which means it can be scripted and automated.
| Conversion | Command | Notes |
|---|---|---|
| Markdown to HTML | pandoc input.md -o output.html |
Add --standalone for complete HTML with head/body |
| Markdown to PDF | pandoc input.md -o output.pdf |
Requires LaTeX (install TeX Live or MiKTeX) |
| Markdown to EPUB | pandoc input.md -o output.epub |
Add metadata with --metadata title="Title" |
| Markdown to DOCX | pandoc input.md -o output.docx |
Use --reference-doc for branded templates |
| HTML to Markdown | pandoc input.html -o output.md |
Useful for importing legacy content into your pipeline |
Your AI coding assistant can write a batch conversion script in minutes. The script reads every file in your "approved" folder, converts each to all required formats, and saves the outputs in format-specific subfolders. Run it once after every editing pass.
Metadata Injection
Formatting is not just about the content body. Every output format needs metadata: titles, descriptions, author names, publication dates, keywords, and Open Graph tags for social sharing.
Store metadata in a structured file (YAML front matter in your Markdown source, or a separate JSON file per piece). Your conversion script reads the metadata and injects it into the correct location for each format:
- HTML:
<title>,<meta>tags, Open Graph properties - PDF: document properties (title, author, subject)
- EPUB: OPF metadata (dc:title, dc:creator, dc:description)
- WordPress: post title, excerpt, categories, tags
Manual metadata entry is a common source of errors. Automate it. The metadata exists in one place and propagates to all formats automatically.
Visual Consistency Across Formats
Each output format has its own rendering engine. HTML renders in browsers. PDF renders via LaTeX or a PDF engine. EPUB renders in e-reader software. The same content can look different in each format, and "different" sometimes means "broken."
Build a format test checklist:
| Check | HTML | EPUB | |
|---|---|---|---|
| Headings render correctly | Verify in browser | Verify in PDF reader | Verify in Calibre or e-reader |
| Tables are legible | Check responsive behavior | Check column widths | Tables may not render; use alternatives |
| Images display | Check paths | Check embedding | Check file inclusion |
| Links work | Click each link | Verify clickable | Verify clickable |
| Code blocks formatted | Check syntax highlighting | Check monospace font | Check line wrapping |
Run this checklist on your first batch. Once your conversion pipeline is stable, spot-check rather than full-check. But the first time, verify everything.
The quality gate for Stage 6: all target formats generated without errors, metadata correct in every format, and visual spot-checks pass. This is the last automated stage before publishing.
Further Reading
- Pandoc User's Guide, John MacFarlane
- Pandoc on GitHub
- Using Pandoc to Format a Dissertation, Terence Eden
- Docker Pandoc: Automate Documentation Pipeline, Boundev
Assignment
Take your finished piece from Session 8.6 and convert it to at least 3 different formats:
- Install Pandoc if you have not already (
pandoc.org/installing.html). - Save your approved content as a Markdown file with YAML front matter for metadata.
- Convert to HTML, PDF, and one additional format of your choice.
- Run the format test checklist on each output.
If you are comfortable scripting: ask your AI coding assistant to create a batch conversion script that takes a Markdown file as input and produces all three formats. One command, three outputs.