Quality Gates: Pass, Fail, Rework
Session 8.10 · ~5 min read
Three Outcomes, Not Two
Most people think of quality checks as binary: pass or fail. That is insufficient for a production pipeline. You need three outcomes at every gate: pass (advance to the next stage), fail (reject entirely and regenerate from scratch), and rework (return to the previous stage with specific corrections).
The difference between fail and rework is important. A draft that misses the point entirely (wrong topic, wrong audience, structural incoherence) should be regenerated. A draft that has the right bones but needs voice corrections and fact-checking should be reworked. Treating both the same wastes either time (reworking something beyond repair) or money (regenerating something that just needed editing).
Defining Criteria Before Production
Quality criteria must be defined before you start producing, not decided in the moment. When you are tired at 11 PM reviewing your tenth draft, "good enough" becomes very tempting. Pre-defined criteria remove that temptation. The criteria say what passes and what does not. Your mood is irrelevant.
Advance to next stage"] B -- "Score 30-39" --> D["REWORK
Return with corrections"] B -- "Score below 30" --> E["FAIL
Regenerate from scratch"] D --> F["Previous stage applies fixes"] F --> A E --> G["Start pipeline from Stage 3"] G --> A style A fill:#222221,stroke:#c8a882,color:#ede9e3 style B fill:#222221,stroke:#8a8478,color:#ede9e3 style C fill:#222221,stroke:#6b8f71,color:#ede9e3 style D fill:#222221,stroke:#c47a5a,color:#ede9e3 style E fill:#222221,stroke:#c47a5a,color:#ede9e3 style F fill:#222221,stroke:#8a8478,color:#ede9e3 style G fill:#222221,stroke:#8a8478,color:#ede9e3
The Scoring Rubric
A rubric converts subjective "this feels right" into objective scoring. Five dimensions, each scored 0 to 10, with total score determining the outcome.
| Dimension | Score 10 | Score 5 | Score 0 |
|---|---|---|---|
| Factual accuracy | Every claim verified, all sources cited, no hallucinations | Most claims accurate, 1-2 unverified assertions | Multiple hallucinated facts, no source alignment |
| Voice consistency | Indistinguishable from hand-written content | Mostly on-voice, occasional AI patterns visible | Generic AI voice throughout, no personality |
| Structural clarity | Perfect outline compliance, clear argument flow | Mostly follows outline, one section out of place | Ignores outline, no discernible argument |
| Originality of insight | Contains unique perspective, practitioner knowledge, or original data | Generic but competent treatment | Could have been written about any topic by any AI |
| AI artifact absence | Zero detectable artifacts in 1000 words | 3-5 minor artifacts (hedging, filler) | Reads like unedited AI output |
Gate Placement
Not every stage needs a full-rubric quality gate. Some stages need lightweight checks. The key is matching the gate intensity to the risk at that stage.
| Stage | Gate Type | What Gets Checked | Who Checks |
|---|---|---|---|
| 1. Research | Completeness check | Does the brief answer all research questions? Are sources rated? | Human (quick scan) |
| 2. Outline | Logic check | Do all three foundational questions have answers? Does argument flow? | Human |
| 3. Draft | Structural check | Does it follow the outline? Within word count? Voice approximation? | Automated + human |
| 4. Review | Full rubric | All 5 dimensions scored | Human |
| 5. Edit | Issue resolution check | All review annotations addressed? No new issues introduced? | Human |
| 6. Format | Technical check | All formats generated? Metadata correct? Visual spot-check? | Automated |
| 7. Publish | Pre-publish checklist | Links, images, metadata, canonical URLs, analytics | Automated + human |
Tracking Gate Performance
Every gate produces data. Track it.
- Pass rate per gate: What percentage of content passes each gate on the first attempt? A gate with a 30% first-pass rate tells you the upstream stage is broken.
- Rework rate: How often does content get sent back? High rework rates mean your specifications or inputs are insufficient.
- Fail rate: How often does content get rejected entirely? High fail rates mean your drafting prompts need fundamental revision.
- Average score per dimension: Which dimension consistently scores lowest? That is where to invest improvement effort.
This data turns your pipeline from a process into a learning system. Each production run generates information about where the pipeline is strong and where it is weak. Use that information. Adjust inputs. Refine prompts. Tighten specifications. The pipeline improves over time, but only if you measure it.
A quality gate without defined criteria is just an opinion. A quality gate with defined criteria, consistent scoring, and tracked metrics is a system. Systems improve. Opinions drift.
Further Reading
- 8 Steps To Create a Successful Content Production Process, SEOBoost
- Content Workflow Guide for 2026, Planable
- Building a Scalable Content Production Process, Heinz Marketing
Assignment
Create a quality rubric for your pipeline:
- Define 5 scoring dimensions relevant to your content type.
- For each dimension, describe what a 10, a 5, and a 0 look like.
- Set thresholds: what total score means pass, rework, or fail?
- Test the rubric by scoring 3 pieces of content (one you wrote, one decent AI output, one obvious slop). Do the scores differentiate them correctly?
Format the rubric as a one-page printable reference. Include scoring thresholds and gate placement for your pipeline stages. This document becomes the operational standard for everything your pipeline produces.