Session 8.10: Quality Gates: Pass, Fail, Rework

Course → Module 8: The Pipeline

Session 10 of 10

Three Outcomes, Not Two

Most people think of quality checks as binary: pass or fail. That is insufficient for a production pipeline. You need three outcomes at every gate: pass (advance to the next stage), fail (reject entirely and regenerate from scratch), and rework (return to the previous stage with specific corrections).

The difference between fail and rework is important. A draft that misses the point entirely (wrong topic, wrong audience, structural incoherence) should be regenerated. A draft that has the right bones but needs voice corrections and fact-checking should be reworked. Treating both the same wastes either time (reworking something beyond repair) or money (regenerating something that just needed editing).

Defining Criteria Before Production

Quality criteria must be defined before you start producing, not decided in the moment. When you are tired at 11 PM reviewing your tenth draft, "good enough" becomes very tempting. Pre-defined criteria remove that temptation. The criteria say what passes and what does not. Your mood is irrelevant.

flowchart TD A["Content Arrives at Gate"] --> B{"Score Against Rubric"} B -- "Score 40+" --> C["PASS
Advance to next stage"] B -- "Score 30-39" --> D["REWORK
Return with corrections"] B -- "Score below 30" --> E["FAIL
Regenerate from scratch"] D --> F["Previous stage applies fixes"] F --> A E --> G["Start pipeline from Stage 3"] G --> A style A fill:#222221,stroke:#c8a882,color:#ede9e3 style B fill:#222221,stroke:#8a8478,color:#ede9e3 style C fill:#222221,stroke:#6b8f71,color:#ede9e3 style D fill:#222221,stroke:#c47a5a,color:#ede9e3 style E fill:#222221,stroke:#c47a5a,color:#ede9e3 style F fill:#222221,stroke:#8a8478,color:#ede9e3 style G fill:#222221,stroke:#8a8478,color:#ede9e3

The Scoring Rubric

A rubric converts subjective "this feels right" into objective scoring. Five dimensions, each scored 0 to 10, with total score determining the outcome.

Dimension	Score 10	Score 5	Score 0
Factual accuracy	Every claim verified, all sources cited, no hallucinations	Most claims accurate, 1-2 unverified assertions	Multiple hallucinated facts, no source alignment
Voice consistency	Indistinguishable from hand-written content	Mostly on-voice, occasional AI patterns visible	Generic AI voice throughout, no personality
Structural clarity	Perfect outline compliance, clear argument flow	Mostly follows outline, one section out of place	Ignores outline, no discernible argument
Originality of insight	Contains unique perspective, practitioner knowledge, or original data	Generic but competent treatment	Could have been written about any topic by any AI
AI artifact absence	Zero detectable artifacts in 1000 words	3-5 minor artifacts (hedging, filler)	Reads like unedited AI output

Gate Placement

Not every stage needs a full-rubric quality gate. Some stages need lightweight checks. The key is matching the gate intensity to the risk at that stage.

Stage	Gate Type	What Gets Checked	Who Checks
1. Research	Completeness check	Does the brief answer all research questions? Are sources rated?	Human (quick scan)
2. Outline	Logic check	Do all three foundational questions have answers? Does argument flow?	Human
3. Draft	Structural check	Does it follow the outline? Within word count? Voice approximation?	Automated + human
4. Review	Full rubric	All 5 dimensions scored	Human
5. Edit	Issue resolution check	All review annotations addressed? No new issues introduced?	Human
6. Format	Technical check	All formats generated? Metadata correct? Visual spot-check?	Automated
7. Publish	Pre-publish checklist	Links, images, metadata, canonical URLs, analytics	Automated + human

Tracking Gate Performance

Every gate produces data. Track it.

Pass rate per gate: What percentage of content passes each gate on the first attempt? A gate with a 30% first-pass rate tells you the upstream stage is broken.
Rework rate: How often does content get sent back? High rework rates mean your specifications or inputs are insufficient.
Fail rate: How often does content get rejected entirely? High fail rates mean your drafting prompts need fundamental revision.
Average score per dimension: Which dimension consistently scores lowest? That is where to invest improvement effort.

This data turns your pipeline from a process into a learning system. Each production run generates information about where the pipeline is strong and where it is weak. Use that information. Adjust inputs. Refine prompts. Tighten specifications. The pipeline improves over time, but only if you measure it.

A quality gate without defined criteria is just an opinion. A quality gate with defined criteria, consistent scoring, and tracked metrics is a system. Systems improve. Opinions drift.

Assignment

Create a quality rubric for your pipeline:

Define 5 scoring dimensions relevant to your content type.
For each dimension, describe what a 10, a 5, and a 0 look like.
Set thresholds: what total score means pass, rework, or fail?
Test the rubric by scoring 3 pieces of content (one you wrote, one decent AI output, one obvious slop). Do the scores differentiate them correctly?

Format the rubric as a one-page printable reference. Include scoring thresholds and gate placement for your pipeline stages. This document becomes the operational standard for everything your pipeline produces.

Quality Gates: Pass, Fail, Rework