The Agent-as-Colleague Model
Session 9.5 · ~5 min read
Neither Workers Nor Oracles
Two failure modes plague multi-agent workflows. The first is abdication: trusting agents to make decisions they should not make, rubber-stamping their output, and publishing whatever comes out. The second is micromanagement: reviewing every intermediate output, rewriting agent results by hand, and defeating the purpose of automation entirely.
The correct mental model sits between these extremes. Think of each agent as a colleague with specific expertise. You respect their competence within their domain. You do not ask the research assistant to make editorial decisions. You do not ask the copy editor to choose topics. And you review their work at defined checkpoints, not constantly.
The Colleague Framework
(Editor-in-Chief)"] --> B["Research Assistant
Agent 1"] A --> C["Ghostwriter
Agent 2"] A --> D["Copy Editor
Agent 3"] B -- "Delivers research brief" --> A C -- "Delivers draft" --> A D -- "Delivers review" --> A A -- "Approves/rejects at each gate" --> E["Published Content"] style A fill:#222221,stroke:#c8a882,color:#ede9e3 style B fill:#222221,stroke:#6b8f71,color:#ede9e3 style C fill:#222221,stroke:#8a8478,color:#ede9e3 style D fill:#222221,stroke:#c47a5a,color:#ede9e3 style E fill:#222221,stroke:#c8a882,color:#ede9e3
You are the editor-in-chief. You do not do everything, but everything goes through you. You set the direction (topic, audience, angle). You review the deliverables. You make the final call. The agents execute within the boundaries you define.
Agent Job Descriptions
A job description for each agent clarifies its role, its boundaries, and its handoff responsibilities. This prevents scope creep, where agents start doing things outside their role and producing unpredictable results.
| Agent | Role | Expertise | Limitations | Decisions Allowed |
|---|---|---|---|---|
| Research Assistant | Information gathering | Search, filtering, source evaluation | Cannot judge relevance to audience; cannot assess strategic fit | Which sources to include; how to structure the brief |
| Ghostwriter | Prose generation | Voice matching, narrative structure, word economy | Cannot decide what to write about; cannot verify facts | Sentence-level phrasing; paragraph-level structure within outline |
| Copy Editor | Quality assessment | Pattern detection, rubric scoring, artifact identification | Cannot make editorial judgments about content direction | What to flag; severity scoring |
Delegation Levels
Not all tasks within an agent's domain deserve the same level of trust. Some tasks the agent handles autonomously. Others require your approval before the chain continues.
| Level | Description | Example |
|---|---|---|
| Autonomous | Agent executes without review | Research Agent formats output as JSON; Writer uses paragraph transitions |
| Review on exception | Agent executes; you review only flagged items | Editor flags issues; you review only items scored below 5 |
| Review always | Agent executes; you review every output | Writer produces draft; you read every word before it moves forward |
| Human only | Agent is not involved | Topic selection, publication approval, ethical review |
As your agents prove reliable, you can shift tasks from "review always" to "review on exception." This is earned trust, not blind trust. It comes from tracking agent performance over many runs and seeing consistent quality.
The Feedback Loop
When an agent underperforms, the fix is not to discard the agent. The fix is to improve its instructions. If the Writing Agent consistently produces voice breaks in opening paragraphs, the fix is a more specific opening-paragraph instruction in its system prompt, not a return to manual writing.
Track agent performance per dimension:
- Research Agent: source quality rate, completeness of brief, schema compliance rate
- Writing Agent: voice consistency score (from Editor), outline compliance rate, artifact count
- Editing Agent: accuracy of flags (how often do you agree with the Editor's assessment?), false positive rate, false negative rate
These metrics tell you where to invest system prompt improvements. A Writing Agent with a 6/10 voice score needs voice fingerprint refinement. An Editing Agent with a 40% false positive rate needs calibration.
The goal is not to remove yourself from the pipeline. The goal is to position yourself where human judgment adds the most value: at decision points and quality gates. Everything else can be delegated to agents whose performance you track and whose instructions you refine.
Further Reading
- AI Agent Workflows: Everything You Need to Know, GoodData
- How Agentic AI Revolutionizes Content Workflows, Global Publicist
- AI Agent Content Writing System, Sight AI
Assignment
Write a "job description" for each agent in your chain. Include:
- Role (one sentence)
- Expertise (what it is good at)
- Limitations (what it cannot do)
- Decisions it can make autonomously
- Decisions that require your approval
Then assign a delegation level (autonomous, review on exception, review always, human only) to each task in your pipeline. Be honest about where you trust the agents and where you do not. This framework evolves as you collect performance data.