Session 9.5: The Agent-as-Colleague Model

Course → Module 9: Multi-Agent Workflows

Session 5 of 7

Neither Workers Nor Oracles

Two failure modes plague multi-agent workflows. The first is abdication: trusting agents to make decisions they should not make, rubber-stamping their output, and publishing whatever comes out. The second is micromanagement: reviewing every intermediate output, rewriting agent results by hand, and defeating the purpose of automation entirely.

The correct mental model sits between these extremes. Think of each agent as a colleague with specific expertise. You respect their competence within their domain. You do not ask the research assistant to make editorial decisions. You do not ask the copy editor to choose topics. And you review their work at defined checkpoints, not constantly.

The Colleague Framework

flowchart TD A["You
(Editor-in-Chief)"] --> B["Research Assistant
Agent 1"] A --> C["Ghostwriter
Agent 2"] A --> D["Copy Editor
Agent 3"] B -- "Delivers research brief" --> A C -- "Delivers draft" --> A D -- "Delivers review" --> A A -- "Approves/rejects at each gate" --> E["Published Content"] style A fill:#222221,stroke:#c8a882,color:#ede9e3 style B fill:#222221,stroke:#6b8f71,color:#ede9e3 style C fill:#222221,stroke:#8a8478,color:#ede9e3 style D fill:#222221,stroke:#c47a5a,color:#ede9e3 style E fill:#222221,stroke:#c8a882,color:#ede9e3

You are the editor-in-chief. You do not do everything, but everything goes through you. You set the direction (topic, audience, angle). You review the deliverables. You make the final call. The agents execute within the boundaries you define.

Agent Job Descriptions

A job description for each agent clarifies its role, its boundaries, and its handoff responsibilities. This prevents scope creep, where agents start doing things outside their role and producing unpredictable results.

Agent	Role	Expertise	Limitations	Decisions Allowed
Research Assistant	Information gathering	Search, filtering, source evaluation	Cannot judge relevance to audience; cannot assess strategic fit	Which sources to include; how to structure the brief
Ghostwriter	Prose generation	Voice matching, narrative structure, word economy	Cannot decide what to write about; cannot verify facts	Sentence-level phrasing; paragraph-level structure within outline
Copy Editor	Quality assessment	Pattern detection, rubric scoring, artifact identification	Cannot make editorial judgments about content direction	What to flag; severity scoring

Delegation Levels

Not all tasks within an agent's domain deserve the same level of trust. Some tasks the agent handles autonomously. Others require your approval before the chain continues.

Level	Description	Example
Autonomous	Agent executes without review	Research Agent formats output as JSON; Writer uses paragraph transitions
Review on exception	Agent executes; you review only flagged items	Editor flags issues; you review only items scored below 5
Review always	Agent executes; you review every output	Writer produces draft; you read every word before it moves forward
Human only	Agent is not involved	Topic selection, publication approval, ethical review

As your agents prove reliable, you can shift tasks from "review always" to "review on exception." This is earned trust, not blind trust. It comes from tracking agent performance over many runs and seeing consistent quality.

The Feedback Loop

When an agent underperforms, the fix is not to discard the agent. The fix is to improve its instructions. If the Writing Agent consistently produces voice breaks in opening paragraphs, the fix is a more specific opening-paragraph instruction in its system prompt, not a return to manual writing.

Track agent performance per dimension:

Research Agent: source quality rate, completeness of brief, schema compliance rate
Writing Agent: voice consistency score (from Editor), outline compliance rate, artifact count
Editing Agent: accuracy of flags (how often do you agree with the Editor's assessment?), false positive rate, false negative rate

These metrics tell you where to invest system prompt improvements. A Writing Agent with a 6/10 voice score needs voice fingerprint refinement. An Editing Agent with a 40% false positive rate needs calibration.

The goal is not to remove yourself from the pipeline. The goal is to position yourself where human judgment adds the most value: at decision points and quality gates. Everything else can be delegated to agents whose performance you track and whose instructions you refine.

Assignment

Write a "job description" for each agent in your chain. Include:

Role (one sentence)
Expertise (what it is good at)
Limitations (what it cannot do)
Decisions it can make autonomously
Decisions that require your approval

Then assign a delegation level (autonomous, review on exception, review always, human only) to each task in your pipeline. Be honest about where you trust the agents and where you do not. This framework evolves as you collect performance data.

The Agent-as-Colleague Model