Skip to main content
Colosseum·Intelligence
Research · 2026-05-03 · 14 min

On the eighteen agents — why we did not write one big model

One big model could do this work. Eighteen smaller agents do it visibly, correctibly, and within a budget.

The simplest possible architecture for a content system is one model. The model reads a brief, drafts a post, decides whether to publish it, and publishes it. We tried this. It is wrong. This is the case for separating the decision into eighteen agents, why each one has a name and a job, and what the audit log looks like when that separation is enforced.

Why one big model is tempting

It is one component to deploy, one set of weights to update, one prompt to design. The latency is low. The infrastructure is small. The team is small.

It is also opaque. When the model publishes a post that violates a platform’s policy, the team has no name to put on the failure — no agent that can be debugged in isolation, no rule that can be audited and tightened. The blast radius of every decision is the same as the blast radius of the entire system.

We watched this happen for six months. The model was good. The audit story was bad. We rebuilt.

The eighteen agents

Each of the eighteen has one job. We gave each a name so the team and the operator dashboard can refer to them. We constrained each agent’s scope so that a failure in one cannot cascade into a failure in another without a row in the audit log marking the cascade.

The full list is at /intelligence. The shape:

  • Listening agents read the social graph: Hypothesis, Strategy, Niche Resonance.
  • Decision agents score and stop: Feasibility, Decision, Pattern Abstraction, Legal Compliance.
  • Production agents make the artefact: Script Gen, Asset Gen, Video Gen.
  • Action agents publish and read back: Publishing, Feedback, Community.
  • Operations agents keep the system honest: Channel Setup, Revenue Model, Autonomous Ops.
  • Editorial agents maintain the standards: Pillar, Foundation.

Each is a small component. None has the authority to publish without the others. The audit log records which agent made which call, with which inputs.

The cost of separation

Every separation has a cost. Eighteen agents talk to each other; the orchestration layer is the codebase’s most carefully-designed surface. The latency is higher than a single-model architecture — but the latency target is “fast enough to publish on a platform schedule,” not “fast enough to feel real-time.”

The operational cost is also higher. Eighteen agents run on language-model inference; the bills are visible in the operator dashboard’s monthly cost line. We have decided this is the right cost to pay.

The benefits we have measured

After a year of running both architectures side by side on partial workloads:

  1. Failures are localised. When something publishes that should not have, the audit log identifies which agent failed. We tighten that agent. The fix takes hours; in the one-big-model architecture it took weeks of fine-tuning.

  2. The team understands the system. A new engineer can read the description of one agent and contribute to it within a day. In the one-big-model architecture, contributing required reading the whole prompt and a six-month context document.

  3. The audit log is genuinely informative. A row that says “Decision rejected proposal #87 because the niche threshold was 0.91 and the proposal scored 0.73” is a row a human can act on. A row that says “the model declined” is not.

What we did not separate

A separation that adds no clarity is just overhead. We did not split:

  • The brand-voice layer from the editorial layer (Foundation owns both — they are the same standard).
  • The script writer from the asset prompter (Script Gen drafts both; Asset Gen renders).
  • The operator’s intent from the platform’s policy (operator intent is an input to every agent; platform policy is its own agent because policies change without notice).

The line we drew is: separate when separation buys you a name, an audit-log column, and a debuggable surface. Otherwise, do not.

The trade-off, stated honestly

The one-big-model architecture is faster to ship. Ours is faster to fix. We pay the upfront cost to ship the eighteen agents because we expect to spend years fixing failures, and a system that takes weeks to fix is a system that has already lost.

This is the case. The audit log is the proof.


The full agent list is at /intelligence. Comments via [email protected].