All Sections
Coordination Intelligence

failure patterns

18 claims from 5 organizations

Documented things that go wrong and how to prevent them. Failure pattern claims are among the most valuable in any OOS because they encode lessons learned the hard way. Other organizations can learn from these without experiencing the failures themselves.

Acme Digital Agency Founding gold
C011 HIGH OBSERVED ONCE 5x efficiency

Gave analytics agent write access to campaigns. It optimized for wrong metrics.

Why: Lacked client context.

Failure mode: Decreased spend on strategic brand campaign.

C012 HIGH MEASURED RESULT 10x efficiency

Single shared state file became bottleneck and corruption source.

Why: Concurrent writes caused data races.

Failure mode: Two agents update simultaneously. One update lost.

McFadyen Digital Founding silver
C016 HIGH OBSERVED ONCE 5x efficiency

AI agents must never autonomously communicate with clients, sellers, or external stakeholders. All external communication flows through a named human.

Why: A test deployment of the Marketplace Analyst accidentally sent a seller health alert directly to a marketplace operator's Slack channel (misconfigured webhook). The client saw raw internal scoring data including "churn risk: high" for three of their top sellers. The account team spent two weeks in damage control.

Failure mode: Client trust erosion, internal data exposure, potential contract violation. The incident led to our blanket rule: AI agents have zero external communication authority.

C017 HIGH OBSERVED REPEATEDLY 7x efficiency

Platform-specific AI models must be retrained or re-validated within 30 days of any major platform release (Adobe Commerce, commercetools, VTEX, Mirakl).

Why: Our Code Review Assistant was trained on Adobe Commerce 2.4.5 patterns. When 2.4.6 shipped with breaking changes to the checkout API, the assistant continued approving code written against the old patterns. Three PRs shipped to staging with deprecated method calls before a senior dev flagged it.

Failure mode: Stale AI models approve code against deprecated platform APIs, introducing technical debt and potential runtime failures in client environments.

C018 MEDIUM HUMAN DEFINED RULE 3x efficiency

When two AI agents produce conflicting recommendations (e.g., Sales Agent scores a lead as high-priority while the Proposal Engine flags scope concerns), the conflict must surface to a human decision-maker within 4 hours. Neither agent may override the other.

Why: Early in our rollout, the Sales Agent pushed a lead through as "high-fit" while the Proposal Engine flagged the engagement as requiring capabilities we had never delivered (custom blockchain-based marketplace settlement). The agents operated in parallel without conflict detection. The SA discovered the mismatch only after spending a day on a proposal we should have declined.

Failure mode: Wasted senior consultant time, potential over-commitment to engagements outside our capability, and reputational risk if we win work we cannot deliver.

C014 MEDIUM OBSERVED ONCE 3x efficiency

All three agents activated from day one. Only Protocol Steward had meaningful work. Others generated noise.

Why: Agents without data produce low-value output.

Failure mode: Founder reads noise. Loses trust. Stops reading agent outputs.

C015 LOW SPECULATION 0.5x efficiency

Daily agent review consumed build time. Weekly batching loses nothing.

Why: Daily reviews felt productive but were not.

Failure mode: 20-35% of OTP time spent on review instead of building.

C016 LOW MEASURED RESULT 3x efficiency

Designed 14-agent architecture before shipping code. Only 3 needed now. Planning addiction.

Why: Designing agents is enjoyable. Building platform is hard.

Failure mode: 170 vault files. Zero production code.

Sneeze It Founding gold
C017 HIGH OBSERVED ONCE 5x efficiency

We gave the performance analyst write access to campaign settings. It optimized for metrics the client did not care about.

Why: The analyst lacked client context. Its optimization targets were technically correct but strategically wrong.

Failure mode: Analyst decreased ad spend on a campaign the client considered strategic (brand building, not performance). Client was frustrated by the uninstructed change.

C018 HIGH MEASURED RESULT 10x efficiency

We used a single shared state file for all agents. It became a bottleneck and a source of merge conflicts within the first week.

Why: A single file means every agent update blocks every other agent. Concurrent writes caused data corruption.

Failure mode: Two agents wrote to the shared file simultaneously. One update was lost. State became inconsistent. Required manual cleanup.

C019 HIGH MEASURED RESULT 10x efficiency

We built coordination infrastructure (message bus, task queue) without embedding triggers in agent workflows. Result: zero transactions for 2 weeks despite live infrastructure. Only activated after explicitly wiring 3 agent workflows to read and write to inboxes.

Why: The protocol described how agents should communicate. No agent's workflow actually included a step to read or write to the message bus. Infrastructure without workflow integration is dead plumbing.

Failure mode: 13 inbox files empty for 14 days. Zero transactions. Triggers must be in execution path, not just spec.

C020 HIGH MEASURED RESULT 10x efficiency

We specified an escalation action in the protocol. The agent detected the trigger. The agent reported the action was overdue. The agent never executed the action. For 17 days.

Why: Spec treated as documentation, not executable logic. Agent could describe what should happen without tools or branching logic to do it.

Failure mode: Critical ad overspend detected. Escalation specified. Agent reports "escalation overdue" for 17 days. No DM sent. No escalation executed. Specification-execution gap.

C021 MEDIUM MEASURED RESULT 6x efficiency

Negative constraints (banned phrases, guardrails) improve AI-drafted message quality. Structural requirements (frameworks, examples, forced elements) degrade it.

Why: Telling an AI what NOT to do produces natural variation. Telling it exactly what TO do produces formulaic output that humans detect and distrust.

Failure mode: Added example messages to coaching prompts. Quality score dropped from 8.4 to 8.2. Reverted. Added zero-tolerance accountability rules instead. Score rose to 8.8.

Sneeze It Digital Agency Founding platinum
C017 HIGH OBSERVED ONCE 5x efficiency

Gave analyst write access to campaigns. Optimized for wrong metrics.

Why: Lacked client context. Technically correct, strategically wrong.

Failure mode: Decreased spend on clients strategic brand campaign. Client frustrated.

C018 HIGH MEASURED RESULT 10x efficiency

Single shared state file caused 12 corruption events in month one.

Why: Concurrent writes from multiple agents created data races.

Failure mode: Two agents wrote simultaneously. One update lost. Manual cleanup required.

C019 HIGH MEASURED RESULT 10x efficiency

Built message bus without embedding triggers in workflows. Zero transactions.

Why: Protocol described communication. No workflow included steps to use it.

Failure mode: 13 inbox files deployed. All empty. Dead infrastructure.

C020 HIGH MEASURED RESULT 10x efficiency

Agent reported escalation overdue for 17 days without executing it.

Why: Spec treated as documentation, not executable logic.

Failure mode: Critical overspend detected. Escalation specified. Never executed. 17 days.

C021 MEDIUM MEASURED RESULT 6x efficiency

Negative constraints improve AI message quality. Structural requirements degrade it.

Why: Telling AI what NOT to do produces natural variation.

Failure mode: Added example messages: quality dropped 8.4 to 8.2. Added ban rules: quality rose to 8.8.