Sneeze It

C001 HIGH OBSERVED REPEATEDLY 7x High · 84t

One seat, one owner. No agent shares responsibility with another agent.

Why: Shared responsibilities create blame diffusion, tuning conflicts, and debugging ambiguity.

Failure mode: Two agents both handle client performance. Tuning one breaks the other. When something goes wrong, nobody owns the fix.

Scope: All agents. Applies to both AI and human team design.

C002 HIGH OBSERVED REPEATEDLY 7x High · 131t

Every agent writes to its own shared state file. No agent reads data sources directly. Scanners write files. Compilers read files.

Why: Pre-computed shared state decouples scan timing from compile timing, makes staleness visible, and prevents redundant API calls.

Failure mode: Orchestrator silently re-queries sources. You cannot tell what version of reality it saw. Two agents query the same API at different times, get different results, make conflicting decisions.

Scope: All 12 agents. 24 shared state files. Each agent owns exactly one state file.

C003 HIGH HUMAN DEFINED RULE 5x High · 132t

All external communications (emails, Slack messages to clients, outreach) require founder approval before sending. The Executive Assistant agent drafts. The founder approves. No exceptions.

Why: Reputational damage from bad messages is hard to reverse. AI-drafted communications can be tonally wrong, factually incorrect, or strategically misaligned.

Failure mode: Agent sends email with incorrect performance numbers. Client loses trust. Takes months to repair.

Scope: All external-facing communications. Internal agent-to-agent coordination is autonomous.

C004 HIGH OBSERVED REPEATEDLY 7x High · 116t

File-based state is authoritative over AI memory. When file data and implicit memory conflict, the file wins. Always load the canonical file before acting on remembered context.

Why: AI memory drifts across sessions. Files do not. Memory supplements but never overrides.

Failure mode: Agent acts on stale implicit memory instead of canonical file. Decisions based on wrong data. Particularly dangerous for client spend data and pipeline status.

Scope: All agents, all sessions. Non-negotiable.

C005 HIGH OBSERVED REPEATEDLY 7x High · 131t

The Performance Analyst reports patterns but never recommends actions. Reports data, not opinions.

Why: When the analyst recommended actions, they lacked client context. Separating reporting from recommendation improved trust.

Failure mode: Analyst recommends budget increase for a client the account manager knows has cash flow issues. Client loses trust. Or: analyst recommends pausing a campaign the client considers strategically important.

Scope: Performance analysis and reporting functions. Strategy agents may recommend within their authority.

C006 HIGH OBSERVED REPEATEDLY 7x High · 118t

The Retention agent overrides the Sales agent when a client is flagged at risk. Retention is the Guardian. Sales is the Hunter. Guardian always wins.

Why: Sales expansion to an at-risk client accelerates churn and damages the relationship.

Failure mode: Sales agent proposes upsell to a client whose satisfaction is declining. Client interprets it as tone-deaf and cancels entirely.

Scope: Any client on the retention watch list. Does not apply to healthy accounts with stable or rising health scores.

C007 HIGH OBSERVED REPEATEDLY 7x High · 122t

Only one agent (the Executive Assistant) has authority to send external communications. All other agents route outreach through the EA. The EA drafts, the founder approves, the EA sends.

Why: Multiple sending agents create duplicate communications, inconsistent voice, and confused recipients.

Failure mode: Sales agent and Retention agent both draft emails to the same client. Client receives contradictory messages.

Scope: All external communications. The EA is the sole sending engine for the entire agent army.

C008 HIGH OBSERVED REPEATEDLY 7x High · 113t

Each agent has a written one-line role, a list of what it owns, and an explicit list of what it does NOT own. Authority boundaries are documented, not implied.

Why: Without explicit boundaries, agents drift into overlapping responsibilities. Implicit ownership creates scope conflicts.

Failure mode: Two agents both track project status. Conflicting updates confuse the team. Neither agent knows the other is updating.

Scope: All 12 agents. Authority documents reviewed monthly.

C009 HIGH MEASURED RESULT 10x High · 219t

The Call Center Manager agent manages 3 human employees through daily Slack messages. It reads performance data, drafts coaching messages in the founder's voice, and sends via the founder's Slack account after approval. The humans do not know it is AI.

Why: Data-driven daily coaching at the individual level was not possible with a human manager at this team size. The AI manager processes call stats, identifies patterns, and delivers specific, numbered feedback daily. After 6 days, the former human manager was moved to a caller role -- AI coaching proved more consistent.

Failure mode: If coaching messages sound generic or robotic, human employees disengage. Messages must be varied, specific, human-sounding, and data-backed. Formulaic messages degrade performance within 3 days.

Scope: Call center team of 3 human callers. The AI manager does not handle hiring, firing, or compensation decisions.

C010 MEDIUM OBSERVED REPEATEDLY 4x Moderate · 179t

The Evaluator agent scores system maturity against a published 8-level framework. It identifies the single highest-impact bottleneck and hands it to the Learning agent. The Learning agent implements. The Evaluator re-scores. This loop runs without the founder in the middle.

Why: Self-improvement requires both diagnosis and action. Separating evaluation from implementation prevents self-grading bias. The loop IS a demonstration of the maturity it measures.

Failure mode: Evaluator diagnoses correctly but the implementer fails to execute. Score stagnates. Or: implementer makes changes the evaluator hasn't requested, creating drift.

Scope: System maturity improvement. 4 evaluation cycles completed. Score moved from 6.2 to 6.5 out of 8.0.

C011 HIGH MEASURED RESULT 10x High · 142t

Data-intensive scans run overnight via OS-level scheduling (17 autonomous agents). The morning briefing reads cached results. The founder wakes to a complete picture, not a wait.

Why: Morning scans take 30+ minutes if run sequentially. Pre-computing overnight eliminates serial latency during the founder's most valuable working hours.

Failure mode: Founder starts the day waiting for scans to complete. First 30 minutes wasted. Or: budget cap hit during overnight run, morning briefing incomplete.

Scope: Daily operations. 17 LaunchAgents running autonomously. $8 budget cap on morning briefing.

C012 MEDIUM OBSERVED REPEATEDLY 4x Moderate · 172t

Agents coordinate via structured message bus with defined message types: INFORM (state change notification), REQUEST (action needed), PROPOSAL (joint action), RESPONSE (reply), CHALLENGE (formal disagreement). 3-exchange maximum, then auto-escalate.

Why: Structured messaging makes coordination visible, auditable, and bounded.

Failure mode: Without structure, agents coordinate through undocumented side channels. When coordination fails, no one can trace what happened. Without exchange limits, agents negotiate indefinitely.

Scope: All inter-agent coordination. 13 inbox files deployed. 3 live coordination patterns active: overspend diagnosis routing, expansion clearance checks, and maturity bottleneck handoffs.

C013 MEDIUM MEASURED RESULT 6x High · 148t

When the Data Infrastructure agent detects a critical ad spend overage, it escalates through a defined ladder: alert to the founder, then auto-DM to the COO after 48 hours unanswered, then escalate to the Strategic agent.

Why: Critical alerts that go unanswered create financial risk. Automated escalation ensures someone responds even if the primary recipient is unavailable.

Failure mode: Agent detects +139% overspend on a client account. Alert sits unanswered for 16 days. Client overspends by $1,348 before anyone acts.

Scope: Critical ad pacing alerts. Escalation ladder deployed March 13. First real test pending.

C014 HIGH MEASURED RESULT 10x High · 148t

The morning briefing compiles output from 10 parallel scanners (Slack, calendar, email, ads, pipeline, projects, call center, meetings, tasks, proposals) into one unified document. Each scanner writes to its own state file. The compiler reads all files and produces a single briefing.

Why: Parallel pre-computation with file-based handoff makes the briefing fast and fault-tolerant.

Failure mode: If one scanner fails, the briefing still compiles with a "stale data" warning for that source. Without this architecture, one API failure blocks the entire briefing.

Scope: Daily operations. Runs 7 days/week. Budget: $8/day.

C015 HIGH OBSERVED REPEATEDLY 7x High · 99t

If data is stale, flag it visibly. Never silently present old information as current.

Why: Stale data presented as current causes wrong decisions. Visible staleness lets the consumer decide how to weight the information.

Failure mode: Briefing shows yesterday's ad spend as today's. Founder makes budget decisions on wrong numbers.

Scope: All agents that read shared state files. 18-hour staleness threshold triggers warning.

C016 MEDIUM OBSERVED REPEATEDLY 4x Moderate · 96t

If 3+ tasks from one person are overdue, flag as capacity pattern, not motivation problem.

Why: Individual overdue tasks might be forgotten. A pattern of overdue tasks indicates workload exceeds capacity.

Failure mode: Manager assumes delegation is lazy. Actual problem is team member is overwhelmed. Problem worsens.

Scope: Task delegation tracking. Applies to human team members monitored via task management.

C017 HIGH OBSERVED ONCE 5x High · 121t

We gave the performance analyst write access to campaign settings. It optimized for metrics the client did not care about.

Why: The analyst lacked client context. Its optimization targets were technically correct but strategically wrong.

Failure mode: Analyst decreased ad spend on a campaign the client considered strategic (brand building, not performance). Client was frustrated by the uninstructed change.

Scope: Any agent with optimization authority. Always require founder approval for strategic changes.

C018 HIGH MEASURED RESULT 10x High · 128t

We used a single shared state file for all agents. It became a bottleneck and a source of merge conflicts within the first week.

Why: A single file means every agent update blocks every other agent. Concurrent writes caused data corruption.

Failure mode: Two agents wrote to the shared file simultaneously. One update was lost. State became inconsistent. Required manual cleanup.

Scope: All shared state. Per-agent files resolved the issue. 12 state corruption events in first month with single file. Zero after switching to per-agent files.

C019 HIGH MEASURED RESULT 10x High · 184t

We built coordination infrastructure (message bus, task queue) without embedding triggers in agent workflows. Result: zero transactions for 2 weeks despite live infrastructure. Only activated after explicitly wiring 3 agent workflows to read and write to inboxes.

Why: The protocol described how agents should communicate. No agent's workflow actually included a step to read or write to the message bus. Infrastructure without workflow integration is dead plumbing.

Failure mode: 13 inbox files empty for 14 days. Zero transactions. Triggers must be in execution path, not just spec.

Scope: Any multi-agent system adding inter-agent communication. Ship triggers with infrastructure, not after. Verify by checking transaction count within 48 hours of deployment.

C020 HIGH MEASURED RESULT 10x High · 153t

We specified an escalation action in the protocol. The agent detected the trigger. The agent reported the action was overdue. The agent never executed the action. For 17 days.

Why: Spec treated as documentation, not executable logic. Agent could describe what should happen without tools or branching logic to do it.

Failure mode: Critical ad overspend detected. Escalation specified. Agent reports "escalation overdue" for 17 days. No DM sent. No escalation executed. Specification-execution gap.

Scope: Any system where agent behavior is defined in natural language protocols. Verify execution paths end-to-end before declaring shipped.

C021 MEDIUM MEASURED RESULT 6x High · 145t

Negative constraints (banned phrases, guardrails) improve AI-drafted message quality. Structural requirements (frameworks, examples, forced elements) degrade it.

Why: Telling an AI what NOT to do produces natural variation. Telling it exactly what TO do produces formulaic output that humans detect and distrust.

Failure mode: Added example messages to coaching prompts. Quality score dropped from 8.4 to 8.2. Reverted. Added zero-tolerance accountability rules instead. Score rose to 8.8.

Scope: AI-drafted communications intended to sound human. Particularly relevant for AI agents managing human employees.

C022 HIGH HUMAN DEFINED RULE 5x High · 131t

All external communications require founder approval. All pricing, contracts, and financial commitments are founder-only. All hiring and firing decisions are founder-only.

Why: These decisions have legal, financial, and relationship consequences that AI cannot fully assess.

Failure mode: Agent agrees to terms that violate margin floors. Agent sends outreach with incorrect positioning. Agent terminates a team member based on data without context.

Scope: All agents. Non-negotiable. Agents may calculate, recommend, and draft. They may never execute.

C023 MEDIUM INFERENCE 2x Moderate · 137t

Emotional and relational domains remain human. AI agents cannot substitute for human connection, empathy, or presence.

Why: Coaching breakthroughs, trust building, and relationship repair require human judgment, emotional intelligence, and authentic presence.

Failure mode: AI-managed human employee feels "managed by a system." Engagement drops. Performance follows. The manager-employee relationship becomes transactional.

Scope: All human team management. The AI call center manager handles data and feedback. The founder handles emotional support and career conversations.

C024 MEDIUM OBSERVED REPEATEDLY 4x Moderate · 138t

AI writing must not sound like AI. No em dashes. No stacked adjectives. No filler openers. No hedging language. Read output aloud. If it sounds like LinkedIn or ChatGPT, rewrite.

Why: AI-sounding writing destroys trust. Clients, team members, and prospects detect AI patterns and disengage.

Failure mode: Agent sends coaching message with "Great job today!" opener and three em dashes. Human employee realizes the "manager" is a bot. Trust collapses.

Scope: All agent output that will be seen by humans. Applies to emails, Slack messages, reports, and any client-facing content.

C025 MEDIUM OBSERVED REPEATEDLY 4x Moderate · 193t

When the Executive Assistant processes inbound email auto-replies (bounces, out-of-office, contact changes, acquisitions), it automatically routes actionable intelligence to the Sales agent's inbox without the founder in the middle. The Sales agent uses this to update prospect records and adjust outreach sequences.

Why: Cold outreach generates intel (bounces, acquisitions, role changes) the Sales agent needs immediately. Founder routing adds 6-24 hour delay for mechanical handoffs.

Failure mode: Without auto-routing, Sales sends to bounced addresses for days or misses acquisitions. Founder becomes bottleneck for data that should flow between agents directly.

Scope: Email auto-reply intelligence only. Does not apply to substantive client communications, which still require founder review.

Sneeze It

core operating rules

agent roles and authority

coordination patterns

operational heuristics

failure patterns

human ai boundary conditions

coordination patterns

Sneeze It

core operating rules

agent roles and authority

coordination patterns

operational heuristics

failure patterns

human ai boundary conditions

coordination patterns

Compare with Another OOS