Core Operating Rules

C001 HIGH MEASURED RESULT 10x efficiency

Every agent writes to a shared state file. No agent reads data sources directly.

Why: Race conditions from direct access.

Failure mode: Two agents get different results from same source.

C002 HIGH OBSERVED REPEATEDLY 7x efficiency

Only one agent has authority to send external communications.

Why: Multiple senders create duplicate client emails.

Failure mode: Client receives contradictory information.

C003 HIGH OBSERVED REPEATEDLY 7x efficiency

Analytics agent reports patterns but never recommends actions.

Why: Recommendations without client context were ignored.

Failure mode: Agent recommends budget increase for cash-strapped client.

C004 HIGH OBSERVED ONCE 5x efficiency

Retention agent overrides sales agent when client is at risk.

Why: Expansion to at-risk client accelerates churn.

Failure mode: Upsell to declining client. Client cancels.

C005 HIGH MEASURED RESULT 10x efficiency

All agent output logged to audit trail before action.

Why: Without logs, debugging takes hours.

Failure mode: Error with no trace. Team guesses.

C001 HIGH OBSERVED REPEATEDLY 7x efficiency

Claude agents and GPT agents never communicate directly. All cross-model data flows through a shared JSON schema file with explicit field definitions and validation.

Why: When the ad copy generator (GPT) needed performance data to write "winning angle" variations, we initially passed Claude's analysis as a natural language summary. GPT interpreted "strong performance on pain point messaging" as license to generate copy about physical pain for a fitness client. The client sold personal training, not physical therapy. The generated headline: "Stop Living in Pain -- Book Your Session Today." The creative director caught it in review. The root cause: natural language summaries lose precision across model boundaries.

Failure mode: Natural language context transfer between models introduces interpretation drift. Each model fills gaps with its own assumptions. Downstream output diverges from source intent.

C002 HIGH OBSERVED ONCE 5x efficiency

Every piece of generated ad copy must pass through a fact-verification layer that checks claims against a client-approved fact sheet before entering any review queue.

Why: GPT generated a testimonial-style ad: "After 6 weeks with [Client], I lost 32 pounds and feel amazing!" No such testimonial existed. The client had never published weight loss claims. The ad copy looked plausible and polished. An account manager approved it without checking. The ad ran for 3 days on Meta before the client's compliance officer flagged it. The client is a medical weight management clinic regulated by state advertising rules. Fabricated testimonials are not just brand risk -- they're legal exposure.

Failure mode: Generative models produce plausible but fabricated claims. Humans approve polished output without verifying underlying facts. Regulated industries face legal risk from AI-generated content.

C003 HIGH MEASURED RESULT 10x efficiency

Client fact sheets are updated quarterly and include: approved claims, prohibited claims, regulatory restrictions, competitive positioning statements, and testimonial sources with dates.

Why: The fabricated testimonial incident (C002) revealed we had no single source of truth for what a client's ads could and couldn't say. Account managers carried this knowledge in their heads. When agents generate creative, they need the same guardrails in structured, machine-readable form. After building fact sheets for all 50+ clients, the fact-check layer caught 14 additional problematic claims in the first month -- including a "guaranteed results" claim for a client whose contract explicitly prohibits guarantee language.

Failure mode: Without structured fact sheets, creative agents generate from general knowledge instead of client-specific constraints. Prohibited claims surface as plausible copy.

C015 LOW OBSERVED ONCE 1.5x efficiency

Model version changes (GPT-4 to GPT-4o, Claude 3 to Claude 3.5) require a 1-week shadow comparison before full cutover. Output from both versions runs in parallel and is compared.

Why: When we upgraded from GPT-4 to GPT-4o for creative generation, the output style shifted noticeably -- shorter sentences, more emoji suggestions, different headline structures. Three clients commented on the "new tone" in the first week. We hadn't noticed because the output was still high quality -- just different. The change was invisible to us but visible to clients who'd been seeing consistent messaging for months.

Failure mode: Model upgrades change output characteristics in ways that are invisible to operators but visible to end recipients. Consistency is a feature that model upgrades can silently break.

C001 HIGH OBSERVED ONCE 5x efficiency

Every agent action must include a location_id tag. Actions without location context are rejected by the system and logged as errors.

Why: Apex operates 12 locations with different markets, pricing, class schedules, and member demographics. A decision correct for downtown Chicago is wrong for suburban Tampa.

Failure mode: The ads agent launched a "First Month Free" Meta campaign targeting all 12 locations with a single creative. The Chicago locations were running a "50% Off" promo simultaneously. Members saw both offers, called to ask which was real, and 6 demanded the better deal. Lena spent a full day sorting refunds. Lost $2,400 in margin.

C002 HIGH OBSERVED ONCE 5x efficiency

Ad budget modifications require regional manager approval. Agents may recommend budget shifts but never execute them directly.

Why: Franchise agreements specify minimum and maximum ad spend per territory. An agent that reallocates budget without approval can violate franchise agreements.

Failure mode: The ads agent detected a high-performing campaign in the Phoenix location and auto-shifted $800/week from a "lower performing" Tampa campaign. The Tampa franchisee had a contractual minimum spend requirement. Corporate received a formal complaint within 48 hours.

C003 HIGH OBSERVED ONCE 5x efficiency

Member PII never appears in cross-location reports. Aggregate metrics only. Individual member data stays within that location's data boundary.

Why: Privacy regulations vary by state. Illinois (BIPA), California (CCPA), and Florida have different requirements. Cross-location data sharing without consent creates legal exposure.

Failure mode: The member lifecycle agent included individual names in a "High Risk Churn" report distributed to all franchisees during a corporate call. A Tampa member's name appeared on the Chicago franchisee's screen. No legal action resulted, but the compliance officer flagged it as a near-miss.

C001 HIGH OBSERVED ONCE 5x efficiency

Agents must never be described as "creating" or "designing" anything in internal or external communications. Use "researching," "organizing," "compiling," or "drafting notes."

Why: The motion design team (3 of 5 employees) threatened to quit in month 2 when a client email referenced "AI-generated shot lists." They saw it as devaluing their craft.

Failure mode: Lead animator forwarded an internal Slack message to the team where the intake agent said "I've created the initial shot breakdown." Two designers submitted resignations the same week. Marcus had to reframe the entire system as "research tooling" to retain them.

C002 HIGH OBSERVED ONCE 5x efficiency

Every agent output that touches client-visible deliverables must pass through a human creative review before leaving the studio.

Why: Clients hired Artifact for human creative judgment. Anything that smells automated erodes the premium positioning ($8K-$25K per project).

Failure mode: Timeline agent auto-sent a project update email that included boilerplate language. Client replied: "Is this automated? We're paying for a boutique experience." Marcus had to take the client to dinner to smooth it over. $200 dinner, 3 hours of relationship repair.

C001 HIGH OBSERVED ONCE 5x efficiency

Legal document templates are read-only for all agents. Templates are stored in a locked Google Drive folder. The document assembly agent reads templates and substitutes client-specific fields. It cannot modify template structure, clause language, or formatting.

Why: The document assembly agent identified what it interpreted as an "inconsistency" in the survivorship clause of the revocable trust template. It "corrected" the clause to align with what it believed was standard language. The modification changed the legal effect of the clause for the surviving spouse.

Failure mode: Agent modifies a survivorship clause in a trust template. Three trusts are assembled using the modified template. Priya catches it during review of the fourth trust. The three already-delivered trusts must be recalled, corrected, re-executed with witnesses, and re-notarized. 22 hours of non-billable work ($6,600 at $300/hr). Client confidence shaken. One client moves to a different attorney.

C002 HIGH OBSERVED ONCE 5x efficiency

Every assembled document is diff-checked against its source template before Priya reviews it. Only field substitutions (client name, address, dates, beneficiary names, asset descriptions) should differ. Any structural change triggers an automatic HOLD and red flag.

Why: The template modification incident went undetected for 3 clients because Priya was reviewing assembled documents for content accuracy, not template fidelity. She was checking that "John Smith" appeared in the right places, not that clause 4.2(b) still had the same language as the template.

Failure mode: Without diff-checking, structural changes hide in dense legal documents. A modified clause in paragraph 14 of a 23-page trust is nearly invisible during a content review. The error propagates through every document assembled from that template until someone reads the specific clause closely.

C003 HIGH OBSERVED ONCE 5x efficiency

The intake questionnaire agent extracts client information into a structured format but never pre-populates legal decisions. Fields like "distribution schedule," "trustee succession," and "incapacity definition" are flagged as REQUIRES ATTORNEY INPUT, never filled from the questionnaire.

Why: A client's intake questionnaire said "I want everything split equally among my kids." The agent interpreted this as equal per-stirpes distribution. But the client had a blended family and actually wanted per-capita distribution to her biological children only, with stepchildren receiving specific bequests. "Everything split equally" means different things to different families.

Failure mode: Agent interprets a client's plain-language wishes as a specific legal distribution scheme. Priya trusts the interpretation and assembles the trust accordingly. Client signs without understanding the legal implications of per-stirpes vs per-capita. Years later, when the trust is administered, the distribution does not match the client's actual intent. Beneficiaries sue. Malpractice claim.

C004 HIGH HUMAN DEFINED RULE 5x efficiency

Client data is stored exclusively in Google Workspace. No client information is transmitted to external APIs, stored in agent logs, or retained between sessions. The agents operate within the workspace boundary.

Why: Estate planning documents contain Social Security numbers, asset valuations, family health information, and beneficiary details. A data breach exposing this information would be catastrophic. Priya chose Google Workspace precisely because it met her cyber liability insurance requirements.

Failure mode: Client SSN, asset values, and family medical history stored in an unsecured agent log file. Data breach exposes 47 clients' estate plans. Identity theft risk for every client. Mandatory breach notification. Cyber liability claim. Practice reputation destroyed in a small legal community where referrals are everything.

C001 HIGH OBSERVED ONCE 5x efficiency

Every student record must include full name, grade level, and a unique 4-digit student ID. Agents must reference students by ID internally and by full name externally.

Why: Two students named "Jayden" (Jayden Morris, 4th grade, and Jayden Thompson, 7th grade) were confused by the progress tracking agent in month 1.

Failure mode: Progress tracking agent sent Jayden Thompson's 7th-grade math scores to Jayden Morris's parents. The Morris family received a report saying their 4th grader was "struggling with pre-algebra." They panicked, called Keisha at 9 PM, and nearly pulled their child. Keisha spent 2 hours on the phone explaining the error. One family left anyway. $3,200/year in revenue lost.

C002 HIGH OBSERVED ONCE 5x efficiency

No parent communication is sent without Keisha reviewing and explicitly approving the exact text. "Auto-send" functionality is permanently disabled.

Why: Parents trust Keisha personally. An obviously automated or incorrect message breaks that trust instantly.

Failure mode: During the Jayden incident, the progress report was auto-sent on a Friday at 5 PM because auto-send was enabled for "routine reports." Keisha didn't see it until the parent called. If she had reviewed it, she would have caught the name mismatch in seconds.

C003 HIGH OBSERVED ONCE 5x efficiency

Student assessment language must never include clinical or diagnostic terms (e.g., "learning disability," "below grade level," "deficient"). Use only descriptive, growth-oriented language.

Why: Keisha is a tutor, not a diagnostician. Using clinical language exposes the business to liability and misrepresents the service.

Failure mode: Progress agent described a student as "performing significantly below grade level in reading comprehension" in a draft report. The parent forwarded it to the school, who contacted Keisha asking if she was providing diagnostic assessments without credentials. Keisha had to write a formal clarification letter. 4 hours of damage control.

C004 HIGH OBSERVED ONCE 5x efficiency

All student data is stored in a single Google Sheet per student. No cross-student spreadsheets. No combined views that could lead to row-shift errors.

Why: Combined spreadsheets create opportunities for data to shift between rows when sorted or filtered. With children's data, one wrong row is catastrophic.

Failure mode: Original system used a master spreadsheet with all 34 students. A tutor sorted by last name, which shifted assessment scores between rows. 6 students had incorrect progress data for 2 weeks before Keisha caught it during a parent meeting.

C001 HIGH OBSERVED REPEATEDLY 7x efficiency

Each agent has a written scope document that lists exactly what it does and explicitly what it does not do. Scope is reviewed monthly.

Why: The support triage agent started including "suggested fix" in its issue categorizations. The founder found this helpful for the first two weeks. Then the agent suggested a fix for a race condition that involved refactoring the connection pool -- a change that would have broken 3 other features. The founder nearly implemented it on a Friday afternoon because "the agent already figured it out." The support agent's job is triage. Not diagnosis. Not code suggestions. That's the code review agent's job, and only on actual PRs with test coverage.

Failure mode: Agents gradually expand their output beyond their defined scope. "Helpful extras" become expected. The human starts relying on output the agent isn't qualified to produce.

C002 HIGH OBSERVED ONCE 5x efficiency

Agent output must be formatted for a single reviewer doing fast context switches. Maximum 5 items per summary. Priority labels on every item. No walls of text.

Why: The code review agent initially produced 800-word reviews covering every style violation, potential improvement, and design consideration. The founder started skimming after week 1 and skipping after week 2. A real bug (null pointer dereference in the auth flow) was buried on line 47 of a 62-line review alongside 11 style nits. The bug reached production. Two users hit it before the founder found it in the review he'd skipped.

Failure mode: Verbose output overwhelms a solo reviewer. Signal-to-noise ratio degrades. Critical items are buried in low-priority observations. The human adapts by reading less.

C015 LOW OBSERVED ONCE 1.5x efficiency

Agents must never reference each other's output or claim awareness of what another agent said.

Why: The release notes agent once generated: "Fixed the race condition flagged in code review last week." No user cares about internal agent workflows. The changelog should say: "Fixed a race condition in connection pooling that could cause timeout errors under high concurrency." User-facing output should describe user-facing impact, not internal process.

Failure mode: Agents leak internal coordination details into user-facing output. Users receive information about the agency's internal process instead of information about what changed for them.

C001 HIGH OBSERVED ONCE 5x efficiency

No agent may access, reference, or surface information from Client A's engagement in any artifact produced for Client B, even if Client A's data would be analytically useful.

Why: Management consulting depends on trust. Clients share sensitive strategic, financial, and operational data under the assumption of strict confidentiality. A single breach ends the relationship and potentially the firm.

Failure mode: Lens pulled competitive benchmarking data from a prior engagement with Haldane Manufacturing into a market analysis being prepared for Orion Supply Chain. Both are in industrial distribution. The Orion deliverable included Haldane's internal margin structure labeled as "industry benchmark." Caught by the lead consultant during review. If delivered, it would have violated our NDA and destroyed both client relationships simultaneously.

C002 HIGH OBSERVED ONCE 5x efficiency

Every document, email draft, or presentation that will be seen by a client must be reviewed and approved by the assigned consultant before delivery. Zero autonomous client-facing output.

Why: Consulting deliverables carry the firm's reputation. A single poorly reasoned recommendation or factual error in a strategy document can undermine months of trust-building.

Failure mode: Archer auto-sent a proposal follow-up email to a prospect that included a placeholder fee estimate ("$XX,000 -- confirm with partner"). The prospect replied asking about the pricing, and we had to explain the error. Lost the deal.

C003 HIGH OBSERVED REPEATEDLY 7x efficiency

Time entries generated by Tock must distinguish between "agent-drafted" and "consultant-performed" work categories. Billing rates apply to consultant time, not agent processing time.

Why: Clients pay for senior consultant judgment. Billing 4 hours at $275/hr for work an agent completed in 12 seconds is fraud-adjacent and will be caught by sophisticated clients who audit invoices.

Failure mode: Tock logged 3.5 hours of "market research" for a deliverable that Lens produced in under a minute with consultant review taking 20 minutes. Client's procurement team flagged the line item. We credited the hours but the trust damage was done.

C001 HIGH OBSERVED REPEATEDLY 7x efficiency

No agent may send a direct communication to a member without human approval. All member-facing messages are drafted, queued, and approved.

Why: Members chose CoreFit for the personal touch. An obviously automated message destroys that illusion and devalues the brand.

Failure mode: In month 2, the lead nurture agent auto-sent a "We miss you!" email to 14 trial members who had visited the day before. Three replied asking if their check-in system was broken. Jamie spent an afternoon apologizing.

C002 HIGH OBSERVED ONCE 5x efficiency

Every agent must pull fresh Mindbody data before acting on member status. Cached data older than 4 hours is considered stale.

Why: Members upgrade, freeze, cancel, and rebook constantly. Acting on stale data produces embarrassing errors.

Failure mode: The retention agent flagged member #4471 as at-risk (no visits in 12 days) and triggered a save offer -- $20 off next month. That member had upgraded to the $189/month premium plan 2 days prior. She received a discount offer on a plan she had just paid more for. She posted about it in the local Facebook group. 47 comments.

C014 MEDIUM OBSERVED REPEATEDLY 4x efficiency

The daily briefing must be compiled and delivered to Jamie by 5:45 AM local time. Jamie reviews before the 6:00 AM first class.

Why: Jamie's decision window is 15 minutes. A late briefing is a missed briefing.

Failure mode: Twice the briefing arrived at 6:10 AM due to slow Google Sheets API calls. Jamie had already started her day without it and missed a retention flag that could have saved a $149/month member.

C001 HIGH OBSERVED ONCE 5x efficiency

No agent ever posts to Discord, GitHub issues, or any public channel using Kai's identity. Agents draft; Kai posts. No exceptions.

Why: The Discord community detected the monitoring agent within 48 hours based on response patterns (consistent response time, formal tone, no typos). A community member posted: "Is Kai using a bot? Responses at 3 AM with perfect grammar? Cap."

Failure mode: Discord monitoring agent responded to a support question at 3:17 AM with a grammatically perfect, well-structured answer. Two regulars immediately flagged it. A thread of 40+ messages debated whether Kai was using AI. Kai had to post a personal message explaining his "workflow tools" and the thread still resurfaces monthly as a joke.

C002 HIGH OBSERVED ONCE 5x efficiency

Agent-drafted responses must be imperfect. Include Kai's writing style: lowercase, occasional abbreviations, conversational tone. Never use semicolons in casual communication.

Why: Developer communities are pattern-matchers. Consistent perfection signals automation faster than any other tell.

Failure mode: Before the style guidelines, the Discord agent used complete sentences, proper capitalization, and formal transitions ("Additionally," "Furthermore"). Community members created a drinking game: "Take a shot every time Kai sounds like ChatGPT." Kai found out when someone posted the rules in #off-topic.

C003 HIGH OBSERVED ONCE 5x efficiency

Response timing must vary. No responses between midnight and 6 AM in Kai's timezone. Daytime responses must have variable delays (5-45 minutes, not instant).

Why: Instant, round-the-clock responses are the #1 bot detection signal in developer communities.

Failure mode: The Discord agent was configured to respond within 2 minutes of any support question. Three responses at 2:04 AM, 2:07 AM, and 2:11 AM on the same night triggered the bot investigation. Kai now queues agent drafts and reviews them in batches during working hours.

C001 HIGH OBSERVED ONCE 5x efficiency

No agent prompt may contain raw transaction data, account numbers, routing numbers, SSNs, or any data classified as PII-Financial under SOC 2 Type II.

Why: SOC 2 compliance requires that PII-Financial data is processed only within audited systems. AI agent prompts are not within the SOC 2 audit boundary. A single violation could trigger an audit finding that costs 6 months of remediation.

Failure mode: During initial setup, the transaction categorization QA agent was accidentally fed a batch of 200 raw transactions including merchant names, amounts, and partial account numbers. The data appeared in Claude's context window. The incident was caught in a weekly security review. No external exposure, but the SOC 2 auditor flagged it as a control deficiency. Raj spent 3 weeks documenting the remediation.

C002 HIGH OBSERVED ONCE 5x efficiency

Support tickets are preprocessed by a redaction layer before reaching the triage agent. The redaction layer strips account numbers, SSNs, routing numbers, and replaces them with [REDACTED-ACCOUNT], [REDACTED-SSN], etc.

Why: Users paste sensitive data into support tickets constantly. The triage agent cannot be trusted to ignore data that appears in its context.

Failure mode: Before the redaction layer, a user submitted a ticket containing their full bank account and routing number asking for help with a failed transfer. The triage agent included the ticket text verbatim in a Slack summary posted to the support channel. Four employees saw the data. Raj filed an internal incident report and built the redaction pipeline over a weekend.

C003 HIGH OBSERVED REPEATEDLY 7x efficiency

The compliance agent has veto authority over all user-facing content produced by any agent. Blog posts, email campaigns, in-app messages, and support reply templates must pass compliance review before publication.

Why: Financial content is regulated. A blog post that says "save more money with Greenline" could be interpreted as financial advice in certain jurisdictions. Maya (compliance officer) must sign off.

Failure mode: The content agent drafted a blog post titled "5 Ways to Beat Inflation with Greenline." Maya caught the word "beat" -- it implied guaranteed financial outcomes. The post was rewritten to "5 Budgeting Strategies for Inflationary Periods." Had it published, it could have triggered a CFPB inquiry.

C001 HIGH HUMAN DEFINED RULE 5x efficiency

No agent retains case-specific information between sessions. Every agent interaction begins with a fresh context. Case details are loaded from Clio at the start of each session and discarded at the end.

Why: Attorney-client privilege extends to all systems that store case information. An AI agent that retains case details across sessions becomes a discoverable repository. Opposing counsel could theoretically subpoena agent logs. The firm's ethics counsel flagged this in week 1.

Failure mode: Agent retains a client's injury details from a prior session. That data persists in a log file on a shared server. During discovery in an unrelated case, opposing counsel requests all electronic records. Case details from another client surface in production. State bar complaint filed.

C002 HIGH OBSERVED ONCE 5x efficiency

The deadline tracker agent pulls statute of limitations dates, filing deadlines, and discovery cutoffs from Clio in real time. It never caches dates. Every check is a live query.

Why: The deadline tracker cached statute dates on Monday and checked against cached data through Friday. Between Monday and Wednesday, a paralegal corrected a statute date in Clio after discovering the incident date was wrong (patient's fall occurred 3 days earlier than initially reported). The cached date was now 3 days too late. The agent did not flag the upcoming deadline because it was comparing against the old date.

Failure mode: Cached statute of limitations is 3 days wrong. Agent shows "42 days remaining" when actual remaining time is 39 days. Attorney does not prioritize the case. On day 39, paralegal notices the corrected date and realizes the statute expires tomorrow. Emergency filing at 11 PM. Case value: $340K. If missed, that is a malpractice claim.

C003 HIGH OBSERVED ONCE 5x efficiency

Demand letter drafts include a mandatory "Attorney Review Required" header and a checklist of factual claims the attorney must verify. The agent highlights every dollar figure, date, and medical diagnosis as requiring verification.

Why: The demand letter agent pulled a medical expense total from case notes that included an estimate for future surgery. The estimate had been revised downward by $18,000 after a second opinion, but the case notes had not been updated. The draft demand letter overstated damages by $18K.

Failure mode: Attorney reviews the draft but trusts the damage calculation. Sends the demand letter with inflated damages. Insurance adjuster catches the discrepancy during negotiation. Attorney's credibility is damaged for that case and potentially for future cases with that adjuster.

C004 MEDIUM OBSERVED ONCE 3x efficiency

The intake screening agent classifies potential cases as STRONG, VIABLE, WEAK, or DECLINE. It provides reasoning for each classification but does not communicate decisions to potential clients. An attorney reviews all classifications before any client contact.

Why: Case evaluation requires legal judgment that the agent cannot reliably provide. A case the agent classified as WEAK (slip-and-fall at a big-box retailer with minimal medical records) turned out to have strong liability because the store had been cited for the same hazard three times. The agent had no access to prior citation records.

Failure mode: Agent classifies a case as DECLINE. Receptionist tells the potential client the firm cannot help. Client goes to a competitor and settles for $280K. The firm discovers the missed opportunity months later when the settlement is reported in legal news.

C001 HIGH OBSERVED ONCE 5x efficiency

Listing descriptions must not contain superlatives ("best," "finest," "most"), subjective claims ("perfect for families," "ideal for retirees"), or comparative language ("better than," "superior to"). The agent generates factual, descriptive content only.

Why: The listing agent used "best views in the city" for a property at 4821 Horizon Drive. A competing broker filed a complaint with the Colorado Division of Real Estate arguing the phrase constituted misleading advertising under Fair Housing guidelines. The Division issued a formal warning letter to Rachel. Legal counsel cost $2,200. The phrase was removed within hours but the warning remains on file for 3 years.

Failure mode: Agent writes "perfect starter home for a young family" in a listing description. This implies the property is more suitable for families than for other household types. Competitor or advocacy group files a Fair Housing complaint. Broker faces investigation, potential fine of $16,000 for a first offense under HUD, and reputation damage in a tight-knit agent community.

C002 HIGH OBSERVED REPEATEDLY 7x efficiency

Every listing description must be reviewed and approved by Rachel before posting to the MLS, Zillow, Realtor.com, or any public platform. No automated posting. The agent generates a draft; Rachel or the listing agent reviews; Rachel approves; admin posts.

Why: After the Fair Housing warning, Rachel implemented a zero-tolerance approval process. In the first week of the new process, she caught 4 descriptions with problematic language that would have gone live without review: "quiet neighborhood" (could imply exclusion), "walking distance to churches" (religious preference), "master suite" (industry moving away from this term), and "great school district" (potentially steering).

Failure mode: Description auto-posts to MLS without review. Language that seems innocuous contains an implied preference or steering signal. A fair housing tester identifies the language. Formal complaint filed. Rachel's license is at risk. The brokerage's reputation in the Denver market takes years to rebuild.

C003 HIGH HUMAN DEFINED RULE 5x efficiency

The lead qualification agent asks exactly 5 screening questions: timeline to buy/sell, price range, pre-approval status, geographic area preference, and property type. It does not ask about family composition, household size, national origin, religion, disability status, or any protected class.

Why: Fair Housing law prohibits discrimination based on protected classes. Even well-intentioned questions like "How many bedrooms do you need?" can be proxies for family status. The qualification agent uses financial and timeline criteria only. Agent training reinforced this after a test lead reported that the qualifier asked "Will you need a home office for remote work?" which could indirectly screen for employment status.

Failure mode: Qualification agent asks "Do you have children?" to determine bedroom count needs. Lead reports the question as discriminatory. HUD complaint filed. Testing organization sends additional test leads. Pattern of discriminatory questioning established. Broker faces enforcement action.

C004 HIGH OBSERVED ONCE 5x efficiency

Market comp data presented to sellers must include the source (Zillow, MLS, county records), the date of the data pull, and a disclaimer that AI-generated analysis is not a substitute for a licensed appraisal. The comp agent never uses the word "appraisal" or "valuation."

Why: Colorado law requires that only licensed appraisers provide property valuations for lending purposes. An AI-generated comp analysis that reads like an appraisal could expose the brokerage to practicing without an appraisal license. A seller showed our comp report to her lender, who flagged it because it looked like an independent valuation. Rachel had to clarify with the lender and revise the report format.

Failure mode: Comp report uses appraisal language. Seller submits it to a lender as supporting documentation for a home equity line. Lender flags it. State board investigates whether the brokerage is practicing appraisal without a license. Fine: up to $25,000 in Colorado.

C001 HIGH HUMAN DEFINED RULE 5x efficiency

Sensitive external actions, including sending messages, making commitments, altering calendars, and destructive changes, are approval-bound by default even when internal coordination is autonomous.

Why: KGORG is designed to preserve human control over commitments that affect other people, time, or records.

Failure mode: An agent could send communications, change a calendar, or commit the principal to something they did not intend, creating trust and operational damage.

C005 HIGH HUMAN DEFINED RULE 5x efficiency

Seat clarity is preferred over prompt sprawl, meaning new capabilities should be added by role design, skill assignment, and routing discipline before expanding a single agent into a catch-all operator.

Why: The organization's design philosophy explicitly favors scalable organizational architecture over overloaded prompts and overlapping ownership.

Failure mode: As prompts bloat and roles overlap, accountability weakens, agent behavior becomes inconsistent, and the system becomes harder to govern or debug.

C009 HIGH HUMAN DEFINED RULE 5x efficiency

Authority levels define a real hierarchy, with the Org Master at the top, Chief of Staff and Executive Assistant beneath it, and specialist workers below them.

Why: Authority gating protects the organization from lower-level seats commanding higher-level ones and creates a clean escalation structure.

Failure mode: Without authority hierarchy, specialists could create policy drift, coordination conflicts, or improperly direct strategic seats.

C001 HIGH OBSERVED ONCE 5x efficiency

No agent may receive, process, store, or output protected health information (PHI) as defined by HIPAA. This includes: patient names, dates of birth, Social Security numbers, diagnosis codes, treatment plans, insurance member IDs, medical record numbers, and any combination of data that could identify a specific patient.

Why: A HIPAA breach can result in fines of $100 to $50,000 per violation, up to $1.5M per year for repeated violations. For a practice generating approximately $1.6M in revenue, a single reportable breach could be existential. Beyond fines, breach notification requirements, OCR investigations, and patient trust damage can close a small practice.

Failure mode: Beacon was generating a patient success story for the blog. The marketing coordinator fed Beacon a prompt that included: "Write a success story about our patient Margaret who recovered from a torn ACL in 12 weeks at our Buckhead location." Beacon produced a draft with the patient's first name, injury type, recovery timeline, and clinic location -- enough to identify the patient. The draft was caught in clinical review. Had it been published, it would constitute a HIPAA violation requiring breach notification to the patient and potentially to HHS. No fine resulted, but the incident triggered a complete redesign of the data pipeline to agents.

C002 HIGH OBSERVED ONCE 5x efficiency

The data pipeline to agents must be architecturally de-identified. Patient data is stripped before it reaches the agent, not by the agent. Agents never see raw data and are asked to anonymize it. De-identification happens in the export step from WebPT to the operational spreadsheets that agents read.

Why: Behavioral compliance ("the agent should anonymize data") fails because it depends on the agent's judgment and the prompt writer's diligence. Architectural compliance ("PHI is removed before the agent ever sees it") fails only if the export pipeline breaks, which is testable and auditable.

Failure mode: The initial approach was to give agents access to full patient records and instruct them to "anonymize all patient information in your output." Flow received full appointment data including patient names. It produced a scheduling analysis that referred to "Patient M.T.'s recurring 3:00 PM Thursday appointment" -- initials plus schedule pattern is enough to identify a regular patient in a small clinic. The architectural approach (strip names in the export) was implemented the next day.

C003 HIGH OBSERVED ONCE 5x efficiency

Marketing content generated by Beacon that references patient outcomes, success stories, or treatment results must include a disclaimer and must never be based on a specific patient's case. Content must be based on aggregated outcomes or fully fictionalized scenarios reviewed by a licensed PT.

Why: Even anonymized patient stories can be identifiable in a small community. "A 45-year-old runner from Roswell who recovered from knee surgery" might describe only one person in the Roswell clinic's patient base. The patient's neighbors might recognize them. HIPAA's de-identification standard (Safe Harbor method) requires removal of 18 specific identifiers, but small-community identifiability goes beyond the checklist.

Failure mode: Beacon wrote a blog post about "a local teacher who returned to coaching after rotator cuff surgery at our Alpharetta clinic." The Alpharetta clinic had exactly one teacher patient who had rotator cuff surgery in the relevant timeframe. A staff member recognized the patient from the description. The post was pulled before publication, but the near-miss demonstrated that even "anonymized" stories are risky in small clinics with small patient populations.

C004 HIGH OBSERVED REPEATEDLY 7x efficiency

All agent outputs are logged to a secure, access-controlled audit trail. Logs must be retained for 6 years (HIPAA retention requirement). Logs must be reviewed monthly for any inadvertent PHI exposure. If PHI is found in any log, it is treated as a potential breach and the HIPAA breach assessment protocol is triggered.

Why: HIPAA requires that covered entities maintain audit trails for systems that touch patient data. Even though agents should never receive PHI, the audit trail exists as a safety net to detect when the architectural controls fail.

Failure mode: No PHI has been found in agent logs since implementing architectural de-identification (C002). However, the monthly audit review in Month 3 found that Flow's scheduling analysis included a reference to "the 2:30 Tuesday patient" -- not a name, but enough to correlate with the appointment book. The reference was ambiguous enough not to constitute PHI, but the threshold was tightened: no temporal patterns that could identify specific patients.

C001 HIGH OBSERVED REPEATEDLY 7x efficiency

Maintenance requests are triaged into three levels: EMERGENCY (water intrusion, gas smell, no heat below 50F, electrical sparking, lockout with safety concern), URGENT (appliance failure, HVAC malfunction above 50F, plumbing leak contained to a fixture, pest infestation), and ROUTINE (cosmetic damage, minor repairs, appliance wear items, non-critical replacements). Only EMERGENCY triggers after-hours dispatch.

Why: The triage agent classified a "dripping faucet" as EMERGENCY because the tenant used the word "leak" and "water is coming out." The agent lacked the ability to distinguish between a dripping faucet (routine, $35 washer) and a burst pipe (emergency, $2,000+ damage). It pattern-matched on "leak" and "water" and escalated.

Failure mode: Tenant reports a dripping kitchen faucet at 11 PM Saturday. Triage agent flags EMERGENCY. After-hours plumber dispatched at 2 AM. Plumber replaces a $0.50 washer, bills $450 (emergency rate: $175 dispatch + $275/hr minimum). A $35 repair becomes a $450 repair because of misclassification. Over 9 weeks, 4 similar false emergencies cost a total of $1,680 in unnecessary after-hours dispatch fees.

C002 HIGH MEASURED RESULT 10x efficiency

After the dripping faucet incident, the triage agent now asks 3 follow-up questions before classifying any water-related report: (1) Is water actively flowing or pooling? (2) Can you see where the water is coming from? (3) Is the water near any electrical outlet or panel? The answers determine triage level.

Why: Tenant language is imprecise. "Leak" means everything from a dripping faucet to a flooded basement. "Water damage" can mean a stain on the ceiling from last month or an active roof failure. Without follow-up questions, the agent has to assume the worst, which means constant over-triage.

Failure mode: Without follow-up questions, the agent must triage based on initial report language alone. "Water coming from the ceiling" could be an active roof leak (EMERGENCY) or condensation from an improperly insulated pipe (ROUTINE). Defaulting to EMERGENCY for every water-related report would cost approximately $800/month in unnecessary dispatch fees based on historical volume.

C003 HIGH OBSERVED ONCE 5x efficiency

The vendor coordination agent can dispatch work orders for pre-approved vendors on pre-approved tasks up to $200. Any work order above $200 requires Corinne's explicit approval via Slack confirmation. The agent sends the request, Corinne replies "approved" or "hold," and the agent proceeds accordingly.

Why: In week 4, the vendor agent dispatched a carpet cleaning service for a move-out turnover at $380 without approval. The unit only needed spot cleaning ($120). Corinne had planned to have Ray handle it in-house for $0 labor cost. The $380 was unnecessary.

Failure mode: Agent dispatches a vendor for $380 when in-house labor would have cost nothing. Multiply by 40 turnovers per year (34% turnover on 120 units). If even 20% of dispatches could be handled in-house, that is 8 unnecessary vendor dispatches at an average of $300 each: $2,400/year wasted.

C004 HIGH OBSERVED ONCE 5x efficiency

Tenant communications never promise repair timelines, cost coverage, or lease concessions. The comms agent acknowledges the request, provides a ticket number, and states that the property manager will follow up. It does not say "we'll fix it by Friday" or "this will be covered by the landlord."

Why: The comms agent told a tenant "your dishwasher repair has been scheduled for Thursday." Ray had a scheduling conflict and could not get there until the following Monday. Tenant took Thursday off work to be home for the repair. No one showed up. Tenant filed a complaint with the Virginia Department of Professional and Occupational Regulation. Complaint was dismissed, but it cost Mark 3 hours of paperwork and created a documented complaint on file.

Failure mode: Agent promises a Thursday repair. Repair happens Monday. Tenant loses a vacation day waiting for a repair tech who never comes. Tenant's trust in management destroyed. Tenant leaves at lease end. Turnover cost for that unit: $3,200 (cleaning, paint, 3 weeks vacancy at $1,350/month = $1,012 lost rent + $2,188 turn costs).

C001 HIGH OBSERVED ONCE 5x efficiency

No study content is published without passing a two-stage review: (1) content QA agent checks for factual accuracy against at least 2 independent sources, and (2) a human subject matter expert signs off.

Why: The Emancipation Proclamation date error proved that single-source AI verification is insufficient. The agent "verified" against one source that also had the wrong date.

Failure mode: Content QA agent verified the 1865 date against a poorly-maintained wiki page. The human review step was skipped because the team was rushing to publish 40 study guides before midterms. One wrong date, 200K impressions, 340 lost students, $4,080/month in lost revenue.

C002 HIGH OBSERVED REPEATEDLY 7x efficiency

All published content includes a "Last verified" date and a feedback button. Students can flag errors directly. Flagged content is pulled from public access within 1 hour pending review.

Why: Students are the fastest error-detection network. Making it easy for them to report issues turns a liability into an early warning system.

Failure mode: Before the feedback button, errors lived in published content for an average of 11 days. After implementation, average detection time dropped to 6 hours. The first month caught 14 errors that would have gone unnoticed.

C003 HIGH OBSERVED ONCE 5x efficiency

Content QA agent must cite its verification sources in a metadata field attached to every piece of content. Sources must be primary (textbooks, academic papers, official records) not secondary (Wikipedia, blogs, forums).

Why: Secondary sources compound errors. Wikipedia had the wrong Emancipation Proclamation date for 3 days before it was corrected. The agent used Wikipedia as a verification source.

Failure mode: Post-incident audit found that 23% of content QA verifications used Wikipedia as the primary source. Of those, 4 had factual discrepancies that hadn't yet been caught. All 4 were corrected before going viral, but the exposure window was unacceptable.

C004 HIGH OBSERVED ONCE 5x efficiency

The support triage agent categorizes tickets into: BILLING (route to Stripe dashboard), CONTENT_ERROR (route to content QA, pull content immediately), TECHNICAL (route to engineering Slack), GENERAL (draft response). Content error tickets are always P0.

Why: A student reporting a content error is doing the company a favor. Slow response tells them their feedback doesn't matter. Fast response tells them the platform is trustworthy.

Failure mode: Early version treated content error reports as GENERAL tickets. A student reported an incorrect chemistry formula. It sat in the general queue for 3 days. By then, 180 students had used the study guide for an exam. Priya found out when a teacher emailed asking why multiple students got the same wrong answer on a test.

C001 HIGH OBSERVED ONCE 5x efficiency

All AI-generated client deliverables (proposals, architecture documents, code, recommendations) require human review and explicit sign-off by a named delivery lead before external distribution.

Why: Our reputation is built on 250+ successful enterprise implementations. A hallucinated architecture recommendation for a $2M marketplace build could cost us the engagement and the reference.

Failure mode: An AI-drafted proposal included a Mirakl feature that had been deprecated 6 months prior. The client's CTO caught it in the review meeting. We recovered, but it cost us credibility on the deal and added two weeks to the sales cycle.

C002 HIGH OBSERVED ONCE 5x efficiency

Client data, source code, and engagement details must never be processed by public AI models. All AI processing uses our private GCP-hosted LLM deployment or enterprise API agreements with explicit DPAs.

Why: We handle source code and infrastructure details for Fortune 500 retailers, military contracts, and financial services companies. A single data leak would be existential.

Failure mode: A developer used a public code assistant to debug a client's checkout integration. The code snippet contained API keys embedded in comments. We caught it in a security audit 3 days later. No breach occurred, but it triggered an emergency policy rollout.

C003 HIGH MEASURED RESULT 10x efficiency

AI cost per engagement must not exceed 3% of project margin. Track monthly per project.

Why: As a services business, margin discipline is survival. AI tools are force multipliers, not cost centers. If AI spend on a project exceeds 3% of margin, we are using it wrong.

Failure mode: On one engagement, the team spun up an AI-powered testing suite that ran continuously against a staging environment. The compute bill hit $14K in a month on a $180K project. The PM did not catch it until the monthly P&L review.

C001 HIGH OBSERVED REPEATEDLY 7x efficiency

Every agent writes to exactly one shared state file. No agent reads a data source that another agent owns.

Why: When our Google Ads monitor and our weekly report agent both pulled from the Google Ads API independently, they reported different numbers for the same client on the same day. The founder sent the weekly report to the client. The client compared it to the alert they received 3 hours earlier. The numbers didn't match. It took 45 minutes to explain the discrepancy was a timezone difference in the two API calls.

Failure mode: Two agents query the same API at different times, get different snapshots. Client-facing numbers contradict each other. Trust erodes.

C002 HIGH MEASURED RESULT 10x efficiency

Alert thresholds must use dollar amounts for accounts under $5K/mo and percentage changes for accounts over $5K/mo.

Why: A 20% spend spike on a $2,500/mo account is $500 -- annoying but manageable. A 20% spike on a $28K/mo account is $5,600 -- that's a budget-breaking emergency. Meanwhile, a $300 daily overspend on the $28K account is noise (1% variance), but the same $300 on the $2,500 account is 12% of their monthly budget gone in one day. We spent two weeks calibrating after the initial rollout generated 47 alerts in a single day, only 3 of which required action.

Failure mode: Flat percentage thresholds flood the team with low-dollar alerts on small accounts while missing meaningful dollar movements on large accounts. Alert fatigue sets in within 72 hours.

C015 MEDIUM OBSERVED REPEATEDLY 4x efficiency

Every client-facing number must include the data timestamp and the date range it covers.

Why: The weekly report showed "CPL: $47" for a fitness client. The client's internal dashboard showed $52. The discrepancy: our report used Monday-Sunday, their dashboard used Sunday-Saturday. Neither was wrong, but the client spent an hour trying to reconcile. Now every metric includes "Mon Mar 2 - Sun Mar 8, data pulled at 6:12 AM EST."

Failure mode: Undated metrics create reconciliation conflicts with client-side reporting tools. Trust erodes even when both numbers are correct.

C001 HIGH HUMAN DEFINED RULE 5x efficiency

Every project begins with a single testable business purpose statement. No other work proceeds until this statement exists and the human practitioner has approved it.

Why: Without a grounding statement, AI generation drifts toward technically interesting but commercially irrelevant output. The business purpose is the acceptance test for the entire project.

Failure mode: Team skips business purpose definition and jumps to technology selection. AI generates an architecturally elegant system that solves no real business problem. Months of work are discarded.

C002 HIGH OBSERVED REPEATEDLY 7x efficiency

The PRD is the single source of truth for project scope. It is a living document updated at every iteration point. AI proposes PRD changes. The human practitioner approves them.

Why: Without a living PRD, scope drifts invisibly. Discoveries in prototyping or domain modeling change requirements, but those changes are lost if not captured in the PRD.

Failure mode: Discovery in Step 7 reveals a new stakeholder need. The finding is discussed but not written into the PRD. The implementation in Step 10 misses the requirement. Stakeholder is dissatisfied.

C003 HIGH HUMAN DEFINED RULE 5x efficiency

AI generates all artifacts (stakeholder maps, PRDs, prototypes, data models, evaluation frameworks, test criteria, implementation code). No artifact ships without human review.

Why: AI output is fast but can be subtly wrong, structurally coherent but semantically off. Human review catches misalignment with business context, stakeholder politics, and domain nuance that AI cannot perceive.

Failure mode: AI generates a stakeholder analysis that omits a politically powerful but technically irrelevant stakeholder. The project proceeds without their buy-in. They block deployment.

C004 HIGH OBSERVED REPEATEDLY 7x efficiency

Technical sunk costs are treated as essentially zero. The methodology explicitly encourages backtracking, re-generation, and exploration of alternative approaches without regard to prior AI output.

Why: When teams feel invested in AI-generated code or designs, they resist changing direction even when evidence says the current path is wrong. Zero sunk cost mindset enables honest evaluation.

Failure mode: Team spends three days refining an AI-generated prototype. New discovery invalidates the approach. Team continues with the flawed prototype because they feel invested in the effort. The final product inherits the flaw.

C005 HIGH OBSERVED REPEATEDLY 7x efficiency

Each iteration focuses on one or two stakeholders at a time. The full PRD defines the complete scope, but work proceeds one bite at a time.

Why: Trying to satisfy all stakeholders simultaneously overwhelms both the human reviewer and the AI generator. Single-stakeholder focus produces sharper artifacts and clearer evaluation criteria.

Failure mode: Team attempts to prototype for all five stakeholders at once. AI generates a bloated, compromised UI that satisfies no one. Human reviewer cannot evaluate because the artifact is too complex. Project stalls.

C001 HIGH OBSERVED REPEATEDLY 7x efficiency

Negative constraints (explicit bans) produce better output than positive style guides. Maintain a living "banned phrases" list and enforce it across all agents.

Why: Telling an agent "write in a direct, executive voice" produces inconsistent results. Telling it "never use the words leverage, synergy, holistic, robust, or cutting-edge" produces consistently better output. The agent has fewer degrees of freedom to drift into consultant cliches.

Failure mode: Forge was given a 2-page style guide with tone examples, voice descriptions, and formatting preferences. Output still read like generic McKinsey filler. After replacing the style guide with a 47-item banned phrase list, client feedback shifted from "this is fine" to "this sounds like you." The style guide had been active for 5 months with no improvement. The banned list worked in the first draft.

C002 HIGH OBSERVED ONCE 5x efficiency

Every agent output must be reviewed before it reaches a client. No autonomous sends, no auto-published documents, no scheduled emails without founder approval.

Why: As a solo consultant, there is no buffer between a mistake and a client relationship. One poorly worded email or a factual error in a deliverable is a direct hit to the founder's credibility. The cost of review (5-10 minutes per item) is trivially small compared to the cost of a client trust breach.

Failure mode: Tempo was configured to auto-send follow-up emails 48 hours after meetings. One follow-up referenced the wrong project name -- it used the prior client's engagement name from a template that wasn't cleared. The recipient forwarded it to their team with a "is this person serious?" comment. Relationship survived but the auto-send was permanently disabled.

C003 HIGH OBSERVED ONCE 5x efficiency

Client data in Notion must be organized by engagement, not by client company. A single company may have multiple engagements over time with different confidentiality requirements. Agents must scope their context to the active engagement only.

Why: Revisiting a client with new work doesn't mean all prior engagement data should be loaded into context. A growth strategy engagement from 2024 may contain board-level information that isn't relevant to a 2026 operational efficiency project.

Failure mode: Scout loaded the full Notion record for a returning client, including notes from a 2024 board advisory engagement. Those notes contained CEO succession planning details. Scout included "leadership transition risk" as a factor in a market analysis that was shared with the client's VP of Operations -- someone who did not know about the succession discussion. The founder caught it in review and rewrote the section, but the near-miss was severe.

C004 HIGH OBSERVED ONCE 5x efficiency

Four agents is the correct number. Do not add a fifth agent unless the founder is consistently spending more than 3 hours per week on a task that doesn't map to an existing agent.

Why: Every additional agent adds coordination overhead, context management complexity, and a new failure surface. For a solo practice, the founder IS the coordination layer. More agents means more time managing agents instead of serving clients.

Failure mode: Attempted to add a fifth agent ("Pitch") for new business development outreach. Within two weeks, the founder was spending more time correcting Pitch's output than writing outreach manually. Pitch was retired. The 3-hour threshold was established as the gate for any future agent.

C001 HIGH HUMAN DEFINED RULE 5x efficiency

Every agent writes to its own shared state file. No agent reads another agent's working memory directly.

Why: Shared state files create visible, auditable coordination.

Failure mode: Agent A acts on stale data from Agent B.

C002 HIGH HUMAN DEFINED RULE 5x efficiency

All external communications require founder approval before sending.

Why: AI-drafted communications may be tonally wrong or strategically misaligned.

Failure mode: Agent sends outreach with incorrect positioning.

C003 HIGH HUMAN DEFINED RULE 5x efficiency

Spec changes require founder approval within 1 business day.

Why: The protocol is the most important asset.

Failure mode: Protocol Steward ships a breaking change.

C004 HIGH HUMAN DEFINED RULE 5x efficiency

Decisions affecting pricing, legal, or partnerships are human-only.

Why: Financial and legal consequences agents cannot assess.

Failure mode: Agent commits to unapproved partnership terms.

C005 HIGH OBSERVED REPEATEDLY 7x efficiency

Tuesday evening is the protected build block. Only coding.

Why: Build velocity depends on uninterrupted focus time.

Failure mode: Build session interrupted. Code does not ship. Timeline slips.

C001 HIGH OBSERVED ONCE 5x efficiency

No agent-generated output is presented to a client as a final deliverable. All agent output is explicitly labeled "FIRST DRAFT - CREATIVE REVIEW REQUIRED" in the document header.

Why: Prism's value proposition is human creativity. Clients pay $150/hour for Mara's taste and Kai and Nina's craft. If clients discover that deliverables are AI-generated, the perceived value drops to zero regardless of quality.

Failure mode: Not yet violated in client delivery. But in month 4, Diego accidentally attached a brief agent's first draft to a client email instead of Mara's revised version. The client noticed the "FIRST DRAFT" header and asked about it. Mara explained it as an internal workflow label. The client accepted it, but the incident prompted Diego to add color-coded file naming: red prefix for drafts, green prefix for approved.

C002 HIGH OBSERVED ONCE 5x efficiency

The creative team (Kai, Nina, Ava) has absolute override authority on any agent output related to visual design, brand direction, or creative strategy. Their override is immediate, no approval chain required, no documentation necessary.

Why: The month-3 crisis. Kai and Nina discovered that the brief generator had been producing creative direction briefs -- including mood boards, color palettes, and typography suggestions -- without their input. They felt their creative authority was being undermined by a machine. Kai told Mara "If this is what we're doing now, I'm out." Nina agreed. Mara paused the entire agent program for 2 weeks.

Failure mode: The brief generator was originally designed to produce "complete" creative briefs including visual direction. This was technically impressive but organizationally devastating. The resolution: briefs now contain objectives, audience, competitive context, and constraints only. Visual direction is explicitly excluded and marked as "TO BE DEFINED BY CREATIVE TEAM." Kai and Nina stayed.

C003 HIGH OBSERVED ONCE 5x efficiency

The brief generation agent produces strategy briefs: business objectives, target audience, competitive landscape, brand constraints, and success metrics. It does not include visual direction, mood boards, color suggestions, font recommendations, or any creative execution guidance.

Why: See C002. Visual direction is the creative team's domain. Briefs that pre-determine creative direction constrain the designers before they begin, which is both demoralizing and produces worse work.

Failure mode: See C002. The original brief generator produced mood boards by pulling from visual trend databases. Kai's exact words: "If the brief already tells me what it should look like, what am I here for?" The creative team's near-resignation was the most serious operational crisis in Prism's history.

C001 HIGH OBSERVED REPEATEDLY 7x efficiency

The organization decomposes customer operations into specialized agents rather than relying on a single general-purpose agent.

Why: Specialized agents are easier to govern, test, replace, and evaluate. The platform currently includes distinct roles for summarization, orchestration, specialist response generation, memory logging, memory consolidation, batch review, knowledge graph maintenance, and read-only CRM Q&A.

Failure mode: When one agent owns too much surface area, it becomes harder to diagnose bad outputs, enforce permissions, and isolate regressions. Errors also spread across more workflow stages.

C002 HIGH OBSERVED REPEATEDLY 7x efficiency

The inbound conversation pipeline separates interpretation from action: Lens summarizes, Sage routes and decides, specialists draft response logic, and downstream steps log or validate outcomes.

Why: This layered design reduces prompt overload and keeps each agent responsible for one cognitive job. It also creates clean interfaces between snapshot understanding, routing, reply generation, and execution.

Failure mode: If interpretation and execution are collapsed into one stage, the system is more likely to send low-context or overconfident responses, skip escalation, or produce outputs that are difficult to debug.

C013 HIGH INFERENCE 3x efficiency

The org optimizes for auditable reliability before maximum autonomy.

Why: The evidence includes approval-required tools, flow gates, memory event logs, validator steps, review outputs, and explicit separation between recommendation and execution layers.

Failure mode: If autonomy outruns auditability, the organization may move faster in the short term but lose the ability to trust, review, and improve the system systematically.

C017 MEDIUM MEASURED RESULT 6x efficiency

Release readiness is part of operating discipline, not just deployment hygiene.

Why: The platform tracks draft/staged/prod status, stale items, and promotion mismatches, and the current org shows multiple stale draft artifacts despite no current cross-artifact mismatches.

Failure mode: Without explicit promotion discipline, teams lose clarity about what is experimental, what is approved, and what should be trusted in production.

C001 HIGH OBSERVED ONCE 5x efficiency

Every client profile must include a defined geographic service area with specific zip codes or a radius from a street address. No ad performance analysis runs without this field populated.

Why: Our ad monitor flagged a plumber's campaign as "strong performance -- CPL $23, 47 leads this month." The media buyer didn't check the geographic breakdown. 31 of those 47 leads were from a city 90 miles away where the plumber doesn't operate. The plumber paid for 31 leads he couldn't serve. He called us on a Friday afternoon and said he was "done." The founder saved the account with a credit and a same-day fix, but it cost the agency $1,400 in credited spend and nearly cost us an $8K/mo client.

Failure mode: Agents evaluate campaign performance without geographic context. High lead volume masks geographic waste. Client pays for leads they can't serve.

C002 HIGH OBSERVED REPEATEDLY 7x efficiency

Shared state files include a "last_successful_write" timestamp and a "data_completeness" flag (FULL, PARTIAL, FAILED).

Why: The Google Ads API occasionally returns partial data during heavy load periods -- 8 of 12 accounts come back, the other 4 timeout. The ad monitor wrote what it had. The briefing reported on 8 clients and said nothing about the other 4. The founder assumed the missing 4 were running fine. One of the missing accounts had paused itself due to a billing issue and stayed paused for 2 days.

Failure mode: Partial data writes look like complete data. Missing accounts are interpreted as "no problems" instead of "not checked."

C014 MEDIUM OBSERVED REPEATEDLY 4x efficiency

Agent-generated content must never use em dashes, exclamation points in the first sentence, or the phrase "I wanted to reach out" in any client-facing draft.

Why: Three clients independently mentioned that our emails "sound like AI." The founder traced it to em dashes and a specific sentence pattern the inbox assistant favored. After removing these markers, zero clients have commented on email tone in 6 weeks.

Failure mode: AI-generated text patterns are recognizable to clients who receive a lot of AI-written communication. Detection erodes the personal touch that small agencies rely on.

C001 HIGH OBSERVED REPEATEDLY 7x efficiency

One seat, one owner. No agent shares responsibility with another agent.

Why: Shared responsibilities create blame diffusion, tuning conflicts, and debugging ambiguity.

Failure mode: Two agents both handle client performance. Tuning one breaks the other. When something goes wrong, nobody owns the fix.

C001 HIGH OBSERVED REPEATEDLY 7x efficiency

Every agent writes to exactly one shared state file. The morning briefing compiler reads all 8 files. No agent reads another agent's data source directly.

Why: The reporting agent and the spend monitor both queried the Meta API independently. They returned different numbers because of timing differences. Per-agent state files with timestamps eliminated the contradiction.

Failure mode: Two agents query the same API 4 minutes apart. Spend numbers differ by $340. Account manager questions data integrity. Trust in the system drops for weeks.

C002 HIGH MEASURED RESULT 10x efficiency

The spend monitoring agent checks pacing every 6 hours and alerts when any account exceeds 115% of daily budget. Alert goes to Slack channel, not DM.

Why: DMs get buried. Channel alerts create shared visibility. The 115% threshold balances sensitivity with noise. At 110%, too many false alarms. At 120%, alerts arrive too late to prevent significant overspend.

Failure mode: Agent DMs the founder at 2 AM about a 112% overspend. Founder silences notifications. Next morning, account is at 145%. Channel alert would have been seen by the AM who starts at 7 AM.

C002 HIGH OBSERVED REPEATEDLY 7x efficiency

Every agent writes to its own shared state file. No agent reads data sources directly. Scanners write files. Compilers read files.

Why: Pre-computed shared state decouples scan timing from compile timing, makes staleness visible, and prevents redundant API calls.

Failure mode: Orchestrator silently re-queries sources. You cannot tell what version of reality it saw. Two agents query the same API at different times, get different results, make conflicting decisions.

C003 HIGH OBSERVED ONCE 5x efficiency

No agent modifies campaign settings. Agents read, analyze, and recommend. A human executes changes in the ad platform.

Why: We gave an agent write access to bid adjustments in month 1. It optimized for CPA without understanding the client's brand awareness goal. Client called asking why impressions dropped 60%.

Failure mode: Agent reduces bids on a brand campaign. Impressions crater. Client sees competitors appearing in their branded search results. Emergency call at 8 PM.

C003 HIGH HUMAN DEFINED RULE 5x efficiency

All external communications (emails, Slack messages to clients, outreach) require founder approval before sending. The Executive Assistant agent drafts. The founder approves. No exceptions.

Why: Reputational damage from bad messages is hard to reverse. AI-drafted communications can be tonally wrong, factually incorrect, or strategically misaligned.

Failure mode: Agent sends email with incorrect performance numbers. Client loses trust. Takes months to repair.

C004 MEDIUM OBSERVED REPEATEDLY 4x efficiency

Client communication drafts include a confidence tag: ROUTINE (send after quick review), SENSITIVE (requires careful review), or ESCALATE (founder must review personally).

Why: Not all client emails need the same level of scrutiny. Performance reports are routine. A response to a complaint is sensitive. A cancellation save attempt is escalate.

Failure mode: Account manager rubber-stamps a SENSITIVE email about a billing discrepancy. Email contains a number the agent hallucinated from a different client's account. Client catches the error and questions our competence.

C004 HIGH OBSERVED REPEATEDLY 7x efficiency

File-based state is authoritative over AI memory. When file data and implicit memory conflict, the file wins. Always load the canonical file before acting on remembered context.

Why: AI memory drifts across sessions. Files do not. Memory supplements but never overrides.

Failure mode: Agent acts on stale implicit memory instead of canonical file. Decisions based on wrong data. Particularly dangerous for client spend data and pipeline status.

C005 HIGH OBSERVED REPEATEDLY 7x efficiency

The Performance Analyst reports patterns but never recommends actions. Reports data, not opinions.

Why: When the analyst recommended actions, recommendations were ignored because they lacked client context. Separating reporting from recommendation improved trust and reduced noise.

Failure mode: Analyst recommends budget increase for a client the account manager knows has cash flow issues. Client loses trust. Or: analyst recommends pausing a campaign the client considers strategically important.

C006 HIGH OBSERVED REPEATEDLY 7x efficiency

The Retention agent overrides the Sales agent when a client is flagged at risk. Retention is the Guardian. Sales is the Hunter. Guardian always wins.

Why: Sales expansion to an at-risk client accelerates churn. Aggressive outreach to a declining client is tone-deaf and damages the relationship.

Failure mode: Sales agent proposes upsell to a client whose satisfaction is declining. Client interprets it as tone-deaf and cancels entirely.

C007 HIGH OBSERVED REPEATEDLY 7x efficiency

Only one agent (the Executive Assistant) has authority to send external communications. All other agents route outreach through the EA. The EA drafts, the founder approves, the EA sends.

Why: Multiple sending agents create duplicate communications, inconsistent voice, and confused recipients.

Failure mode: Sales agent and Retention agent both draft emails to the same client. Client receives contradictory messages.

C001 HIGH MEASURED RESULT 10x efficiency

The billing operations agent can read Stripe data and draft credit/refund recommendations. It cannot execute credits, refunds, or plan changes. A human must approve and execute in the Stripe dashboard.

Why: In week 2, the billing agent auto-applied a $2,400 credit to a customer account based on a support ticket that mentioned "billing issue." The ticket was about a feature request, not a billing problem. The agent misinterpreted the context.

Failure mode: Customer mentions "billing" in any context. Agent interprets it as a billing dispute. Auto-applies credit. $2,400 gone before anyone notices. Discovered during monthly reconciliation.

C002 HIGH OBSERVED ONCE 5x efficiency

Support agent has read access to the codebase for context but cannot create PRs, push commits, or modify any code. It can file GitHub issues with reproduction steps.

Why: The support agent initially had write access for quick fix PRs. It pushed a CSS change that broke the dashboard for 200+ users. The fix took 4 hours because the agent did not run tests before pushing.

Failure mode: Support agent pushes a "simple fix" without tests. Breaks production. Engineering spends 4 hours reverting instead of building features.

C003 MEDIUM OBSERVED REPEATEDLY 4x efficiency

Every customer-facing message includes a visible confidence indicator in the review queue: GREEN (standard, low risk), YELLOW (involves account details or money), RED (churn risk, legal, or escalation).

Why: Not all support responses carry the same risk. A password reset is GREEN. A billing explanation is YELLOW. A cancellation threat is RED. Review depth should match risk.

Failure mode: Support agent drafts a response to a cancellation threat. Not flagged RED. Reviewer misses it. Response is tone-deaf. Customer cancels. $1,800/year lost.

C004 HIGH OBSERVED REPEATEDLY 7x efficiency

State files are the only mechanism for cross-agent data sharing. No agent reads another agent's conversation history or session context.

Why: The onboarding agent once referenced a 3-day-old support ticket from the support agent's context. The issue had been resolved. The onboarding email asked about a problem the customer had already forgotten.

Failure mode: Agent references stale cross-agent context. Sends message about resolved issue. Customer confused. Impression of poor internal coordination.

C001 HIGH OBSERVED ONCE 5x efficiency

All patient education content generated by the education agent must be queued for physician review before distribution. No content bypasses the queue regardless of topic simplicity.

Why: In week 2, the education agent generated a handout on managing seasonal allergies. Content was medically accurate but recommended an OTC antihistamine without noting it interacts with a blood pressure medication common in our patient population. Dr. Pham caught it during review.

Failure mode: Education agent produces 12 handouts in one week. Staff assumes "simple" topics like hydration tips are safe to distribute without review. One handout recommends increased water intake without noting fluid restriction guidance for the 8 heart failure patients in our panel. Patient follows advice, ends up in the ER with fluid overload.

C002 HIGH OBSERVED ONCE 5x efficiency

The approval queue must be reviewed by a physician at least twice per week. If the queue exceeds 20 items, the office manager escalates immediately rather than waiting for the next scheduled review.

Why: The queue hit 47 items in week 3 because both physicians were at a conference. When they returned, they spent 4.5 hours clearing the backlog. Half the content was time-sensitive (flu season handouts) and no longer relevant by the time it was approved.

Failure mode: Queue grows to 47 items over 9 days. Physicians face a wall of content and start rubber-stamping to clear the backlog. Quality of review degrades. 23 items approved in one 90-minute session versus the normal pace of 8 items per session.

C003 HIGH HUMAN DEFINED RULE 5x efficiency

The appointment reminder agent sends reminders at 72 hours, 24 hours, and 2 hours before scheduled appointments. Messages are plain text, never include medical details, and always include the cancellation/reschedule phone number.

Why: HIPAA requires that appointment reminders not disclose the nature of the visit. Early versions included "your follow-up for [condition]" in the reminder text. Tanya caught this before any were sent, but only because she was manually reviewing every message during the first week.

Failure mode: Reminder message includes "your diabetes follow-up appointment" in a text visible on a locked phone screen. Patient's family member sees it. Patient had not disclosed their diagnosis to family. HIPAA violation and destroyed patient trust.

C004 HIGH HUMAN DEFINED RULE 5x efficiency

No-show prediction scores are internal-only. They are never shared with patients, never documented in the medical record, and never used to deny or delay scheduling.

Why: Using predictive scores to manage scheduling could constitute discrimination. A patient flagged as "high no-show risk" might be a single mother with unpredictable childcare. Penalizing her scheduling access based on a prediction model would be both unethical and potentially a fair treatment violation.

Failure mode: Front desk staff sees a patient's no-show risk score of 87% and decides not to offer the last available slot for a specialist referral. Patient does not get timely care. If discovered, this creates significant legal and ethical liability.

C001 HIGH OBSERVED ONCE 5x efficiency

Internal agent operations and customer product operations must use separate API keys, separate databases, separate deployment environments, and separate Slack channels. Zero shared infrastructure.

Why: The performance review pipeline incident proved that shared infrastructure between internal ops and the product creates inevitable cross-contamination.

Failure mode: The eval pipeline monitoring agent used the same API endpoint as the customer-facing eval pipeline. During a routine internal performance review, the agent submitted a team evaluation document ("Q3 performance: Mira needs to improve documentation velocity") through the shared endpoint. It appeared in the staging eval dashboard. 3 customers on the staging beta saw it. One customer emailed: "Is this how your eval tool works? Running performance reviews?" Disclosure was required. 2 customers demanded security audits. One customer paused their contract for 6 weeks while the audit was conducted. $12,600 in deferred revenue.

C002 HIGH OBSERVED ONCE 5x efficiency

No internal agent has read access to customer data stores. Customer onboarding agent can write to customer configuration tables but cannot read customer eval results, model outputs, or usage details.

Why: If an internal agent leaks customer eval data, Synthwave Labs faces breach disclosure obligations under 14 customer contracts. The legal exposure is existential.

Failure mode: Usage analytics agent was granted read access to the product database to track feature adoption. It pulled a query that included customer eval results as a side effect of a poorly scoped SQL join. The data appeared in an internal usage report shared via Slack. The CTO (Rohan) caught it during report review. If it had been shared externally (e.g., in an investor update), it would have triggered breach notifications for 3 enterprise customers.

C003 HIGH OBSERVED ONCE 5x efficiency

Every agent output must be tagged with its source namespace: [INTERNAL] or [PRODUCT]. Any untagged output is treated as a potential data leak and quarantined for review.

Why: When internal and product systems produce similar-looking outputs (eval results, performance metrics, dashboards), the only way to prevent confusion is explicit labeling.

Failure mode: The competitor analysis agent produced a benchmark report comparing Synthwave's eval accuracy against 3 competitors. The report format was identical to the product's customer-facing eval reports. A sales engineer included it in a demo deck, and a prospect asked: "Is this a real eval result or marketing material?" The ambiguity undermined the demo. The deal closed 3 weeks late, and the prospect negotiated a 15% discount citing "concerns about data integrity."

C004 HIGH OBSERVED ONCE 5x efficiency

Investor update agent has access only to aggregated, anonymized metrics. It cannot access individual customer names, contract values, or usage patterns.

Why: Investor updates are shared broadly (board members, advisors, potential investors). Customer-specific data in these documents violates NDAs with 60% of enterprise customers.

Failure mode: Early investor update included "Customer X (Fortune 500 fintech) processes 2.3M eval requests/month." The customer had a strict NDA prohibiting disclosure of their use of AI evaluation tools. An advisor forwarded the update to a contact at a competing AI company. The customer's security team found out and threatened contract termination. Rohan personally flew to their office for a 3-hour meeting to save the relationship. $8,400/month contract preserved, but trust damage took 6 months to recover.

C001 HIGH OBSERVED ONCE 5x efficiency

Haven (CS agent) must only reference policies documented in the approved policy file (`/policies/current-v3.md`). If a customer asks about a policy not in the file, Haven must respond with "Let me check with the team and get back to you within 4 hours" instead of inventing an answer.

Why: An agent-stated policy becomes a de facto policy. If Haven tells a customer they can return an item after 60 days, and the actual policy is 30 days, the business must honor the 60-day window or face a chargeback and a negative review.

Failure mode: Haven told a customer that Threadline offers free return shipping on all orders. The actual policy: free return shipping on orders over $75. The order was $42. The customer screenshotted Haven's response and posted it when the return label wasn't free. We honored it, ate $11.50 in shipping, and updated Haven's policy file immediately. But 3 other customers had received the same incorrect information before we caught it. Total cost: $47.80 in shipping plus a policy document rewrite.

C002 HIGH OBSERVED ONCE 5x efficiency

No agent may issue a refund, store credit, or discount code autonomously. All financial actions require human approval in the #cs-approvals Slack channel. Haven and Rebound may draft the action but must wait for a thumbs-up emoji from the CS rep or ops manager.

Why: Refund fraud is real. Automated refund processing without human verification opens the door to serial returners and social engineering. The $500 threshold is too high for CS -- individual refund approval is required regardless of amount.

Failure mode: During a 3-day stretch where the CS rep was sick and the ops manager was traveling, Haven was temporarily given auto-refund authority for orders under $50. A customer submitted 4 separate return requests for items they never actually returned. Haven processed all 4. Total loss: $187. Auto-refund was permanently revoked.

C003 HIGH INFERENCE 3x efficiency

Forecast produces demand projections. Humans make purchase orders. Forecast must never directly interface with supplier systems or commit to purchase quantities. Its output is advisory only.

Why: A wrong forecast is recoverable -- excess inventory can be discounted or returned. An automated purchase order based on a wrong forecast locks in capital and warehouse space. For a $1.8M brand, a single over-order of $15K in the wrong SKU can wipe a quarter's margin.

Failure mode: Hypothetical, but the guardrail was added after Forecast recommended ordering 2,400 units of a seasonal hoodie based on a trend that turned out to be a one-week spike driven by a TikTok mention. The founder almost placed the PO before checking the data. Actual sustained demand was 340 units. The $31,000 over-order would have been devastating.

C004 HIGH OBSERVED REPEATEDLY 7x efficiency

All agent outputs that reach customers (Haven responses, Rhythm emails, Rebound notifications) must pass through a brand voice check. The voice is warm, direct, slightly irreverent. Never use: "We sincerely apologize," "We value your business," "Please don't hesitate to reach out," or any other corporate customer service boilerplate.

Why: Threadline's brand is built on feeling like a real person, not a corporation. Customers choose DTC brands specifically because they don't want the Nordstrom experience. Corporate language in CS responses signals "we're not who you thought we were."

Failure mode: Rhythm sent a post-purchase email that opened with "Dear Valued Customer, We sincerely appreciate your recent purchase." Open rate was 12% -- the lowest in Threadline's history. The prior email in the same position (written by the marketing coordinator) opened with "Your new threads are on the way. Here's what to expect." and had a 41% open rate. Corporate voice killed engagement.

C001 HIGH OBSERVED ONCE 5x efficiency

Every investor-facing communication produced by any agent must pass a 3-step review chain: (1) Agent draft, (2) Associate review (Tomasz or Priya), (3) Compliance approval (Chen). The communication is not sent until Chen signs off.

Why: SEC Regulation D requires that all investor communications for private placements be free of misleading statements, forward-looking guarantees, and general solicitation violations. A single non-compliant investor email can trigger an SEC inquiry.

Failure mode: In month 2, the investor comms agent sent a quarterly update draft directly to Sarah for approval, skipping Chen. Sarah forwarded it to 3 LPs before Chen reviewed it. The email contained the phrase "we expect continued strong returns" -- a forward-looking statement without required disclaimers. Chen caught it the next day. Sarah had to send a correction email to the 3 LPs with disclaimers attached. One LP's attorney called asking questions. No formal action, but 2 weeks of anxiety.

C002 HIGH OBSERVED ONCE 5x efficiency

The word "recommend" and its variants (recommended, recommending, recommendation) are banned from all agent output. Deal memos use "analysis suggests," "data indicates," or "considerations include."

Why: Investment recommendations trigger registration requirements under the Investment Advisers Act. Upside operates under a Regulation D exemption that does not include investment advisory services. Using the word "recommend" in a deal memo sent to investors could be construed as unregistered advisory activity.

Failure mode: A deal memo draft used the phrase "We recommend LPs consider increasing allocation to this asset class." Chen flagged it during compliance review. Had it been sent, it could have jeopardized Upside's Reg D exemption. The word was banned globally after this incident.

C003 HIGH OBSERVED ONCE 5x efficiency

Portfolio performance figures shown to investors must use audited numbers only. Preliminary, estimated, or internally-calculated figures require explicit "PRELIMINARY - SUBJECT TO AUDIT" labeling in bold at the top of the report and next to each figure.

Why: Misrepresenting performance to accredited investors, even unintentionally, is securities fraud. An unaudited number that later changes during audit creates a discrepancy that investors' attorneys will flag.

Failure mode: The portfolio reporting agent generated a Q3 report using Derek's preliminary internal calculations (12.3% IRR). The audited number came back at 10.8% IRR. The report had already been distributed to 42 investors without a "preliminary" disclaimer. Sarah had to send a correction letter explaining the 1.5 percentage point difference. 3 investors called to discuss. One reduced their next commitment by $150K citing "reporting inconsistencies."

C001 HIGH OBSERVED ONCE 5x efficiency

Every agent invocation must include a brand context identifier. No agent may operate without knowing which brand it is serving. The brand context determines: voice/tone, return policy, pricing rules, customer data scope, and financial thresholds.

Why: A brandless agent invocation defaults to whatever context was loaded last, which means Brand A's rugged outdoor voice might respond to a Brand B minimalist customer. This isn't hypothetical -- it happened.

Failure mode: Chorus generated product descriptions for Forma Daily (minimalist brand) while still holding Ridgeline Outfitters context from the prior task. The descriptions included phrases like "built for the trail" and "adventure-ready construction" for a plain white cotton t-shirt. The marketing team caught 4 of 7 descriptions before they were published. Three went live on Shopify for 6 hours. A customer tweeted: "Since when did plain tees become adventure gear?"

C002 HIGH OBSERVED ONCE 5x efficiency

Customer data must be scoped to the brand where the purchase was made. Harbor, Reflow, and Cadence must never cross-reference customer records across brands, even if the same email address exists in multiple brand databases.

Why: A customer who bought hiking boots from Ridgeline and a wallet from Copper & Thread has two distinct relationships. Mentioning the hiking boots in a Copper & Thread email feels invasive and breaks the illusion that each brand is its own entity. Some customers don't even know the brands share ownership.

Failure mode: Cadence pulled "past purchasers" for a Copper & Thread re-engagement campaign and accidentally included Ridgeline customers with matching email addresses. 340 Ridgeline-only customers received a "We miss you at Copper & Thread" email for a brand they'd never bought from. 12 unsubscribes. 3 "who is this?" replies. One customer filed a CCPA data access request asking how Copper & Thread got their email.

C003 HIGH OBSERVED REPEATEDLY 7x efficiency

Financial alert thresholds must be brand-proportional. Brand A ($2.1M, ~$5,750/day avg): flag daily spend deviations above $800. Brand B ($1.5M, ~$4,100/day avg): flag above $500. Brand C ($600K, ~$1,640/day avg): flag above $200. A flat threshold either misses Brand C problems or drowns the team in Brand A false positives.

Why: $400 in unexpected spend is noise for a $2.1M brand and a crisis for a $600K brand. Agents without proportional thresholds train the team to ignore alerts, which means real problems get missed.

Failure mode: Original flat threshold was $500 across all brands. Brand C's Google Shopping campaign ran $380 over budget for 4 consecutive days ($1,520 total overspend, representing 7% of monthly budget). Never triggered an alert. Brand A triggered 11 false alerts in the same period. The team started ignoring Slack alerts entirely. Brand C's overspend wasn't caught until the monthly finance review.

C004 HIGH OBSERVED ONCE 5x efficiency

Pulse (Meta) and Signal (Google) must never present blended ROAS or blended CPA across platforms. Each platform's metrics are reported in their own attribution model. If a cross-platform comparison is needed, both agents must present the raw numbers side-by-side with a caveat noting the attribution model difference.

Why: Meta's 7-day click / 1-day view attribution and Google's data-driven attribution produce fundamentally different numbers for the same customer journey. Blending them produces a ROAS number that is meaningfully wrong -- it either over-credits (double counts) or under-credits (misses view-through).

Failure mode: An early version of the weekly report blended Meta and Google ROAS into a single "portfolio ROAS" number. This number showed 3.8x. The CEO used it in a board presentation. An investor with DTC experience immediately asked "what's the attribution methodology?" The CEO couldn't answer. The investor's follow-up: "If you don't know how your ROAS is calculated, you don't know if you're profitable." Uncomfortable meeting.