All Sections
Coordination Intelligence

human ai boundary conditions

79 claims from 32 organizations

Where human oversight is required and where agents have full autonomy. These claims define the trust boundary between human and AI decision-making. Getting this wrong either bottlenecks the organization (too much human oversight) or creates risk (too little).

Acme Digital Agency Founding gold
C013 HIGH HUMAN DEFINED RULE 5x efficiency

All external communications require human approval.

Why: AI comms can be wrong.

Failure mode: Email with incorrect numbers. Client loses trust.

C014 LOW SPECULATION 0.5x efficiency

Pricing and financial commitments require human decision.

Why: Legal and relationship implications.

Failure mode: Agent applies discount violating margin floors.

C013 MEDIUM OBSERVED REPEATEDLY 4x efficiency

The creative director has veto power over any GPT-generated ad copy, regardless of account manager approval. No creative ships without CD review for accounts over $10K/mo.

Why: Account managers optimize for speed and client satisfaction. The creative director optimizes for brand integrity and long-term positioning. When AMs could approve and ship creative without CD review, three clients ended up with nearly identical ad copy structures (GPT's favored patterns). A client noticed their competitor (also our client) had similar-sounding ads. The creative director now reviews all high-spend accounts.

Failure mode: Without creative authority oversight, generative AI produces converging output across clients. Clients sharing a market notice the similarity. Agency credibility suffers.

C014 MEDIUM HUMAN DEFINED RULE 3x efficiency

No agent may create, modify, or delete HubSpot deal records. Agents read HubSpot. Humans write to HubSpot.

Why: Founding rule based on the VP of Media's experience at a previous agency where an automation moved 8 deals to "Closed Lost" based on a date-based rule that didn't account for extended negotiation timelines. Two of those deals were actively in conversation. The sales rep discovered the status change 3 days later and had to re-open them manually. One prospect noticed the "Closed" status in a shared HubSpot view and asked if they should look elsewhere.

Failure mode: Automated CRM writes based on rules or heuristics override active human judgment. Prospects and clients see status changes that don't reflect reality.

C017 LOW OBSERVED ONCE 1.5x efficiency

Any communication to a client that includes performance data must be reviewed by the media buyer who manages that account, not just the account manager.

Why: An account manager sent a GPT-drafted email that cited "your CPA dropped 15% this month." The media buyer managing that account knew the CPA drop was because they'd shifted budget from prospecting to retargeting -- the CPA looked better but new customer acquisition had actually declined. The client replied asking to scale the "winning strategy," which would have meant cutting all prospecting. The media buyer intervened and reframed the conversation.

Failure mode: Accurate data presented without strategic context creates false narratives. Clients make budget decisions based on metrics that look good in isolation but mask underlying trade-offs.

C016 MEDIUM OBSERVED ONCE 3x efficiency

Franchisees may override any agent recommendation for their location. The override is logged but not challenged. Patterns of overrides (3+ on the same recommendation type) trigger a review with the regional manager.

Why: Franchisees know their local market. But repeated overrides of data-driven recommendations may indicate either a bad model or a struggling franchisee.

Failure mode: One franchisee overrode lead scoring 8 times in a month, insisting his "gut" was better. His location's close rate dropped from 28% to 14% over that period. The pattern trigger caught it and Lena intervened with coaching.

C017 LOW SPECULATION 0.5x efficiency

The corporate marketing director (Lena) is the single approval gate for all network-wide communications, budget reallocations, and new campaign launches. No agent bypasses Lena.

Why: Franchise networks require a human who understands both the brand and the individual franchisee relationships. Agents optimize for metrics; Lena optimizes for the franchise ecosystem.

Failure mode: Not yet violated. But the system was designed after a pre-agent incident where an intern launched a network-wide email campaign without Lena's review. The email contained a pricing error that generated $8,000 in honor-the-price requests.

C018 LOW SPECULATION 0.5x efficiency

Trainer scheduling decisions that affect trainer pay (shift additions, cancellations, or reassignments) require the location manager's approval within 2 hours. If no approval is received, the change does not go through.

Why: Trainers at Apex are paid per class. A cancelled class means lost income. Agent-driven cancellations without manager awareness create labor disputes.

Failure mode: Hypothesized based on the class occupancy incident (C008). When a class is cancelled due to low occupancy, the trainer loses income. If the cancellation decision was agent-driven without manager input, the trainer has no one to appeal to.

C015 HIGH OBSERVED ONCE 5x efficiency

Creative feedback on deliverables (color, composition, pacing, music) is exclusively human territory. Agents never comment on creative quality.

Why: The moment an agent says "this looks good" or "consider adjusting the pacing," the creative team feels judged by a machine. It's a psychological boundary, not a technical one.

Failure mode: Early shot list agent included a note: "The pacing of Scene 3 feels slow compared to the brand's typical energy." Designer responded: "Who asked the spreadsheet for creative notes?" Feature was removed within the hour.

C016 MEDIUM OBSERVED ONCE 3x efficiency

Marcus is the only person who can modify agent behavior or prompts. The creative team can request changes but cannot directly edit agent configurations.

Why: If everyone tweaks the agents, behavior becomes unpredictable. Single owner means single accountability.

Failure mode: Designer changed the shot list agent's reference sources to include a competitor's work. Next output included shots clearly inspired by a competitor's recent campaign. Marcus caught it before the client saw it, but it could have been a legal issue.

C017 MEDIUM INFERENCE 2x efficiency

When a client asks "do you use AI?" the answer is always honest: "We use AI tools for project management and research. All creative work is done by our team."

Why: Clients will find out eventually. Being caught lying is worse than any stigma around AI tooling.

Failure mode: No direct failure yet, but Marcus rehearsed this answer after a peer agency was called out on Twitter for using AI-generated thumbnails and claiming they were hand-designed. The backlash cost that agency two clients publicly.

Atticus Legal bronze
C016 HIGH HUMAN DEFINED RULE 5x efficiency

Priya makes all legal decisions: distribution schemes, trustee selections, incapacity definitions, executor appointments, guardian nominations. The agent assembles. Priya decides.

Why: Estate planning involves irreversible decisions that affect families for generations. A trust that distributes assets per-stirpes instead of per-capita changes who inherits and how much. These decisions require understanding family dynamics, client values, tax implications, and state law nuances that no AI system can reliably navigate.

Failure mode: Agent selects a distribution scheme based on the most common pattern in its training data. Client's situation is non-standard but not flagged as COMPLEX. Trust is assembled with the wrong distribution scheme. Discovered after the client has passed. Beneficiaries litigate for two years. Priya's malpractice insurance covers the claim but her premium doubles.

C017 LOW HUMAN DEFINED RULE 1.5x efficiency

Client meetings, consultations, and signing ceremonies are always conducted by Priya in person. The agent prepares materials beforehand and processes notes afterward. No AI presence in client interactions.

Why: Estate planning conversations involve grief, family conflict, end-of-life wishes, and deeply personal financial disclosures. Clients need to feel that a human being is listening to them and understanding their family's unique situation. The trust built in those meetings is what generates referrals.

Failure mode: Client mentions during signing that she has been estranged from her son and might want to reconsider the trust terms. This is a pivotal moment that requires human empathy and legal judgment in real time. No AI system can navigate a conversation where a mother is reconsidering disinheriting her child.

C016 LOW INFERENCE 0.8x efficiency

Keisha personally handles all intake conversations with new families. No agent involvement until the student is enrolled and has a unique ID.

Why: The intake conversation is where trust is built. Parents are evaluating whether to trust a stranger with their child's education. Automation at this stage is a deal-killer.

Failure mode: No direct failure -- Keisha established this boundary from day 1 after observing that 90% of her competitors use automated intake forms. She attributes her 85% intake-to-enrollment rate (vs. industry average of 40%) to the personal touch.

C017 LOW INFERENCE 0.8x efficiency

Any communication involving a student's behavioral issues, parent concerns, or sensitive topics is drafted entirely by Keisha. Agents have no role in sensitive communications.

Why: A tutor once reported that a student was "consistently distracted and possibly dealing with something at home." This requires human empathy and careful word choice that no agent can replicate.

Failure mode: No direct failure -- Keisha pre-empted this after the progress agent included the tutor's raw note about a student seeming "sad and distracted" in a draft report. Keisha caught it in review and realized the agent had no judgment about what raw notes are appropriate to surface to parents. She immediately excluded all behavioral/emotional observations from agent processing.

Candor Labs bronze
C012 MEDIUM OBSERVED ONCE 3x efficiency

The founder reviews and approves every support response before it reaches a user. No auto-responses, even for FAQ-type questions.

Why: At $12K MRR with 340 users, every support interaction is a retention opportunity. The triage agent drafted a response to a user question about API rate limits: "Rate limits are documented at docs.candor.dev/limits." Technically correct but cold. The user was a $89/mo enterprise plan customer who was evaluating whether to expand to their team. The founder rewrote it as a personal message, offered a 15-minute call, and closed a $267/mo team plan upgrade. The auto-response would have answered the question and missed the expansion signal.

Failure mode: Efficient auto-responses miss relationship signals embedded in support interactions. Users with expansion intent receive transactional answers. Revenue opportunity is invisible to the agent.

C013 MEDIUM OBSERVED REPEATEDLY 4x efficiency

No agent may label an issue as "won't fix" or "not a bug." Only the founder assigns terminal states.

Why: The triage agent categorized a user report as "not a bug -- expected behavior" based on the documentation. The user's point was that the expected behavior was confusing and should be redesigned. The founder agreed with the user. The agent's categorization would have closed a valid UX improvement because the agent couldn't distinguish between "works as documented" and "documentation reflects a poor design choice."

Failure mode: Agents apply terminal labels based on current system behavior. They cannot evaluate whether the current behavior should change. Valid improvement requests are closed as non-issues.

C016 HIGH OBSERVED ONCE 5x efficiency

Strategic recommendations in deliverables must be written by the consultant, not the agent. Agents provide data, structure, and draft analysis. The "so what" and "we recommend" sections are human-only.

Why: Clients are paying for experienced human judgment. Agent-generated recommendations lack the nuanced understanding of client politics, organizational readiness, and implementation feasibility that comes from years of consulting experience.

Failure mode: Lens drafted a recommendation to "consolidate the client's 3 distribution centers into 1 regional hub." Analytically sound based on the data. Politically catastrophic because the CEO's brother manages one of the centers slated for closure. The consultant caught it, but only because she knew the client's internal dynamics.

C017 MEDIUM OBSERVED ONCE 3x efficiency

Pricing decisions, discount approvals, and scope change negotiations are exclusively human decisions. No agent may suggest, imply, or commit to pricing in any client-facing communication.

Why: Pricing in consulting is relationship-dependent, not formula-dependent. The same scope of work might be priced at $75K for a new client and $55K for a long-term relationship with expansion potential. Agents cannot evaluate these trade-offs.

Failure mode: Archer included a "preferred client discount of 10%" in a renewal proposal because the client had been with us for 3 years. The partner had intended to increase rates by 5% given the expanded scope. The client saw the discount language and anchored to it. We lost $18,000 in annual margin.

C012 HIGH SPECULATION 1.5x efficiency

Jamie has final override on all agent recommendations. No agent escalates past Jamie to staff or members directly.

Why: Jamie is the brand. Members know Jamie. An agent that bypasses Jamie breaks the chain of trust that makes CoreFit work.

Failure mode: Not yet violated, but the boundary was tested when the retention agent flagged a high-value member and the draft "save" email sat in Jamie's queue for 3 days. The member cancelled before Jamie reviewed it. Now Jamie has a 24-hour SLA on retention flags.

C013 MEDIUM SPECULATION 1x efficiency

Trainers are not told which recommendations come from agents vs. Jamie's own analysis. The agent layer is invisible to staff.

Why: Trainers who know "a computer" is evaluating them behave differently -- they game metrics instead of improving. Jamie learned this from a gym owner friend who made the mistake of being transparent about it.

Failure mode: Hypothesized based on industry peer experience. One gym that revealed AI performance tracking saw trainers artificially inflating class check-ins by having members scan twice.

C018 LOW SPECULATION 0.5x efficiency

Members who explicitly ask "Is this automated?" or "Am I talking to a bot?" must receive an honest answer routed through Jamie. No agent is authorized to deny being AI.

Why: Honesty is a brand value. Getting caught lying about automation is worse than admitting it.

Failure mode: Hypothesized. Not yet asked directly, but Jamie has a templated response ready: "We use AI tools to help us stay on top of things, but every message is reviewed by a real person before it's sent to you."

DevForge silver
C017 LOW INFERENCE 0.8x efficiency

Kai personally writes all responses to first-time contributors. Their first interaction with the project sets the tone for their entire contribution arc.

Why: Open-source projects live and die by contributor retention. A first-time contributor who feels welcomed submits 5x more PRs over the following year than one who gets a generic response.

Failure mode: No direct failure -- Kai established this rule after tracking contributor retention rates. Contributors whose first PR got a personal, detailed review had a 6-month retention rate of 45%. Contributors who got a brief "LGTM, merged" had a 6-month retention rate of 12%. The personal touch is the single biggest lever for community growth.

C018 LOW INFERENCE 0.8x efficiency

Security vulnerability reports are handled exclusively by Kai. No agent reads, processes, or drafts responses to security reports.

Why: Security reports contain exploit details. Routing them through any automated system increases the attack surface and the risk of accidental disclosure.

Failure mode: No direct failure, but Kai established this rule after reading about an open-source project where an AI assistant summarized a security report and included the exploit details in a public issue comment. The vulnerability was exploited within hours of the comment being posted. Kai will not risk this with DevForge's 2,400-star community.

C016 MEDIUM SPECULATION 1x efficiency

Maya (compliance officer) has absolute veto on any agent output that could be interpreted as financial advice, investment guidance, or credit-related recommendations. Her veto cannot be overridden by Raj or any other team member.

Why: Regulatory penalties for unlicensed financial advice start at $10,000 per incident. Maya's authority is the last line of defense.

Failure mode: Not yet violated. Raj once pushed back on a content veto ("it's just a blog post"), but Maya held firm and Raj backed down after she cited the CFPB enforcement action against a competitor for similar language. The near-miss reinforced Maya's authority.

C017 LOW SPECULATION 0.5x efficiency

The engineering team reviews all API monitoring alert thresholds quarterly. The monitoring agent proposes threshold adjustments based on observed data, but the engineering lead approves changes.

Why: Thresholds that are too sensitive create alert fatigue. Thresholds that are too loose miss real incidents. The engineering team understands the infrastructure nuances that the monitoring agent does not.

Failure mode: Hypothesized. The current thresholds were set based on 6 months of data and have not yet needed adjustment. The quarterly review exists to prevent threshold drift as the user base grows and API patterns change.

C018 LOW SPECULATION 0.5x efficiency

No agent may access or reference a user's actual financial position (total balance, spending patterns, debt levels) when generating any external communication. Users are addressed as "users" not as segments defined by their financial status.

Why: Segmenting users by wealth or spending level in communications ("We noticed you spend a lot on dining out") feels invasive and can violate fair lending adjacent regulations depending on how the data is used.

Failure mode: Hypothesized based on a competitor incident. A competing app sent an email to "high spenders" with personalized savings tips. Users who received it felt profiled. The app's subreddit had 200+ comments in a thread titled "How does [app] know how much I spend?"

C017 HIGH HUMAN DEFINED RULE 5x efficiency

No agent provides legal advice, case assessments, or settlement recommendations to clients. Agents process data and generate drafts. Attorneys exercise legal judgment.

Why: The unauthorized practice of law is a criminal offense in North Carolina. An AI system that provides legal advice to a client without attorney supervision exposes the firm to criminal liability, bar discipline, and malpractice claims simultaneously.

Failure mode: Client emails asking "Should I accept their offer of $85K?" Comms agent responds with "Based on similar cases, $85K appears below the expected range." Client relies on this and rejects the offer. Settlement falls through. Client sues the firm arguing the AI provided bad legal advice.

C018 LOW HUMAN DEFINED RULE 1.5x efficiency

Agents generate work product that is reviewed under attorney supervision. This preserves work product privilege. If an agent generates a document without attorney involvement in the process, the privilege may not attach, making the document discoverable.

Why: The firm's ethics counsel determined that AI-generated drafts are protected under work product doctrine only if an attorney directed the creation and reviewed the output. Documents generated autonomously by an agent without attorney direction may not qualify for privilege protection.

Failure mode: Demand agent autonomously generates a case evaluation memo without attorney direction. Opposing counsel argues the memo is not privileged because no attorney supervised its creation. Judge agrees. Internal case strategy is disclosed to the opposing side. Settlement leverage destroyed.

C017 HIGH HUMAN DEFINED RULE 5x efficiency

Pricing strategy, listing presentations, and buyer consultations are human-only. AI provides data preparation. The agent presents the strategy, reads the client, adjusts in real time, and builds the relationship that generates referrals.

Why: Real estate is a relationship business. The top-producing agent at Keystone (Maria, $420K GCI) attributes 70% of her business to referrals. Those referrals come from personal connections built during high-stakes negotiations, not from listing descriptions or comp reports. The AI makes Maria faster. Maria makes the clients trust the brokerage.

Failure mode: Over-reliance on AI-generated materials causes agents to under-prepare for client interactions. Agent shows up to a listing presentation with the AI's comp report but has not walked the neighborhood or researched the seller's motivation. Seller chooses a competitor who did the legwork.

C018 LOW HUMAN DEFINED RULE 1.5x efficiency

Negotiation is exclusively human. The AI does not draft counteroffers, suggest negotiation tactics, or communicate with the other party's agent. Negotiation requires reading emotional cues, understanding motivations, and making judgment calls that an AI cannot reliably make.

Why: A buyer's agent at another brokerage reportedly used AI to draft a counteroffer response. The response was technically sound but tonally aggressive. The seller's agent took offense. What should have been a routine $5K negotiation became adversarial. Deal nearly fell through over tone, not terms.

Failure mode: AI drafts a counteroffer response that is legally correct but emotionally tone-deaf. Seller's agent perceives disrespect. Relationship deteriorates. Negotiation that should close in 2 rounds extends to 5. Deal falls through. Buyer loses the home. Agent loses the commission. Everyone blames the tone of a single email.

KGORG Founding silver
C008 HIGH HUMAN DEFINED RULE 5x efficiency

Seats with approval_required or supervised execution modes indicate that autonomy is intentionally limited in areas tied to personal operations, communications, or medium-risk work.

Why: Governance is being used as an explicit boundary mechanism, not just as documentation.

Failure mode: If execution mode is ignored, agents may begin acting beyond the risk tolerance set for a solo personal operating system.

C018 HIGH HUMAN DEFINED RULE 5x efficiency

Personal-development and creative seats are configured with low authority and approval-heavy execution, signaling that introspection, voice, and personal expression remain human-led domains.

Why: These areas affect identity, judgment, and public representation, so the system keeps AI in a support role rather than granting broad autonomy.

Failure mode: If these seats act too freely, generated reflections, coaching guidance, or public writing may drift from the principal's authentic voice or values.

C016 HIGH OBSERVED ONCE 5x efficiency

Clinical judgment is exclusively human. Agents may present data (utilization rates, no-show patterns, payer statistics), but clinical decisions -- treatment plans, discharge timing, referral decisions, and patient communication about medical topics -- are exclusively the domain of licensed clinicians.

Why: Agents are not licensed healthcare providers. Any agent output that could be interpreted as clinical guidance exposes the practice to malpractice liability. The boundary isn't about capability -- it's about licensure and liability.

Failure mode: Flow's scheduling analysis recommended "reducing average appointment duration from 45 minutes to 30 minutes for follow-up visits based on utilization data." This is a clinical decision disguised as a scheduling recommendation. The clinical director rejected it because 45-minute follow-ups are the clinical standard of care for the types of injuries treated. Shorter appointments would compromise treatment quality. Flow was updated to propose scheduling changes that don't alter appointment durations without clinical director approval.

C017 LOW INFERENCE 0.8x efficiency

Patient-facing communication -- appointment reminders, recall messages, satisfaction surveys, and any text or email that a patient receives -- must be reviewed by the practice manager before sending. Agents may draft but never send.

Why: Patients associate all communications from the practice with their healthcare provider. An impersonal or tone-deaf automated message can damage the therapeutic relationship. Worse, a message that inadvertently references treatment details is a HIPAA violation.

Failure mode: Beacon drafted an appointment reminder template that included: "Looking forward to seeing you for your [treatment_type] session!" The template variable was designed to pull from the de-identified appointment type field (e.g., "physical therapy"). But for specialized appointments, the field contained more specific values like "post-surgical rehabilitation" -- which is a clinical detail that constitutes PHI when linked to the patient receiving the reminder. The practice manager caught it and replaced the variable with a generic "your upcoming appointment."

C016 HIGH HUMAN DEFINED RULE 5x efficiency

Lease negotiations, rent increase conversations, and non-renewal notices are always delivered by Corinne in person or by phone. The AI prepares data packets (market comps, payment history, maintenance costs for the unit) but Corinne delivers the message.

Why: A rent increase conversation requires reading the tenant's reaction, understanding their financial situation, and sometimes making a judgment call that preserves a good tenant at slightly below-market rent. The 2% increase for Unit 4B (C006) saved a $3,200 turnover. That judgment cannot be automated.

Failure mode: AI delivers a rent increase notification by email. Tenant sees "your rent is increasing 6%" with no human context. Feels like a corporation, not a landlord who knows them. Tenant does not negotiate. Tenant leaves. Corinne finds out when she sees the 30-day notice.

C017 LOW HUMAN DEFINED RULE 1.5x efficiency

Tenant disputes, habitability complaints, and any communication that references legal action or "my rights" are immediately routed to Corinne with no AI response. The comms agent sends only: "Thank you for contacting us. Your property manager Corinne will be in touch within 24 hours." No acknowledgment of the substance of the complaint.

Why: Virginia landlord-tenant law is specific and adversarial. A poorly worded AI response to a habitability complaint could constitute an admission. A tenant wrote "the mold in my bathroom is a health hazard and I know my rights." The comms agent drafted a response that included "we understand your concern about the mold." Corinne intercepted it. The word "mold" in a management communication could trigger remediation obligations under Virginia code regardless of whether actual mold was confirmed.

Failure mode: AI acknowledges "mold" in a response. Tenant screenshots the message. Files a habitability complaint citing management's acknowledgment. Virginia code triggers mandatory remediation timeline once landlord has "notice." AI response becomes evidence of notice. Remediation cost: $4,000-$12,000 depending on scope. All because the AI used the tenant's own word back to them.

Learnwell silver
C017 LOW INFERENCE 0.8x efficiency

Priya personally handles all teacher onboarding calls. No agent involvement during live conversations with teachers. Agents prepare briefing docs before calls and log notes after.

Why: Teachers evaluate EdTech tools based on whether the people behind them understand teaching. A knowledgeable human on an onboarding call converts at 72%. An email-only onboarding converts at 18%.

Failure mode: No direct failure -- Priya tested skipping onboarding calls for a cohort of 10 teachers and relying on automated email sequences. 2 of 10 adopted the platform (20%) vs. the typical 7 of 10 (70%). The 8 who didn't adopt cited "didn't feel like they understood my classroom needs" in exit surveys.

C018 LOW INFERENCE 0.8x efficiency

Student data is never used for marketing, case studies, or testimonials without explicit written consent from the student (and parent if under 18). Agents cannot access student data for any purpose outside of direct service delivery.

Why: EdTech companies using student data for marketing have been targeted by advocacy groups and state attorneys general. The legal and reputational risk is existential for a company Learnwell's size.

Failure mode: Engagement metrics agent compiled a "success story" dataset showing students who improved test scores by 20%+ while using the platform. The teacher outreach agent nearly included specific improvement percentages in outreach emails. Sam caught it in review. If those numbers had been sent without student consent, it would have violated FERPA guidelines and potentially triggered a state investigation. One complaint could have shut down the company.

McFadyen Digital Founding silver
C019 HIGH HUMAN DEFINED RULE 5x efficiency

Pricing decisions, discount approvals, and engagement scoping are human-only. AI agents may suggest pricing based on historical data, but the CRO or a named VP must approve all commercial terms.

Why: Our pricing varies dramatically based on platform complexity, client maturity, offshore/onshore mix, and strategic account value. The Sales Agent once suggested a $280K price for an engagement that the CRO priced at $450K because of strategic upsell potential the AI could not see. The human context on relationship dynamics and long-term account value is irreplaceable.

Failure mode: Under-pricing erodes margin. Over-pricing loses deals. Both damage the business differently, and AI cannot weigh the tradeoffs with sufficient context on relationship history and strategic intent.

C020 MEDIUM HUMAN DEFINED RULE 3x efficiency

Hiring decisions, performance evaluations, and team composition for client engagements are human-only. AI may surface utilization data, skills matching, and availability, but the staffing decision is made by the delivery lead.

Why: Staffing a $1M marketplace build is not a skills-matching problem. It requires understanding team dynamics, growth opportunities for junior staff, client personality fit, and timezone overlap preferences. AI sees the spreadsheet. Humans see the team.

Failure mode: An early experiment with AI-recommended staffing suggested putting two senior developers with known collaboration friction on the same engagement because their skills were complementary on paper. The delivery lead overrode it. We formalized the boundary.

C011 HIGH OBSERVED REPEATEDLY 7x efficiency

No agent may send any communication to a client -- email, Slack, SMS -- without explicit human approval for the first 30 days.

Why: The Saturday morning auto-send incident (see C007). Twelve clients received decontextualized performance alerts. One client escalated to their CMO, who called our founder to "discuss the relationship." The client didn't churn, but the founder spent 4 hours that weekend in damage control. The agent was technically correct. The communication was still harmful.

Failure mode: Technically accurate but contextually tone-deaf messages damage client relationships.

C012 HIGH OBSERVED REPEATEDLY 7x efficiency

The founder's override is final. If the founder says "ignore this alert," the agent marks it suppressed with a reason and timestamp, and does not resurface it for 7 days.

Why: The ad monitor flagged a $200/day overspend on a client running a 2-week promotional push. The founder dismissed it. The agent re-flagged it the next day. And the next. For 8 consecutive days. The founder started ignoring all alerts from that agent. He missed a real $1,800/day overspend on a different client 3 days later because he'd trained himself to skip the alerts.

Failure mode: Persistent re-alerting on dismissed items causes the human to tune out the entire alert channel. Real emergencies get buried.

C018 LOW INFERENCE 0.8x efficiency

Budget change recommendations require human approval regardless of agent confidence. No agent may increase or decrease ad spend autonomously.

Why: Non-negotiable from day one. The founder watched another agency's automation tool increase a client's daily budget from $150 to $450 during a conversion spike that turned out to be bot traffic. The client was billed $2,700 in wasted spend before a human caught it. Meridian's agents flag pacing anomalies. Humans decide the response.

Failure mode: Autonomous budget changes amplify bad signals. Bot traffic, attribution glitches, and seasonal spikes all look like "good performance" to an algorithm.

C006 HIGH HUMAN DEFINED RULE 5x efficiency

Five mandatory human review gates exist at Steps 3, 4, 6, 8, and 9. AI-generated output at these steps cannot proceed without explicit human approval.

Why: These gates protect against the five highest-risk failure modes: scope drift (Step 3), technology lock-in (Step 4), data model errors (Step 6), assumption blindness (Step 8), and false validation (Step 9).

Failure mode: Team skips the Step 8 review gate ("What Are We Missing?"). AI-generated evaluation confirms the project is on track. A lifecycle implication (regulatory compliance, data migration, support staffing) is missed. It surfaces in production.

C007 MEDIUM OBSERVED REPEATEDLY 4x efficiency

Technology selection (Step 4) requires human evaluation of AI-recommended options. The standard is "happy with" not "optimal." AI presents options with tradeoffs. The human decides.

Why: AI optimizes for technical criteria. Humans weigh organizational readiness, team skill gaps, vendor relationships, and risk tolerance, factors AI cannot fully assess.

Failure mode: AI recommends a technically superior framework. The team has no experience with it. Learning curve delays the project by two months. A familiar, adequate framework would have shipped in three weeks.

C008 HIGH HUMAN DEFINED RULE 5x efficiency

Business purpose validation (Step 9) requires the human practitioner to define testable hypotheses with explicit pass/fail criteria before AI generates validation artifacts.

Why: If AI generates both the hypotheses and the validation criteria, confirmation bias is structurally embedded. The human must define what "pass" looks like before AI checks.

Failure mode: AI generates hypotheses and their validation tests. The tests are subtly aligned to confirm the hypotheses. The project "passes" validation but the business purpose is not actually met. Failure surfaces post-launch.

C009 MEDIUM OBSERVED ONCE 3x efficiency

Data structure and domain modeling (Step 6) requires human review of AI-generated models against real-world domain knowledge. AI models from requirements. Humans validate against operational reality.

Why: AI domain models are structurally correct but can miss implicit domain constraints, edge cases, and business rules that exist only in practitioners' heads.

Failure mode: AI generates a clean data model for a scheduling system. The model does not account for a business rule that certain resource types cannot overlap. The constraint is discovered in production when double-bookings occur.

C010 MEDIUM INFERENCE 2x efficiency

Step 8 (Comprehensive Evaluation) is a mandatory adversarial review. The human practitioner asks "What are we missing?" and challenges every assumption. AI generates the evaluation. The human decides what to act on.

Why: AI is poor at identifying its own blind spots. A structured adversarial review forces examination of lifecycle implications, deployment concerns, support models, and edge cases that AI-generated artifacts tend to omit.

Failure mode: Team treats Step 8 as a rubber stamp. AI reports "no issues found." The team proceeds. A data migration requirement surfaces during deployment that adds four weeks to the timeline.

C015 LOW INFERENCE 0.8x efficiency

The founder must personally write the first and last paragraphs of every deliverable. The opening sets the strategic frame. The closing commits to next steps. Both must carry the founder's authentic voice and judgment. Forge handles the middle.

Why: Clients read the opening and closing most carefully. These are the sections that convey whether the consultant truly understands their situation and is committed to the outcome. Agent-written bookends, no matter how polished, lack the specificity and conviction that comes from genuine understanding.

Failure mode: The founder let Forge write an entire board memo end-to-end, only making minor edits. The board chair later told the founder privately: "The analysis was solid but your memo didn't sound like you. It sounded like it could have come from anyone." The relationship was strong enough to survive the comment, but it was a warning.

C016 LOW SPECULATION 0.5x efficiency

Relationship decisions are exclusively human. Which clients to pursue, which to fire, how to handle a difficult conversation, when to push back on scope creep, when to discount -- these decisions involve judgment that agents cannot replicate. Agents provide data. The founder decides.

Why: A solo consultant's practice IS the founder's relationships. An agent optimizing for revenue might suggest pursuing a client that the founder knows is culturally toxic. An agent optimizing for efficiency might suggest firing a low-revenue client who provides strategic referrals.

Failure mode: Tempo flagged a client as "low engagement, low revenue, recommend deprioritizing" based on meeting frequency and invoice amounts. The client was the founder's first customer, a consistent referral source, and a personal mentor to the founder. The recommendation, while data-rational, was relationship-blind. The founder ignored it, but the incident highlighted the need for an explicit boundary.

C017 HIGH HUMAN DEFINED RULE 5x efficiency

Founder has unlimited override authority over all agents.

Why: A human must always be able to stop any AI action.

Failure mode: Agent publishes unapproved spec change. No way to reverse.

C018 LOW SPECULATION 0.5x efficiency

IP strategist has kill authority on the entire venture.

Why: External kill authority prevents sunk-cost fallacy.

Failure mode: Market thesis invalidated but founder keeps building.

C016 HIGH OBSERVED ONCE 5x efficiency

The creative team can see and modify any agent output at any time. There is no "agent-only" workflow. Every agent output file is in a shared Google Drive folder that the entire team can access.

Why: Transparency is what saved the agent program after the month-3 crisis. Kai's specific complaint was "I didn't know this existed until I saw a brief I didn't write." Making everything visible rebuilt trust. Hidden workflows breed suspicion.

Failure mode: The month-3 crisis. The brief generator had been running for 6 weeks before the creative team discovered it. The secrecy (unintentional -- Diego and Mara just hadn't announced it) made it feel like a replacement rather than a tool.

C017 LOW SPECULATION 0.5x efficiency

Mara positions agents to the team as "first draft tools" -- they handle the blank page problem so the creative team starts from something rather than nothing. This framing was negotiated with Kai and Nina during the post-crisis redesign.

Why: The psychological difference between "AI does the creative work" and "AI handles the blank page so you can start further along" is enormous. The first threatens identity. The second saves time.

Failure mode: The original positioning was implicit -- Mara never explained what the agents were for. Kai and Nina assumed the worst: "They're trying to replace us." The explicit "first draft tool" framing resolved the identity threat.

C018 LOW SPECULATION 0.5x efficiency

Clients are not informed that agents are involved in any part of the process unless they ask directly. If asked, Mara's response is: "We use AI tools for research and first drafts. Everything you see has been created and reviewed by our team."

Why: The boutique agency premium depends on perceived human craft. Proactively disclosing AI involvement invites clients to question whether they're getting human-quality work, even when they are.

Failure mode: Hypothesized. No client has asked directly. But the team agreed on the disclosure language to prevent improvised responses that might say too much or too little. The policy is: honest if asked, quiet if not.

R3V Founding gold
C004 HIGH OBSERVED REPEATEDLY 7x efficiency

AI is allowed to read broadly across operational systems, but write actions with customer or operational consequences are constrained by approval settings, sensitive-tool controls, or explicit orchestration.

Why: The tool registry shows a meaningful split between low-sensitivity read tools and higher-sensitivity write tools requiring approval, especially in Switchboard and email/task operations.

Failure mode: Without this boundary, AI can create tasks, send communications, modify records, or close work items prematurely, causing customer confusion and internal cleanup work.

C014 MEDIUM INFERENCE 2x efficiency

Human operators remain the final authority for exceptions, irreversible changes, and policy edge cases even when AI handles most preparation work.

Why: Sensitive tools such as task creation, task completion, work-item transfers, and outbound Gmail sending are gated or approval-sensitive, indicating that AI prepares and recommends more broadly than it independently finalizes.

Failure mode: If edge-case judgment is handed to automation without oversight, the org risks operational mistakes that are technically valid but contextually wrong.

C018 LOW SPECULATION 0.5x efficiency

AI should be autonomous for retrieval, summarization, classification, and routine coordination when guardrails are present, but not for unconstrained cross-system decision making.

Why: Current platform health shows strong recent run reliability, while the architecture still preserves boundaries around sensitive writes and escalation. This suggests the org's autonomy model is conditional, not absolute.

Failure mode: If the org either over-restricts or over-trusts AI, it will either leave efficiency on the table or create brittle automation with poor human confidence.

C011 MEDIUM OBSERVED ONCE 3x efficiency

The founder reviews every client-facing report before delivery. No report goes out on auto-send, regardless of agent accuracy track record.

Why: With 12 clients and a small team, every relationship is critical. One bad report costs more to repair than the 15 minutes of daily review. We tried auto-sending for 3 "stable" clients for 2 weeks. In week 2, one report included a negative ROAS figure due to a conversion tracking gap. The client screenshot it and sent it to a competitor agency for a second opinion. We kept the client, but the competitor now had our reporting format and our client's data.

Failure mode: Auto-sent reports with errors create competitive vulnerability. Clients share bad reports with competitors. Agency loses information asymmetry.

C012 MEDIUM OBSERVED ONCE 3x efficiency

Budget recommendations require the media buyer's sign-off, not just the founder's.

Why: The founder is a strategist, not a tactician. He approved a budget increase recommendation from the audit agent without consulting the media buyer. The media buyer knew the account had just changed bidding strategies and needed 2 weeks of learning period data before increasing spend. The budget increase during the learning period inflated CPL by 55% for 10 days.

Failure mode: Strategic decision-maker approves tactical changes without practitioner input. Agent recommendation + founder authority skips the person who understands the operational context.

C016 LOW HUMAN DEFINED RULE 1.5x efficiency

Any agent action that touches a client's ad account (pause, enable, budget change, bid adjustment) requires two-human approval: the media buyer AND the founder.

Why: This has never been violated because it was a founding rule. The founder worked at an agency where an automated rule paused a client's best-performing campaign on a Friday evening. The campaign was off for the entire weekend -- $3,800 in lost leads for a personal injury lawyer whose weekend calls are highest-value. Two-human approval is slow. Sightline accepts the latency.

Failure mode: Single-approval automation acts on stale or narrow context. Weekend and evening actions go unreviewed. High-value campaigns pause during peak periods.

Sneeze It Founding gold
C017 MEDIUM HUMAN DEFINED RULE 3x efficiency

Strategy calls with clients are always human-only. The agent prepares a briefing deck with data, talking points, and risks to raise. The human runs the call. The agent processes meeting notes afterward.

Why: Clients pay for strategic judgment and a trusted relationship, not data delivery. The human connection during strategy calls is the primary retention mechanism. AI handles the preparation so the human shows up fully informed.

Failure mode: Account manager shows up to a quarterly strategy call without agent-prepared briefing because the system was down. Client asks about a performance trend the AM has not reviewed. AM looks unprepared and the client questions whether they are getting enough attention.

C022 HIGH HUMAN DEFINED RULE 5x efficiency

All external communications require founder approval. All pricing, contracts, and financial commitments are founder-only. All hiring and firing decisions are founder-only.

Why: These decisions have legal, financial, and relationship consequences that AI cannot fully assess.

Failure mode: Agent agrees to terms that violate margin floors. Agent sends outreach with incorrect positioning. Agent terminates a team member based on data without context.

C023 MEDIUM INFERENCE 2x efficiency

Emotional and relational domains remain human. AI agents cannot substitute for human connection, empathy, or presence.

Why: Coaching breakthroughs, trust building, and relationship repair require human judgment, emotional intelligence, and authentic presence.

Failure mode: AI-managed human employee feels "managed by a system." Engagement drops. Performance follows. The manager-employee relationship becomes transactional.

C024 MEDIUM OBSERVED REPEATEDLY 4x efficiency

AI writing must not sound like AI. No em dashes. No stacked adjectives. No filler openers. No hedging language. Read output aloud. If it sounds like LinkedIn or ChatGPT, rewrite.

Why: AI-sounding writing destroys trust. Clients, team members, and prospects detect AI patterns and disengage.

Failure mode: Agent sends coaching message with "Great job today!" opener and three em dashes. Human employee realizes the "manager" is a bot. Trust collapses.

Stackwise silver
C015 HIGH OBSERVED ONCE 5x efficiency

Pricing conversations are human-only. Support can link to the pricing page. Cannot discuss custom pricing, discounts, or annual deals.

Why: Support agent offered a 20% discount to retain a customer who mentioned evaluating competitors. Unauthorized. Set a precedent that took months to unwind.

Failure mode: Customer mentions competitor. Agent offers unauthorized discount. Finance discovers during invoicing. Customer expects lower rate permanently.

C016 MEDIUM OBSERVED REPEATEDLY 4x efficiency

Product roadmap questions get a standard redirect: "Let me connect you with our product team." Support does not speculate about features, timelines, or priorities.

Why: Agent told a customer a feature was "planned for Q2" based on a GitHub milestone. Milestone was aspirational. Customer planned around it. Feature slipped to Q3.

Failure mode: Agent references internal milestone as commitment. Customer plans around it. Feature slips. Trust in product team damaged.

C016 LOW HUMAN DEFINED RULE 1.5x efficiency

AI agents never interact directly with patients. All patient communication goes through staff. The appointment agent generates reminder text; Tanya's team sends it through the practice's existing patient communication system.

Why: Patients trust their doctor's office, not an AI system. A message "from your care team" carries weight. A message from an AI carries suspicion. This is healthcare, where trust is the foundation of the therapeutic relationship.

Failure mode: Patient receives a reminder that feels automated or impersonal. Calls the office to confirm it is real. Front desk spends 3 minutes per call reassuring patients. With 140 patients per week, that is potentially 7 hours of wasted staff time.

C017 LOW HUMAN DEFINED RULE 1.5x efficiency

Clinical decision-making remains exclusively human. Agents provide data, patterns, and prepared materials. They never recommend diagnoses, suggest treatments, or interpret lab results. The no-show prediction agent predicts behavior, not clinical outcomes.

Why: The education agent once included a line in a handout that read "If your A1C is above 7, your doctor may recommend increasing your metformin dose." Dr. Pham flagged it immediately. That sentence crosses from education into clinical recommendation, which an AI system is not licensed or qualified to make.

Failure mode: Agent-generated content includes a treatment suggestion. Patient reads it as physician advice. Adjusts their own medication based on the suggestion. Adverse event occurs. Practice faces malpractice exposure for content it generated but did not intend as clinical guidance.

C019 LOW INFERENCE 0.8x efficiency

Lina (CEO) personally handles all enterprise contract negotiations and renewals. No agent drafts pricing proposals or contract amendments.

Why: Enterprise pricing is relationship-based. The same product at the same scale might be priced differently based on strategic value, expansion potential, or competitive dynamics that no agent can assess.

Failure mode: No direct failure, but the sales demo prep agent once included a "suggested pricing" slide in a demo deck based on the prospect's company size. The suggested price was 30% below what Lina would have proposed based on the prospect's competitive situation (they were migrating from a more expensive competitor and had no alternatives). If the prospect had seen the slide, Synthwave would have left $2,500/month on the table.

C020 LOW INFERENCE 0.8x efficiency

All customer-facing product decisions (feature prioritization, deprecation, API changes) are made by the engineering team. Agents can surface data (usage patterns, support ticket themes, churn correlations) but never recommend product changes.

Why: An AI company whose product decisions are made by AI agents creates a recursive trust problem. Customers need to know that humans are making judgment calls about the tools they depend on.

Failure mode: Usage analytics agent recommended deprecating a feature used by only 3 customers. Those 3 customers were the top 3 by contract value ($14K/month combined). The "low usage" metric missed that these were power users who had built their entire workflow around the feature. Rohan caught it in review and vetoed the recommendation. If the deprecation had shipped, it would have been a $168K/year revenue loss.

C016 LOW INFERENCE 0.8x efficiency

Brand partnerships, influencer collaborations, and co-marketing decisions are human-only. Shade and Rhythm may identify opportunities (e.g., "Competitor X partnered with Influencer Y, consider a similar approach"), but the founder makes all relationship decisions.

Why: Brand partnerships define who you are by association. An agent cannot evaluate whether a potential partner's values, audience overlap, and public reputation align with Threadline's brand. One bad partnership can undo years of careful positioning.

Failure mode: Rhythm flagged a micro-influencer with 45K followers in the target demographic as a partnership candidate based on engagement metrics. The founder researched the influencer and found recent posts promoting a fast-fashion brand that Threadline had publicly criticized. The partnership would have been a brand contradiction. Data-driven recommendation, values-blind outcome.

C017 LOW INFERENCE 0.8x efficiency

Product design, fabric selection, and collection planning are exclusively human creative decisions. Agents may surface data (what's selling, what competitors are launching, what customers are requesting), but the creative direction of the brand is the founder's domain.

Why: DTC apparel brands live and die by creative vision. An agent optimizing for sales data would produce safe, derivative products. The founder's taste, instinct, and willingness to take creative risks are what differentiate Threadline from Amazon basics.

Failure mode: Forecast recommended discontinuing the Drift Cardigan (lowest velocity in the line, 2.3 units/week). The founder kept it because it's the piece that gets photographed, that stylists pull for editorial, and that signals "this brand has taste." It drives zero direct revenue and immeasurable indirect value. Data said kill it. Brand instinct said keep it. Instinct was right.

C016 MEDIUM SPECULATION 1x efficiency

Chen (compliance officer) has final authority on all investor-facing output. Sarah (managing partner) cannot override Chen on compliance matters. Chen may escalate to outside counsel if Sarah pushes back.

Why: Compliance authority must be independent of commercial pressure. A managing partner eager to close a deal may pressure compliance to approve borderline language. This dynamic has ended other firms.

Failure mode: Not yet formally violated. Sarah and Chen disagreed once on whether "projected returns" required a specific disclaimer. Sarah wanted to keep the language clean. Chen insisted on the disclaimer. Chen won. The rule exists to codify Chen's authority for future disagreements.

C017 LOW SPECULATION 0.5x efficiency

No agent has direct contact with any investor. All communications are sent from Sarah's or Priya's email address, after human review and approval. Agents draft; humans send.

Why: Investors in alternative assets expect personal relationships with the fund managers. Discovering that communications are AI-drafted (even with human review) would undermine the trust-based relationship that drives AUM growth.

Failure mode: Hypothesized. No investor has asked about AI involvement. Sarah's policy is that if asked directly, she will disclose "We use AI tools to help prepare materials, and every communication is reviewed by our team before it reaches you." But the disclosure is reactive, not proactive.

C018 LOW SPECULATION 0.5x efficiency

Accredited investor verification is never automated by any agent. Verification requires Priya to collect documentation (tax returns, W-2s, or third-party verification letters) and Chen to review and approve. The compliance agent may generate the verification request letter, but Priya and Chen execute the process.

Why: Accepting a non-accredited investor into a Reg D offering violates securities law and can invalidate the entire offering for all investors. This is the highest-stakes compliance function at the firm.

Failure mode: Hypothesized. The compliance agent once suggested "auto-verifying" investors who self-certified as accredited on the online intake form. Chen vetoed it immediately. Self-certification is not sufficient under Reg D Rule 506(c). The suggestion was a useful reminder that agents optimize for efficiency, not compliance.

Vetted Goods silver
C017 LOW INFERENCE 0.8x efficiency

Brand strategy decisions -- whether to launch a new brand, discontinue a brand, reposition a brand, or merge brand operations -- are exclusively human decisions. Agents provide data (brand performance, market trends, customer overlap analysis) but the strategic direction of each brand is the founder's domain.

Why: Brand strategy involves qualitative judgment about market positioning, emotional resonance, and long-term vision that data alone cannot capture. An agent analyzing Brand C's declining margins might recommend discontinuation. The founder knows that Brand C is a Trojan horse for wholesale relationships that feed Brand A.

Failure mode: Ledger's margin analysis flagged Brand C as "underperforming, recommend evaluation for discontinuation" three months in a row. A new VP of Operations, relying on agent recommendations, prepared a discontinuation proposal for the board. The founder rejected it because Brand C's wholesale relationships generated $340K in Brand A revenue that wasn't visible in Brand C's standalone P&L. The agent recommendation was data-correct and strategically catastrophic.

C018 LOW SPECULATION 0.5x efficiency

Hiring, firing, and organizational structure decisions are exclusively human. Agents may flag capacity constraints ("Harbor is handling 2.3x the ticket volume of 6 months ago") but must never recommend specific staffing actions.

Why: Staffing decisions involve budget constraints, team dynamics, growth projections, and cultural fit that agents cannot evaluate. A recommendation to "hire a second CS rep for Brand B" doesn't account for the fact that the founder is planning to merge Brand B and Brand C CS operations in Q3.

Failure mode: Harbor recommended hiring a dedicated Brand C CS rep based on ticket volume trends. The founder was about to negotiate a shared services agreement with a third-party CS provider that would handle all three brands for less than one additional hire. The recommendation wasn't wrong based on available data, but it lacked context about the founder's strategic plans. No damage, but it highlighted the boundary.