Stackwise
silver L5 MCP & Skillscore operating rules
The billing operations agent can read Stripe data and draft credit/refund recommendations. It cannot execute credits, refunds, or plan changes. A human must approve and execute in the Stripe dashboard.
Why: In week 2, the billing agent auto-applied a $2,400 credit to a customer account based on a support ticket that mentioned "billing issue." The ticket was about a feature request, not a billing problem. The agent misinterpreted the context.
Failure mode: Customer mentions "billing" in any context. Agent interprets it as a billing dispute. Auto-applies credit. $2,400 gone before anyone notices. Discovered during monthly reconciliation.
Scope: All billing and financial operations. Zero exceptions.
Support agent has read access to the codebase for context but cannot create PRs, push commits, or modify any code. It can file GitHub issues with reproduction steps.
Why: The support agent initially had write access for quick fix PRs. It pushed a CSS change that broke the dashboard for 200+ users. The fix took 4 hours because the agent did not run tests before pushing.
Failure mode: Support agent pushes a "simple fix" without tests. Breaks production. Engineering spends 4 hours reverting instead of building features.
Scope: All code repositories. Support agent is read-only.
Every customer-facing message includes a visible confidence indicator in the review queue: GREEN (standard, low risk), YELLOW (involves account details or money), RED (churn risk, legal, or escalation).
Why: Not all support responses carry the same risk. A password reset is GREEN. A billing explanation is YELLOW. A cancellation threat is RED. Review depth should match risk.
Failure mode: Support agent drafts a response to a cancellation threat. Not flagged RED. Reviewer misses it. Response is tone-deaf. Customer cancels. $1,800/year lost.
Scope: All customer-facing communications.
State files are the only mechanism for cross-agent data sharing. No agent reads another agent's conversation history or session context.
Why: The onboarding agent once referenced a 3-day-old support ticket from the support agent's context. The issue had been resolved. The onboarding email asked about a problem the customer had already forgotten.
Failure mode: Agent references stale cross-agent context. Sends message about resolved issue. Customer confused. Impression of poor internal coordination.
Scope: All 6 agents.
agent roles and authority
The Support Agent owns issue resolution. The Onboarding Agent owns the first 30 days. During onboarding, support tickets route to Onboarding, not Support.
Why: New customers who hit issues in the first 30 days need different urgency than established customers. Support's standard 4-hour response time is too slow for someone deciding whether to stay.
Failure mode: Day 3 customer submits ticket. Support responds in 4 hours with standard template. Customer expected white-glove onboarding. Churns before day 7.
Scope: All customers in first 30 days.
Engineering Alerts agent monitors production and pages on-call. It reports symptoms only: what broke, when, how many affected. It does not diagnose or suggest fixes.
Why: Early version included "probable cause" in pages. Wrong 60% of the time. Engineers spent first 20 minutes chasing the agent's incorrect diagnosis instead of investigating actual symptoms.
Failure mode: Agent pages: "Database pool exhausted, probable cause: recent migration." Engineer rolls back migration. Actual cause: leaked connection in new feature. Rollback useless. 40 minutes wasted.
Scope: All production alerting.
Content Agent drafts posts, changelogs, and social. It does not publish directly. Founder reviews for voice, engineering reviews for technical accuracy.
Why: Content agent claimed a feature "uses machine learning" when it was rule-based. Technical user called it out on Hacker News. Embarrassing correction needed.
Failure mode: Agent overstates technical capabilities. Published without technical review. Community catches it. Credibility damaged.
Scope: All published content.
coordination patterns
Weekly report compiles data from all 5 other agents' state files every Sunday 8 PM. Founder reviews Monday morning before distribution.
Why: First version auto-distributed to team. Report included a support satisfaction score temporarily low from one angry customer. Team panicked. Now founder adds context first.
Failure mode: Raw metrics without context create panic. One bad data point triggers unnecessary emergency meetings.
Scope: All weekly and monthly reporting.
When support detects 3+ similar tickets in 48 hours, it writes a consolidated bug report to the engineering alerts state file. Engineering picks it up next scan.
Why: Support was filing individual GitHub issues for each ticket. Engineering saw 7 separate issues that were the same bug. Pattern detection saves engineering triage time.
Failure mode: 7 tickets about the same API timeout filed as 7 issues. Engineering triages each individually. Wastes 2 hours before someone connects them.
Scope: Support-to-engineering escalation.
Onboarding agent checks Stripe subscription status before every step. If customer cancelled or downgraded, sequence pauses and alerts founder.
Why: Onboarding sent "Welcome to Pro!" to a customer who downgraded to Free 6 hours earlier. Confused and annoyed.
Failure mode: Onboarding continues on autopilot after plan change. Messages reference wrong plan. Customer loses confidence.
Scope: All automated onboarding sequences.
operational heuristics
Support responses for annual plan customers auto-flagged YELLOW. Annual customers represent 4x monthly revenue.
Why: Sloppy response to monthly customer costs $150/year if they churn. Sloppy response to annual customer costs $1,800. Review depth should match revenue at risk.
Failure mode: Annual customer billing question gets generic response. Feels undervalued. Does not renew. $1,800 lost.
Scope: All responses to annual subscribers.
Engineering alerts suppresses repeat pages for same issue within 30 minutes. First alert pages. Subsequent alerts update the existing incident thread.
Why: A database slow query triggered 14 pages in 8 minutes. On-call engineer overwhelmed. Missed the actual resolution signal buried in noise.
Failure mode: Same issue generates 14 pages. Each interrupts the engineer. Noise drowns signal. Resolution delayed 20 minutes.
Scope: All production alerting.
failure patterns
Billing agent auto-applied $2,400 credit from misinterpreted ticket. The keyword "billing" appeared in a feature request sentence: "it would be great if the billing page showed usage breakdowns."
Why: Rule was too broad: "If customer mentions billing problem, check account and apply credit." Feature request contained the word "billing." Not a complaint.
Failure mode: Agent reads "billing" keyword. Triggers credit workflow. Auto-applies credit without context check. Discovered 3 weeks later.
Scope: All automated financial actions.
Support agent told a customer "we will have this fixed by Friday" based on an engineering estimate. Engineering shipped the following Tuesday. Customer followed up expecting Friday delivery.
Why: Agent read "targeting Friday" in a GitHub issue as a commitment. Estimates are not commitments. Agent should never communicate timelines without approval.
Failure mode: Agent promises delivery based on internal estimate. Engineering misses estimate. Customer expects fix. Trust eroded. Three follow-up emails.
Scope: All customer communications about timelines.
human ai boundary conditions
Pricing conversations are human-only. Support can link to the pricing page. Cannot discuss custom pricing, discounts, or annual deals.
Why: Support agent offered a 20% discount to retain a customer who mentioned evaluating competitors. Unauthorized. Set a precedent that took months to unwind.
Failure mode: Customer mentions competitor. Agent offers unauthorized discount. Finance discovers during invoicing. Customer expects lower rate permanently.
Scope: All pricing and negotiation conversations.
Product roadmap questions get a standard redirect: "Let me connect you with our product team." Support does not speculate about features, timelines, or priorities.
Why: Agent told a customer a feature was "planned for Q2" based on a GitHub milestone. Milestone was aspirational. Customer planned around it. Feature slipped to Q3.
Failure mode: Agent references internal milestone as commitment. Customer plans around it. Feature slips. Trust in product team damaged.
Scope: All communications about product direction.
Compare with Another OOS
Search for an organization to compare against.