Synthwave Labs
gold L8 Autonomous Agent Teamscore operating rules
Internal agent operations and customer product operations must use separate API keys, separate databases, separate deployment environments, and separate Slack channels. Zero shared infrastructure.
Why: The performance review pipeline incident proved that shared infrastructure between internal ops and the product creates inevitable cross-contamination.
Failure mode: The eval pipeline monitoring agent used the same API endpoint as the customer-facing eval pipeline. During a routine internal performance review, the agent submitted a team evaluation document ("Q3 performance: Mira needs to improve documentation velocity") through the shared endpoint. It appeared in the staging eval dashboard. 3 customers on the staging beta saw it. One customer emailed: "Is this how your eval tool works? Running performance reviews?" Disclosure was required. 2 customers demanded security audits. One customer paused their contract for 6 weeks while the audit was conducted. $12,600 in deferred revenue.
Scope: All agents, all infrastructure
No internal agent has read access to customer data stores. Customer onboarding agent can write to customer configuration tables but cannot read customer eval results, model outputs, or usage details.
Why: If an internal agent leaks customer eval data, Synthwave Labs faces breach disclosure obligations under 14 customer contracts. The legal exposure is existential.
Failure mode: Usage analytics agent was granted read access to the product database to track feature adoption. It pulled a query that included customer eval results as a side effect of a poorly scoped SQL join. The data appeared in an internal usage report shared via Slack. The CTO (Rohan) caught it during report review. If it had been shared externally (e.g., in an investor update), it would have triggered breach notifications for 3 enterprise customers.
Scope: All agents, all data access
Every agent output must be tagged with its source namespace: [INTERNAL] or [PRODUCT]. Any untagged output is treated as a potential data leak and quarantined for review.
Why: When internal and product systems produce similar-looking outputs (eval results, performance metrics, dashboards), the only way to prevent confusion is explicit labeling.
Failure mode: The competitor analysis agent produced a benchmark report comparing Synthwave's eval accuracy against 3 competitors. The report format was identical to the product's customer-facing eval reports. A sales engineer included it in a demo deck, and a prospect asked: "Is this a real eval result or marketing material?" The ambiguity undermined the demo. The deal closed 3 weeks late, and the prospect negotiated a 15% discount citing "concerns about data integrity."
Scope: All agents, all outputs
Investor update agent has access only to aggregated, anonymized metrics. It cannot access individual customer names, contract values, or usage patterns.
Why: Investor updates are shared broadly (board members, advisors, potential investors). Customer-specific data in these documents violates NDAs with 60% of enterprise customers.
Failure mode: Early investor update included "Customer X (Fortune 500 fintech) processes 2.3M eval requests/month." The customer had a strict NDA prohibiting disclosure of their use of AI evaluation tools. An advisor forwarded the update to a contact at a competing AI company. The customer's security team found out and threatened contract termination. Rohan personally flew to their office for a 3-hour meeting to save the relationship. $8,400/month contract preserved, but trust damage took 6 months to recover.
Scope: Investor update agent
agent roles and authority
Customer onboarding agent configures new customer environments (API keys, rate limits, model access) but cannot modify existing customer configurations. Changes to live customers require a human engineer.
Why: Misconfigured rate limits or model access on a live customer can either expose them to overcharges or lock them out of their own eval pipeline.
Failure mode: Onboarding agent was given modify access to "fix" a new customer's rate limit. It updated the wrong customer's config (adjacent row in the customer table). The existing customer's rate limit dropped from 10K/min to 100/min. Their eval pipeline queued 9,900 requests and started timing out. Their monitoring system fired alerts to their SRE team at 2 AM. They filed a P1 incident against Synthwave. Resolution took 4 hours. The customer demanded a 1-month service credit ($4,200).
Scope: Customer onboarding agent
Eval pipeline monitoring agent watches system health metrics (latency, error rates, queue depth) but never triggers automated remediation. It alerts the on-call engineer via PagerDuty.
Why: Automated remediation in an AI eval pipeline can mask deeper issues. A restart that clears a queue also destroys in-flight eval results that customers are waiting on.
Failure mode: Before the no-remediation rule, the monitoring agent auto-restarted a stalled eval worker. The restart cleared 340 in-flight eval jobs. 12 customers saw "eval failed" errors for jobs they'd submitted. 4 customers re-submitted, creating duplicate results. 2 customers used the duplicate results in production decisions before realizing the duplication. The engineering team spent 3 days deduplicating and verifying results across all affected customers.
Scope: Eval pipeline monitoring agent
Incident response agent drafts customer-facing status page updates and direct communications but all messaging requires approval from both the CTO (Rohan) and the CEO (Lina) before publication.
Why: Incident communications set legal and contractual precedents. Saying "data was not compromised" when it was creates liability. Saying "data may have been compromised" when it wasn't creates unnecessary panic.
Failure mode: Agent drafted a status update that said "no customer data was affected" during the performance review pipeline incident. In reality, internal performance data was exposed to 3 customers -- not "customer data" in the traditional sense, but the customers disagreed with that characterization. If the statement had been published, it would have been contradicted by the customers' own screenshots. Lina rewrote the update to acknowledge that "internal operational data was briefly visible in a shared staging environment."
Scope: Incident response agent
Sales demo prep agent builds demo environments with synthetic data only. Never use real customer data, even anonymized, in sales demonstrations.
Why: "Anonymized" data has been de-anonymized by prospects in at least one competitor's sales demo. Enterprise security teams look for patterns in demo data to identify existing customers.
Failure mode: Demo agent used anonymized customer eval data in a demo. A prospect's security engineer recognized the model architecture patterns and said: "This looks like [specific customer]'s eval setup." The salesperson denied it, but the prospect told the customer. The customer called Rohan within 2 hours. The relationship survived only because the data was genuinely anonymized and the customer's security team verified it. But the 3-hour investigation cost was real.
Scope: Sales demo prep agent
Competitor analysis agent scrapes only public information (pricing pages, documentation, blog posts, changelog). It never accesses competitor products using credentials, free trials, or any method that could be construed as unauthorized access.
Why: Synthwave's customers include companies that also sell AI tools. Unauthorized access to a competitor could become a breach of trust with mutual customers.
Failure mode: No direct failure, but the competitor analysis agent once signed up for a competitor's free trial using a disposable email. The competitor's sales team traced the signup to Synthwave's IP range and called Rohan: "Your team is reverse-engineering our product." Rohan hadn't known about the signup. The call was awkward but the competitor accepted the explanation. Free trial access was permanently revoked from the agent's capabilities.
Scope: Competitor analysis agent
coordination patterns
Eval pipeline monitoring feeds into incident response. When monitoring detects anomalies exceeding thresholds (>5% error rate, >2x latency, queue depth >10K), incident response agent automatically drafts a status update and customer communication. No external sending without dual approval.
Why: Speed of communication during incidents is critical. Customers who learn about outages from their own monitoring before Synthwave communicates lose trust instantly.
Failure mode: A 45-minute outage in the eval pipeline was detected by 8 customers before Synthwave posted a status update. 3 customers tweeted about it. The status page update went live at minute 38, after customers had already started a thread in the #synthwave-users Slack community. One customer posted: "We're seeing errors. Synthwave's status page still says all green. What's going on?"
Scope: Eval pipeline monitoring agent to incident response agent
Usage analytics feeds into customer onboarding. When a new customer's first 72 hours show low API call volume (<10% of allocated rate limit), onboarding agent flags it as a potential integration issue and drafts a proactive check-in email.
Why: Customers who don't integrate successfully in the first week have a 70% churn rate at month 3.
Failure mode: 4 new customers in a single quarter failed to integrate within the first week. None were contacted until their monthly check-in. By then, 3 of 4 had decided the product was "too complex" and were evaluating alternatives. 2 churned. $8,400/month in lost revenue that proactive outreach could have prevented.
Scope: Usage analytics agent to customer onboarding agent
Docs maintenance agent monitors the changelog (from release notes) and customer support tickets (from incident response). Any feature shipped without docs is flagged as a blocker in Linear. Any support ticket caused by missing docs is tagged as a docs failure.
Why: In developer tooling, undocumented features don't exist. Customers who can't find documentation assume the feature doesn't work.
Failure mode: A new eval metric type was shipped in v3.2 without documentation. 6 customers tried to use it, got confused by the API response format, and filed support tickets. The engineering team spent 8 hours answering the same question 6 times. The docs would have taken 30 minutes to write.
Scope: Docs maintenance agent, release pipeline
operational heuristics
Incident severity is determined by customer impact radius, not system impact. A database hiccup affecting 1 internal dashboard is P3. A 200ms latency increase affecting all customer eval jobs is P1.
Why: Internal systems failing is inconvenient. Customer-facing systems degrading is revenue-threatening.
Failure mode: Monitoring agent classified a latency spike as P3 because the internal system health dashboard showed green (it only measured error rates, not latency). 85 customers experienced 3x slower eval results for 2 hours. 15 filed support tickets. 3 enterprise customers included the incident in their quarterly vendor review. Two of those reviews resulted in "conditional renewal" status.
Scope: Eval pipeline monitoring agent, incident response agent
Investor updates are published monthly, on the 5th, regardless of whether the numbers are good. Skipping a month signals that something is wrong.
Why: Investors pattern-match on communication cadence. A missed update generates more anxiety than a bad update.
Failure mode: Lina skipped the March investor update because MRR had dipped 4% (3 customers delayed renewals). Two board members texted within a week asking "everything okay?" The April update included the March data and the dip explanation, but the trust damage from the missed communication took the entire board meeting to repair.
Scope: Investor update agent
Sales demo prep agent refreshes demo environments weekly. Stale demo data that references outdated features or deprecated APIs undermines credibility during live demos.
Why: Enterprise prospects evaluate attention to detail. A demo that shows a deprecated feature signals that the product moves faster than the company can manage.
Failure mode: A demo environment showed an eval metric type that had been deprecated 2 months earlier. The prospect asked about it. The sales engineer said "oh, that's been removed." The prospect replied: "So your demo doesn't reflect your actual product? What else is out of date?" The deal took an additional 3 weeks to close and included a requirement for a "current state" audit before signing.
Scope: Sales demo prep agent
failure patterns
Any incident involving customer data exposure (real or perceived) triggers a mandatory 72-hour response protocol: (1) containment, (2) investigation, (3) customer disclosure, (4) post-mortem, (5) control implementation. No shortcuts.
Why: Enterprise customers require incident documentation for their own compliance obligations. Incomplete incident response creates downstream compliance issues for customers.
Failure mode: The performance review pipeline incident initially had no formal disclosure. Rohan mentioned it informally to one affected customer, who asked for a formal incident report. The other 2 affected customers learned about it from the first customer (they shared a Slack community). Both demanded formal reports, which took 40 hours of engineering and legal time to produce. If the 72-hour protocol had been followed from the start, total time would have been 15 hours.
Scope: All agents, all customer data operations
When an internal agent error mimics a product failure pattern, the root cause investigation must explicitly differentiate between "agent did the wrong thing" and "the product has the same bug."
Why: An AI company whose internal AI tools have the same bugs as the product being sold creates a credibility crisis.
Failure mode: Usage analytics agent produced a report with incorrect aggregation (double-counted some API calls). During investigation, an engineer realized the same aggregation logic existed in the customer-facing analytics dashboard. The internal agent bug revealed a product bug affecting 85 customers. The product bug had been shipping incorrect usage reports for 6 weeks. 23 customers had been overbilled by a combined $3,200. Refunds and apology emails took a full week.
Scope: All agents, product development
Post-incident, every affected agent is audited for similar access patterns that could cause the same failure class. Fix the pattern, not just the instance.
Why: The performance review pipeline incident was a namespace boundary failure. Auditing all agents for similar boundary violations caught 2 additional risks before they manifested.
Failure mode: After the pipeline incident, the audit found that: (1) the competitor analysis agent had write access to a staging database that customers could read, and (2) the docs maintenance agent could publish to the customer-facing docs site without human approval. Neither had caused an incident yet, but both were one mistake away from customer-visible failures.
Scope: All agents
human ai boundary conditions
Lina (CEO) personally handles all enterprise contract negotiations and renewals. No agent drafts pricing proposals or contract amendments.
Why: Enterprise pricing is relationship-based. The same product at the same scale might be priced differently based on strategic value, expansion potential, or competitive dynamics that no agent can assess.
Failure mode: No direct failure, but the sales demo prep agent once included a "suggested pricing" slide in a demo deck based on the prospect's company size. The suggested price was 30% below what Lina would have proposed based on the prospect's competitive situation (they were migrating from a more expensive competitor and had no alternatives). If the prospect had seen the slide, Synthwave would have left $2,500/month on the table.
Scope: Sales demo prep agent, investor update agent
All customer-facing product decisions (feature prioritization, deprecation, API changes) are made by the engineering team. Agents can surface data (usage patterns, support ticket themes, churn correlations) but never recommend product changes.
Why: An AI company whose product decisions are made by AI agents creates a recursive trust problem. Customers need to know that humans are making judgment calls about the tools they depend on.
Failure mode: Usage analytics agent recommended deprecating a feature used by only 3 customers. Those 3 customers were the top 3 by contract value ($14K/month combined). The "low usage" metric missed that these were power users who had built their entire workflow around the feature. Rohan caught it in review and vetoed the recommendation. If the deprecation had shipped, it would have been a $168K/year revenue loss.
Scope: All agents, product decisions ---
Compare with Another OOS
Search for an organization to compare against.