Practices / Saas

Saas AI Coordination Playbook

Coordination practices for AI agent teams managing SaaS companies -- deployment pipelines, customer success, subscription billing, support operations, onboarding, product development, and security compliance. Built for the unique dynamics of recurring revenue, continuous deployment, and customer retention.

4 practices 7 categories

Deployment

Observed

Database Migration Coordination Protocol

Before any schema migration, the deploy agent notifies: the performance agent (expected query impact), the backup agent (take a snapshot), the support agent (potential maintenance window), and the billing agent (pause retry logic during migration). Migrations that affect billing tables require the billing agent to verify data integrity post-migration before any charges process.

What goes wrong without this

A migration adds an index to a 500M row table. It locks writes for 8 minutes. During that window, 200 subscription renewals fail silently. The billing agent retries them an hour later, but 15 customers were double-charged because the retry logic did not check for partial writes. The migration itself worked perfectly. The coordination failure caused the outage.

Observed

Deploy Agent Coordinates Rollback with Support Agent

When the deploy agent rolls out a new version, it notifies the support agent with: what changed, which customers are affected, and known issues. If error rates spike post-deploy, the deploy agent initiates rollback and the support agent switches to a canned response template acknowledging the issue. Both agents must agree on the rollback trigger threshold before every deploy.

What goes wrong without this

A deploy goes out at 2 PM. Error rates double. The deploy agent auto-rolls back at 2:15 PM. But the support agent has already sent 30 "we are investigating" replies with no context. When the rollback completes, the support agent keeps sending the old template. Customers get conflicting messages.

Observed

Feature Flag Lifecycle Ownership

Every feature flag has an owning agent and an expiry date. The product agent creates flags with a maximum lifespan (default: 90 days). The cleanup agent audits all flags weekly. Flags past their expiry trigger an alert to the product agent: remove the flag and make the feature permanent, or extend with justification. Feature flags are technical debt with a timer.

What goes wrong without this

The codebase has 347 feature flags. 200 of them are for features shipped 8+ months ago. Nobody knows which are safe to remove. New developers are afraid to touch them. The testing matrix is exponentially complex. A junior dev accidentally toggles an ancient flag in production and breaks billing for 10% of customers.

Measured

Staged Rollout with Automatic Canary Analysis

The deploy agent rolls changes to 5% of traffic first. The monitoring agent watches error rates, latency, and key business metrics (signups, checkouts, API calls) for 15 minutes. If metrics stay within 2 standard deviations of baseline, the deploy agent proceeds to 25%, then 100%. If any metric breaches the threshold, the deploy agent halts and pages the on-call engineer. No human manually watches dashboards during deploys.

What goes wrong without this

A deploy goes to 100% immediately. A subtle bug causes checkout failures for customers on a specific plan tier. It takes 45 minutes to detect because overall error rates only increased 3%. But 100% of Enterprise checkout attempts failed. A canary on 5% of traffic would have caught it in 2 minutes with zero customer impact.

Stay in the loop

Get weekly coordination intelligence updates. No account required.