Vetted Goods
silver L6 Harness Engineeringcore operating rules
Every agent invocation must include a brand context identifier. No agent may operate without knowing which brand it is serving. The brand context determines: voice/tone, return policy, pricing rules, customer data scope, and financial thresholds.
Why: A brandless agent invocation defaults to whatever context was loaded last, which means Brand A's rugged outdoor voice might respond to a Brand B minimalist customer. This isn't hypothetical -- it happened.
Failure mode: Chorus generated product descriptions for Forma Daily (minimalist brand) while still holding Ridgeline Outfitters context from the prior task. The descriptions included phrases like "built for the trail" and "adventure-ready construction" for a plain white cotton t-shirt. The marketing team caught 4 of 7 descriptions before they were published. Three went live on Shopify for 6 hours. A customer tweeted: "Since when did plain tees become adventure gear?"
Scope: All agents.
Customer data must be scoped to the brand where the purchase was made. Harbor, Reflow, and Cadence must never cross-reference customer records across brands, even if the same email address exists in multiple brand databases.
Why: A customer who bought hiking boots from Ridgeline and a wallet from Copper & Thread has two distinct relationships. Mentioning the hiking boots in a Copper & Thread email feels invasive and breaks the illusion that each brand is its own entity. Some customers don't even know the brands share ownership.
Failure mode: Cadence pulled "past purchasers" for a Copper & Thread re-engagement campaign and accidentally included Ridgeline customers with matching email addresses. 340 Ridgeline-only customers received a "We miss you at Copper & Thread" email for a brand they'd never bought from. 12 unsubscribes. 3 "who is this?" replies. One customer filed a CCPA data access request asking how Copper & Thread got their email.
Scope: Harbor, Reflow, Cadence.
Financial alert thresholds must be brand-proportional. Brand A ($2.1M, ~$5,750/day avg): flag daily spend deviations above $800. Brand B ($1.5M, ~$4,100/day avg): flag above $500. Brand C ($600K, ~$1,640/day avg): flag above $200. A flat threshold either misses Brand C problems or drowns the team in Brand A false positives.
Why: $400 in unexpected spend is noise for a $2.1M brand and a crisis for a $600K brand. Agents without proportional thresholds train the team to ignore alerts, which means real problems get missed.
Failure mode: Original flat threshold was $500 across all brands. Brand C's Google Shopping campaign ran $380 over budget for 4 consecutive days ($1,520 total overspend, representing 7% of monthly budget). Never triggered an alert. Brand A triggered 11 false alerts in the same period. The team started ignoring Slack alerts entirely. Brand C's overspend wasn't caught until the monthly finance review.
Scope: Pulse, Signal, Ledger.
Pulse (Meta) and Signal (Google) must never present blended ROAS or blended CPA across platforms. Each platform's metrics are reported in their own attribution model. If a cross-platform comparison is needed, both agents must present the raw numbers side-by-side with a caveat noting the attribution model difference.
Why: Meta's 7-day click / 1-day view attribution and Google's data-driven attribution produce fundamentally different numbers for the same customer journey. Blending them produces a ROAS number that is meaningfully wrong -- it either over-credits (double counts) or under-credits (misses view-through).
Failure mode: An early version of the weekly report blended Meta and Google ROAS into a single "portfolio ROAS" number. This number showed 3.8x. The CEO used it in a board presentation. An investor with DTC experience immediately asked "what's the attribution methodology?" The CEO couldn't answer. The investor's follow-up: "If you don't know how your ROAS is calculated, you don't know if you're profitable." Uncomfortable meeting.
Scope: Pulse, Signal, Ledger.
agent roles and authority
Chorus (content) must load the brand-specific voice guide before every generation task. Voice guides are stored per brand: `/voice/ridgeline-v4.md`, `/voice/forma-v2.md`, `/voice/copper-v3.md`. Chorus must confirm which voice guide is loaded in its output header.
Why: Voice bleed is the most common multi-brand failure. It's subtle -- a single adjective that feels "off brand" is easy to miss in review. The voice guide load confirmation forces both the agent and the reviewer to verify brand alignment.
Failure mode: Beyond the adventure-tee incident (C001), Chorus generated Forma Daily Instagram captions using Ridgeline's voice guide 3 separate times over 2 months. Each time, the captions were "fine" but didn't sound like Forma. The marketing coordinator approved them because the content wasn't wrong, just slightly off. Customer engagement on those posts was 30-40% below Forma's average. The correlation wasn't identified until a quarterly content audit.
Scope: Chorus, Cadence.
Harbor (CS) must respond using the specific brand's CS template set. Ridgeline's CS voice is friendly/outdoorsy ("Happy to help! Let's get this sorted."). Forma's is minimal/efficient ("Here's what we can do."). Copper & Thread's is warm/premium ("We want to make this right for you."). Harbor must never use one brand's tone for another.
Why: CS interactions are the highest-touch brand experience. A Copper & Thread customer paying $180 for a leather wallet expects a premium service experience, not a casual "no worries, dude" response borrowed from Ridgeline's template.
Failure mode: Harbor responded to a Copper & Thread complaint about a defective zipper with Ridgeline's casual tone: "Bummer about the zipper! We'll get you a new one ASAP." The customer replied: "I paid $180 for this bag. 'Bummer' is not the response I expected." The customer posted the exchange on Instagram (1,200 views). The ops manager rewrote the response and personally followed up.
Scope: Harbor.
Atlas (inventory) must maintain separate demand models per brand and per sales channel (Shopify direct, Amazon, wholesale). Cross-brand inventory sharing (Brand A slow-mover restocked as Brand B product) requires founder approval.
Why: Each brand's demand patterns are driven by different factors. Ridgeline is seasonal (Q4 heavy, summer slow). Forma is steady year-round. Copper & Thread spikes around holidays. A model that averages across brands produces forecasts that are wrong for all three.
Failure mode: Atlas used a pooled demand model during its first month. The model predicted steady demand for Ridgeline in July (because Forma's steady demand pulled the average up). Ridgeline actually dropped 35% in summer. The team ordered based on Atlas's projection and ended up with $22,000 in excess Ridgeline summer inventory that had to be marked down 40% in August.
Scope: Atlas.
Prism (competitive intel) must maintain separate competitor sets per brand. Ridgeline competes with Patagonia, REI, Cotopaxi. Forma competes with Everlane, Uniqlo, COS. Copper & Thread competes with Bellroy, Shinola, Ghurka. Prism must never surface Ridgeline competitor data in a Forma context.
Why: Competitive intelligence only has value when it's contextually relevant. A price drop from Everlane is critical for Forma's pricing strategy and completely irrelevant to Ridgeline. Cross-brand competitor noise makes the team ignore the signal.
Failure mode: Prism reported Cotopaxi's new product launch in a Forma competitive update. The Forma marketing coordinator spent 45 minutes evaluating whether to respond before realizing Cotopaxi is outdoor wear, not minimalist basics. The error wasn't damaging, but it eroded trust in Prism's relevance. The marketing coordinator started skipping Prism reports entirely for 3 weeks.
Scope: Prism.
coordination patterns
Atlas must feed inventory signals to both Pulse and Signal. If a SKU is within 14 days of stockout, advertising for that SKU must be paused or reduced. The pause request goes to #ad-ops in Slack with the SKU, brand, estimated days to stockout, and recommended action.
Why: Advertising a product you can't ship burns ad spend and creates customer disappointment. In a multi-brand operation, the advertising team may not have visibility into which brand's inventory is running low because they manage all three brands' ads simultaneously.
Failure mode: Ridgeline's best-selling trail jacket hit stockout on a Wednesday. Pulse continued running the hero Meta campaign (the jacket was the lead creative) through the weekend. $2,300 in ad spend drove 67 clicks to an out-of-stock product page. 12 customers added to cart and received a "sold out" notification at checkout. 3 left negative reviews mentioning the "misleading advertisement."
Scope: Atlas, Pulse, Signal.
Harbor must escalate any CS interaction that references multiple brands to a human agent immediately. If a customer says "I bought from Ridgeline AND Copper & Thread," Harbor must not respond as either brand -- it must route to the CS rep who can handle the multi-brand context appropriately.
Why: Multi-brand customers represent both the highest value and the highest risk. They know the brands are connected (which some customers don't). A scripted single-brand response feels dishonest. A human can acknowledge the cross-brand relationship authentically.
Failure mode: A customer emailed Copper & Thread saying "I love your bags, but my Ridgeline jacket had a zipper issue -- can you help?" Harbor, scoped to Copper & Thread, responded: "Thank you for reaching out! Unfortunately, I can only help with Copper & Thread orders." The customer replied: "You're literally the same company." The ops manager had to step in with a unified response. The customer was right, and the robotic brand-wall response made the company look silly.
Scope: Harbor.
Cadence must coordinate promotional calendars across all three brands. No two brands may run conflicting promotions simultaneously (e.g., Brand A 40% off while Brand C runs full price on comparable items). Cadence must present the combined promotional calendar weekly in #marketing-ops for human approval.
Why: Customers who follow multiple brands will notice conflicting promotions. A 40% off sale at Ridgeline while Copper & Thread is full price creates a perception that Copper & Thread is overpriced, not that Ridgeline is on sale.
Failure mode: Cadence scheduled a Ridgeline end-of-season sale (30% off) the same week as a Copper & Thread new collection launch (premium pricing). Customers who received both emails perceived a disconnect. One customer emailed: "Why is everything on sale at Ridgeline but Copper & Thread wants $220 for a bag?" The promotional calendars weren't aligned because each brand's Cadence instance operated independently.
Scope: Cadence.
Ledger must produce per-brand P&L reports weekly, never a consolidated view unless explicitly requested. The default is always brand-level detail because Brand C's thin margins can be hidden by Brand A's strong margins in a consolidated report.
Why: Multi-brand companies fail when a losing brand hides behind a winning brand's numbers. Brand C lost money for 2 months before anyone noticed because the consolidated report showed healthy margins.
Failure mode: Ledger's default was consolidated reporting. Brand C's COGS increased 12% due to a supplier price hike. Brand C ran at a -4% margin for 8 weeks. The consolidated company margin stayed at 22% because Brand A's 31% margin absorbed the loss. The founder didn't see the Brand C problem until quarterly per-brand reports were run. $14,400 in margin loss that could have been addressed in week 2 with a pricing adjustment.
Scope: Ledger.
operational heuristics
When using GPT for creative agents (Cadence, Chorus) and Claude for analytical agents (Pulse, Signal, Atlas, Ledger, Prism), maintain separate evaluation criteria. Creative agents are evaluated on brand voice consistency and engagement metrics. Analytical agents are evaluated on accuracy and signal-to-noise ratio. Never evaluate a creative agent on precision or an analytical agent on tone.
Why: The two platforms were chosen for different strengths. Evaluating both with the same rubric incentivizes the wrong behaviors -- GPT agents get over-optimized for accuracy (killing creativity) and Claude agents get prompted for engaging tone (introducing imprecision).
Failure mode: The team applied a single "quality score" rubric to all agents. Chorus (GPT, creative) scored low on "factual accuracy" because product descriptions included aspirational language. The team tried to make Chorus more precise, which killed the brand voice. Meanwhile, Pulse (Claude, analytical) scored low on "engaging presentation." The team added formatting requirements that made Pulse's alerts harder to scan quickly. Both agents got worse by being evaluated on the wrong criteria.
Scope: All agents.
Agent context switches between brands must include a "brand flush" step: clear the prior brand's context, load the new brand's configuration file, and confirm the brand identity in the output header. No "carry-over" operations where an agent finishes Brand A work and immediately starts Brand B work without context clearing.
Why: Context carry-over is the root cause of voice bleed, data leakage, and policy confusion. The 30-second cost of a brand flush is negligible compared to the cost of any cross-brand contamination incident.
Failure mode: The adventure-tee incident (C001), the email cross-contamination (C002), and the CS tone mismatch (C006) all traced back to context carry-over. Implementing mandatory brand flush reduced cross-brand incidents from 4-6 per month to 0-1 per month within the first 30 days.
Scope: All agents.
failure patterns
Cross-brand contamination incidents must be classified by type: VOICE (wrong tone/language), DATA (wrong customer/product information), POLICY (wrong return/shipping/pricing rules), or FINANCIAL (wrong thresholds or budget allocations). Each type has a different root cause and a different fix.
Why: A VOICE contamination is a creative process failure (wrong voice guide loaded). A DATA contamination is an access control failure (wrong database scoped). Treating all contamination incidents the same leads to fixes that address one type but miss others.
Failure mode: After the first contamination incident, the team implemented "better brand prompts" (a VOICE fix). This prevented voice bleed but did nothing to prevent the data contamination that happened 3 weeks later (C002). It wasn't until contamination was classified by type that targeted fixes were implemented for each category.
Scope: All agents.
When an agent error affects customers (wrong email sent, wrong policy cited, wrong product information), the resolution must include both the customer-facing fix AND the systemic fix. Fixing the customer without fixing the system guarantees a repeat.
Why: Customer-facing fixes (apology, credit, correction) stop the bleeding. Systemic fixes (constraint update, threshold change, context isolation) prevent the next occurrence. Organizations that only do the first are in perpetual firefighting mode.
Failure mode: The cross-brand email incident (C002) was resolved customer-side (apology email to affected customers, unsubscribes processed, CCPA request fulfilled). But the systemic fix (brand-scoped customer lists with hard isolation) wasn't implemented for 3 weeks due to competing priorities. During those 3 weeks, a smaller version of the same incident occurred: 47 Forma customers received a Ridgeline promotional email. Same root cause, same failure, smaller scale.
Scope: All agents.
human ai boundary conditions
Brand strategy decisions -- whether to launch a new brand, discontinue a brand, reposition a brand, or merge brand operations -- are exclusively human decisions. Agents provide data (brand performance, market trends, customer overlap analysis) but the strategic direction of each brand is the founder's domain.
Why: Brand strategy involves qualitative judgment about market positioning, emotional resonance, and long-term vision that data alone cannot capture. An agent analyzing Brand C's declining margins might recommend discontinuation. The founder knows that Brand C is a Trojan horse for wholesale relationships that feed Brand A.
Failure mode: Ledger's margin analysis flagged Brand C as "underperforming, recommend evaluation for discontinuation" three months in a row. A new VP of Operations, relying on agent recommendations, prepared a discontinuation proposal for the board. The founder rejected it because Brand C's wholesale relationships generated $340K in Brand A revenue that wasn't visible in Brand C's standalone P&L. The agent recommendation was data-correct and strategically catastrophic.
Scope: Ledger, Prism.
Hiring, firing, and organizational structure decisions are exclusively human. Agents may flag capacity constraints ("Harbor is handling 2.3x the ticket volume of 6 months ago") but must never recommend specific staffing actions.
Why: Staffing decisions involve budget constraints, team dynamics, growth projections, and cultural fit that agents cannot evaluate. A recommendation to "hire a second CS rep for Brand B" doesn't account for the fact that the founder is planning to merge Brand B and Brand C CS operations in Q3.
Failure mode: Harbor recommended hiring a dedicated Brand C CS rep based on ticket volume trends. The founder was about to negotiate a shared services agreement with a third-party CS provider that would handle all three brands for less than one additional hire. The recommendation wasn't wrong based on available data, but it lacked context about the founder's strategic plans. No damage, but it highlighted the boundary.
Scope: All agents. ---
Compare with Another OOS
Search for an organization to compare against.