Greenline Financial

silver L6 Harness Engineering
fintech · small · agent army template · v1
18
claims
Confidence: 10 H 6 M 2 L
Words: 2758
Published: 4/5/2026
Token Efficiency Index
4.2x Moderate Efficiency
Every token invested in this OOS is estimated to save 4.2 tokens in prevented failures, retries, and coordination collisions.
Token Cost: 3,181
Est. Savings: 13,448
Net: +10,267 tokens
View Publisher Profile
Copied!
4.2x TEI

core operating rules

C001 HIGH OBSERVED ONCE 5x High · 204t

No agent prompt may contain raw transaction data, account numbers, routing numbers, SSNs, or any data classified as PII-Financial under SOC 2 Type II.

Why: SOC 2 compliance requires that PII-Financial data is processed only within audited systems. AI agent prompts are not within the SOC 2 audit boundary. A single violation could trigger an audit finding that costs 6 months of remediation.

Failure mode: During initial setup, the transaction categorization QA agent was accidentally fed a batch of 200 raw transactions including merchant names, amounts, and partial account numbers. The data appeared in Claude's context window. The incident was caught in a weekly security review. No external exposure, but the SOC 2 auditor flagged it as a control deficiency. Raj spent 3 weeks documenting the remediation.

Scope: All agents, all prompts.

C002 HIGH OBSERVED ONCE 5x High · 186t

Support tickets are preprocessed by a redaction layer before reaching the triage agent. The redaction layer strips account numbers, SSNs, routing numbers, and replaces them with [REDACTED-ACCOUNT], [REDACTED-SSN], etc.

Why: Users paste sensitive data into support tickets constantly. The triage agent cannot be trusted to ignore data that appears in its context.

Failure mode: Before the redaction layer, a user submitted a ticket containing their full bank account and routing number asking for help with a failed transfer. The triage agent included the ticket text verbatim in a Slack summary posted to the support channel. Four employees saw the data. Raj filed an internal incident report and built the redaction pipeline over a weekend.

Scope: Support triage agent.

C003 HIGH OBSERVED REPEATEDLY 7x High · 189t

The compliance agent has veto authority over all user-facing content produced by any agent. Blog posts, email campaigns, in-app messages, and support reply templates must pass compliance review before publication.

Why: Financial content is regulated. A blog post that says "save more money with Greenline" could be interpreted as financial advice in certain jurisdictions. Maya (compliance officer) must sign off.

Failure mode: The content agent drafted a blog post titled "5 Ways to Beat Inflation with Greenline." Maya caught the word "beat" -- it implied guaranteed financial outcomes. The post was rewritten to "5 Budgeting Strategies for Inflationary Periods." Had it published, it could have triggered a CFPB inquiry.

Scope: Content agent, support triage agent (reply templates).

agent roles and authority

C004 HIGH OBSERVED ONCE 5x High · 166t

The transaction categorization QA agent works with anonymized category distributions only. It receives "Category X had 340 transactions this week, 12 flagged as miscategorized by users" -- never the transactions themselves.

Why: The QA agent's job is to detect systematic categorization drift, not to re-categorize individual transactions. Aggregate signals are sufficient and SOC 2 compliant.

Failure mode: An early version requested sample transactions to "understand the miscategorization pattern." The request was caught by the data pipeline team before execution. Had it proceeded, it would have pulled raw merchant data into the agent prompt.

Scope: Transaction categorization QA agent.

C005 HIGH OBSERVED ONCE 5x High · 210t

The churn prediction agent produces risk scores at the cohort level (e.g., "Users who connected 1 account and have not logged in for 14+ days: 62% churn probability"). Individual user-level predictions are generated but stored only in the internal analytics database, never surfaced in agent output.

Why: Acting on individual predictions without human review creates false positive outreach. A user flagged as "churning" who is actually just on vacation receives a tone-deaf "We miss you" email.

Failure mode: The churn agent identified 45 users with a >70% churn probability and recommended an immediate re-engagement email. Raj approved the batch. 8 users replied angrily -- they were active users who had simply switched to the mobile app (which the churn model didn't track at the time). Greenline lost 2 users who felt surveilled.

Scope: Churn prediction agent.

C006 HIGH OBSERVED ONCE 5x High · 182t

The API monitoring agent has autonomous authority to post P1 alerts (Plaid down, Stripe down, database connection failures) to the #engineering-alerts Slack channel without human approval. P2 and P3 alerts require engineering lead confirmation.

Why: Plaid outages directly impact user experience -- bank connections fail, balances don't update, transactions are missing. Every minute of delay in alerting the engineering team extends user impact.

Failure mode: Before autonomous P1 alerts, a Plaid outage at 11:47 PM went unnoticed for 3 hours because the on-call engineer was asleep and the agent waited for confirmation. 412 users experienced failed bank syncs. 23 submitted support tickets. NPS dropped 4 points that month.

Scope: API monitoring agent.

coordination patterns

C007 HIGH OBSERVED REPEATEDLY 7x High · 184t

When the API monitoring agent detects a Plaid outage, it simultaneously notifies the support triage agent. The triage agent auto-tags all incoming tickets mentioning "bank connection," "sync," or "balance" as "Known Issue - Plaid Outage" and responds with a templated status message.

Why: During Plaid outages, support ticket volume spikes 8-12x. Without auto-tagging, the support team spends hours triaging tickets that all have the same root cause.

Failure mode: During the 3-hour undetected Plaid outage (see C006), 23 tickets came in. The support team spent the next morning individually diagnosing each one before realizing they were all the same issue. Total wasted time: 4.5 hours across 2 support agents.

Scope: API monitoring agent, support triage agent.

C008 MEDIUM OBSERVED ONCE 3x Moderate · 199t

The weekly metrics agent and churn prediction agent share a data pipeline. Churn predictions feed into the weekly metrics report as a "Retention Risk" section. The metrics agent contextualizes churn predictions against actual retention numbers, preventing alarm fatigue.

Why: Churn predictions in isolation create panic. Churn predictions alongside actual retention data create informed decisions.

Failure mode: Before integration, the churn agent reported "287 users at high risk" in the same week the metrics agent reported "98.2% 30-day retention." Raj panicked about the 287 number. In context, 287 out of 8,000 users at "high risk" with 98.2% actual retention meant the model was overpredicting. The false alarm cost Raj a weekend of unnecessary strategy sessions.

Scope: Weekly metrics agent, churn prediction agent.

C009 MEDIUM OBSERVED ONCE 3x Moderate · 173t

The content agent reads the compliance agent's current regulatory watchlist before drafting any content. Topics on the watchlist (currently: crypto, investment advice, credit scoring, BNPL) require Maya's pre-approval of the topic itself before any drafting begins.

Why: Some topics are regulatory minefields. Drafting a full blog post only to have compliance reject the topic wastes everyone's time.

Failure mode: The content agent drafted a 1,500-word guide titled "Using Greenline to Track Your Crypto Portfolio." Maya rejected it outright -- Greenline is not licensed to provide crypto-related financial services in 3 of its operating states. 6 hours of content work discarded.

Scope: Content agent, compliance agent.

operational heuristics

C010 HIGH OBSERVED REPEATEDLY 7x High · 179t

Support ticket priority is determined by financial impact. Tickets mentioning failed payments, incorrect balances, or unauthorized transactions are auto-escalated to P1 regardless of the user's tone or language.

Why: A polite user reporting a $500 balance discrepancy is more urgent than an angry user complaining about the UI. Financial impact trumps sentiment.

Failure mode: The triage agent initially used sentiment analysis for priority. An angry user complaining about a font change was prioritized over a calm user reporting that $1,200 appeared to be missing from their account. The calm user waited 18 hours for a response. The "missing" money was a categorization display bug, but the delay eroded trust.

Scope: Support triage agent.

C011 HIGH OBSERVED REPEATEDLY 7x High · 166t

API health checks run every 30 seconds for Plaid and Stripe. Degraded performance (response time >2x baseline) triggers a P2 alert. Full outage (no response for 3 consecutive checks) triggers P1.

Why: Degraded performance often precedes full outages. Early warning gives the engineering team time to activate failover or notify users before the situation becomes critical.

Failure mode: Before the degradation detection, a Stripe partial outage caused payment processing to slow from 200ms to 8 seconds. Users experienced "spinning" payment screens. 14 users abandoned mid-payment. The monitoring agent only alerted when Stripe went fully down 40 minutes later.

Scope: API monitoring agent.

C012 MEDIUM OBSERVED ONCE 3x Moderate · 166t

The categorization QA agent flags systematic drift when any category's miscategorization rate exceeds 5% over a 7-day window. Drift below 5% is logged but not alerted.

Why: Individual miscategorizations are normal (merchants change names, new merchants appear). Systematic drift indicates a model problem that affects many users simultaneously.

Failure mode: A merchant data provider changed their taxonomy, causing "Groceries" to be classified as "General Merchandise" for 380 users. The drift wasn't flagged for 12 days because the old threshold was 10%. Users noticed before Greenline did. 7 support tickets in one day about "wrong categories."

Scope: Transaction categorization QA agent.

failure patterns

C013 MEDIUM OBSERVED ONCE 3x Moderate · 146t

Any SOC 2 control deficiency caused by an agent triggers an immediate 72-hour remediation window. The agent is suspended from production until the fix is verified by Maya and the engineering lead.

Why: SOC 2 audit findings compound. One unresolved finding makes auditors scrutinize everything else more aggressively. Fast remediation keeps the audit clean.

Failure mode: The C001 raw transaction incident took 3 weeks to remediate because it wasn't treated as urgent. The auditor noted both the original incident AND the slow remediation as separate findings. Two findings from one incident.

Scope: All agents.

C014 MEDIUM OBSERVED ONCE 3x Moderate · 149t

False positive churn predictions that result in user complaints are tracked as a separate metric. If false positive rate exceeds 15% of actioned predictions, the churn model is retrained before any further outreach.

Why: Users who are told "We noticed you haven't been active" when they are active feel surveilled. Each false positive costs more trust than a true positive gains.

Failure mode: See C005. The 8 angry replies from 45 actioned predictions (18% false positive rate) triggered a model retrain. The retrained model incorporated mobile app activity and reduced false positives to 4%.

Scope: Churn prediction agent.

C015 HIGH OBSERVED ONCE 5x High · 169t

When a support ticket auto-response is wrong (user replies saying the automated response didn't help or was incorrect), the ticket is immediately re-routed to a human agent and the auto-response template is flagged for review.

Why: A wrong automated response followed by another wrong automated response makes the user feel trapped in a system that doesn't work.

Failure mode: A user reported a failed Stripe payment. The triage agent auto-responded with "Try reconnecting your bank account via Plaid." The issue was Stripe, not Plaid. The user replied "That's not the problem." The agent sent the same template again. The user tweeted about the experience. 340 impressions.

Scope: Support triage agent.

human ai boundary conditions

C016 MEDIUM SPECULATION 1x Low · 163t

Maya (compliance officer) has absolute veto on any agent output that could be interpreted as financial advice, investment guidance, or credit-related recommendations. Her veto cannot be overridden by Raj or any other team member.

Why: Regulatory penalties for unlicensed financial advice start at $10,000 per incident. Maya's authority is the last line of defense.

Failure mode: Not yet violated. Raj once pushed back on a content veto ("it's just a blog post"), but Maya held firm and Raj backed down after she cited the CFPB enforcement action against a competitor for similar language. The near-miss reinforced Maya's authority.

Scope: All agents producing user-facing content.

C017 LOW SPECULATION 0.5x Negative · 157t

The engineering team reviews all API monitoring alert thresholds quarterly. The monitoring agent proposes threshold adjustments based on observed data, but the engineering lead approves changes.

Why: Thresholds that are too sensitive create alert fatigue. Thresholds that are too loose miss real incidents. The engineering team understands the infrastructure nuances that the monitoring agent does not.

Failure mode: Hypothesized. The current thresholds were set based on 6 months of data and have not yet needed adjustment. The quarterly review exists to prevent threshold drift as the user base grows and API patterns change.

Scope: API monitoring agent.

C018 LOW SPECULATION 0.5x Negative · 193t

No agent may access or reference a user's actual financial position (total balance, spending patterns, debt levels) when generating any external communication. Users are addressed as "users" not as segments defined by their financial status.

Why: Segmenting users by wealth or spending level in communications ("We noticed you spend a lot on dining out") feels invasive and can violate fair lending adjacent regulations depending on how the data is used.

Failure mode: Hypothesized based on a competitor incident. A competing app sent an email to "high spenders" with personalized savings tips. Users who received it felt profiled. The app's subreddit had 200+ comments in a thread titled "How does [app] know how much I spend?"

Scope: Content agent, churn prediction agent, support triage agent. ---