Rules of thumb learned from practice. These claims may not be provable in all cases, but they have been observed to work reliably. Heuristics evolve into rules as evidence accumulates.
Unresolvable errors: log and stop. No automatic retry.
Why: Retries on unresolvable errors create noise.
Failure mode: Agent retries 100 times, consumes rate limits.
When the Delivery Monitor flags a project as "at risk" (velocity drop >20% for 2 consecutive sprints), the PM must acknowledge within 24 hours with a remediation plan. If no acknowledgment in 24 hours, it escalates to the SVP of Global Delivery automatically.
Why: Silent project degradation was our biggest delivery risk. PMs naturally want to "fix it internally" before escalating. The forced acknowledgment window prevents hiding.
Failure mode: A PM acknowledged the alert but submitted a boilerplate remediation plan ("will add resources next sprint") without actually investigating the root cause. The project continued to degrade. We now require the PM to cite specific Jira tickets and the root cause in the acknowledgment.
AI code review comments that go unaddressed for 48 hours are auto-escalated to the tech lead. This prevents PR queues from stalling because developers dismiss AI feedback.
Why: Early adoption showed developers ignoring AI comments at a 40% rate, assuming they were false positives. Some were. Many were not. The escalation creates accountability.
Failure mode: The escalation initially went to the PM instead of the tech lead. PMs did not have the technical context to evaluate whether the AI comment was valid. We rerouted to tech leads who can make the call in 5 minutes.
The Knowledge Navigator's staleness scoring triggers automatic review requests to documentation owners when a page has not been updated in 6 months and has been cited more than 10 times.
Why: High-citation, stale documentation is the most dangerous kind -- it is trusted precisely because it is frequently referenced, but the information may be outdated.
Failure mode: Documentation owners were overwhelmed with review requests during the initial rollout (we had 3+ years of Confluence debt). We added a priority queue based on citation frequency x staleness to focus on the most dangerous pages first.
If founder has fewer than 3 OTP hours in a week, defer all non-build work.
Why: Low-availability weeks must protect build above everything.
Failure mode: Low-availability week spent on outreach delays timeline by 2 weeks.
If data is stale, flag it visibly. Never silently present old information as current.
Why: Stale data presented as current causes wrong decisions. Visible staleness lets the consumer decide how to weight the information.
Failure mode: Briefing shows yesterday's ad spend as today's. Founder makes budget decisions on wrong numbers.
If 3+ tasks from one person are overdue, flag as capacity pattern, not motivation problem.
Why: Individual overdue tasks might be forgotten. A pattern of overdue tasks indicates workload exceeds capacity.
Failure mode: Manager assumes delegation is lazy. Actual problem is team member is overwhelmed. Problem worsens.
If data is stale, flag it visibly. Never silently present old info as current.
Why: Stale data presented as current causes wrong decisions.
Failure mode: Briefing shows yesterdays spend as todays. Wrong budget decisions.
If 3+ tasks from one person are overdue, flag as capacity pattern.
Why: Individual overdue = forgotten. Pattern = overwhelmed.
Failure mode: Manager assumes laziness. Actual problem is capacity. Problem worsens.