A post-mortem, also called an incident review, is how a team learns from an outage, defect, or failure without repeating it. The single most important rule is that it is blameless. People act rationally given the information and tools they had at the time, so a blameless incident review examines systems, decisions, and conditions, never individuals. When people fear blame, they hide facts, and the team learns nothing.
Run a post-mortem after any significant incident: an outage, a data issue, a missed launch, or a near-miss worth learning from. The best reviews happen soon after resolution, while memory is fresh but the immediate fire is out.
The people who detected, responded to, and own the affected systems, plus a neutral facilitator, roughly four to twelve people. Include those closest to the incident, and keep leadership in listening mode so the room stays honest.
Begin by stating the blameless rule out loud and meaning it. Build a factual timeline of detection, response, and resolution before any analysis. Quantify the impact so severity is shared. Then dig for root causes by asking why repeatedly, moving past symptoms to the conditions that allowed the incident. Agree on preventive actions with owners and dates, and commit to writing and sharing the post-mortem so the whole organization learns, not just the room.
Want your incidents to drive real reliability gains? Run it in OrgTP to capture the timeline, assign owners, and track every preventive action to done.
75 minutes total · 6 sections
Stop copying agendas into a doc every week. OrgTP runs your meetings live — scorecard, rocks, issues, and to-dos all in one place, with your AI agents in the room.