A Lightweight Post Mortem Process
We discussed the need to perform post mortems or AARs in our post entitled “After Action Reviews”. Our new book includes a description of how these meetings should be run, but given the amount of time we spend teaching companies our light weight post mortem process we thought it useful to describe it in a blog post as well.
First, please understand that we think onerous processes result in the death of an organization. We’ve often said that the point at which a company begins to hire “process engineers” is the point at which processes have gotten a bit too far. Startups need light, adaptable processes that can grow as their needs grow over time. The post mortem process described here is one such process.
Ideally everyone will be gathered in a single room and the room will have whiteboards that can be used during the process. Attendees should include everyone involved with the issue or crisis and who can contribute either to a complete and accurate timeline or contribute to issues identified within the timeline. Managers who might be assigned action items, be they process, organizational or technical should also attend the post mortem. A single person should be identified as the Post Mortem process facilitator.
Our post mortem process consists of three phases:
- Phase 1 focuses on generating a timeline of the events leading up to the issue or crisis. Nothing is discussed other than the timeline during this first phase. The phase is complete once everyone in the room agrees that there are no more items to be added to the timeline. We typically find that even after we’ve completed the timeline phase, people will continue to remember or identify timeline worthy events in the next phase of the post mortem.
- Phase 2 of the post mortem consists of issue identification. The process facilitator walks through the timeline and works with the team to identify issues. Was it OK that the first monitor identified customer failures at 8 AM but that no one responded until noon? Why didn’t the auto-failover of the database occur as expected? Why did we believe that dropping the user_authorization table would allow the application to start running again? Each and every issue is identified from the timeline, but no corrections or actions are allowed to be made until the team is done identifying issues. Invariably, team members will start to suggest actions but it is the responsibility of the process facilitator to focus the team on issue identification during Phase 2.
- Phase 3 of the post mortem focuses on actions. Each item should have at least one action associated with it. The process facilitator walks down the list of issues and works with the team to identify an action, an owner, an expected result and a time by which it should be completed. Using the SMART principles, each action should be specific, measurable, attainable, realistic and timely. A single owner should be identified, even though the action may take a group or team to accomplish.