First Thing – Runbooks and Low Noise Outage Notifications
Institutional knowledge is the experience and understanding that staff have about a company. From the long-standing engineers who wrote the code that runs the day-to-day business, to network engineers familiar with the topology of the data center, and even customer service representatives, who know how to leverage internal company processes to resolve customer issues. This employee understanding takes years to build in employees and can be difficult – if not impossible – to replace. Runbooks are the embodiment of institutional knowledge about applications and systems in a tangible document to aid in troubleshooting and resolving issues.
Runbooks may embody the knowledge needed to resolve issues, but we must combine them with alerts that tell us when an outage occurs. When I think of alerts, I’m always reminded of the boy who cried wolf – the story of a boy who is bored at night tending sheep...