Book Image

Red Hat Enterprise Linux Troubleshooting Guide

By : Benjamin Cane
Book Image

Red Hat Enterprise Linux Troubleshooting Guide

By: Benjamin Cane

Overview of this book

Red Hat Enterprise Linux is an operating system that allows you to modernize your infrastructure, boost efficiency through virtualization, and finally prepare your data center for an open, hybrid cloud IT architecture. It provides the stability to take on today's challenges and the flexibility to adapt to tomorrow's demands. In this book, you begin with simple troubleshooting best practices and get an overview of the Linux commands used for troubleshooting. The book will cover the troubleshooting methods for web applications and services such as Apache and MySQL. Then, you will learn to identify system performance bottlenecks and troubleshoot network issues; all while learning about vital troubleshooting steps such as understanding the problem statement, establishing a hypothesis, and understanding trial, error, and documentation. Next, the book will show you how to capture and analyze network traffic, use advanced system troubleshooting tools such as strace, tcpdump & dmesg, and discover common issues with system defaults. Finally, the book will take you through a detailed root cause analysis of an unexpected reboot where you will learn to recover a downed system.
Table of Contents (19 chapters)
Red Hat Enterprise Linux Troubleshooting Guide
About the Author
About the Reviewers

Chapter 12. Root Cause Analysis of an Unexpected Reboot

In this last chapter, we will put the troubleshooting methods and skills that you learned in previous chapters to the test. We will perform a root cause analysis of one of the most difficult real-world scenarios: an unexpected reboot.

As we discussed in Chapter 1, Troubleshooting Best Practices, a root cause analysis is a bit more involved than simply troubleshooting and resolving an issue. In Enterprise environments, you will find that every issue that causes a significant impact will require a root cause analysis (RCA). The reason for this is because Enterprise environments often have well-established processes of how incidents are supposed to be handled.

In general, when a significant incident occurs, the organization impacted by it wants to avoid it from happening again. You can see this in many industries even outside of technical environments.

As we discussed in Chapter 1, Troubleshooting Best Practices, a useful RCA has the following...