Book Image

Red Hat Enterprise Linux Troubleshooting Guide

By : Benjamin Cane
Book Image

Red Hat Enterprise Linux Troubleshooting Guide

By: Benjamin Cane

Overview of this book

Red Hat Enterprise Linux is an operating system that allows you to modernize your infrastructure, boost efficiency through virtualization, and finally prepare your data center for an open, hybrid cloud IT architecture. It provides the stability to take on today's challenges and the flexibility to adapt to tomorrow's demands. In this book, you begin with simple troubleshooting best practices and get an overview of the Linux commands used for troubleshooting. The book will cover the troubleshooting methods for web applications and services such as Apache and MySQL. Then, you will learn to identify system performance bottlenecks and troubleshoot network issues; all while learning about vital troubleshooting steps such as understanding the problem statement, establishing a hypothesis, and understanding trial, error, and documentation. Next, the book will show you how to capture and analyze network traffic, use advanced system troubleshooting tools such as strace, tcpdump & dmesg, and discover common issues with system defaults. Finally, the book will take you through a detailed root cause analysis of an unexpected reboot where you will learn to recover a downed system.
Table of Contents (19 chapters)
Red Hat Enterprise Linux Troubleshooting Guide
About the Author
About the Reviewers

A sample Root Cause Analysis

Now that we have all of the information we need, let's create a root cause analysis report. This report can be in any format, really, but I've found that something along the following lines works well.

Problem summary

At approximately 1:50 A.M. on July 5, 2015 the server unexpectedly rebooted. The watchdog process initiated the reboot process due to a high load average on the server.

After investigation, the high load average appears to be caused by a custom e-mail application, which was left in a running state even though it has been migrated to another server.

From the data available, it seems the application consumed 100 percent of the root filesystem.

While I was unable to obtain process states from before the reboot, it appears the high load average might have also been due to the same application being unable to write to the disk.

Problem details

The time at which the incident was reported—07/05/2015 at 01:52

The timeline of the incident would...