Book Image

Red Hat Enterprise Linux Troubleshooting Guide

By : Benjamin Cane
Book Image

Red Hat Enterprise Linux Troubleshooting Guide

By: Benjamin Cane

Overview of this book

Red Hat Enterprise Linux is an operating system that allows you to modernize your infrastructure, boost efficiency through virtualization, and finally prepare your data center for an open, hybrid cloud IT architecture. It provides the stability to take on today's challenges and the flexibility to adapt to tomorrow's demands. In this book, you begin with simple troubleshooting best practices and get an overview of the Linux commands used for troubleshooting. The book will cover the troubleshooting methods for web applications and services such as Apache and MySQL. Then, you will learn to identify system performance bottlenecks and troubleshoot network issues; all while learning about vital troubleshooting steps such as understanding the problem statement, establishing a hypothesis, and understanding trial, error, and documentation. Next, the book will show you how to capture and analyze network traffic, use advanced system troubleshooting tools such as strace, tcpdump & dmesg, and discover common issues with system defaults. Finally, the book will take you through a detailed root cause analysis of an unexpected reboot where you will learn to recover a downed system.
Table of Contents (19 chapters)
Red Hat Enterprise Linux Troubleshooting Guide
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

A late night alert


As we have been progressing through the chapters and solving many issues for our recent employer, we have also been gaining their confidence in our abilities. Recently, we were even placed on the on call rotation, which means that if issues occur after hours an alert will be sent to our phone by SMS.

Of course, the first night of being on call we get an alert; the alert is not a good one.

ALERT: blog.example.com is no longer responding to ICMP Pings

When we were added to the on call rotation, our team lead informed us that any major incident that occurs after hours must also have an RCA performed. The reason for this is so that others in our group can learn and understand what we did to resolve the issue and how to prevent it from happening again.

As we discussed earlier one of the key components to a useful RCA is listing when things happen. A major event in our timeline is when we received the alert; based on our SMS message we can see that we received the alert on July...