Book Image

Linux Kernel Debugging

By : Kaiwan N. Billimoria
Book Image

Linux Kernel Debugging

By: Kaiwan N. Billimoria

Overview of this book

The Linux kernel is at the very core of arguably the world’s best production-quality OS. Debugging it, though, can be a complex endeavor. Linux Kernel Debugging is a comprehensive guide to learning all about advanced kernel debugging. This book covers many areas in-depth, such as instrumentation-based debugging techniques (printk and the dynamic debug framework), and shows you how to use Kprobes. Memory-related bugs tend to be a nightmare – two chapters are packed with tools and techniques devoted to debugging them. When the kernel gifts you an Oops, how exactly do you interpret it to be able to debug the underlying issue? We’ve got you covered. Concurrency tends to be an inherently complex topic, so a chapter on lock debugging will help you to learn precisely what data races are, including using KCSAN to detect them. Some thorny issues, both debug- and performance-wise, require detailed kernel-level tracing; you’ll learn to wield the impressive power of Ftrace and its frontends. You’ll also discover how to handle kernel lockups, hangs, and the dreaded kernel panic, as well as leverage the venerable GDB tool within the kernel (KGDB), along with much more. By the end of this book, you will have at your disposal a wide range of powerful kernel debugging tools and techniques, along with a keen sense of when to use which.
Table of Contents (17 chapters)
1
Part 1: A General Introduction and Approaches to Kernel Debugging
4
Part 2: Kernel and Driver Debugging Tools and Techniques
11
Part 3: Additional Kernel Debugging Tools and Techniques

A kernel Oops and what it signifies

Here are a quick few things to realize regarding a kernel Oops.

First off, an Oops is not the same as a segfault – a segmentation fault... It might, as a side effect, cause a segfault to occur, and thus the process context might receive the fatal SIGSEGV signal. This, of course, has the poor process caught in the crossfire.

Next, an Oops is not the same thing as a full-fledged kernel panic. A panic implies the system is in an unusable state. It might lead up to this, especially on production systems (we cover kernel panic in Chapter 10, Kernel Panic, Lockups and Hangs). Note though, that the kernel provides several sysctl tunables (editable by root, of course) regarding what circumstances can lead to the kernel panicking. We can check them out – on my x86_64 Ubuntu 20.04 guest running our custom production kernel, here they are:

$ cd /proc/sys/kernel/
$ ls panic_on_*
panic_on_io_nmi  panic_on_oops  panic_on_rcu_stall...