Linux Kernel Debugging

By : Kaiwan N. Billimoria

Linux Kernel Debugging

By: Kaiwan N. Billimoria

Overview of this book

The Linux kernel is at the very core of arguably the world’s best production-quality OS. Debugging it, though, can be a complex endeavor. Linux Kernel Debugging is a comprehensive guide to learning all about advanced kernel debugging. This book covers many areas in-depth, such as instrumentation-based debugging techniques (printk and the dynamic debug framework), and shows you how to use Kprobes. Memory-related bugs tend to be a nightmare – two chapters are packed with tools and techniques devoted to debugging them. When the kernel gifts you an Oops, how exactly do you interpret it to be able to debug the underlying issue? We’ve got you covered. Concurrency tends to be an inherently complex topic, so a chapter on lock debugging will help you to learn precisely what data races are, including using KCSAN to detect them. Some thorny issues, both debug- and performance-wise, require detailed kernel-level tracing; you’ll learn to wield the impressive power of Ftrace and its frontends. You’ll also discover how to handle kernel lockups, hangs, and the dreaded kernel panic, as well as leverage the venerable GDB tool within the kernel (KGDB), along with much more. By the end of this book, you will have at your disposal a wide range of powerful kernel debugging tools and techniques, along with a keen sense of when to use which.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Share Your Thoughts

Part 1: A General Introduction and Approaches to Kernel Debugging

Free Chapter

Chapter 1: A General Introduction to Debugging Software

Technical requirements

Software debugging – what it is, origins, and myths

Software bugs – a few actual cases

Setting up the workspace

A tale of two kernels

Debugging – a few quick tips

Summary

Further reading

Chapter 2: Approaches to Kernel Debugging

Technical requirements

Classifying bug types

Kernel debugging – why there are different approaches to it

Summarizing the different approaches to kernel debugging

Summary

Further reading

Part 2: Kernel and Driver Debugging Tools and Techniques

Chapter 3: Debug via Instrumentation – printk and Friends

Technical requirements

The ubiquitous kernel printk

Leveraging the printk for debug purposes

Using the kernel's powerful dynamic debug feature

Remaining printk miscellany

Summary

Further reading

Chapter 4: Debug via Instrumentation – Kprobes

Understanding kprobes basics

Using static kprobes – traditional approaches to probing

Understanding the basics of the Application Binary Interface (ABI)

Using static kprobes – demo 3 and demo 4

Getting started with kretprobes

Kprobes – limitations and downsides

The easier way – dynamic kprobes or kprobe-based event tracing

Trapping into the execve() API – via perf and eBPF tooling

Summary

Further reading

Chapter 5: Debugging Kernel Memory Issues – Part 1

Technical requirements

What's the problem with memory anyway?

Using KASAN and UBSAN to find memory bugs

Building your kernel and modules with Clang

Catching memory defects in the kernel – comparisons and notes (Part 1)

Summary

Further reading

Chapter 6: Debugging Kernel Memory Issues – Part 2

Technical requirements

Detecting slab memory corruption via SLUB debug

Finding memory leakage issues with kmemleak

Catching memory defects in the kernel – comparisons and notes (Part 2)

Summary

Further reading

Chapter 7: Oops! Interpreting the Kernel Bug Diagnostic

Technical requirements

Generating a simple kernel bug and Oops

A kernel Oops and what it signifies

The devil is in the details – decoding the Oops

Tools and techniques to help determine the location of the Oops

An Oops on an ARM Linux system and using netconsole

A few actual Oopses

Summary

Further reading

Chapter 8: Lock Debugging

Technical requirements

Locking and lock debugging

Locking – a quick summarization of key points

Catching concurrency bugs with KCSAN

A few actual use cases of kernel bugs due to locking defects

Summary

Further reading

Part 3: Additional Kernel Debugging Tools and Techniques

Chapter 9: Tracing the Kernel Flow

Technical requirements

Kernel tracing technology – an overview

Using the ftrace kernel tracer

Using the trace-cmd, KernelShark, and perf-tools ftrace frontends

An introduction to kernel tracing with LTTng and Trace Compass

Summary

Further reading

Chapter 10: Kernel Panic, Lockups, and Hangs

Technical requirements

Panic! – what happens when a kernel panics

Writing a custom kernel panic handler routine

Detecting lockups and CPU stalls in the kernel

Employing the kernel's hung task and workqueue stall detectors

Summary

Further reading

Chapter 11: Using Kernel GDB (KGDB)

Technical requirements

Conceptually understanding how KGDB works

Setting up an ARM target system and kernel for KGDB

Debugging the kernel with KGDB

Debugging kernel modules with KGDB

[K]GDB – a few tips and tricks

Summary

Further reading

Chapter 12: A Few More Kernel Debugging Approaches

An introduction to the kdump/crash framework

A mention on performing static analysis on kernel code

An introduction to kernel code coverage tools and testing frameworks

Miscellaneous – using journalctl, assertions, and warnings

Summary

Further reading

Why subscribe?

Other Books You May Enjoy

Packt is searching for authors like you

Share Your Thoughts

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Debugging – a few quick tips

I'll start off by saying this: debugging is both a science and an art, refined by experience – the mundane hands-on slogging through to reproduce and identify a bug and its root cause, and (possibly) fix it. I'm of the opinion that the following few debug tips are really nothing new; that said, we do tend to get caught up in the moment and often miss the obvious. The hope is that you'll find these tips useful and return to them time and again!

Assumptions – just say NO!

Churchill famously said, "Never, never, never, give up". We say "Never, never, never, make assumptions".

Assumptions are, very often, the root cause behind many, many bugs and defects. Think back, re-read the Software bugs – a few actual cases section!

In fact (hey, I am partially joking here), just look at the word assume: it just begs saying, "Don't make an ASS out of U and ME"!

Using assertions in your code is a great way to catch assumptions. The userspace way is to use the assert() macro. It's well documented in the man page. (We cover more on using macros within the kernel in Chapter 12, A Few More Kernel Debugging Approaches, in the Assertions, warnings and BUG() macros section).

Don't lose the forest for the trees!

At times, we do get lost in the twisted mazes of complex code paths. In these circumstances, it's really easy to lose sight of the big idea, the objective of the code. Try and zoom out and think of the bigger picture. It often helps spot the faulty assumption(s) that led to the error(s). Well-written documentation can be a lifesaver.

Think small

When faced with a difficult bug, try this: build/configure/get the smallest possible version of your problem (statement) to execute causing the issue or bug you're currently facing to surface. This often helps you track down the root cause of the problem. In fact, very often (in my own experience), the mere act of doing this – or even just the detailed jotting down of the problem you face – triggers you seeing the actual issue and its solution in your mind!

"It requires twice the brainpower to debug a piece of code than to write it"

This paraphrased quote is by Brian Kernighan in the book The Elements of Programming Style. So, should we not use our full brainpower while writing code? Ha, of course you should... But, debugging is typically harder than writing code. The real point is this: take the trouble to first carefully do your groundwork: write a brief very high-level design document and write what you expect the code to do, at a high level of abstraction. Then move on to the specifics (with a so-called low-level design doc). Good documentation will save you one day (and blessings shall be showered upon you!).

That reminds me of another quote: An ounce of design is worth a pound of refactoring – Karl Wiegers.

Employ "Zen Mind, Beginner's Mind"

Sometimes, the code can become too complex (spaghetti-like; it just smells). In many cases, just giving up and starting from scratch again, if viable, is perhaps the best thing to do.

This Zen-Beginner's Mind state also implies that we at least temporarily stop our (perhaps over-egotistical) thought patterns (I wrote this so well, how can it be wrong!?) and look at the situation from the point of view of somebody completely new to it. It is, in fact, one key reason why a colleague reviewing your code can spot bugs you'd never see! Plus, a good night's rest can do wonders.

Variable naming, comments

I recall a Q&A on Quora revealing that the hardest thing a programmer does is name variables well! This is truer than it might appear at first glance. Variable names stick; choose yours carefully. As with commenting, don't go overboard either: a local variable for a loop index? int i is just fine (int theloopindex is just painful). The same goes for comments: they're there to explain the rationale, the design behind the code, what it's designed and implemented to achieve, not how the code works. Any competent programmer can figure that out.

Ignore logs at your peril!

It's self-evident perhaps, but we can often miss the obvious when under pressure... Carefully checking kernel (and even app) logs often reveals the source of the issue you might be facing. Logs are usually able to be displayed in reverse-chronological order and give you a view of what actually occurred; Linux's systemd journalctl(1) utility is powerful; learn how to leverage it!

Testing can reveal the presence of errors but not their absence

A truism, unfortunately. Still, testing and QA is simply one of the most critical parts of the software process; ignore it at your peril! The time and the trouble taken to write exhaustive test cases – both positive and negative – pays off in large dividends in the long run, helping make the product or project a grand success. Negative test cases and fuzzing are critical for exposing (and subsequently fixing) security vulnerabilities in the code base. Then again, runtime testing only tests the portions of code actually executed. Take the trouble to perform code coverage analysis; 100% code coverage – and runtime testing it is the objective! (Again, we cover more on these key points in Chapter 12, A Few More Kernel Debugging Approaches, in the An introduction to kernel code coverage tools and testing frameworks section).

Incurring technical debt

Every now and then, you realize deep down that though what you've coded works, it's not been done well enough (perhaps there still exist corner cases that will trigger bugs or undefined behavior); that nagging feeling that perhaps this design and implementation simply isn't the best. The temptation to quickly check it in and hope for the best can be high, especially as deadlines loom! Please don't; there is really a thing called technical debt. It will come and get you.

Silly mistakes

If I had a penny for each time I've made really silly mistakes when developing code, I'd be a rich man! For instance, I once spent nearly half a day racking my head about why my C program would just refuse to work correctly until I realized I was editing the correct code but compiling an old version of it – performing the build in the wrong directory! (I am certain you've faced your share of such pesky frustrations.) Often, a break, a good night's sleep, can do wonders.

Empirical model

The word empirical means to validate something (anything) by actual and direct observation or experience rather than relying on theory.

Figure 1.6 – Be empirical!

So, don't believe the book (this one is an exception of course!), don't believe the tutorial, the article, blog, tutor, or author: be empirical – try it out and see for yourself!

Years (decades, actually) back, on my very first day of work at a company I joined, a colleague emailed me a document that I still hold dear: The Ten Commandments for C Programmers, by Henry Spencer (https://www.electronicsweekly.com/open-source-engineering/linux/the-ten-commandments-for-c-programmers-2009-04/). Do check it out. In a similar, albeit clumsier, manner, I present a quick checklist for you.

A programmer's checklist – seven rules

Very important! Did you remember to do the following?:

Check all APIs for their failure case.
Compile with warnings on (definitely with -Wall and possibly -Wextra or even -Werror; yes, treating warnings as errors is going to make its way into the kernel!); eliminate all warnings as far as is possible.
Never trust (user) input; validate it.
Eliminate unused (or dead) code from the code base immediately.
Test thoroughly; 100% code coverage is the objective. Take the time and trouble to learn how to use powerful tools: memory checkers, static and dynamic analyzers, security checkers (checksec, lynis, and several others), fuzzers, code coverage tools, fault injection frameworks, and so on. Don't ignore security!
With regard to kernels and especially drivers, after eliminating software issues, be aware that (peripheral) hardware issues could be the root cause of the bug. Don't discount it out of hand! (You'll learn this the hard way.)
Do not assume anything (assume: make an ASS out of U and ME); using assertions helps catch assumptions, and thus bugs.

We shall elaborate on several of these points in the coming material.

Linux Kernel Debugging

By : Kaiwan N. Billimoria

Linux Kernel Debugging

By: Kaiwan N. Billimoria

Overview of this book

Related Content you might be interested in

Current Title:

Linux Kernel Debugging

Linux Kernel Programming

Linux Kernel Programming

Linux Kernel Programming Part 2 - Char Device Drivers and Kernel Synchronization

Debugging – a few quick tips

A programmer's checklist – seven rules