Book Image

Mastering Linux Kernel Development

By : CH Raghav Maruthi
Book Image

Mastering Linux Kernel Development

By: CH Raghav Maruthi

Overview of this book

Mastering Linux Kernel Development looks at the Linux kernel, its internal arrangement and design, and various core subsystems, helping you to gain significant understanding of this open source marvel. You will look at how the Linux kernel, which possesses a kind of collective intelligence thanks to its scores of contributors, remains so elegant owing to its great design. This book also looks at all the key kernel code, core data structures, functions, and macros, giving you a comprehensive foundation of the implementation details of the kernel’s core services and mechanisms. You will also look at the Linux kernel as well-designed software, which gives us insights into software design in general that are easily scalable yet fundamentally strong and safe. By the end of this book, you will have considerable understanding of and appreciation for the Linux kernel.
Table of Contents (19 chapters)
Title Page
About the Author
About the Reviewer
Customer Feedback

Kernel stack

With current-generation computing platforms powered by multi-core hardware capable of running simultaneous applications, the possibility of multiple processes concurrently initiating kernel mode switch when requesting for the same process is built in. To be able to handle such situations, kernel services are designed to be re-entrant, allowing multiple processes to step in and engage the required services. This mandated the requesting process to maintain its own private kernel stack to keep track of the kernel function call sequence, store local data of the kernel functions, and so on.

The kernel stack is directly mapped to the physical memory, mandating the arrangement to be physically in a contiguous region. The kernel stack by default is 8kb for x86-32 and most other 32-bit systems (with an option of 4k kernel stack to be configured during kernel build), and 16kb on an x86-64 system.

When kernel services are invoked in the current process context, they need to validate the process’s prerogative before it commits to any relevant operations. To perform such validations, the kernel services must gain access to the task structure of the current process and look through the relevant fields. Similarly, kernel routines might need to have access to the current task structure for modifying various resource structures such as signal handler tables, looking for pending signals, file descriptor table, and memory descriptor among others. To enable accessing the task structure at runtime, the address of the current task structure is loaded into a processor register (register chosen is architecture specific) and made available through a kernel global macro called current (defined in architecture-specific kernel header asm/current.h ):

  /* arch/ia64/include/asm/current.h */
  #ifndef _ASM_IA64_CURRENT_H
  #define _ASM_IA64_CURRENT_H
  * Modified 1998-2000
  *      David Mosberger-Tang <[email protected]>, Hewlett-Packard Co
  #include <asm/intrinsics.h>
  * In kernel mode, thread pointer (r13) is used to point to the 
    current task
  * structure.
 #define current ((struct task_struct *) ia64_getreg(_IA64_REG_TP))
 #endif /* _ASM_IA64_CURRENT_H */
 /* arch/powerpc/include/asm/current.h */
 #ifdef __KERNEL__
 * This program is free software; you can redistribute it and/or
 * modify it under the terms of the GNU General Public License
 * as published by the Free Software Foundation; either version
 * 2 of the License, or (at your option) any later version.
 struct task_struct;
 #ifdef __powerpc64__
 #include <linux/stddef.h>
 #include <asm/paca.h>
 static inline struct task_struct *get_current(void)
       struct task_struct *task;

       __asm__ __volatile__("ld %0,%1(13)"
       : "=r" (task)
       : "i" (offsetof(struct paca_struct, __current)));
       return task;
 #define current get_current()
 * We keep `current' in r2 for speed.
register struct task_struct *current asm ("r2");
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_CURRENT_H */

However, in register-constricted architectures, where there are few registers to spare, reserving a register to hold the address of the current task structure is not viable. On such platforms, the task structure of the current process is directly made available at the top of the kernel stack that it owns. This approach renders a significant advantage with respect to locating the task structure, by just masking the least significant bits of the stack pointer.

With the evolution of the kernel, the task structure grew and became too large to be contained in the kernel stack, which is already restricted in physical memory (8Kb). As a result, the task structure was moved out of the kernel stack, barring a few key fields that define the process's CPU state and other low-level processor-specific information. These fields were then wrapped in a newly created structure called struct thread_info. This structure is contained on top of the kernel stack and provides a pointer that refers to the current task structure, which can be used by kernel services.

The following code snippet shows struct thread_info for x86 architecture (kernel 3.10):

/* linux-3.10/arch/x86/include/asm/thread_info.h */

struct thread_info {
struct task_struct *task; /* main task structure */
 struct exec_domain *exec_domain; /* execution domain */
 __u32 flags; /* low level flags */
 __u32 status; /* thread synchronous flags */
 __u32 cpu; /* current CPU */
 int preempt_count; /* 0 => preemptable, <0 => BUG */
 mm_segment_t addr_limit;
 struct restart_block restart_block;
 void __user *sysenter_return;
 #ifdef CONFIG_X86_32
 unsigned long previous_esp; /* ESP of the previous stack in case of   
 nested (IRQ) stacks */
 __u8 supervisor_stack[0];
 unsigned int sig_on_uaccess_error:1;
 unsigned int uaccess_err:1; /* uaccess failed */

With thread_info containing process-related information, apart from task structure, the kernel has multiple viewpoints to the current process structure: struct task_struct, an architecture-independent information block, and thread_info, an architecture-specific one. The following figure depicts thread_info and task_struct:

For architectures that engage thread_info, the current macro's implementation is modified to look into the top of kernel stack to obtain a reference to the current thread_info and through it the current task structure. The following code snippet shows the implementation of current for an x86-64 platform:

  #include <linux/thread_info.h>

#define get_current() (current_thread_info()->task)
#define current get_current()

  #endif /* __ASM_GENERIC_CURRENT_H */
  * how to get the current stack pointer in C
  register unsigned long current_stack_pointer asm ("sp");

   * how to get the thread information struct from C
  static inline struct thread_info *current_thread_info(void)  

  static inline struct thread_info *current_thread_info(void)
 return (struct thread_info *)
                (current_stack_pointer & ~(THREAD_SIZE - 1));

As use of PER_CPU variables has increased in recent times, the process scheduler is tuned to cache crucial current process-related information in the PER_CPU area. This change enables quick access to current process data over looking up the kernel stack. The following code snippet shows the implementation of the current macro to fetch the current task data through the PER_CPU variable:

  #ifndef _ASM_X86_CURRENT_H
  #define _ASM_X86_CURRENT_H

  #include <linux/compiler.h>
  #include <asm/percpu.h>

  #ifndef __ASSEMBLY__
  struct task_struct;

DECLARE_PER_CPU(struct task_struct *, current_task);

  static __always_inline struct task_struct *get_current(void)
return this_cpu_read_stable(current_task);

#define current get_current()

  #endif /* __ASSEMBLY__ */

  #endif /* _ASM_X86_CURRENT_H */

The use of PER_CPU data led to a gradual reduction of information in thread_info. With thread_info shrinking in size, kernel developers are considering getting rid of thread_info altogether by moving it into the task structure. As this involves changes to low-level architecture code, it has only been implemented for the x86-64 architecture, with other architectures planned to follow. The following code snippet shows the current state of the thread_info structure with just one element:

/* linux-4.9.10/arch/x86/include/asm/thread_info.h */
struct thread_info {
 unsigned long flags; /* low level flags */