In this chapter, you will not only learn about the Linux kernel in general, but also specific things about it. The chapter will start with a quick presentation of the history of Linux and its role and will then continue with an explanation of its various features. The steps used to interact with the sources of the Linux kernel will not be omitted. You will only be presented with the steps necessary to obtain a Linux kernel image from a source code, but also information about what porting for an new ARM machine implies, and some of the methods used to debug various problems that could appear when working with the Linux kernel sources in general. In the end, the context will be switched to the Yocto Project to show how the Linux kernel can be built for a given machine, and also how an external module can be integrated and used later from a root filesystem image.
This chapter will give you an idea of the Linux kernel and Linux operating system. This presentation would not have been possible without the historical component. Linux and UNIX are usually placed in the same historical context, but although the Linux kernel appeared in 1991 and the Linux operating system quickly became an alternative to the UNIX operating system, these two operating systems are members of the same family. Taking this into consideration, the history of UNIX operating system could not have started from another place. This means that we need to go back in time to more than 40 years ago, to be more precise, about 45 years ago to 1969 when Dennis Ritchie and Ken Thompson started the development of UNIX.
The predecessor of UNIX was Multiplexed Information and Computing Service (Multics), a multiuser operating system project that was not on its best shape at the time. Since the Multics had become a nonviable solution for Bell Laboratories Computer Sciences Research Center in the summer of 1969, a filesystem design was born and it later became what is known today as UNIX. Over time, it was ported on multiple machines due to its design and the fact that the source code was distributed alongside it. The most prolific contributor to the UNIX was the University of California, Berkeley. They also developed their own UNIX version called Berkeley Software Distribution (BSD), that was first released in 1977. Until the 1990s, multiple companies developed and offered their own distributions of UNIX, their main inspirations being Berkeley or AT&T. All of them helped UNIX become a stable, robust, and powerful operating system. Among the features that made UNIX strong as an operating system, the following can be mentioned:
- UNIX is simple. The number of system calls that it uses are reduced to only a couple of hundred and their design is basic
- Everything is regarded as a file in UNIX, making the manipulation of data and devices simpler, and it minimizes system calls used for interaction.
- Faster process creation time and the
fork()
system call. - The UNIX kernel and utilities written in C language as well as a property that makes it easily portable and accessible.
- Simple and robust interprocess communication (IPC) primitives helps in the creation of fast and simple programs that accomplish only one thing in the best available manner.
Linux is as an alternative solution to a UNIX variant called Minix, an operating system that was created for teaching purposes, but it lacked easy interaction with the system source code. Any changes made to the source code were not easily integrated and distributed because of Minix's license. Linus Torvalds first started working at a terminal emulator to connect to other UNIX systems from his university. Within the same academic year, emulator evolved in a full-fledged UNIX. He released it to be used by everyone in 1991.
One of the most attractive features of Linux is that it is an open source operating system whose source code is available under the GNU GPL license. When writing the Linux kernel, Linus Torvalds used the best design choices and features from the UNIX available in variations of the operating system kernel as a source of inspiration. Its license is what has propelled it into becoming the powerhouse it is today. It has engaged a large number of developers that helped with code enhancements, bug fixing, and much more.
Linux has become a truly collaborative project developed by a huge community over the internet. Although a great number of changes were made inside this project, Linus has remained its creator and maintainer. Change is a constant factor in everything around us and this applies to Linux and its maintainer, who is now called Greg Kroah-Hartman, and has already been its kernel maintainer for two years now. It may seem that in the period that Linus was around, the Linux kernel was a loose-knit community of developers. This may be because of Linus' harsh comments that are known worldwide. Since Greg has been appointed the kernel maintainer, this image started fading gradually. I am looking forward to the years to come.
With an impressive numbers of code lines, the Linux kernel is one of the most prominent open source projects and at the same time, the largest available one. The Linux kernel constitutes a piece of software that helps with the interfacing of hardware, being the lowest-level code available that runs in everyone's Linux operating system. It is used as an interface for other user space applications, as described in the following diagram:
The main roles of the Linux kernel are as follows:
- It provides a set of portable hardware and architecture APIs that offer user space applications the possibility to use necessary hardware resources
- It helps with the management of hardware resources, such as a CPU, input/output peripherals, and memory
- It is used for the management of concurrent accesses and the usage of necessary hardware resources by different applications.
To make sure that the preceding roles are well understood, an example will be very useful. Let's consider that in a given Linux operating system, a number of applications need access to the same resource, a network interface, or a device. For these elements, the kernel needs to multiplex a resource in order to make sure that all applications have access to it.
This section will introduce a number of features available inside the Linux kernel. It will also cover information about each of them, how they are used, what they represent, and any other relevant information regarding each specific functionality. The presentation of each feature familiarizes you with the main role of some of the features available inside the Linux kernel, as well as the Linux kernel and its source code in general.
On a more general note, some of the most valuable features that the Linux kernel has are as follows:
The preceding features does not constitute actual functionalities, but have helped the project along its development process and are still helping it today. Having said this, there are a lot of features that are implemented, such as fast user space mutex (futex), netfileters, Simplified Mandatory Access Control Kernel (smack), and so on. A complete list of these can be accessed and studied at http://en.wikipedia.org/wiki/Category:Linux_kernel_features.
When discussing the memory in Linux, we can refer to it as the physical and virtual memory. Compartments of the RAM memory are used for the containment of the Linux kernel variables and data structures, the rest of the memory being used for dynamic allocations, as described here:
_count
: This represents the page counter. When it reaches the0
value, the page is added to the free pages list.virtual
: This represents the virtual address associated to a physical page. The ZONE_DMA and ZONE_NORMAL pages are always mapped, while the ZONE_HIGHMEN are not always mapped.flags
: This represents a set of flags that describe the attributes of the page.
The zones of the physical memory have been previously. The physical memory is split up into multiple nodes that have a common physical address space and a fast local memory access. The smallest of them is ZONE_DMA between 0 to 16Mb. The next is ZONE_NORMAL, which is the LowMem area between 16Mb to 896Mb, and the largest one is ZONE_HIGHMEM, which is between 900Mb to 4GB/64Gb. This information can be visible both in the preceding and following images:
The virtual memory is used both in the user space and the kernel space. The allocation for a memory zone implies the allocation of a physical page as well as the allocation of an address space area; this is done both in the page table and in the internal structures available inside the operating system. The usage of the page table differs from one architecture type to another. For the Complex instruction set computing (CISC) architecture, the page table is used by the processor, but on a Reduced instruction set computing (RISC) architecture, the page table is used by the core for a page lookup and translation lookaside buffer (TLB) add operations. Each zone descriptor is used for zone mapping. It specifies whether the zone is mapped for usage by a file if the zone is read-only, copy-on-write, and so on. The address space descriptor is used by the operating system to maintain high-level information.
The methods used by the kernel for memory handling is the first subject that will be discussed here. This is done to make sure that you understand the methods used by the kernel to obtain memory. Although the smallest addressable unit of a processor is a byte, the Memory Management Unit (MMU), the unit responsible for virtual to physical translation the smallest addressable unit is the page. A page's size varies from one architecture to another. It is responsible for maintaining the system's page tables. Most of 32-bit architectures use 4KB pages, whereas the 64-bit ones usually have 8KB pages. For the Atmel SAMA5D3-Xplained board, the definition of the struct page
structure is as follows:
This is one of the most important fields of the page structure. The flags
field, for example, represents the status of the page; this holds information, such as whether the page is dirty or not, locked, or in another valid state. The values that are associated with this flag are defined inside the include/linux/page-flags-layout.h
header file. The virtual
field represents the virtual address associated with the page, count
represents the count value for the page that is usually accessible indirectly through the page_count()
function. All the other fields can be accessed inside the include/linux/mm_types.h
header file.
There are allocations that require interaction with more than one zone. One such example is a normal allocation that is able to use either ZONE_DMA
or ZONE_NORMAL
. ZONE_NORMAL
is preferred because it does not interfere with direct memory accesses, though when the memory is at full usage, the kernel might use other available zones besides the ones that it uses in normal scenarios. The kernel that is available is a struct zone structure that defines each zone's relevant information. For the Atmel SAMA5D3-Xplained board, this structure is as shown here:
As you can see, the zone that defines the structure is an impressive one. Some of the most interesting fields are represented by the watermark
variable, which contain the high, medium, and low watermarks for the defined zone. The present_pages
attribute represents the available pages within the zone. The name
field represents the name of the zone, and others, such as the lock
field, a spin lock that shields the zone structure for simultaneous access. All the other fields that can be identified inside the corresponding include/linux/mmzone.h
header file for the Atmel SAMA5D3 Xplained board.
This function is used to get the logical address for a corresponding memory page:
The preceding function does what the name suggests. It returns the page full of zero
values. The difference between this function and the __get_free_page()
function is that after being released, the page is filled with zero
values:
The preceding functions are used for freeing the given allocated pages. The passing of the pages should be done with care because the kernel is not able to check the information it is provided.
Usually the disk is slower than the physical memory, so this is one of the reasons that memory is preferred over disk storage. The same applies for processor's cache levels: the closer it resides to the processor the faster it is for the I/O access. The process that moves data from the disk into the physical memory is called page caching. The inverse process is defined as page writeback. These two notions will be presented in this subsection, but is it mainly about the kernel context.
The first time the kernel calls the read()
system call, the data is verified if it is present in the page cache. The process by which the page is found inside the RAM is called cache hit. If it is not available there, then data needs to be read from the disk and this process is called
cache miss.
When the kernel issues the write() system call, there are multiple possibilities for cache interaction with regard to this system call. The easiest one is to not cache the write system calls operations and only keep the data in the disk. This scenario is called no-write cache. When the write operation updates the physical memory and the disk data at the same time, the operation is called write-through cache. The third option is represented by write-back cache where the page is marked as dirty. It is added to the dirty list and over time, it is put on the disk and marked as not dirty. The best synonym for the dirty keyword is represented by the synchronized key word.
Besides its own physical memory, the kernel is also responsible for user space process and memory management. The memory allocated for each user space process is called process address space and it contains the virtual memory addressable by a given process. It also contains the related addresses used by the process in its interaction with the virtual memory.
Usually a process receives a flat 32 or 64-bit address space, its size being dependent on the architecture type. However, there are operating systems that allocate a segmented address space. The possibility of sharing the address space between the operating systems is offered to threads. Although a process can access a large memory space, it usually has permission to access only an interval of memory. This is called a memory area and it means that a process can only access a memory address situated inside a viable memory area. If it somehow tries to administrate a memory address outside of its valid memory area, the kernel will kill the process with the Segmentation fault notification.
A memory area contains the following:
- The
text
section maps source code - The
data
section maps initialized global variables - The
bss
section maps uninitialized global variables - The
zero page
section is used to process user space stack - The
shared libraries text
,bss
and data-specific sections - Mapped files
- Anonymous memory mapping is usually linked with functions, such as
malloc()
- Shared memory segments
A process address space is defined inside the Linux kernel source through a memory descriptor. This structure is called struct mm_struct
, which is defined inside the include/linux/mm_types.h
header file and contains information relevant for a process address space, such as the number of processes that use the address space, a list of memory areas, the last memory area that was used, the number of memory areas available, start and finish addresses for the code, data, heap and stack sections.
A process, as presented previously, is a fundamental unit in a Linux operating system and at the same time, is a form of abstraction. It is, in fact, a program in execution, but a program by itself is not a process. It needs to be in an active state and have associated resources. A process is able to become a parent by using the fork()
function, which spawns a child process. Both parent and child processes reside in separate address spaces, but both of them have the same content. The exec()
family of function is the one that is able to execute a different program, create an address space, and load it inside that address space.
- Calls the
dup_task_struct()
function to create a new kernel stack. Thetask_struct
andthread_info
structures are created for a new process. - Checks that the child does not go beyond the limits of the memory area.
- The child process distinguishes itself from its parent.
- It is set as
TASK_UNINTERRUPTIBLE
to make sure it does not run. - Flags are updated.
PID
is associated with the child process.- The flags that are already set are inspected and proper action is performed with respect to their values.
- The clean process is performed at the end when the child process pointer is obtained.
At the end of the execution, the process need to be terminated so that the resources can be freed, and the parent of the executing process needs to be notified about this. The method that is most used to terminate a process is done by calling the exit()
system call. A number of steps are needed for this process:
- The
PF_EXITING
flag is set. - The
del_timer_sync()
function is called to remove the kernel timers. - The
acct_update_integrals()
function is called when writing accounting and logging information. - The
exit_mm()
is called to release themm_struct
structure for the process. - The
exit_sem()
is called to dequeue the process from the IPC semaphore. - The
exit_files()
andexit_fs()
function are called to remove the links to various files descriptors. - The task exit code should be set.
- Call
exit_notify()
to notify the parent and set the task exit state toEXIT_ZOMBIE
. - Call
schedule()
to switch to a new process.
The process scheduler decides which resources are allocated for a runnable process. It is a piece of software that is responsible for multitasking, resource allocation to various processes, and decides how to best set the resources and processor time. it also decides which processes should run next.
The first design of the Linux scheduler was very simplistic. It was not able to scale properly when the number of processes increased, so from the 2.5 kernel version, a new scheduler was developed. It is called O(1) scheduler and offers a constant time algorithm for time slice calculation and a run queue that is defined on a per-processor basis. Although it is perfect for large servers, it is not the best solution for a normal desktop system. From the 2.6 kernel version, improvements have been made to the O(1) scheduler, such as the fair scheduling concept that later materialized from the kernel version 2.6.23 into the Completely Fair Scheduler (CFS), which became the defacto scheduler.
The CFC has a simple idea behind. It behaves as if we have a perfect multitasking processor where each process gets 1/n
slice of the processor's time and this time slice is an incredibly small. The n
value represents the number of running processes. Con Kolivas is the Australian programmer that contributed to the fair scheduling implementation, also known as Rotating Staircase Deadline Scheduler (RSDL). Its implementation required a red-black tree for the priorities of self-balancing and also a time slice that is calculated at the nanosecond level. Similarly to the O(1) scheduler, CFS applies the notion of weight, which implies that some processes wait more than others. This is based on the weighed fair queuing algorithm.
For processes to interact with a system, an interface should be provided to give the user space application the possibility of interacting with hardware and other processes.System
calls. These are used as an interface between the hardware and the user space. They are also used to ensure stability, security, and abstraction, in general. These are common layers that constitute an entry point into the kernel alongside traps and exceptions, as described here:
The interaction with most of the system calls that are available inside the Linux system is done using the C library. They are able to define a number of arguments and return a value that reveals whether they were successful or not. A value of zero
usually means that the execution ended with success, and in case errors appear, an error code will be available inside the errno
variable. When a system call is done, the following steps are followed:
- The switch into kernel mode is made.
- Any restrictions to the kernel space access are eliminated.
- The stack from the user space is passed into the kernel space.
- Any arguments from the user space are checked and copied into the kernel space.
- The associated routine for the system call is identified and run.
- The switch to the user space is made and the execution of the application continues.
The Linux operating system is able to support a large variety of filesystem options. This is done due to the existence of Virtual File System (VFS), which is able to provide a common interface for a large number of filesystem types and handle the systems calls relevant to them.
The filesystem types supported by the VFS can be put in these three categories:
- Disk-based filesystems: These manage the memory on a local disk or devices that are used for disk emulation. Some of the most well known ones are:
- Linux filesystems, such as Second Extended Filesystem (Ext2), Third Extended Filesystem (Ext3), and Forth Extended Filesystem (Ext4)
- UNIX filesystems, such as sysv filesystem, UFS, Minix filesystem, and so on
- Microsoft filesystems, such as MS-DOS, NTFS (available since Windows NT), and VFAT (available since Windows 95)
- ISO966 CD-ROM filesystem and disk format DVD filesystem
- Proprietary filesystems, such as the ones from Apple, IBM, and other companies
- Network filesystems: They are allowed to access various filesystem types over a network on other computers. One of the most well known ones is NFS. Of course, there are others but they are not as well known. These include Andrew filesystem (AFS), Novel's NetWare Core Protocol (NCP), Constant Data Availability (Coda), and so on.
- Special filesystems: The
/proc
filesystem is the perfect example for this category of filesystems. This category of filesystems enables an easier access for system applications to interrogate data structures of kernels and implement various features.
The virtual filesystem system call implementation is very well summarized in this image:
In the preceding image, it can be seen how easily the copy is handled from one filesystem type to another. It only uses the basic open()
, close()
, read()
, write()
functions available for all the other filesystem interaction. However, all of them implement the specific functionality underneath for the chosen filesystem. For example, the open()
system calls sys_open()
and it takes the same arguments as open()
and returns the same result. The difference between sys_open()
and open()
is that sys_open()
is a more permissive function.
discussing the memory in Linux, we can refer to it as the physical and virtual memory. Compartments of the RAM memory are used for the containment of the Linux kernel variables and data structures, the rest of the memory being used for dynamic allocations, as described here:
_count
: This represents the page counter. When it reaches the0
value, the page is added to the free pages list.virtual
: This represents the virtual address associated to a physical page. The ZONE_DMA and ZONE_NORMAL pages are always mapped, while the ZONE_HIGHMEN are not always mapped.flags
: This represents a set of flags that describe the attributes of the page.
The zones of the physical memory have been previously. The physical memory is split up into multiple nodes that have a common physical address space and a fast local memory access. The smallest of them is ZONE_DMA between 0 to 16Mb. The next is ZONE_NORMAL, which is the LowMem area between 16Mb to 896Mb, and the largest one is ZONE_HIGHMEM, which is between 900Mb to 4GB/64Gb. This information can be visible both in the preceding and following images:
The virtual memory is used both in the user space and the kernel space. The allocation for a memory zone implies the allocation of a physical page as well as the allocation of an address space area; this is done both in the page table and in the internal structures available inside the operating system. The usage of the page table differs from one architecture type to another. For the Complex instruction set computing (CISC) architecture, the page table is used by the processor, but on a Reduced instruction set computing (RISC) architecture, the page table is used by the core for a page lookup and translation lookaside buffer (TLB) add operations. Each zone descriptor is used for zone mapping. It specifies whether the zone is mapped for usage by a file if the zone is read-only, copy-on-write, and so on. The address space descriptor is used by the operating system to maintain high-level information.
The methods used by the kernel for memory handling is the first subject that will be discussed here. This is done to make sure that you understand the methods used by the kernel to obtain memory. Although the smallest addressable unit of a processor is a byte, the Memory Management Unit (MMU), the unit responsible for virtual to physical translation the smallest addressable unit is the page. A page's size varies from one architecture to another. It is responsible for maintaining the system's page tables. Most of 32-bit architectures use 4KB pages, whereas the 64-bit ones usually have 8KB pages. For the Atmel SAMA5D3-Xplained board, the definition of the struct page
structure is as follows:
This is one of the most important fields of the page structure. The flags
field, for example, represents the status of the page; this holds information, such as whether the page is dirty or not, locked, or in another valid state. The values that are associated with this flag are defined inside the include/linux/page-flags-layout.h
header file. The virtual
field represents the virtual address associated with the page, count
represents the count value for the page that is usually accessible indirectly through the page_count()
function. All the other fields can be accessed inside the include/linux/mm_types.h
header file.
There are allocations that require interaction with more than one zone. One such example is a normal allocation that is able to use either ZONE_DMA
or ZONE_NORMAL
. ZONE_NORMAL
is preferred because it does not interfere with direct memory accesses, though when the memory is at full usage, the kernel might use other available zones besides the ones that it uses in normal scenarios. The kernel that is available is a struct zone structure that defines each zone's relevant information. For the Atmel SAMA5D3-Xplained board, this structure is as shown here:
As you can see, the zone that defines the structure is an impressive one. Some of the most interesting fields are represented by the watermark
variable, which contain the high, medium, and low watermarks for the defined zone. The present_pages
attribute represents the available pages within the zone. The name
field represents the name of the zone, and others, such as the lock
field, a spin lock that shields the zone structure for simultaneous access. All the other fields that can be identified inside the corresponding include/linux/mmzone.h
header file for the Atmel SAMA5D3 Xplained board.
This function is used to get the logical address for a corresponding memory page:
The preceding function does what the name suggests. It returns the page full of zero
values. The difference between this function and the __get_free_page()
function is that after being released, the page is filled with zero
values:
The preceding functions are used for freeing the given allocated pages. The passing of the pages should be done with care because the kernel is not able to check the information it is provided.
Usually the disk is slower than the physical memory, so this is one of the reasons that memory is preferred over disk storage. The same applies for processor's cache levels: the closer it resides to the processor the faster it is for the I/O access. The process that moves data from the disk into the physical memory is called page caching. The inverse process is defined as page writeback. These two notions will be presented in this subsection, but is it mainly about the kernel context.
The first time the kernel calls the read()
system call, the data is verified if it is present in the page cache. The process by which the page is found inside the RAM is called cache hit. If it is not available there, then data needs to be read from the disk and this process is called
cache miss.
When the kernel issues the write() system call, there are multiple possibilities for cache interaction with regard to this system call. The easiest one is to not cache the write system calls operations and only keep the data in the disk. This scenario is called no-write cache. When the write operation updates the physical memory and the disk data at the same time, the operation is called write-through cache. The third option is represented by write-back cache where the page is marked as dirty. It is added to the dirty list and over time, it is put on the disk and marked as not dirty. The best synonym for the dirty keyword is represented by the synchronized key word.
Besides its own physical memory, the kernel is also responsible for user space process and memory management. The memory allocated for each user space process is called process address space and it contains the virtual memory addressable by a given process. It also contains the related addresses used by the process in its interaction with the virtual memory.
Usually a process receives a flat 32 or 64-bit address space, its size being dependent on the architecture type. However, there are operating systems that allocate a segmented address space. The possibility of sharing the address space between the operating systems is offered to threads. Although a process can access a large memory space, it usually has permission to access only an interval of memory. This is called a memory area and it means that a process can only access a memory address situated inside a viable memory area. If it somehow tries to administrate a memory address outside of its valid memory area, the kernel will kill the process with the Segmentation fault notification.
A memory area contains the following:
- The
text
section maps source code - The
data
section maps initialized global variables - The
bss
section maps uninitialized global variables - The
zero page
section is used to process user space stack - The
shared libraries text
,bss
and data-specific sections - Mapped files
- Anonymous memory mapping is usually linked with functions, such as
malloc()
- Shared memory segments
A process address space is defined inside the Linux kernel source through a memory descriptor. This structure is called struct mm_struct
, which is defined inside the include/linux/mm_types.h
header file and contains information relevant for a process address space, such as the number of processes that use the address space, a list of memory areas, the last memory area that was used, the number of memory areas available, start and finish addresses for the code, data, heap and stack sections.
A process, as presented previously, is a fundamental unit in a Linux operating system and at the same time, is a form of abstraction. It is, in fact, a program in execution, but a program by itself is not a process. It needs to be in an active state and have associated resources. A process is able to become a parent by using the fork()
function, which spawns a child process. Both parent and child processes reside in separate address spaces, but both of them have the same content. The exec()
family of function is the one that is able to execute a different program, create an address space, and load it inside that address space.
- Calls the
dup_task_struct()
function to create a new kernel stack. Thetask_struct
andthread_info
structures are created for a new process. - Checks that the child does not go beyond the limits of the memory area.
- The child process distinguishes itself from its parent.
- It is set as
TASK_UNINTERRUPTIBLE
to make sure it does not run. - Flags are updated.
PID
is associated with the child process.- The flags that are already set are inspected and proper action is performed with respect to their values.
- The clean process is performed at the end when the child process pointer is obtained.
At the end of the execution, the process need to be terminated so that the resources can be freed, and the parent of the executing process needs to be notified about this. The method that is most used to terminate a process is done by calling the exit()
system call. A number of steps are needed for this process:
- The
PF_EXITING
flag is set. - The
del_timer_sync()
function is called to remove the kernel timers. - The
acct_update_integrals()
function is called when writing accounting and logging information. - The
exit_mm()
is called to release themm_struct
structure for the process. - The
exit_sem()
is called to dequeue the process from the IPC semaphore. - The
exit_files()
andexit_fs()
function are called to remove the links to various files descriptors. - The task exit code should be set.
- Call
exit_notify()
to notify the parent and set the task exit state toEXIT_ZOMBIE
. - Call
schedule()
to switch to a new process.
The process scheduler decides which resources are allocated for a runnable process. It is a piece of software that is responsible for multitasking, resource allocation to various processes, and decides how to best set the resources and processor time. it also decides which processes should run next.
The first design of the Linux scheduler was very simplistic. It was not able to scale properly when the number of processes increased, so from the 2.5 kernel version, a new scheduler was developed. It is called O(1) scheduler and offers a constant time algorithm for time slice calculation and a run queue that is defined on a per-processor basis. Although it is perfect for large servers, it is not the best solution for a normal desktop system. From the 2.6 kernel version, improvements have been made to the O(1) scheduler, such as the fair scheduling concept that later materialized from the kernel version 2.6.23 into the Completely Fair Scheduler (CFS), which became the defacto scheduler.
The CFC has a simple idea behind. It behaves as if we have a perfect multitasking processor where each process gets 1/n
slice of the processor's time and this time slice is an incredibly small. The n
value represents the number of running processes. Con Kolivas is the Australian programmer that contributed to the fair scheduling implementation, also known as Rotating Staircase Deadline Scheduler (RSDL). Its implementation required a red-black tree for the priorities of self-balancing and also a time slice that is calculated at the nanosecond level. Similarly to the O(1) scheduler, CFS applies the notion of weight, which implies that some processes wait more than others. This is based on the weighed fair queuing algorithm.
For processes to interact with a system, an interface should be provided to give the user space application the possibility of interacting with hardware and other processes.System
calls. These are used as an interface between the hardware and the user space. They are also used to ensure stability, security, and abstraction, in general. These are common layers that constitute an entry point into the kernel alongside traps and exceptions, as described here:
The interaction with most of the system calls that are available inside the Linux system is done using the C library. They are able to define a number of arguments and return a value that reveals whether they were successful or not. A value of zero
usually means that the execution ended with success, and in case errors appear, an error code will be available inside the errno
variable. When a system call is done, the following steps are followed:
- The switch into kernel mode is made.
- Any restrictions to the kernel space access are eliminated.
- The stack from the user space is passed into the kernel space.
- Any arguments from the user space are checked and copied into the kernel space.
- The associated routine for the system call is identified and run.
- The switch to the user space is made and the execution of the application continues.
The Linux operating system is able to support a large variety of filesystem options. This is done due to the existence of Virtual File System (VFS), which is able to provide a common interface for a large number of filesystem types and handle the systems calls relevant to them.
The filesystem types supported by the VFS can be put in these three categories:
- Disk-based filesystems: These manage the memory on a local disk or devices that are used for disk emulation. Some of the most well known ones are:
- Linux filesystems, such as Second Extended Filesystem (Ext2), Third Extended Filesystem (Ext3), and Forth Extended Filesystem (Ext4)
- UNIX filesystems, such as sysv filesystem, UFS, Minix filesystem, and so on
- Microsoft filesystems, such as MS-DOS, NTFS (available since Windows NT), and VFAT (available since Windows 95)
- ISO966 CD-ROM filesystem and disk format DVD filesystem
- Proprietary filesystems, such as the ones from Apple, IBM, and other companies
- Network filesystems: They are allowed to access various filesystem types over a network on other computers. One of the most well known ones is NFS. Of course, there are others but they are not as well known. These include Andrew filesystem (AFS), Novel's NetWare Core Protocol (NCP), Constant Data Availability (Coda), and so on.
- Special filesystems: The
/proc
filesystem is the perfect example for this category of filesystems. This category of filesystems enables an easier access for system applications to interrogate data structures of kernels and implement various features.
The virtual filesystem system call implementation is very well summarized in this image:
In the preceding image, it can be seen how easily the copy is handled from one filesystem type to another. It only uses the basic open()
, close()
, read()
, write()
functions available for all the other filesystem interaction. However, all of them implement the specific functionality underneath for the chosen filesystem. For example, the open()
system calls sys_open()
and it takes the same arguments as open()
and returns the same result. The difference between sys_open()
and open()
is that sys_open()
is a more permissive function.
process is defined as page writeback. These two notions will be presented in this subsection, but is it mainly about the kernel context.
The first time the kernel calls the read()
system call, the data is verified if it is present in the page cache. The process by which the page is found inside the RAM is called cache hit. If it is not available there, then data needs to be read from the disk and this process is called
cache miss.
When the kernel issues the write() system call, there are multiple possibilities for cache interaction with regard to this system call. The easiest one is to not cache the write system calls operations and only keep the data in the disk. This scenario is called no-write cache. When the write operation updates the physical memory and the disk data at the same time, the operation is called write-through cache. The third option is represented by write-back cache where the page is marked as dirty. It is added to the dirty list and over time, it is put on the disk and marked as not dirty. The best synonym for the dirty keyword is represented by the synchronized key word.
Besides its own physical memory, the kernel is also responsible for user space process and memory management. The memory allocated for each user space process is called process address space and it contains the virtual memory addressable by a given process. It also contains the related addresses used by the process in its interaction with the virtual memory.
Usually a process receives a flat 32 or 64-bit address space, its size being dependent on the architecture type. However, there are operating systems that allocate a segmented address space. The possibility of sharing the address space between the operating systems is offered to threads. Although a process can access a large memory space, it usually has permission to access only an interval of memory. This is called a memory area and it means that a process can only access a memory address situated inside a viable memory area. If it somehow tries to administrate a memory address outside of its valid memory area, the kernel will kill the process with the Segmentation fault notification.
A memory area contains the following:
- The
text
section maps source code - The
data
section maps initialized global variables - The
bss
section maps uninitialized global variables - The
zero page
section is used to process user space stack - The
shared libraries text
,bss
and data-specific sections - Mapped files
- Anonymous memory mapping is usually linked with functions, such as
malloc()
- Shared memory segments
A process address space is defined inside the Linux kernel source through a memory descriptor. This structure is called struct mm_struct
, which is defined inside the include/linux/mm_types.h
header file and contains information relevant for a process address space, such as the number of processes that use the address space, a list of memory areas, the last memory area that was used, the number of memory areas available, start and finish addresses for the code, data, heap and stack sections.
A process, as presented previously, is a fundamental unit in a Linux operating system and at the same time, is a form of abstraction. It is, in fact, a program in execution, but a program by itself is not a process. It needs to be in an active state and have associated resources. A process is able to become a parent by using the fork()
function, which spawns a child process. Both parent and child processes reside in separate address spaces, but both of them have the same content. The exec()
family of function is the one that is able to execute a different program, create an address space, and load it inside that address space.
- Calls the
dup_task_struct()
function to create a new kernel stack. Thetask_struct
andthread_info
structures are created for a new process. - Checks that the child does not go beyond the limits of the memory area.
- The child process distinguishes itself from its parent.
- It is set as
TASK_UNINTERRUPTIBLE
to make sure it does not run. - Flags are updated.
PID
is associated with the child process.- The flags that are already set are inspected and proper action is performed with respect to their values.
- The clean process is performed at the end when the child process pointer is obtained.
At the end of the execution, the process need to be terminated so that the resources can be freed, and the parent of the executing process needs to be notified about this. The method that is most used to terminate a process is done by calling the exit()
system call. A number of steps are needed for this process:
- The
PF_EXITING
flag is set. - The
del_timer_sync()
function is called to remove the kernel timers. - The
acct_update_integrals()
function is called when writing accounting and logging information. - The
exit_mm()
is called to release themm_struct
structure for the process. - The
exit_sem()
is called to dequeue the process from the IPC semaphore. - The
exit_files()
andexit_fs()
function are called to remove the links to various files descriptors. - The task exit code should be set.
- Call
exit_notify()
to notify the parent and set the task exit state toEXIT_ZOMBIE
. - Call
schedule()
to switch to a new process.
The process scheduler decides which resources are allocated for a runnable process. It is a piece of software that is responsible for multitasking, resource allocation to various processes, and decides how to best set the resources and processor time. it also decides which processes should run next.
The first design of the Linux scheduler was very simplistic. It was not able to scale properly when the number of processes increased, so from the 2.5 kernel version, a new scheduler was developed. It is called O(1) scheduler and offers a constant time algorithm for time slice calculation and a run queue that is defined on a per-processor basis. Although it is perfect for large servers, it is not the best solution for a normal desktop system. From the 2.6 kernel version, improvements have been made to the O(1) scheduler, such as the fair scheduling concept that later materialized from the kernel version 2.6.23 into the Completely Fair Scheduler (CFS), which became the defacto scheduler.
The CFC has a simple idea behind. It behaves as if we have a perfect multitasking processor where each process gets 1/n
slice of the processor's time and this time slice is an incredibly small. The n
value represents the number of running processes. Con Kolivas is the Australian programmer that contributed to the fair scheduling implementation, also known as Rotating Staircase Deadline Scheduler (RSDL). Its implementation required a red-black tree for the priorities of self-balancing and also a time slice that is calculated at the nanosecond level. Similarly to the O(1) scheduler, CFS applies the notion of weight, which implies that some processes wait more than others. This is based on the weighed fair queuing algorithm.
For processes to interact with a system, an interface should be provided to give the user space application the possibility of interacting with hardware and other processes.System
calls. These are used as an interface between the hardware and the user space. They are also used to ensure stability, security, and abstraction, in general. These are common layers that constitute an entry point into the kernel alongside traps and exceptions, as described here:
The interaction with most of the system calls that are available inside the Linux system is done using the C library. They are able to define a number of arguments and return a value that reveals whether they were successful or not. A value of zero
usually means that the execution ended with success, and in case errors appear, an error code will be available inside the errno
variable. When a system call is done, the following steps are followed:
- The switch into kernel mode is made.
- Any restrictions to the kernel space access are eliminated.
- The stack from the user space is passed into the kernel space.
- Any arguments from the user space are checked and copied into the kernel space.
- The associated routine for the system call is identified and run.
- The switch to the user space is made and the execution of the application continues.
The Linux operating system is able to support a large variety of filesystem options. This is done due to the existence of Virtual File System (VFS), which is able to provide a common interface for a large number of filesystem types and handle the systems calls relevant to them.
The filesystem types supported by the VFS can be put in these three categories:
- Disk-based filesystems: These manage the memory on a local disk or devices that are used for disk emulation. Some of the most well known ones are:
- Linux filesystems, such as Second Extended Filesystem (Ext2), Third Extended Filesystem (Ext3), and Forth Extended Filesystem (Ext4)
- UNIX filesystems, such as sysv filesystem, UFS, Minix filesystem, and so on
- Microsoft filesystems, such as MS-DOS, NTFS (available since Windows NT), and VFAT (available since Windows 95)
- ISO966 CD-ROM filesystem and disk format DVD filesystem
- Proprietary filesystems, such as the ones from Apple, IBM, and other companies
- Network filesystems: They are allowed to access various filesystem types over a network on other computers. One of the most well known ones is NFS. Of course, there are others but they are not as well known. These include Andrew filesystem (AFS), Novel's NetWare Core Protocol (NCP), Constant Data Availability (Coda), and so on.
- Special filesystems: The
/proc
filesystem is the perfect example for this category of filesystems. This category of filesystems enables an easier access for system applications to interrogate data structures of kernels and implement various features.
The virtual filesystem system call implementation is very well summarized in this image:
In the preceding image, it can be seen how easily the copy is handled from one filesystem type to another. It only uses the basic open()
, close()
, read()
, write()
functions available for all the other filesystem interaction. However, all of them implement the specific functionality underneath for the chosen filesystem. For example, the open()
system calls sys_open()
and it takes the same arguments as open()
and returns the same result. The difference between sys_open()
and open()
is that sys_open()
is a more permissive function.
own physical memory, the kernel is also responsible for user space process and memory management. The memory allocated for each user space process is called process address space and it contains the virtual memory addressable by a given process. It also contains the related addresses used by the process in its interaction with the virtual memory.
Usually a process receives a flat 32 or 64-bit address space, its size being dependent on the architecture type. However, there are operating systems that allocate a segmented address space. The possibility of sharing the address space between the operating systems is offered to threads. Although a process can access a large memory space, it usually has permission to access only an interval of memory. This is called a memory area and it means that a process can only access a memory address situated inside a viable memory area. If it somehow tries to administrate a memory address outside of its valid memory area, the kernel will kill the process with the Segmentation fault notification.
A memory area contains the following:
- The
text
section maps source code - The
data
section maps initialized global variables - The
bss
section maps uninitialized global variables - The
zero page
section is used to process user space stack - The
shared libraries text
,bss
and data-specific sections - Mapped files
- Anonymous memory mapping is usually linked with functions, such as
malloc()
- Shared memory segments
A process address space is defined inside the Linux kernel source through a memory descriptor. This structure is called struct mm_struct
, which is defined inside the include/linux/mm_types.h
header file and contains information relevant for a process address space, such as the number of processes that use the address space, a list of memory areas, the last memory area that was used, the number of memory areas available, start and finish addresses for the code, data, heap and stack sections.
A process, as presented previously, is a fundamental unit in a Linux operating system and at the same time, is a form of abstraction. It is, in fact, a program in execution, but a program by itself is not a process. It needs to be in an active state and have associated resources. A process is able to become a parent by using the fork()
function, which spawns a child process. Both parent and child processes reside in separate address spaces, but both of them have the same content. The exec()
family of function is the one that is able to execute a different program, create an address space, and load it inside that address space.
- Calls the
dup_task_struct()
function to create a new kernel stack. Thetask_struct
andthread_info
structures are created for a new process. - Checks that the child does not go beyond the limits of the memory area.
- The child process distinguishes itself from its parent.
- It is set as
TASK_UNINTERRUPTIBLE
to make sure it does not run. - Flags are updated.
PID
is associated with the child process.- The flags that are already set are inspected and proper action is performed with respect to their values.
- The clean process is performed at the end when the child process pointer is obtained.
At the end of the execution, the process need to be terminated so that the resources can be freed, and the parent of the executing process needs to be notified about this. The method that is most used to terminate a process is done by calling the exit()
system call. A number of steps are needed for this process:
- The
PF_EXITING
flag is set. - The
del_timer_sync()
function is called to remove the kernel timers. - The
acct_update_integrals()
function is called when writing accounting and logging information. - The
exit_mm()
is called to release themm_struct
structure for the process. - The
exit_sem()
is called to dequeue the process from the IPC semaphore. - The
exit_files()
andexit_fs()
function are called to remove the links to various files descriptors. - The task exit code should be set.
- Call
exit_notify()
to notify the parent and set the task exit state toEXIT_ZOMBIE
. - Call
schedule()
to switch to a new process.
The process scheduler decides which resources are allocated for a runnable process. It is a piece of software that is responsible for multitasking, resource allocation to various processes, and decides how to best set the resources and processor time. it also decides which processes should run next.
The first design of the Linux scheduler was very simplistic. It was not able to scale properly when the number of processes increased, so from the 2.5 kernel version, a new scheduler was developed. It is called O(1) scheduler and offers a constant time algorithm for time slice calculation and a run queue that is defined on a per-processor basis. Although it is perfect for large servers, it is not the best solution for a normal desktop system. From the 2.6 kernel version, improvements have been made to the O(1) scheduler, such as the fair scheduling concept that later materialized from the kernel version 2.6.23 into the Completely Fair Scheduler (CFS), which became the defacto scheduler.
The CFC has a simple idea behind. It behaves as if we have a perfect multitasking processor where each process gets 1/n
slice of the processor's time and this time slice is an incredibly small. The n
value represents the number of running processes. Con Kolivas is the Australian programmer that contributed to the fair scheduling implementation, also known as Rotating Staircase Deadline Scheduler (RSDL). Its implementation required a red-black tree for the priorities of self-balancing and also a time slice that is calculated at the nanosecond level. Similarly to the O(1) scheduler, CFS applies the notion of weight, which implies that some processes wait more than others. This is based on the weighed fair queuing algorithm.
For processes to interact with a system, an interface should be provided to give the user space application the possibility of interacting with hardware and other processes.System
calls. These are used as an interface between the hardware and the user space. They are also used to ensure stability, security, and abstraction, in general. These are common layers that constitute an entry point into the kernel alongside traps and exceptions, as described here:
The interaction with most of the system calls that are available inside the Linux system is done using the C library. They are able to define a number of arguments and return a value that reveals whether they were successful or not. A value of zero
usually means that the execution ended with success, and in case errors appear, an error code will be available inside the errno
variable. When a system call is done, the following steps are followed:
- The switch into kernel mode is made.
- Any restrictions to the kernel space access are eliminated.
- The stack from the user space is passed into the kernel space.
- Any arguments from the user space are checked and copied into the kernel space.
- The associated routine for the system call is identified and run.
- The switch to the user space is made and the execution of the application continues.
The Linux operating system is able to support a large variety of filesystem options. This is done due to the existence of Virtual File System (VFS), which is able to provide a common interface for a large number of filesystem types and handle the systems calls relevant to them.
The filesystem types supported by the VFS can be put in these three categories:
- Disk-based filesystems: These manage the memory on a local disk or devices that are used for disk emulation. Some of the most well known ones are:
- Linux filesystems, such as Second Extended Filesystem (Ext2), Third Extended Filesystem (Ext3), and Forth Extended Filesystem (Ext4)
- UNIX filesystems, such as sysv filesystem, UFS, Minix filesystem, and so on
- Microsoft filesystems, such as MS-DOS, NTFS (available since Windows NT), and VFAT (available since Windows 95)
- ISO966 CD-ROM filesystem and disk format DVD filesystem
- Proprietary filesystems, such as the ones from Apple, IBM, and other companies
- Network filesystems: They are allowed to access various filesystem types over a network on other computers. One of the most well known ones is NFS. Of course, there are others but they are not as well known. These include Andrew filesystem (AFS), Novel's NetWare Core Protocol (NCP), Constant Data Availability (Coda), and so on.
- Special filesystems: The
/proc
filesystem is the perfect example for this category of filesystems. This category of filesystems enables an easier access for system applications to interrogate data structures of kernels and implement various features.
The virtual filesystem system call implementation is very well summarized in this image:
In the preceding image, it can be seen how easily the copy is handled from one filesystem type to another. It only uses the basic open()
, close()
, read()
, write()
functions available for all the other filesystem interaction. However, all of them implement the specific functionality underneath for the chosen filesystem. For example, the open()
system calls sys_open()
and it takes the same arguments as open()
and returns the same result. The difference between sys_open()
and open()
is that sys_open()
is a more permissive function.
presented previously, is a fundamental unit in a Linux operating system and at the same time, is a form of abstraction. It is, in fact, a program in execution, but a program by itself is not a process. It needs to be in an active state and have associated resources. A process is able to become a parent by using the fork()
function, which spawns a child process. Both parent and child processes reside in separate address spaces, but both of them have the same content. The exec()
family of function is the one that is able to execute a different program, create an address space, and load it inside that address space.
- Calls the
dup_task_struct()
function to create a new kernel stack. Thetask_struct
andthread_info
structures are created for a new process. - Checks that the child does not go beyond the limits of the memory area.
- The child process distinguishes itself from its parent.
- It is set as
TASK_UNINTERRUPTIBLE
to make sure it does not run. - Flags are updated.
PID
is associated with the child process.- The flags that are already set are inspected and proper action is performed with respect to their values.
- The clean process is performed at the end when the child process pointer is obtained.
At the end of the execution, the process need to be terminated so that the resources can be freed, and the parent of the executing process needs to be notified about this. The method that is most used to terminate a process is done by calling the exit()
system call. A number of steps are needed for this process:
- The
PF_EXITING
flag is set. - The
del_timer_sync()
function is called to remove the kernel timers. - The
acct_update_integrals()
function is called when writing accounting and logging information. - The
exit_mm()
is called to release themm_struct
structure for the process. - The
exit_sem()
is called to dequeue the process from the IPC semaphore. - The
exit_files()
andexit_fs()
function are called to remove the links to various files descriptors. - The task exit code should be set.
- Call
exit_notify()
to notify the parent and set the task exit state toEXIT_ZOMBIE
. - Call
schedule()
to switch to a new process.
The process scheduler decides which resources are allocated for a runnable process. It is a piece of software that is responsible for multitasking, resource allocation to various processes, and decides how to best set the resources and processor time. it also decides which processes should run next.
The first design of the Linux scheduler was very simplistic. It was not able to scale properly when the number of processes increased, so from the 2.5 kernel version, a new scheduler was developed. It is called O(1) scheduler and offers a constant time algorithm for time slice calculation and a run queue that is defined on a per-processor basis. Although it is perfect for large servers, it is not the best solution for a normal desktop system. From the 2.6 kernel version, improvements have been made to the O(1) scheduler, such as the fair scheduling concept that later materialized from the kernel version 2.6.23 into the Completely Fair Scheduler (CFS), which became the defacto scheduler.
The CFC has a simple idea behind. It behaves as if we have a perfect multitasking processor where each process gets 1/n
slice of the processor's time and this time slice is an incredibly small. The n
value represents the number of running processes. Con Kolivas is the Australian programmer that contributed to the fair scheduling implementation, also known as Rotating Staircase Deadline Scheduler (RSDL). Its implementation required a red-black tree for the priorities of self-balancing and also a time slice that is calculated at the nanosecond level. Similarly to the O(1) scheduler, CFS applies the notion of weight, which implies that some processes wait more than others. This is based on the weighed fair queuing algorithm.
For processes to interact with a system, an interface should be provided to give the user space application the possibility of interacting with hardware and other processes.System
calls. These are used as an interface between the hardware and the user space. They are also used to ensure stability, security, and abstraction, in general. These are common layers that constitute an entry point into the kernel alongside traps and exceptions, as described here:
The interaction with most of the system calls that are available inside the Linux system is done using the C library. They are able to define a number of arguments and return a value that reveals whether they were successful or not. A value of zero
usually means that the execution ended with success, and in case errors appear, an error code will be available inside the errno
variable. When a system call is done, the following steps are followed:
- The switch into kernel mode is made.
- Any restrictions to the kernel space access are eliminated.
- The stack from the user space is passed into the kernel space.
- Any arguments from the user space are checked and copied into the kernel space.
- The associated routine for the system call is identified and run.
- The switch to the user space is made and the execution of the application continues.
The Linux operating system is able to support a large variety of filesystem options. This is done due to the existence of Virtual File System (VFS), which is able to provide a common interface for a large number of filesystem types and handle the systems calls relevant to them.
The filesystem types supported by the VFS can be put in these three categories:
- Disk-based filesystems: These manage the memory on a local disk or devices that are used for disk emulation. Some of the most well known ones are:
- Linux filesystems, such as Second Extended Filesystem (Ext2), Third Extended Filesystem (Ext3), and Forth Extended Filesystem (Ext4)
- UNIX filesystems, such as sysv filesystem, UFS, Minix filesystem, and so on
- Microsoft filesystems, such as MS-DOS, NTFS (available since Windows NT), and VFAT (available since Windows 95)
- ISO966 CD-ROM filesystem and disk format DVD filesystem
- Proprietary filesystems, such as the ones from Apple, IBM, and other companies
- Network filesystems: They are allowed to access various filesystem types over a network on other computers. One of the most well known ones is NFS. Of course, there are others but they are not as well known. These include Andrew filesystem (AFS), Novel's NetWare Core Protocol (NCP), Constant Data Availability (Coda), and so on.
- Special filesystems: The
/proc
filesystem is the perfect example for this category of filesystems. This category of filesystems enables an easier access for system applications to interrogate data structures of kernels and implement various features.
The virtual filesystem system call implementation is very well summarized in this image:
In the preceding image, it can be seen how easily the copy is handled from one filesystem type to another. It only uses the basic open()
, close()
, read()
, write()
functions available for all the other filesystem interaction. However, all of them implement the specific functionality underneath for the chosen filesystem. For example, the open()
system calls sys_open()
and it takes the same arguments as open()
and returns the same result. The difference between sys_open()
and open()
is that sys_open()
is a more permissive function.
decides which resources are allocated for a runnable process. It is a piece of software that is responsible for multitasking, resource allocation to various processes, and decides how to best set the resources and processor time. it also decides which processes should run next.
The first design of the Linux scheduler was very simplistic. It was not able to scale properly when the number of processes increased, so from the 2.5 kernel version, a new scheduler was developed. It is called O(1) scheduler and offers a constant time algorithm for time slice calculation and a run queue that is defined on a per-processor basis. Although it is perfect for large servers, it is not the best solution for a normal desktop system. From the 2.6 kernel version, improvements have been made to the O(1) scheduler, such as the fair scheduling concept that later materialized from the kernel version 2.6.23 into the Completely Fair Scheduler (CFS), which became the defacto scheduler.
The CFC has a simple idea behind. It behaves as if we have a perfect multitasking processor where each process gets 1/n
slice of the processor's time and this time slice is an incredibly small. The n
value represents the number of running processes. Con Kolivas is the Australian programmer that contributed to the fair scheduling implementation, also known as Rotating Staircase Deadline Scheduler (RSDL). Its implementation required a red-black tree for the priorities of self-balancing and also a time slice that is calculated at the nanosecond level. Similarly to the O(1) scheduler, CFS applies the notion of weight, which implies that some processes wait more than others. This is based on the weighed fair queuing algorithm.
For processes to interact with a system, an interface should be provided to give the user space application the possibility of interacting with hardware and other processes.System
calls. These are used as an interface between the hardware and the user space. They are also used to ensure stability, security, and abstraction, in general. These are common layers that constitute an entry point into the kernel alongside traps and exceptions, as described here:
The interaction with most of the system calls that are available inside the Linux system is done using the C library. They are able to define a number of arguments and return a value that reveals whether they were successful or not. A value of zero
usually means that the execution ended with success, and in case errors appear, an error code will be available inside the errno
variable. When a system call is done, the following steps are followed:
- The switch into kernel mode is made.
- Any restrictions to the kernel space access are eliminated.
- The stack from the user space is passed into the kernel space.
- Any arguments from the user space are checked and copied into the kernel space.
- The associated routine for the system call is identified and run.
- The switch to the user space is made and the execution of the application continues.
The Linux operating system is able to support a large variety of filesystem options. This is done due to the existence of Virtual File System (VFS), which is able to provide a common interface for a large number of filesystem types and handle the systems calls relevant to them.
The filesystem types supported by the VFS can be put in these three categories:
- Disk-based filesystems: These manage the memory on a local disk or devices that are used for disk emulation. Some of the most well known ones are:
- Linux filesystems, such as Second Extended Filesystem (Ext2), Third Extended Filesystem (Ext3), and Forth Extended Filesystem (Ext4)
- UNIX filesystems, such as sysv filesystem, UFS, Minix filesystem, and so on
- Microsoft filesystems, such as MS-DOS, NTFS (available since Windows NT), and VFAT (available since Windows 95)
- ISO966 CD-ROM filesystem and disk format DVD filesystem
- Proprietary filesystems, such as the ones from Apple, IBM, and other companies
- Network filesystems: They are allowed to access various filesystem types over a network on other computers. One of the most well known ones is NFS. Of course, there are others but they are not as well known. These include Andrew filesystem (AFS), Novel's NetWare Core Protocol (NCP), Constant Data Availability (Coda), and so on.
- Special filesystems: The
/proc
filesystem is the perfect example for this category of filesystems. This category of filesystems enables an easier access for system applications to interrogate data structures of kernels and implement various features.
The virtual filesystem system call implementation is very well summarized in this image:
In the preceding image, it can be seen how easily the copy is handled from one filesystem type to another. It only uses the basic open()
, close()
, read()
, write()
functions available for all the other filesystem interaction. However, all of them implement the specific functionality underneath for the chosen filesystem. For example, the open()
system calls sys_open()
and it takes the same arguments as open()
and returns the same result. The difference between sys_open()
and open()
is that sys_open()
is a more permissive function.
to interact with a system, an interface should be provided to give the user space application the possibility of interacting with hardware and other processes.System
calls. These are used as an interface between the hardware and the user space. They are also used to ensure stability, security, and abstraction, in general. These are common layers that constitute an entry point into the kernel alongside traps and exceptions, as described here:
The interaction with most of the system calls that are available inside the Linux system is done using the C library. They are able to define a number of arguments and return a value that reveals whether they were successful or not. A value of zero
usually means that the execution ended with success, and in case errors appear, an error code will be available inside the errno
variable. When a system call is done, the following steps are followed:
- The switch into kernel mode is made.
- Any restrictions to the kernel space access are eliminated.
- The stack from the user space is passed into the kernel space.
- Any arguments from the user space are checked and copied into the kernel space.
- The associated routine for the system call is identified and run.
- The switch to the user space is made and the execution of the application continues.
The Linux operating system is able to support a large variety of filesystem options. This is done due to the existence of Virtual File System (VFS), which is able to provide a common interface for a large number of filesystem types and handle the systems calls relevant to them.
The filesystem types supported by the VFS can be put in these three categories:
- Disk-based filesystems: These manage the memory on a local disk or devices that are used for disk emulation. Some of the most well known ones are:
- Linux filesystems, such as Second Extended Filesystem (Ext2), Third Extended Filesystem (Ext3), and Forth Extended Filesystem (Ext4)
- UNIX filesystems, such as sysv filesystem, UFS, Minix filesystem, and so on
- Microsoft filesystems, such as MS-DOS, NTFS (available since Windows NT), and VFAT (available since Windows 95)
- ISO966 CD-ROM filesystem and disk format DVD filesystem
- Proprietary filesystems, such as the ones from Apple, IBM, and other companies
- Network filesystems: They are allowed to access various filesystem types over a network on other computers. One of the most well known ones is NFS. Of course, there are others but they are not as well known. These include Andrew filesystem (AFS), Novel's NetWare Core Protocol (NCP), Constant Data Availability (Coda), and so on.
- Special filesystems: The
/proc
filesystem is the perfect example for this category of filesystems. This category of filesystems enables an easier access for system applications to interrogate data structures of kernels and implement various features.
The virtual filesystem system call implementation is very well summarized in this image:
In the preceding image, it can be seen how easily the copy is handled from one filesystem type to another. It only uses the basic open()
, close()
, read()
, write()
functions available for all the other filesystem interaction. However, all of them implement the specific functionality underneath for the chosen filesystem. For example, the open()
system calls sys_open()
and it takes the same arguments as open()
and returns the same result. The difference between sys_open()
and open()
is that sys_open()
is a more permissive function.
existence of Virtual File System (VFS), which is able to provide a common interface for a large number of filesystem types and handle the systems calls relevant to them.
The filesystem types supported by the VFS can be put in these three categories:
- Disk-based filesystems: These manage the memory on a local disk or devices that are used for disk emulation. Some of the most well known ones are:
- Linux filesystems, such as Second Extended Filesystem (Ext2), Third Extended Filesystem (Ext3), and Forth Extended Filesystem (Ext4)
- UNIX filesystems, such as sysv filesystem, UFS, Minix filesystem, and so on
- Microsoft filesystems, such as MS-DOS, NTFS (available since Windows NT), and VFAT (available since Windows 95)
- ISO966 CD-ROM filesystem and disk format DVD filesystem
- Proprietary filesystems, such as the ones from Apple, IBM, and other companies
- Network filesystems: They are allowed to access various filesystem types over a network on other computers. One of the most well known ones is NFS. Of course, there are others but they are not as well known. These include Andrew filesystem (AFS), Novel's NetWare Core Protocol (NCP), Constant Data Availability (Coda), and so on.
- Special filesystems: The
/proc
filesystem is the perfect example for this category of filesystems. This category of filesystems enables an easier access for system applications to interrogate data structures of kernels and implement various features.
The virtual filesystem system call implementation is very well summarized in this image:
In the preceding image, it can be seen how easily the copy is handled from one filesystem type to another. It only uses the basic open()
, close()
, read()
, write()
functions available for all the other filesystem interaction. However, all of them implement the specific functionality underneath for the chosen filesystem. For example, the open()
system calls sys_open()
and it takes the same arguments as open()
and returns the same result. The difference between sys_open()
and open()
is that sys_open()
is a more permissive function.
An interrupt is a representation of an event that changes the succession of instructions performed by the processor. Interrupts imply an electric signal generated by the hardware to signal an event that has happened, such as a key press, reset, and so on. Interrupts are divided into more categories depending on their reference system, as follows:.
The difference between them is that all the available interrupts are permitted to act in the bottom half context. This helps the top half respond to another interrupt while the bottom half is working, which means that it is able to save its data in a specific buffer and it permits the bottom half to operate in a safe environment.
For the bottom half processing, there are four defined mechanisms available:
The available mechanisms are well presented here:
For the top half component of the interrupt, there are three levels of abstraction in the interrupt source code. The first one is the high-level driver API that has functions, such as request_irq()
, free_irq
, disable_irq()
, enable_irq()
, and so on. The second one is represented by the high-level IRQ flow handlers, which is a generic layer with predefined or architecture-specific interrupt flow handlers assigned to respond to various interrupts during device initialization or boot time. It defines a number of predefined functions, such as handle_level_irq()
, handle_simple_irq()
, handle_percpu_irq()
, and so on. The third is represented by chip-level hardware encapsulation. It defines the struct irq_chip
structure that holds chip-relevant functions used in the IRQ flow implementation. Some of the functions are irq_ack()
, irq_mask()
, and irq_unmask()
.
SA_SAMPLE_RANDOM
: This indicates that the interrupt can contribute to the entropy pool, that is, a pool with bits that possess a strong random property, by sampling unpredictable events, such as mouse movement, inter-key press time, disk interrupts, and so onSA_SHIRQ
: This shows that the interrupt is sharable between devices.SA_INTERRUPT
: This indicates a fast interrupt handler, so interrupts are disabled on the current processor-it does not represent a situation that is very desirable
The first mechanism that will be discussed regarding bottom half interrupt handling is represented by softirqs
. They are rarely used but can be found on the Linux kernel source code inside the kernel/softirq.c
file. When it comes to implementation, they are statically allocated at the compile step. They are created when an entry is added in the include/linux/interrupt.h
header file and the system information they provide is available inside the /proc/softirqs
file. Although not used too often, they can be executed after exceptions, interrupts, system calls, and when the ksoftirkd
daemon is run by the scheduler.
The last and the newest addition to the bottom half mechanism options is represented by the kernel threads that are operated entirely in the kernel mode since they are created/destroyed by the kernel. They appeared during the 2.6.30 kernel release, and also have the same advantages as the work queues, along with some extra features, such as the possibility of having their own context. It is expected that eventually the kernel threads will replace the work queues and tasklets, since they are similar to the user space threads. A driver might want to request a threaded interrupt handler. All it needs to do in this case is to use request_threaded_irq()
in a similar way to request_irq()
. The request_threaded_irq()
function offers the possibility of passing a handler and thread_fn
to split the interrupt handling code into two parts. In addition to this, quick_check_handler
is called to check if the interrupt was called from a device; if that is the case, it will need to call IRQ_WAKE_THREAD
to wake up the handler thread and execute thread_fn
.
The number of requests with which a kernel is dealing is likened to the number of requests a server has to receive. This situation can deal with race conditions, so a good synchronization method would be required. A number of policies are available for the way the kernel behaves by defining a kernel control path. Here is an example of a kernel control path:
A number of synchronization primitives have been born:
- Per-CPU variables: This is one of the most simple and efficient synchronization methods. It multiplies a data structure so that each one is available for each CPU.
- Atomic operations: This refers to atomic read-modify-write instructions.
- Memory barrier: This safeguards the fact that the operations done before the barrier are all finished before starting the operations after it.
- Spin lock: This represents a type of lock that implements bust waiting.
- Semaphore: This is a form of locking that implements sleep or blocking waiting.
- Seqlocks: This is similar to spin locks, but is based on an access counter.
- Local interrupt disabling: This forbids the use of functions that can be postponed on a single CPU.
- Read-copy-update(RCU): This is a method designed to protect the most used data structures used for reading. It offers a lock-free access to shared data structures using pointers.
With the preceding methods, race condition situations try to be fixed. It is the job of the developer to identify and solve all the eventual synchronization problems that might appear.
The last and the newest addition to the bottom half mechanism options is represented by the kernel threads that are operated entirely in the kernel mode since they are created/destroyed by the kernel. They appeared during the 2.6.30 kernel release, and also have the same advantages as the work queues, along with some extra features, such as the possibility of having their own context. It is expected that eventually the kernel threads will replace the work queues and tasklets, since they are similar to the user space threads. A driver might want to request a threaded interrupt handler. All it needs to do in this case is to use request_threaded_irq()
in a similar way to request_irq()
. The request_threaded_irq()
function offers the possibility of passing a handler and thread_fn
to split the interrupt handling code into two parts. In addition to this, quick_check_handler
is called to check if the interrupt was called from a device; if that is the case, it will need to call IRQ_WAKE_THREAD
to wake up the handler thread and execute thread_fn
.
The number of requests with which a kernel is dealing is likened to the number of requests a server has to receive. This situation can deal with race conditions, so a good synchronization method would be required. A number of policies are available for the way the kernel behaves by defining a kernel control path. Here is an example of a kernel control path:
A number of synchronization primitives have been born:
- Per-CPU variables: This is one of the most simple and efficient synchronization methods. It multiplies a data structure so that each one is available for each CPU.
- Atomic operations: This refers to atomic read-modify-write instructions.
- Memory barrier: This safeguards the fact that the operations done before the barrier are all finished before starting the operations after it.
- Spin lock: This represents a type of lock that implements bust waiting.
- Semaphore: This is a form of locking that implements sleep or blocking waiting.
- Seqlocks: This is similar to spin locks, but is based on an access counter.
- Local interrupt disabling: This forbids the use of functions that can be postponed on a single CPU.
- Read-copy-update(RCU): This is a method designed to protect the most used data structures used for reading. It offers a lock-free access to shared data structures using pointers.
With the preceding methods, race condition situations try to be fixed. It is the job of the developer to identify and solve all the eventual synchronization problems that might appear.
a server has to receive. This situation can deal with race conditions, so a good synchronization method would be required. A number of policies are available for the way the kernel behaves by defining a kernel control path. Here is an example of a kernel control path:
A number of synchronization primitives have been born:
- Per-CPU variables: This is one of the most simple and efficient synchronization methods. It multiplies a data structure so that each one is available for each CPU.
- Atomic operations: This refers to atomic read-modify-write instructions.
- Memory barrier: This safeguards the fact that the operations done before the barrier are all finished before starting the operations after it.
- Spin lock: This represents a type of lock that implements bust waiting.
- Semaphore: This is a form of locking that implements sleep or blocking waiting.
- Seqlocks: This is similar to spin locks, but is based on an access counter.
- Local interrupt disabling: This forbids the use of functions that can be postponed on a single CPU.
- Read-copy-update(RCU): This is a method designed to protect the most used data structures used for reading. It offers a lock-free access to shared data structures using pointers.
With the preceding methods, race condition situations try to be fixed. It is the job of the developer to identify and solve all the eventual synchronization problems that might appear.
Around the Linux kernel, there are a great number of functions that are influenced by time. From the scheduler to the system uptime, they all require a time reference, which includes both absolute and relative time. For example, an event that needs to be scheduled for the future, represents a relative time, which, in fact, implies that there is a method used to count time.
The timer implementation can vary depending on the type of the event. The periodical implementations are defined by the system timer, which issues an interrupt at a fixed period of time. The system timer is a hardware component that issues a timer interrupt at a given frequency to update the system time and execute the necessary tasks. Another one that can be used is the real-time clock, which is a chip with a battery attached that keeps counting time long after the system was shut down. Besides the system time, there are dynamic timers available that are managed by the kernel dynamically to plan events that run after a particular time has passed.
The timer interrupt has an occurrence window and for ARM, it is 100 times per second. This is called the system timer frequency or tick rate and its unit of measurement is hertz (Hz). The tick rate differs from one architecture to another. If for the most of them, we have the value of 100 Hz, there are others that have values of 1024 Hz, such as the Alpha and Itanium (IA-64) architectures, for example. The default value, of course, can be changed and increased, but this action has its advantages and disadvantages.
Some of the advantages of higher frequency are:
The total number of ticks done on a Linux operation system from the time it started booting is stored in a variable called jiffies inside the include/linux/jiffies.h
header file. At boot time, this variable is initialized to zero and one is added to its value each time an interrupt happens. So, the actual value of the system uptime can be calculated in the form of jiffies/Hz.
Until now, you were introduced to some of features of the Linux kernel. Now, it is time to present more information about the development process, versioning scheme, community contributions, and and interaction with the Linux kernel.
Linux kernel is a well known open source project. To make sure that developers know how to interact with it, information about how the git
interaction is done with this project, and at the same time, some information about its development and release procedures will be presented. The project has evolved and its development processes and release procedures have evolved with it.
Before presenting the actual development process, a bit of history will be necessary. Until the 2.6 version of the Linux kernel project, one release was made every two or three years, and each of them was identified by even middle numbers, such as 1.0.x, 2.0.x, and 2.6.x. The development branches were instead defined using even numbers, such as 1.1.x, 2.1.x, and 2.5.x, and they were used to integrate various features and functionalities until a major release was prepared and ready to be shipped. All the minor releases had names, such as 2.6.32 and 2.2.23, and they were released between major release cycles.
- All the new minor release versions, such as 2.6.x, contain a two week merge window in which a number of features could be introduced in the next release
- This merge window will be closed with a release test version called 2.6.(x+1)-rc1
- Then a 6-8 weeks bug fixing period follows when all the bugs introduced by the added features should be fixed
- In the bug fixing interval, tests were run on the release candidate and the 2.6.(x+1)-rcY test versions were released
- After the final test were done and the last release candidate is considered sufficiently stable, a new release will be made with a name, such as 2.6.(x+1), and this process will be continued once again
This process worked great but the only problem was that the bug fixes were only released for the latest stable versions of the Linux kernel. People needed long term support versions and security updates for their older versions, general information about these versions that were long time supported, and so on.
Since a great number of patches and features are included in the Linux kernel everyday, it becomes difficult to keep track of all the changes, and the bigger picture in general. This changed over time because sites, such as http://kernelnewbies.org/LinuxChanges and http://lwn.net/, appeared to help developers keep in touch with the world of Linux kernel.
Besides these links, the git
versioning control system can offer much needed information. Of course, this requires the existence of Linux kernel source clones to be available on the workstation. Some of the commands that offer a great deal of information are:
git log
: This lists all the commits with the latest situated on top of the listgit log –p
: This lists all the commits and with their correspondingdiffs
git tag –l
: This lists the available tagsgit checkout <tagname>
: This checks out a branch or tag from a working repositorygit log v2.6.32..master
: This lists all the changes between the given tag and the latest versiongit log –p V2.6.32..master MAINTAINERS
: This lists all the differences between the two given branches in theMAINTAINERS
file
Of course, this is just a small list with helpful commands. All the other commands are available at http://git-scm.com/docs/.
The Linux kernel offers support for a large variety of CPU architectures. Each architecture and individual board have their own maintainers, and this information is available inside the MAINTAINERS
file. Also, the difference between board porting is mostly given by the architecture, PowerPC being very different from ARM or x86. Since the development board that this book focuses on is an Atmel with an ARM Cortex-A5 core, this section will try to focus on ARM architecture.
The increase in popularity of the ARM architecture came with the refactoring of the work and the introduction of the device tree that dramatically reduced the amount of code available inside the mach-*
directories. If the SoC is supported by the Linux kernel, then adding support for a board is as simple as defining a device tree in the /arch/arm/boot/dts
directory with an appropriate name. For example, for <soc-name>-<board-name>.d
, include the relevant dtsi
files if necessary. Make sure that you build the device tree blob (DTB) by including the device tree into arch/arm/boot/dts/Makefile and add the missing device drivers for board.
- Generic code files: These usually have a single word name, such as
clock.c
,led.c
, and so on - CPU specific code: This is for the machine ID and usually has the
<machine-ID>*.c
form - for example,at91sam9263.c
,at91sam9263_devices.c
,sama5d3.c
, and so on - Board specific code: This usually is defined as board-*.c, such as
board-carmeva.c
,board-pcontrol-g20.c
,board-pcontrol-g20.c
, and so on
For a given board, the proper configuration should be made first inside the arch/arm/mach-*/Kconfig
file; for this, the machine ID should be identified for the board CPU. After the configuration is done, the compilation can begin, so for this, arch/arm/mach-*/Makefile
should also be updated with the required files to ensure board support. Another step is represented by the machine structure that defines the board and the machine type number that needs to be defined in the board-<machine>.c
file.
When the boot process starts in the first case, only the dtb
is necessary to pass to the boot loader and loaded to initialize the Linux kernel, while in the second case, the machine type number needs to be loaded in the R1
register. In the early boot process, __lookup_machine_type
looks for the machine_desc
structure and loads it for the initialization of the board.
After this information has been presented to you, and if you are eager to contribute to the Linux kernel, then this section should be read next. If you want to really contribute to the Linux kernel project, then a few steps should be performed before starting this work. This is mostly related to documentation and investigation of the subject. No one wants to send a duplicate patch or replicate the work of someone else in vain, so a search on the Internet on the topic of your interest could save a lot of time. Other useful advice is that after you've familiarized yourself with the subject, avoid sending a workaround. Try to reach the problem and offer a solution. If not, report the problem and describe it thoroughly. If the solution is found, then make both the problem and solution available in the patch.
One of the most valuable things in the open source community is the help you can get from others. Share your question and issues, but do not forget to mention the solution also. Ask the questions in appropriate mailing lists and try to avoid the maintainers, if possible. They are usually very busy and have hundreds and thousands of e-mails to read and reply. Before asking for help, try to research the question you want to raise, it will help both when formulating it but also it could offer an answer. Use IRC, if available, for smaller questions and lastly, but most importantly, try to not overdo it.
Before presenting the actual development process, a bit of history will be necessary. Until the 2.6 version of the Linux kernel project, one release was made every two or three years, and each of them was identified by even middle numbers, such as 1.0.x, 2.0.x, and 2.6.x. The development branches were instead defined using even numbers, such as 1.1.x, 2.1.x, and 2.5.x, and they were used to integrate various features and functionalities until a major release was prepared and ready to be shipped. All the minor releases had names, such as 2.6.32 and 2.2.23, and they were released between major release cycles.
- All the new minor release versions, such as 2.6.x, contain a two week merge window in which a number of features could be introduced in the next release
- This merge window will be closed with a release test version called 2.6.(x+1)-rc1
- Then a 6-8 weeks bug fixing period follows when all the bugs introduced by the added features should be fixed
- In the bug fixing interval, tests were run on the release candidate and the 2.6.(x+1)-rcY test versions were released
- After the final test were done and the last release candidate is considered sufficiently stable, a new release will be made with a name, such as 2.6.(x+1), and this process will be continued once again
This process worked great but the only problem was that the bug fixes were only released for the latest stable versions of the Linux kernel. People needed long term support versions and security updates for their older versions, general information about these versions that were long time supported, and so on.
Since a great number of patches and features are included in the Linux kernel everyday, it becomes difficult to keep track of all the changes, and the bigger picture in general. This changed over time because sites, such as http://kernelnewbies.org/LinuxChanges and http://lwn.net/, appeared to help developers keep in touch with the world of Linux kernel.
Besides these links, the git
versioning control system can offer much needed information. Of course, this requires the existence of Linux kernel source clones to be available on the workstation. Some of the commands that offer a great deal of information are:
git log
: This lists all the commits with the latest situated on top of the listgit log –p
: This lists all the commits and with their correspondingdiffs
git tag –l
: This lists the available tagsgit checkout <tagname>
: This checks out a branch or tag from a working repositorygit log v2.6.32..master
: This lists all the changes between the given tag and the latest versiongit log –p V2.6.32..master MAINTAINERS
: This lists all the differences between the two given branches in theMAINTAINERS
file
Of course, this is just a small list with helpful commands. All the other commands are available at http://git-scm.com/docs/.
The Linux kernel offers support for a large variety of CPU architectures. Each architecture and individual board have their own maintainers, and this information is available inside the MAINTAINERS
file. Also, the difference between board porting is mostly given by the architecture, PowerPC being very different from ARM or x86. Since the development board that this book focuses on is an Atmel with an ARM Cortex-A5 core, this section will try to focus on ARM architecture.
The increase in popularity of the ARM architecture came with the refactoring of the work and the introduction of the device tree that dramatically reduced the amount of code available inside the mach-*
directories. If the SoC is supported by the Linux kernel, then adding support for a board is as simple as defining a device tree in the /arch/arm/boot/dts
directory with an appropriate name. For example, for <soc-name>-<board-name>.d
, include the relevant dtsi
files if necessary. Make sure that you build the device tree blob (DTB) by including the device tree into arch/arm/boot/dts/Makefile and add the missing device drivers for board.
- Generic code files: These usually have a single word name, such as
clock.c
,led.c
, and so on - CPU specific code: This is for the machine ID and usually has the
<machine-ID>*.c
form - for example,at91sam9263.c
,at91sam9263_devices.c
,sama5d3.c
, and so on - Board specific code: This usually is defined as board-*.c, such as
board-carmeva.c
,board-pcontrol-g20.c
,board-pcontrol-g20.c
, and so on
For a given board, the proper configuration should be made first inside the arch/arm/mach-*/Kconfig
file; for this, the machine ID should be identified for the board CPU. After the configuration is done, the compilation can begin, so for this, arch/arm/mach-*/Makefile
should also be updated with the required files to ensure board support. Another step is represented by the machine structure that defines the board and the machine type number that needs to be defined in the board-<machine>.c
file.
When the boot process starts in the first case, only the dtb
is necessary to pass to the boot loader and loaded to initialize the Linux kernel, while in the second case, the machine type number needs to be loaded in the R1
register. In the early boot process, __lookup_machine_type
looks for the machine_desc
structure and loads it for the initialization of the board.
After this information has been presented to you, and if you are eager to contribute to the Linux kernel, then this section should be read next. If you want to really contribute to the Linux kernel project, then a few steps should be performed before starting this work. This is mostly related to documentation and investigation of the subject. No one wants to send a duplicate patch or replicate the work of someone else in vain, so a search on the Internet on the topic of your interest could save a lot of time. Other useful advice is that after you've familiarized yourself with the subject, avoid sending a workaround. Try to reach the problem and offer a solution. If not, report the problem and describe it thoroughly. If the solution is found, then make both the problem and solution available in the patch.
One of the most valuable things in the open source community is the help you can get from others. Share your question and issues, but do not forget to mention the solution also. Ask the questions in appropriate mailing lists and try to avoid the maintainers, if possible. They are usually very busy and have hundreds and thousands of e-mails to read and reply. Before asking for help, try to research the question you want to raise, it will help both when formulating it but also it could offer an answer. Use IRC, if available, for smaller questions and lastly, but most importantly, try to not overdo it.
MAINTAINERS
file. Also, the difference between board porting is mostly given
by the architecture, PowerPC being very different from ARM or x86. Since the development board that this book focuses on is an Atmel with an ARM Cortex-A5 core, this section will try to focus on ARM architecture.
The increase in popularity of the ARM architecture came with the refactoring of the work and the introduction of the device tree that dramatically reduced the amount of code available inside the mach-*
directories. If the SoC is supported by the Linux kernel, then adding support for a board is as simple as defining a device tree in the /arch/arm/boot/dts
directory with an appropriate name. For example, for <soc-name>-<board-name>.d
, include the relevant dtsi
files if necessary. Make sure that you build the device tree blob (DTB) by including the device tree into arch/arm/boot/dts/Makefile and add the missing device drivers for board.
- Generic code files: These usually have a single word name, such as
clock.c
,led.c
, and so on - CPU specific code: This is for the machine ID and usually has the
<machine-ID>*.c
form - for example,at91sam9263.c
,at91sam9263_devices.c
,sama5d3.c
, and so on - Board specific code: This usually is defined as board-*.c, such as
board-carmeva.c
,board-pcontrol-g20.c
,board-pcontrol-g20.c
, and so on
For a given board, the proper configuration should be made first inside the arch/arm/mach-*/Kconfig
file; for this, the machine ID should be identified for the board CPU. After the configuration is done, the compilation can begin, so for this, arch/arm/mach-*/Makefile
should also be updated with the required files to ensure board support. Another step is represented by the machine structure that defines the board and the machine type number that needs to be defined in the board-<machine>.c
file.
When the boot process starts in the first case, only the dtb
is necessary to pass to the boot loader and loaded to initialize the Linux kernel, while in the second case, the machine type number needs to be loaded in the R1
register. In the early boot process, __lookup_machine_type
looks for the machine_desc
structure and loads it for the initialization of the board.
After this information has been presented to you, and if you are eager to contribute to the Linux kernel, then this section should be read next. If you want to really contribute to the Linux kernel project, then a few steps should be performed before starting this work. This is mostly related to documentation and investigation of the subject. No one wants to send a duplicate patch or replicate the work of someone else in vain, so a search on the Internet on the topic of your interest could save a lot of time. Other useful advice is that after you've familiarized yourself with the subject, avoid sending a workaround. Try to reach the problem and offer a solution. If not, report the problem and describe it thoroughly. If the solution is found, then make both the problem and solution available in the patch.
One of the most valuable things in the open source community is the help you can get from others. Share your question and issues, but do not forget to mention the solution also. Ask the questions in appropriate mailing lists and try to avoid the maintainers, if possible. They are usually very busy and have hundreds and thousands of e-mails to read and reply. Before asking for help, try to research the question you want to raise, it will help both when formulating it but also it could offer an answer. Use IRC, if available, for smaller questions and lastly, but most importantly, try to not overdo it.
One of the most valuable things in the open source community is the help you can get from others. Share your question and issues, but do not forget to mention the solution also. Ask the questions in appropriate mailing lists and try to avoid the maintainers, if possible. They are usually very busy and have hundreds and thousands of e-mails to read and reply. Before asking for help, try to research the question you want to raise, it will help both when formulating it but also it could offer an answer. Use IRC, if available, for smaller questions and lastly, but most importantly, try to not overdo it.
The official location for the Linux kernel is available at http://www.kernel.org, but there a lot of smaller communities that contribute to the Linux kernel with their features or even maintain their own versions.
Although the Linux core contains the scheduler, memory management, and other features, it is quite small in size. The extremely large number of device drivers, architectures and boards support together with filesystems, network protocols and all the other components were the ones that made the size of the Linux kernel really big. This can be seen by taking a look at the size of the directories of the Linux.
The Linux source code structure contains the following directories:
arch
: This contains architecture-dependent codeblock
: This contains the block layer corecrypto
: This contains cryptographic librariesdrivers
: This gathers all the implementation of the device drivers with the exception of the sound onesfs
: This gathers all the available implementations of filesysteminclude
: This contains the kernel headersinit
: This has the Linux initialization codeipc
: This holds the interprocess communication implementation codekernel
: This is the core of the Linux kernellib
: This contains various libraries, such aszlibc
,crc
, and so onmm
: This contains the source code for memory managementnet
: This offers access to all the network protocol implementations supported inside Linuxsamples
: This presents a number of sample implementations, such askfifo
,kobject
, and so onscripts
: This is used both internally and externallysecurity
: This has a bunch of security implementation, such asapparmor
,selinux
,smack
, and so onsound
: This contains sound drivers and support codeusr
: This is theinitramfs cpio
archive that generates sourcesvirt
: This holds the source code for the virtualization supportCOPYING
: This represents the Linux license and the definition copying conditionsCREDITS
: This represents the collection of Linux's main contributorsDocumentation
: This contains corresponding documentation of kernel sourcesKbuild
: This represents the top-level kernel build systemKconfig
: This is the top-level descriptor for configuration parametersMAINTAINERS
: This a list with the maintainers of each kernel componentMakefile
: This represents the top-level makefileREADME
: This file describes what Linux is, it is the starting point for understanding the projectREPORTING-BUGS
: This offers information regarding the bug report procedure
As it can be seen, the source code of the Linux kernel is quite large, so a browsing tool would be required. There are a number of tools that can be used, such as Cscope, Kscope, or the web browser, Linux Cross Reference (LXR). Cscope is a huge project that can be also available with extensions for vim
and emacs
.
Before building a Linux kernel image, the proper configuration needs to be done. This is hard, taking into consideration that we have access to hundreds and thousands of components, such as drivers, filesystems, and other items. A selection process is done inside the configuration stage, and this is possible with the help of dependency definitions. The user has the chance to use and define a number of options that are enabled in order to define the components that will be used to build a Linux kernel image for a specific board.
Here are a number of variable options available for a configuration key:
bool
: These are the options can have true or false valuestristate
: This, besides the true and false options, also appears as a module optionint
: These values, are not that spread but they usually have a well-established value rangestring
: These values, are also not the most spread ones but usually contain some pretty basic information
Besides the manual configuration of the .config
file, configuration is the worst option for a developer, mostly because it can miss dependencies between some of the configurations. I would like to suggest to developers to use the make menuconfig
command that will launch a text console tool for the configuration of a kernel image.
After the configuration is done, the compilation process can be started. A piece of advice I would like to give is to use as many threads as possible if the host machine offers this possibility because it would help with the build process. An example of the build process start command is make –j 8
.
In an embedded development, the compilation process implies cross-compilation, the most visible difference with the native compilation process being the fact that it has a prefix with the target architecture available in the naming. The prefix setup can be done using the ARCH
variable that defines the name of the architecture of the target board and the CROSS_COMPILE
variable that defines the prefix for the cross-compilation toolchain. Both of them are defined in the top-level Makefile
.
The best option would be to set these variables as environment variables to make sure that a make process is not run for the host machine. Although it only works in the current terminal, it will be the best solution in the situation that no automation tool is available for these tasks, such as the Yocto Project. It is not recommended though to update the .bashrc
shell variables if you are planning to use more than one toolchain on the host machine.
Here are a number of variable options available for a configuration key:
bool
: These are the options can have true or false valuestristate
: This, besides the true and false options, also appears as a module optionint
: These values, are not that spread but they usually have a well-established value rangestring
: These values, are also not the most spread ones but usually contain some pretty basic information
Besides the manual configuration of the .config
file, configuration is the worst option for a developer, mostly because it can miss dependencies between some of the configurations. I would like to suggest to developers to use the make menuconfig
command that will launch a text console tool for the configuration of a kernel image.
After the configuration is done, the compilation process can be started. A piece of advice I would like to give is to use as many threads as possible if the host machine offers this possibility because it would help with the build process. An example of the build process start command is make –j 8
.
In an embedded development, the compilation process implies cross-compilation, the most visible difference with the native compilation process being the fact that it has a prefix with the target architecture available in the naming. The prefix setup can be done using the ARCH
variable that defines the name of the architecture of the target board and the CROSS_COMPILE
variable that defines the prefix for the cross-compilation toolchain. Both of them are defined in the top-level Makefile
.
The best option would be to set these variables as environment variables to make sure that a make process is not run for the host machine. Although it only works in the current terminal, it will be the best solution in the situation that no automation tool is available for these tasks, such as the Yocto Project. It is not recommended though to update the .bashrc
shell variables if you are planning to use more than one toolchain on the host machine.
configuration is done, the compilation process can be started. A piece of advice I would like to give is to use as many threads as possible if the host machine offers this possibility because it would help with the build process. An example of the build process start command is make –j 8
.
In an embedded development, the compilation process implies cross-compilation, the most visible difference with the native compilation process being the fact that it has a prefix with the target architecture available in the naming. The prefix setup can be done using the ARCH
variable that defines the name of the architecture of the target board and the CROSS_COMPILE
variable that defines the prefix for the cross-compilation toolchain. Both of them are defined in the top-level Makefile
.
The best option would be to set these variables as environment variables to make sure that a make process is not run for the host machine. Although it only works in the current terminal, it will be the best solution in the situation that no automation tool is available for these tasks, such as the Yocto Project. It is not recommended though to update the .bashrc
shell variables if you are planning to use more than one toolchain on the host machine.
The best option would be to set these variables as environment variables to make sure that a make process is not run for the host machine. Although it only works in the current terminal, it will be the best solution in the situation that no automation tool is available for these tasks, such as the Yocto Project. It is not recommended though to update the .bashrc
shell variables if you are planning to use more than one toolchain on the host machine.
As I mentioned previously, the Linux kernel has a lot of kernel modules and drivers that are already implemented and available inside the source code of the Linux kernel. A number of them, being so many, are also available outside the Linux kernel source code. Having them outside not only reduces the boot time by not initializing them at boot time, but is done instead at the request and needs of users. The only difference is that the loading and unloading of the modules requires root access.
For module interaction, multiple utilities used for multiple operations are available, such as modinfo
, which is used for information gathering about modules; insmod
is able for loading a module when the fill path to the kernel module is given. Similar utilities for a module are available. One of them is called modprobe
and the difference in modprobe
is that the full path is not necessary, as it is responsible for loading dependent modules of the chosen kernel object before loading itself. Another functionality that modprobe
offers is the –r
option. It is the remove functionality which offers support for removing the module and all its dependencies. An alternative to this is the rmmod
utility, which removes modules not used anymore. The last utility available is lsmod
, which lists the loaded modules.
The simplest kernel module example that can be written looks something similar to this:
Since the Linux kernel version 2.2, there is a possibility of using the _init and __exit
macros in this way:
The preceding macros are removed, the first one after the initialization, and the second one when the module is built-in within the Linux kernel sources.
As mentioned previously, a kernel module is not only available inside a Linux kernel, but also outside of the Linux kernel tree. For a built-in kernel module, the compile process is similar to the one of other available kernel modules and a developer can inspire its work from one of them. The kernel module available outside of the Linux kernel drivers and the build process requires access to the sources of the Linux kernel or the kernel headers.
For a kernel module available outside of the Linux kernel sources, a Makefile
example is available, as follows:
KDIR := <path/to/linux/kernel/sources> PWD := $(shell pwd) obj-m := hello_world.o all: $(MAKE) ARCH=arm CROSS_COMPILE=<arm-cross-compiler-prefix> -C $(KDIR) M=$(PWD)
For a module that is implemented inside a Linux kernel, a configuration for the module needs to be made available inside the corresponding Kconfig
file with the correct configuration. Also, the Makefile
near the Kconfig
file needs to be updated to let the Makefile
system know when the configuration for the module is updated and the sources need to be built. We will see an example of this kind for a kernel device driver here.
An example of the Kconfig
file is as follows:
An example of the Makefile
is as follows:
A driver is usually used as an interface with a framework that exposes a number of hardware features, or with a bus interface used to detect and communicate with the hardware. The best example is shown here:
An inheritance mechanism is used to create specialized structures from more generic ones, such as struct device_driver
and struct device
for every bus subsystem. The bus driver is the one responsible for representing each type of bus and matching the corresponding device driver with the detected devices, detection being done through an adapter driver. For nondiscoverable devices, a description is made inside the device tree or the source code of the Linux kernel. They are handled by the platform bus that supports platform drivers and in return, handles platform devices.
Having to debug the Linux kernel is not the most easy task, but it needs to be accomplished to make sure that the development process moves forward. Understanding the Linux kernel is, of course, one of the prerequisites. Some of the available bugs are very hard to solve and may be available inside the Linux kernel for a long period of time.
Although Linus Torvalds and the Linux community do not believe that the existence of a kernel debugger will do much good to a project, a better understanding of the code is the best approach for any project. There are still some debugger solutions that are available to be used. GNU debugger (gdb
) is the first one and it can be used in the same way as for any other process. Another one is the kgdb
a patch over gdb
that permits debugging of serial connections.
Moving on to the Yocto Project, we have recipes available for every kernel version available inside the BSP support for each supported board, and recipes for kernel modules that are built outside the Linux kernel source tree.
The recipe firstly defines repository-related information. It is defined through variables, such as SRC_URI
and SRCREV
. It also indicates the branch of the repository through the KBRANCH
variable, and also the place from where defconfig
needs to be put into the source code to define the .config
file. As seen in the recipe, there is an update made to the do_deploy
task for the kernel recipe to add the device driver to the tmp/deploy/image/sama5d3-xplained
directory alongside the kernel image and other binaries.
After the kernel is built by running the bitbake virtual/kernel
command, the kernel image will be available inside the tmp/deploy/image/sama5d3-xplained
directory under the zImage-sama5d3-xplained.bin
name, which is a symbolic link to the full name file and has a larger name identifier. The kernel image was deployed here from the place where the Linux kernel tasks were executed. The simplest method to discover that place would be to run bitbake –c devshell virtual/kernel
. A development shell will be available to the user for direct interaction with the Linux kernel source code and access to task scripts. This method is preferred because the developer has access to the same environment as bitbake
.
As mentioned in the example of the Linux kernel external module, the last two lines of each kernel module that is external or internal is packaged with the kernel-module-
prefix to make sure that when the IMAGE_INSTALL
variable is available, the value kernel-modules are added to all kernel modules available inside the /lib/modules/<kernel-version>
directory. The kernel module recipe is very similar to any available recipe, the major difference being in the form of the module inherited, as shown in the line inherit module.