Book Image

Linux: Embedded Development

By : Alexandru Vaduva, Alex Gonzalez, Chris Simmonds
Book Image

Linux: Embedded Development

By: Alexandru Vaduva, Alex Gonzalez, Chris Simmonds

Overview of this book

Embedded Linux is a complete Linux distribution employed to operate embedded devices such as smartphones, tablets, PDAs, set-top boxes, and many more. An example of an embedded Linux distribution is Android, developed by Google. This learning path starts with the module Learning Embedded Linux Using the Yocto Project. It introduces embedded Linux software and hardware architecture and presents information about the bootloader. You will go through Linux kernel features and source code and get an overview of the Yocto Project components available. The next module Embedded Linux Projects Using Yocto Project Cookbook takes you through the installation of a professional embedded Yocto setup, then advises you on best practices. Finally, it explains how to quickly get hands-on with the Freescale ARM ecosystem and community layer using the affordable and open source Wandboard embedded board. Moving ahead, the final module Mastering Embedded Linux Programming takes you through the product cycle and gives you an in-depth description of the components and options that are available at each stage. You will see how functions are split between processes and the usage of POSIX threads. By the end of this learning path, your capabilities will be enhanced to create robust and versatile embedded projects. This Learning Path combines some of the best that Packt has to offer in one complete, curated package. It includes content from the following Packt products: ? Learning Embedded Linux Using the Yocto Project by Alexandru Vaduva ? Embedded Linux Projects Using Yocto Project Cookbook by Alex González ? Mastering Embedded Linux Programming by Chris Simmonds
Table of Contents (6 chapters)

In this chapter, you will learn about toolchains, how to use and customize them, and how code standards apply to them. A toolchain contains a myriad of tools, such as compilers, linkers, assemblers, debuggers, and a variety of miscellaneous utilities that help to manipulate the resulting application binaries. In this chapter, you will learn how to use the GNU toolchain and become familiar with its features. You will be presented with examples that will involve manual configurations, and at the same time, these examples will be moved to the Yocto Project environment. At the end of the chapter, an analysis will be made to identify the similarities and differences between manual and automatic deployment of a toolchain, and the various usage scenarios available for it.

A toolchain represents a compiler and its associated utilities that are used with the purpose of producing kernels, drivers, and applications necessary for a specific target. A toolchain usually contains a set of tools that are usually linked to each other. It consists of gcc, glibc, binutils, or other optional tools, such as a debugger optional compiler, which is used for specific programming languages, such as C++, Ada, Java, Fortran, or Objective-C.

Usually a toolchain, which is available on a traditional desktop or server, executes on these machines and produces executables and libraries that are available and can run on the same system. A toolchain that is normally used for an embedded development environment is called is a cross toolchain. In this case, programs, such as gcc, run on the host system for a specific target architecture, for which it produces a binary code. This whole process is referred to as cross-compilation, and it is the most common way to build sources for embedded development.

Introducing toolchains

In a toolchain environment, three different machines are available:

These three machine are used to generate four different toolchain build procedures:

The three machines that generate four different toolchain build procedures is described in the following diagram:

Introducing toolchains

Toolchains represent a list of tools that make the existence of most of great projects available today possible. This includes open source projects as well. This diversity would not have been possible without the existence of a corresponding toolchain. This also happens in the embedded world where newly available hardware needs the components and support of a corresponding toolchain for its Board Support Package (BSP).

Toolchain configuration is no easy process. Before starting the search for a prebuilt toolchain, or even building one yourself, the best solution would be to check for a target specific BSP; each development platform usually offers one.

The GNU toolchain is a term used for a collection of programming tools under the GNU Project umbrella. This suite of tools is what is normally called a toolchain, and is used for the development of applications and operating systems. It plays an important role in the development of embedded systems and Linux systems, in particular.

The following projects are included in the GNU toolchain:

The projects included in the toolchain is described in the following diagram:

Components of toolchains

An embedded development environment needs more than a cross-compilation toolchain. It needs libraries and it should target system-specific packages, such as programs, libraries, and utilities, and host specific debuggers, editors, and utilities. In some cases, usually when talking about a company's environment, a number of servers host target devices, and an certain hardware probes are connected to the host through Ethernet or other methods. This emphasizes the fact that an embedded distribution includes a great number of tools, and, usually, a number of these tools require customization. Presenting each of these will take up more than a chapter in a book.

In this book, however, we will cover only the toolchain building components. These include the following:

I will start by the introducing the first item on this list, the GNU Binutils package. Developed under the GNU GPL license, it represents a set of tools that are used to create and manage binary files, object code, assembly files, and profile data for a given architecture. Here is a list with the functionalities and names of the available tools for GNU Binutils package:

The majority of these tools use the Binary File Descriptor (BFD) library for low-level data manipulation, and also, many of them use the opcode library to assemble and disassemble operations.

In the toolchain generation process, the next item on the list is represented by kernel headers, and are needed by the C library for interaction with the kernel. Before compiling the corresponding C library, the kernel headers need to be supplied so that they can offer access to the available system calls, data structures, and constants definitions. Of course, any C library defines sets of specifications that are specific to each hardware architecture; here, I am referring to application binary interface (ABI).

An application binary interface (ABI) represents the interface between two modules. It gives information on how functions are called and the kind of information that should be passed between components or to the operating system. Referring to a book, such as The Linux Kernel Primer, will do you good, and in my opinion, is a complete guide for what the ABI offers. I will try to reproduce this definition for you.

An ABI can be seen as a set of rules similar to a protocol or an agreement that offers the possibility for a linker to put together compiled modules into one component without the need of recompilation. At the same time, an ABI describes the binary interface between these components. Having this sort of convention and conforming to an ABI offers the benefits of linking object files that could have been compiled with different compilers.

It can be easily seen from both of these definitions that an ABI is dependent on the type of platform, which can include physical hardware, some kind of virtual machine, and so on. It may also be dependent on the programming language that is used and the compiler, but most of it depends on the platform.

The ABI presents how the generated codes operate. The code generation process must also be aware of the ABI, but when coding in a high-level language, attention given to the ABI is rarely a problem. This information could be considered as necessary knowledge to specify some ABI related options.

As a general rule, ABI must be respected for its interaction with external components. However, with regard to interaction with its internal modules, the user is free to do whatever he or she wants. Basically, they are able to reinvent the ABI and form their own dependence on the limitations of the machine. The simple example here is related to various citizens who belong to their own country or region, because they learned and know the language of that region since they were born. Hence, they are able to understand one another and communicate without problems. For an external citizen to be able to communicate, he or she will need to know the language of a region, and being in this community seems natural, so it will not constitute a problem. Compilers are also able to design their own custom calling conventions where they know the limitations of functions that are called within a module. This exercise is typically done for optimization reasons. However, this can be considered an abuse of the ABI term.

The kernel in reference to a user space ABI is backward compatible, and it makes sure that binaries are generated using older kernel header versions, rather than the ones available on the running kernel, will work best. The disadvantages of this are represented by the fact that new system calls, data structures, and binaries generated with a toolchain that use newer kernel headers, might not work for newer features. The need for the latest kernel headers can be justified by the need to have access to the latest kernel features.

The GNU Compiler Collection, also known as GCC, represents a compiler system that constitutes the key component of the GNU toolchain. Although it was originally named the GNU C Compiler, due to the fact that it only handled the C programming language, it soon begun to represent a collection of languages, such as C, C++, Objective C, Fortran, Java, Ada, and Go, as well as the libraries for other languages (such as libstdc++, libgcj, and so on).

It was originally written as the compiler for the GNU operating system and developed as a 100 percent free software. It is distributed under the GNU GPL. This helped it extend to its functionalities across a wide variety of architectures, and it played an important role in the growth of open source software.

The development of GCC started with the effort put in by Richard Stallman to bootstrap the GNU operating system. This quest led Stallman to write his own compiler from scratch. It was released in 1987, with Stallman as the author and other as contributors to it. By 1991, it had already reached a stable phase, but it was unable to include improvements due to its architectural limitations. This meant that the starting point for work on GCC version 2 had begun, but it did not take long until the need for development of new language interfaces started to appear in it as well, and developers started doing their own forks of the compiler source code. This fork initiative proved to be very inefficient, and because of the difficulty of accepting the code procedure, working on it became really frustrating.

This changed in 1997, when a group of developers gathered as the Experimental/Enhanced GNU Compiler System (EGCS) workgroup started merging several forks in one project. They had so much success in this venture, and gathered so many features, that they made Free Software Foundation (FSF) halt their development of GCC version 2 and appointed EGCS the official GCC version and maintainers by April 1999. They united with each other with the release of GCC 2.95. More information on the history and release history of the GNU Compiler Collection can be found at https://www.gnu.org/software/gcc/releases.html and http://en.wikipedia.org/wiki/GNU_Compiler_Collection#Revision_history.

The GCC interface is similar to the Unix convention, where users call a language-specific driver, which interprets arguments and calls a compiler. It then runs an assembler on the resulting outputs and, if necessary, runs a linker to obtain the final executable. For each language compiler, there is a separate program that performs the source code read.

The process of obtaining an executable from source code has some execution steps. After the first step, an abstract syntax tree is generated and, in this stage, compiler optimization and static code analysis can be applied. The optimizations and static code analysis can be both applied on architecture-independent GIMPLE or its superset GENERIC representation, and also on architecture-dependent Register Transfer Language (RTL) representation, which is similar to the LISP language. The machine code is generated using pattern-matching algorithm, which was written by Jack Davidson and Christopher Fraser.

GCC was initially written almost entirely in C language, although the Ada frontend is written mostly in Ada language. However, in 2012, the GCC committee announced the use of C++ as an implementation language. The GCC library could not be considered finished as an implementation language, even though its main activities include adding new languages support, optimizations, improved runtime libraries, and increased speed for debugging applications.

Each available frontend generated a tree from the given source code. Using this abstract tree form, different languages can share the same backend. Initially, GCC used Look-Ahead LR (LALR) parsers, which were generated using Bison, but over time, it moved on to recursive-descendent parsers for C, C++, and Objective-C in 2006. Today, all available frontends use handwritten recursive-descendent parsers.

Until recently, the syntax tree abstraction of a program was not independent of a target processor, because the meaning of the tree was different from one language frontend to the other, and each provided its own tree syntax. All this changed with the introduction of GENERIC and GIMPLE architecture-independent representations, which were introduced with the GCC 4.0 version.

GENERIC is a more complex intermediate representation, while GIMPLE is a simplified GENERIC and targets all the frontends of GCC. Languages, such as C, C++ or Java frontends, directly produce GENERIC tree representations in the frontend. Others use different intermediate representations that are then parsed and converted to GENERIC representations.

The GIMPLE transformation represents complex expressions that are split into a three address code using temporary variables. The GIMPLE representation was inspired by the SIMPLE representation used on the McCAT compiler for simplifying the analysis and optimization of programs.

The middle stage representation of GCC involves code analysis and optimization, and works independently in terms of a compiled language and the target architecture. It starts from the GENERIC representation and continues to the Register Transfer Language (RTL) representation. The optimization mostly involves jump threading, instruction scheduling, loop optimization, sub expression elimination, and so on. The RTL optimizations are less important than the ones done through GIMPLE representations. However, they include dead code elimination, global value numbering, partial redundancy elimination, sparse conditional constant propagation, scalar replacement of aggregates, and even automatic vectorization or automatic parallelization.

The GCC backend is mainly represented by preprocessor macros and specific target architecture functions, such as endianness definitions, calling conventions, or word sizes. The initial stage of the backend uses these representations to generate the RTL; this suggests that although GCC's RTL representation is nominally processor-independent, the initial processing of abstract instructions is adapted for each specific target.

A machine-specific description file contains RTL patterns, also code snippets, or operand constraints that form a final assembly. In the process of RTL generation, the constraints of the target architecture are verified. To generate an RTL snippet, it must match one or a number RTL patterns from the machine description file, and at the same time also satisfy the limitations for these patterns. If this is not done, the process of conversion for the final RTL into machine code would be impossible. Toward the end of compilation, the RTL representation becomes a strict form. Its representation contains a real machine register correspondence and a template from the target's machine description file for each instruction reference.

As a result, the machine code is obtained by calling small snippets of code, which are associated with corresponding patterns. In this way, instructions are generated from target instruction sets. This process involves the usage of registers, offsets, and addresses from the reload phase.

The last element that needs to be introduced here is the C library. It represents the interface between a Linux kernel and applications used on a Linux system. At the same time, it offers aid for the easier development of applications. There are a couple of C libraries available in this community:

The choice of the C library used by the GCC compiler will be executed in the toolchain generation phase, and it will be influenced not only by the size and application support offered by the libraries, but also by compliance of standards, completeness, and personal preference.

The first library that we'll discuss here is the glibc library, which is designed for performance, compliance of standards, and portability. It was developed by the Free Software Foundation for the GNU/Linux operating system and is still present today on all GNU/Linux host systems that are actively maintained. It was released under the GNU Lesser General Public License.

The glibc library was initially written by Roland McGrath in the 1980s and it continued to grow until the 1990s when the Linux kernel forked glibc, calling it Linux libc. It was maintained separately until January 1997 when the Free Software Foundation released glibc 2.0. The glibc 2.0 contained so many features that it did not make any sense to continue the development of Linux libc, so they discontinued their fork and returned to using glibc. There are changes that are made in Linux libc that were not merged into glibc because of problems with the authorship of the code.

The glibc library is quite large in terms of its dimensions and isn't a suitable fit for small embedded systems, but it provides the functionality required by the Single UNIX Specification (SUS), POSIX, ISO C11, ISO C99, Berkeley Unix interfaces, System V Interface Definition, and the X/Open Portability Guide, Issue 4.2, with all its extensions common with X/Open System Interface compliant systems along with X/Open UNIX extensions. In addition to this, GLIBC also provides extensions that have been deemed useful or necessary while developing GNU.

The next C library that I'm going to discuss here is the one that resides as the main C library used by the Yocto Project until version 1.7. Here, I'm referring to the eglibc library. This is a version of glibc optimized for the usage of embedded devices and is, at the same time, able to preserve the compatibility standards.

Since 2009, Debian and a number of its derivations chose to move from the GNU C Library to eglibc. This might be because there is a difference in licensing between GNU LGPL and eglibc, and this permits them to accept patches that glibc developers my reject. Since 2014, the official eglibc homepage states that the development of eglibc was discontinued because glibc had also moved to the same licensing, and also, the release of Debian Jessie meant that it had moved back to glibc. This also happened in the case of Yocto support when they also decided to make glibc their primary library support option.

The newlib library is another C library developed with the intention of being used in embedded systems. It is a conglomerate of library components under free software licenses. Developed by Cygnus Support and maintained by Red Hat, it is one of the preferred versions of the C library used for non-Linux embedded systems.

The newlib system calls describe the usage of the C library across multiple operation systems, and also on embedded systems that do not require an operating system. It is included in commercial GCC distributions, such as Red Hat, CodeSourcery, Attolic, KPIT and others. It also supported by architecture vendors that include ARM, Renesas, or Unix-like environments, such as Cygwin, and even proprietary operating systems of the Amiga personal computer.

By 2007, it also got support from the toolchain maintainers of Nintendo DS, PlayStation, portable SDK Game Boy Advance systems, Wii, and GameCube development platforms. Another addition was made to this list in 2013 when Google Native Client SDK included newlib as their primary C library.

Bionic is a derivate of the BSD C library developed by Google for Android based on the Linux kernel. Its development is independent of Android code development. It is licensed as 3-clause BSD license and its goals are publically available. These include the following:

It also has a list of restrictions compared to glibc, as follows:

The next C library that will be discussed is musl. It is a C library intended for use with Linux operating systems for embedded and mobile systems. It has a MIT license and was developed with the idea of having a clean, standard-compliant libc, which is time efficient, since it's been developed from scratch. As a C library, it is optimized for the linking of static libraries. It is compatible with C99 standard and POSIX 2008, and implements Linux, glibc, and BSD non-standard functions.

Next, we'll discuss uClibc, which is a C standard library designed for Linux embedded systems and mobile devices. Although initially developed for μClinux and designed for microcontrollers, it gathered track and became the weapon of choice for anyone who's has limited space on their device. This has become popular due to the following reasons:

The uClibc library also has another quality that makes it quite useful. It introduces a new ideology and, because of this, the C library does not try to support as many standards as possible. However, it focuses on embedded Linux and consists of the features necessary for developers who face the limitation of available space. Due to this reason, this library was written from scratch, and even though it has its fair share of limitations, uClibc is an important alternative to glibc. If we take into consideration the fact that most of the features used from C libraries are present in it, the final size is four times smaller, and WindRiver, MontaVista, and TimeSys are active maintainers of it.

The dietlibc library is a standard C library that was developed by Felix von Leitner and released under the GNU GPL v2 license. Although it also contains some commercial licensed components, its design was based on the same idea as uClibc: the possibility of compiling and linking software while having the smallest size possible. It has another resemblance to uClibc; it was developed from scratch and has only implemented the most used and known standard functions. Its primary usage is mainly in the embedded devices market.

The last in the C libraries list is the klibc standard C library. It was developed by H. Peter Anvin and it was developed to be used as part of the early user space during the Linux startup process. It is used by the components that run the the kernel startup process but aren't used in the kernel mode and, hence, they do not have access to the standard C library.

The development of klibc started in 2002 as an initiative to remove the Linux initialization code outside a kernel. Its design makes it suitable for usage in embedded devices. It also has another advantage: it is optimized for small size and correctness of data. The klibc library is loaded during the Linux startup process from initramfs (a temporary Ram filesystem) and is incorporated by default into initramfs using the mkinitramfs script for Debian and Ubuntu-based filesystems. It also has access to a small set of utilities, such as mount, mkdir, dash, mknod, fstype, nfsmount, run-init and so on, which are very useful in the early init stage.

The klibc library is licensed under GNU GPL since it uses some components from the Linux kernel, so, as a whole, it is visible as a GPL licensed software, limiting its applicability in commercial embedded software. However, most of the source code of libraries is written under the BSD license.

When generating a toolchain, the first thing that needs to be done is the establishment of an ABI used to generate binaries. This means that the kernel needs to understand this ABI and, at the same time, all the binaries in the system need to be compiled with the same ABI.

When working with the GNU toolchain, a good source of gathering information and understanding the ways in which work is done with these tools is to consult the GNU coding standards. The coding standard's purposes are very simple: to make sure that the work with the GNU ecosystem is performed in a clean, easy, and consistent manner. This is a guideline that needs to be used by people interested in working with GNU tools to write reliable, solid, and portable software. The main focus of the GNU toolchain is represented by the C language, but the rules applied here are also very useful for any programming languages. The purpose of each rule is explained by making sure that the logic behind the given information is passed to the reader.

The main language that we will be focusing on will also be the C programming language. With regard to the GNU coding standard compatibility regarding libraries for GNU, exceptions or utilities and their compatibility should be very good when compared with standards, such as the ones from Berkeley Unix, Standard C, or POSIX. In case of conflicts in compatibility, it is very useful to have compatibility modes for that programming language.

Standards, such as POSIX and C, have a number of limitations regarding the support for extensions - however, these extensions could still be used by including a —posix, —ansi, or —compatible option to disable them. In case the extension offers a high probability of breaking a program or script by being incompatible, a redesign of its interface should be made to ensure compatibility.

A large number of GNU programs suppress the extensions that are known to cause conflict with POSIX if the POSIXLY_CORRECT environment variable is defined. The usage of user defined features offers the possibility for interchanging GNU features with other ones totally different, better, or even use a compatible feature. Additional useful features are always welcomed.

If we take a quick look at the GNU Standard documentation, some useful information can be learned from it:

It is better to use the int type, although you might consider defining a narrower data type. There are, of course, a number of special cases where this could be hard to use. One such example is the dev_t system type, because it is shorter than int on some machines and wider on others. The only way to offer support for non-standard C types involves checking the width of dev_t using Autoconf and, after this, choosing the argument type accordingly. However, it may not worth the trouble.

For the GNU Project, the implementation of an organization's standard specifications is optional, and this can be done only if it helps the system by making it better overall. In most situations, following published standards fits well within a users needs because their programs or scripts could be considered more portable. One such example is represented by the GCC, which implements almost all the features of Standard C, as the standard requires. This offers a great advantage for the developers of the C program. This also applies to GNU utilities that follow POSIX.2 specifications.

There are also specific points in the specifications that are not followed, but this happens with the sole reason of making the GNU system better for users. One such example would be the fact that the Standard C program does not permit extensions to C, but, GCC implements many of them, some being later embraced by the standard. For developers interested in outputting an error message as required by the standard, the --pedantic argument can be used. It is implemented with a view to making sure that GCC fully implements the standard.

The POSIX.2 standard mentions that commands, such as du and df, should output sizes in units of 512 bytes. However, users want units of 1KB and this default behavior is implemented. If someone is interested in having the behavior requested by POSIX standard, they would need to set the POSIXLY_CORRECT environment variable.

Another such example is represented by the GNU utilities, which don't always respect the POSIX.2 standard specifications when referring to support for long named command-line options or intermingling of options with arguments. This incompatibility with the POSIX standard is very useful in practice for developers. The main idea here is not to reject any new feature or remove an older one, although a certain standard mentions it as deprecated or forbidden.

To make sure that you write robust code, a number of guidelines should be mentioned. The first one refers to the fact that limitations should not be used for any data structure, including files, file names, lines, and symbols, and especially arbitrary limitations. All data structures should be dynamically allocated. One of the reasons for this is represented by the fact that most Unix utilities silently truncate long lines; GNU utilities do not do these kind of things.

Utilities that are used to read files should avoid dropping null characters or nonprinting characters. The exception here is when these utilities, that are intended for interfacing with certain types of printers or terminals, are unable to handle the previously mentioned characters. The advice that I'd give in this case would be to try and make programs work with a UTF-8 character set, or other sequences of bytes used to represent multibyte characters.

Make sure that you check system calls for error return values; the exception here is when a developer wishes to ignore the errors. It would be a good idea to include the system error text from strerror, perror, or equivalent error handling functions, in error messages that result from a crashed on system call, adding the name of the source code file, and also the name of the utility. This is done to make sure that the error message is easy to read and understand by anyone involved in the interaction with the source code or the program.

Check the return value for malloc or realloc to verify if they've returned zero. In case realloc is used in order to make a block smaller in systems that approximate block dimensions to powers of 2, realloc may have a different behavior and get a different block. In Unix, when realloc has a bug, it destroys the storage block for a zero return value. For GNU, this bug does not occur, and when it fails, the original block remains unchanged. If you want to run the same program on Unix and do not want to lose data, you could check if the bug was resolved on the Unix system or use the malloc GNU.

The content of the block that was freed is not accessible to alter or for any other interactions from the user. This can be done before calling free.

When a malloc command fails in a noninteractive program, we face a fatal error. In case the same situation is repeated, but, this time, an interactive program is involved, it would be better to abort the command and return to the read loop. This offers the possibility to free up virtual memory, kill other processes, and retry the command.

To decode arguments, the getopt_long option can be used.

When writing static storage during program execution, use C code for its initialization. However, for data that will not be changed, reserve C initialized declarations.

Try to keep away from low-level interfaces to unknown Unix data structures - this could happen when the data structure do not work in a compatible fashion. For example, to find all the files inside a directory, a developer could use the readdir function, or any high-level interface available function, since these do not have compatibility problems.

For signal handling, use the BSD variant of signal and the POSIX sigaction function. The USG signal interface is not the best alternative in this case. Using POSIX signal functions is nowadays considered the easiest way to develop a portable program. However, the use of one function over another is completely up to the developer.

For error checks that identify impossible situations, just abort the program, since there is no need to print any messages. These type of checks bear witness to the existence of bugs. To fix these bugs, a developer will have to inspect the available source code and even start a debugger. The best approach to solve this problem would be to describe the bugs and problems using comments inside the source code. The relevant information could be found inside variables after examining them accordingly with a debugger.

Do not use a count of the encountered errors in a program as an exit status. This practice is not the best, mostly because the values for an exit status are limited to 8 bits only, and an execution of the executable might have more than 255 errors. For example, if you try to return exit status 256 for a process, the parent process will see a status of zero and consider that the program finished successfully.

If temporary files are created, checking that the TMPDIR environment variable would be a good idea. If the variable is defined, it would be wise to use the /tmp directory instead. The use of temporary files should be done with caution because there is the possibility of security breaches occurring when creating them in world-writable directories. For C language, this can be avoided by creating temporary files in the following manner:

This can also be done using the mkstemps function, which is made available by Gnulib.

For a bash environment, use the noclobber environment variable, or the set -C short version, to avoid the previously mentioned problem. Furthermore, the mktemp available utility is altogether a better solution for making a temporary file a shell environment; this utility is available in the GNU Coreutils package.

After the introduction of the packages that comprise a toolchain, this section will introduce the steps needed to obtain a custom toolchain. The toolchain that will be generated will contain the same sources as the ones available inside the Poky dizzy branch. Here, I am referring to the gcc version 4.9, binutils version 2.24, and glibc version 2.20. For Ubuntu systems, there are also shortcuts available. A generic toolchain can be installed using the available package manager, and there are also alternatives, such as downloading custom toolchains available inside Board Support Packages, or even from third parties, including CodeSourcery and Linaro. More information on toolchains can be found at http://elinux.org/Toolchains. The architecture that will be used as a demo is an ARM architecture.

The toolchain build process has eight steps. I will only outline the activities required for each one of them, but I must mention that they are all automatized inside the Yocto Project recipes. Inside the Yocto Project section, the toolchain is generated without notice. For interaction with the generated toolchain, the simplest task would be to call meta-ide-support, but this will be presented in the appropriate section as follows:

After these steps are performed, a toolchain will be available for the developer to use. The same strategy and build procedure steps is followed inside the Yocto Project.

Advice on robust programming

To make sure that you write robust code, a number of guidelines should be mentioned. The first one refers to the fact that limitations should not be used for any data structure, including files, file names, lines, and symbols, and especially arbitrary limitations. All data structures

should be dynamically allocated. One of the reasons for this is represented by the fact that most Unix utilities silently truncate long lines; GNU utilities do not do these kind of things.

Utilities that are used to read files should avoid dropping null characters or nonprinting characters. The exception here is when these utilities, that are intended for interfacing with certain types of printers or terminals, are unable to handle the previously mentioned characters. The advice that I'd give in this case would be to try and make programs work with a UTF-8 character set, or other sequences of bytes used to represent multibyte characters.

Make sure that you check system calls for error return values; the exception here is when a developer wishes to ignore the errors. It would be a good idea to include the system error text from strerror, perror, or equivalent error handling functions, in error messages that result from a crashed on system call, adding the name of the source code file, and also the name of the utility. This is done to make sure that the error message is easy to read and understand by anyone involved in the interaction with the source code or the program.

Check the return value for malloc or realloc to verify if they've returned zero. In case realloc is used in order to make a block smaller in systems that approximate block dimensions to powers of 2, realloc may have a different behavior and get a different block. In Unix, when realloc has a bug, it destroys the storage block for a zero return value. For GNU, this bug does not occur, and when it fails, the original block remains unchanged. If you want to run the same program on Unix and do not want to lose data, you could check if the bug was resolved on the Unix system or use the malloc GNU.

The content of the block that was freed is not accessible to alter or for any other interactions from the user. This can be done before calling free.

When a malloc command fails in a noninteractive program, we face a fatal error. In case the same situation is repeated, but, this time, an interactive program is involved, it would be better to abort the command and return to the read loop. This offers the possibility to free up virtual memory, kill other processes, and retry the command.

To decode arguments, the getopt_long option can be used.

When writing static storage during program execution, use C code for its initialization. However, for data that will not be changed, reserve C initialized declarations.

Try to keep away from low-level interfaces to unknown Unix data structures - this could happen when the data structure do not work in a compatible fashion. For example, to find all the files inside a directory, a developer could use the readdir function, or any high-level interface available function, since these do not have compatibility problems.

For signal handling, use the BSD variant of signal and the POSIX sigaction function. The USG signal interface is not the best alternative in this case. Using POSIX signal functions is nowadays considered the easiest way to develop a portable program. However, the use of one function over another is completely up to the developer.

For error checks that identify impossible situations, just abort the program, since there is no need to print any messages. These type of checks bear witness to the existence of bugs. To fix these bugs, a developer will have to inspect the available source code and even start a debugger. The best approach to solve this problem would be to describe the bugs and problems using comments inside the source code. The relevant information could be found inside variables after examining them accordingly with a debugger.

Do not use a count of the encountered errors in a program as an exit status. This practice is not the best, mostly because the values for an exit status are limited to 8 bits only, and an execution of the executable might have more than 255 errors. For example, if you try to return exit status 256 for a process, the parent process will see a status of zero and consider that the program finished successfully.

If temporary files are created, checking that the TMPDIR environment variable would be a good idea. If the variable is defined, it would be wise to use the /tmp directory instead. The use of temporary files should be done with caution because there is the possibility of security breaches occurring when creating them in world-writable directories. For C language, this can be avoided by creating temporary files in the following manner:

This can also be done using the mkstemps function, which is made available by Gnulib.

For a bash environment, use the noclobber environment variable, or the set -C short version, to avoid the previously mentioned problem. Furthermore, the mktemp available utility is altogether a better solution for making a temporary file a shell environment; this utility is available in the GNU Coreutils package.

After the introduction of the packages that comprise a toolchain, this section will introduce the steps needed to obtain a custom toolchain. The toolchain that will be generated will contain the same sources as the ones available inside the Poky dizzy branch. Here, I am referring to the gcc version 4.9, binutils version 2.24, and glibc version 2.20. For Ubuntu systems, there are also shortcuts available. A generic toolchain can be installed using the available package manager, and there are also alternatives, such as downloading custom toolchains available inside Board Support Packages, or even from third parties, including CodeSourcery and Linaro. More information on toolchains can be found at http://elinux.org/Toolchains. The architecture that will be used as a demo is an ARM architecture.

The toolchain build process has eight steps. I will only outline the activities required for each one of them, but I must mention that they are all automatized inside the Yocto Project recipes. Inside the Yocto Project section, the toolchain is generated without notice. For interaction with the generated toolchain, the simplest task would be to call meta-ide-support, but this will be presented in the appropriate section as follows:

After these steps are performed, a toolchain will be available for the developer to use. The same strategy and build procedure steps is followed inside the Yocto Project.

Generating the toolchain

After the introduction of the packages that comprise a toolchain, this section will introduce the

steps needed to obtain a custom toolchain. The toolchain that will be generated will contain the same sources as the ones available inside the Poky dizzy branch. Here, I am referring to the gcc version 4.9, binutils version 2.24, and glibc version 2.20. For Ubuntu systems, there are also shortcuts available. A generic toolchain can be installed using the available package manager, and there are also alternatives, such as downloading custom toolchains available inside Board Support Packages, or even from third parties, including CodeSourcery and Linaro. More information on toolchains can be found at http://elinux.org/Toolchains. The architecture that will be used as a demo is an ARM architecture.

The toolchain build process has eight steps. I will only outline the activities required for each one of them, but I must mention that they are all automatized inside the Yocto Project recipes. Inside the Yocto Project section, the toolchain is generated without notice. For interaction with the generated toolchain, the simplest task would be to call meta-ide-support, but this will be presented in the appropriate section as follows:

After these steps are performed, a toolchain will be available for the developer to use. The same strategy and build procedure steps is followed inside the Yocto Project.

As I have mentioned, the major advantage and available feature of the Yocto Project environment is represented by the fact that a Yocto Project build does not use the host available packages, but builds and uses its own packages. This is done to make sure that a change in the host environment does not influence its available packages and that builds are made to generate a custom Linux system. A toolchain is one of the components because almost all packages that are constituents of a Linux distribution need the usage of toolchain components.

The first step for the Yocto Project is to identify the exact sources and packages that will be combined to generate the toolchain that will be used by later built packages, such as U-Boot bootloader, kernel, BusyBox and others. In this book, the sources that will be discussed are the ones available inside the dizzy branch, the latest poky 12.0 version, and the Yocto Project version 1.7. The sources can be gathered using the following command:

Gathering the sources and investigating the source code, we identified a part of the packages mentioned and presented in the preceding headings, as shown here:

The GNU CC and GCC C compiler package, which consists of all the preceding packages, is split into multiple fractions, each one with its purpose. This is mainly because each one has its purpose and is used with different scopes, such as sdk components. However, as I mentioned in the introduction of this chapter, there are multiple toolchain build procedures that need to be assured and automated with the same source code. The available support inside Yocto is for gcc 4.8 and 4.9 versions. A quick look at the gcc available recipes shows the available information:

The GNU Binutils package represents the binary tools collection, such as GNU Linker, GNU Assembler, addr2line, ar, nm, objcopy, objdump, and other tools and related libraries. The Yocto Project offers support for the Binutils version 2.24, and is also dependent on the available toolchain build procedures, as it can be viewed from the inspection of the source code:

The last components is represented by C libraries that are present as components inside the Poky dizzy branch. There are two C libraries available that can be used by developers. The first one is represented by the GNU C library, also known as glibc, which is the most used C library in Linux systems. The sources for glibc package can be viewed here:

From these sources, the same location also includes tools, such as ldconfig, a standalone native dynamic linker for runtime dependencies and a binding and cross locale generation tool. In the other C library, called uClibc, as previously mentioned, a library designed for embedded systems has fewer recipes, as it can be viewed from the Poky source code:

The uClibc is used as an alternative to glibc C library because it generates smaller executable footprints. At the same time, uClibc is the only package from the ones presented in the preceding list that has a bbappend applied to it, since it extends the support for two machines, genericx86-64 and genericx86. The change between glibc and uClibc can be done by changing the TCLIBC variable to the corresponding variable in this way: TCLIBC = "uclibc".

As mentioned previously, the toolchain generation process for the Yocto Project is simpler. It is the first task that is executed before any recipe is built using the Yocto Project. To generate the cross-toolchain inside using Bitbake, first, the bitbake meta-ide-support task is executed. The task can be executed for the qemuarm architecture, for example, but it can, of course, be generated in a similar method for any given hardware architecture. After the task finishes the execution process, the toolchain is generated and it populates the build directory. It can be used after this by sourcing the environment-setup script available in the tmp directory:

Set the MACHINE variable to the value qemuarm accordingly inside the conf/local.conf file:

The default C library used for the generation of the toolchain is glibc, but it can be changed according to the developer's need. As seen from the presentation in the previous section, the toolchain generation process inside the Yocto Project is very simple and straightforward. It also avoids all the trouble and problems involved in the manual toolchain generation process, making it very easy to reconfigure also.