Let's imagine that we write a C program, and run it on the machine.
Well, hang on a second. C code cannot possibly be directly deciphered by the CPU; it must be converted into machine language. So, we understand that on modern systems we will have a toolchain installed – this includes the compiler, linker, library objects, and various other tools. We compile and link the C source code, converting it into an executable format that can be run on the system.
The processor Instruction Set Architecture (ISA) – documents the machine's instruction formats, the addressing schemes it supports, and its register model. In fact, CPU Original Equipment Manufacturers (OEMs) release a document that describes how the machine works; this document is generally called the ABI. The ABI describes more than just the ISA; it describes the machine instruction formats, the register set details, the calling convention, the linking semantics, and the executable file format, such as ELF. Try out a quick Google for x86 ABI – it should reveal interesting results.
Let's try this out. First, we write a simple Hello, World type of C program:
$ cat hello.c
/*
* hello.c
*
****************************************************************
* This program is part of the source code released for the book
* "Linux System Programming"
* (c) Kaiwan N Billimoria
* Packt Publishers
*
* From:
* Ch 1 : Linux System Architecture
****************************************************************
* A quick 'Hello, World'-like program to demonstrate using
* objdump to show the corresponding assembly and machine
* language.
*/
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
int main(void)
{
int a;
printf("Hello, Linux System Programming, World!\n");
a = 5;
exit(0);
}
$
We build the application via the Makefile, with make. Ideally, the code must compile with no warnings:
$ gcc -Wall -Wextra hello.c -o hello
hello.c: In function ‘main':
hello.c:23:6: warning: variable ‘a' set but not used [-Wunused-but-set-variable]
int a;
^
$
Important! Do not ignore compiler warnings with production code. Strive to get rid of all warnings, even the seemingly trivial ones; this will help a great deal with correctness, stability, and security.
In this trivial example code, we understand and anticipate the unused variable warning that gcc emits, and just ignore it for the purpose of this demo.
The exact warning and/or error messages you see on your system could differ from what you see here. This is because my Linux distribution (and version), compiler/linker, library versions, and perhaps even CPU, may differ from yours. I built this on a x86_64 box running the Fedora 27/28 Linux distribution.
Similarly, we build the debug version of the hello program (again, ignoring the warning for now), and run it:
$ make hello_dbg
[...]
$ ./hello_dbg
Hello, Linux System Programming, World!
$
We use the powerful objdump utility to see the intermixed source-assembly-machine language of our program (objdump's --source option switch
-S, --source Intermix source code with disassembly):
$ objdump --source ./hello_dbg
./hello_dbg: file format elf64-x86-64
Disassembly of section .init:
0000000000400400 <_init>:
400400: 48 83 ec 08 sub $0x8,%rsp
[...]
int main(void)
{
400527: 55 push %rbp
400528: 48 89 e5 mov %rsp,%rbp
40052b: 48 83 ec 10 sub $0x10,%rsp
int a;
printf("Hello, Linux System Programming, World!\n");
40052f: bf e0 05 40 00 mov $0x4005e0,%edi
400534: e8 f7 fe ff ff callq 400430 <puts@plt>
a = 5;
400539: c7 45 fc 05 00 00 00 movl $0x5,-0x4(%rbp)
exit(0);
400540: bf 00 00 00 00 mov $0x0,%edi
400545: e8 f6 fe ff ff callq 400440 <exit@plt>
40054a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
[...]
$
The exact assembly and machine code you see on your system will, in all likelihood, differ from what you see here; this is because my Linux distribution (and version), compiler/linker, library versions, and perhaps even CPU, may differ from yours. I built this on a x86_64 box running Fedora Core 27.
Alright. Let's take the line of source code a = 5; where, objdump reveals the corresponding machine and assembly language:
a = 5;
400539: c7 45 fc 05 00 00 00 movl $0x5,-0x4(%rbp)
We can now clearly see the following:
| C source |
Assembly language |
Machine instructions |
| a = 5; |
movl $0x5,-0x4(%rbp) |
c7 45 fc 05 00 00 00 |
So, when the process runs, at some point it will fetch and execute the machine instructions, producing the desired result. Indeed, that's exactly what a programmable computer is designed to do!
Though we have shown examples of displaying (and even writing a bit of) assembly and machine code for the Intel CPU, the concepts and principles behind this discussion hold up for other CPU architectures, such as ARM, PPC, and MIPS. Covering similar examples for all these CPUs goes beyond the scope of this book; however, we urge the interested reader to study the processor datasheet and ABI, and try it out.