Book Image

Nginx HTTP Server

By : Clement Nedelcu
Book Image

Nginx HTTP Server

By: Clement Nedelcu

Overview of this book

<p>Nginx is a lightweight HTTP server designed for high-traffic websites, with network scalability as primary objective. With the advent of high-speed Internet access, short loading times and fast transfer rates have become a necessity. This free open-source solution will either serve as full replacement of other software such as Apache, or stand in front of your existing infrastructure to improve its overall speed. However, migrating to Nginx requires some guidance.<br /><br />This book serves as a detailed guide to setting up Nginx in different production situations: as standalone server, as reverse proxy, interacting with applications via FastCGI, and more. In addition, the complete directive reference will be your best friend at all stages of the configuration and maintenance processes.<br /><br />This book is the perfect companion for both Nginx beginners and experienced administrators. For the former, it will take you through the complete process of setting up this lightweight HTTP server on your system and configuring its various modules to get it to do exactly what you need, in a fast and secure way. For the latter, it provides different angles of approach that can help you make the most of your current infrastructure. As the book progresses, it provides a complete reference to all the modules and directives of Nginx. It will explain how to replace your existing server with Nginx or configure Nginx to work as a frontend for your existing server.<br /><br />By the end of this book, you will be geared up to implement Nginx as a web server that will boost the speed of all your web applications.</p>
Table of Contents (17 chapters)
Nginx HTTP Server
Credits
About the Author
About the Reviewers
Preface
Directive Index
Index

Discovering the Linux filesystem


Linux-based operating systems have their files organized in a very specific way that follows more or less closely the long-established Filesystem Hierarchy Standard (FHS). According to the official FHS documentation, this standard enables:

  • Software to predict the location of installed files and directories

  • Users to predict the location of installed files and directories

Although the original standard specification was published in 1993, it is still used by modern distributions, but in a slightly revised version.

Directory structure

Unlike Microsoft Windows operating systems where all file paths begin with a drive letter (what happens if you have over twenty-six drives on your system?), FHS-based filesystems have a common parent. This parent is called the root directory, also known as / (the slash character). All files and directories (regardless of the device, drive, or partition, they are located on) are children of the root directory. Consequently, all absolute paths that you will find in this book start with a slash.

Let us now run cd /, followed by ls in order to discover the many subdirectories defined by the FHS. Please note that this directory structure is purely conventional; nothing actually prevents you from placing your own files in any of these folders or creating more directories at the root.

Path

Description

/

The root directory: Not to be confused with /root. No files are usually placed at the root, although nothing really prevents you from doing so.

/bin

Binaries: Common executable binaries and scripts available for all users of the system. This is where essential programs such as ls, cp, or mv are found.

/boot

Boot: Critical files used at system boot time.

/dev

Devices: Device and special files, more information in the next section.

/etc

Et cetera: System-wide configuration files for services and applications. You will often need to browse this directory, for example, when you will need to edit the Nginx server settings and virtual hosts.

/home

Home directories: This directory contains home directories for all users on the system except the root user. In the examples we studied before we used /home/alex, the home directory for the alex user.

/lib

Libraries: System-wide shared libraries and kernel modules, required by binaries found in the /bin and /sbin folders.

/media

Removable media: A directory that allows you to easily access removable media using mount points for devices such as CD-ROMs, USB devices, and so on.

/mnt

Temporarily mounted filesystems: This directory is a suitable placeholder in case the administrator wishes to mount a filesystem on a temporary basis.

/opt

Optional software packages: In theory, this directory should host application files and add-on packages that do not come with the default operating system installation. In practice, it is hardly ever used.

/proc

Kernel and process information virtual filesystem: This directory provides access to a virtual filesystem containing a variety of statistics and details about all running processes.

/root

Root user home directory: The root user, also known as Superuser, does not have its home directory stored in the same folder as regular users (/home). Instead, its personal files are stored in the /root. directory. The slash-root (/root) directory is not to be confused with the root directory (/).

/sbin

System binaries: Utilities dedicated to system administration, thus generally accessed by the root user only. Programs such as ifconfig, halt, service, and many others can be found here.

/srv

Service data: A placeholder for data coming from services hosted on the system. Like many others, this directory is rarely used.

/tmp

Temporary files: Files that do not need to be conserved beyond program execution should be stored here. Many operating systems actually clear the contents of this directory on reboot.

/usr

Read-only user data: This directory provides a secondary hierarchy for shareable read-only user data. The /usr directory should contain the following:

  • /usr/bin: Non-essential command binaries and scripts for all users (such as wget, gzip, firefox, and many more)

  • /usr/include: Header files from C libraries for inclusion at compile time

  • /usr/lib: Libraries used by program binaries found in /usr/bin and /usr/sbin

  • /usr/sbin: Non-essential system command binaries and scripts for all users (such as useradd, ntpdate, and so on)

  • /usr/share: Architecture-independent data files

  • /usr/src: Source code for kernel and installed applications

  • /usr/X11R6: X Window System (v11 release 6)-related files

  • /usr/local: A third hierarchy level for local data only

/var

Variable files: Files that are expected to be modified by running applications or services, for example, logfiles, cache, spool, and more. It comes with a hierarchy of its own:

  • /var/lib: Variable state information related to an application or more generally the operating system. Note that MySQL database files are usually stored in /var/lib/mysql.

  • /var/lock: Lock files used for synchronized resource access between applications.

  • /var/log: Logfiles generated by programs, services, or the system kernel.

  • /var/mail: User e-mail-related files. On most systems, /var/mail is now a simple shortcut to the actual location of the files in /var/spool/mail.

  • /var/run: Runtime variable data. Cleared when the system reboots, this directory provides information about the state of the system since it was started.

  • /var/spool: A directory in which files that are expected to be processed are placed such as e-mails and print jobs.

  • /var/tmp: A placeholder for temporary files that should not be deleted when the system reboots.

Special files and devices

As you may have noticed in the directory structure, Linux operating systems have a reserved directory for "device files" (/dev). As a matter of fact, this folder contains elements referred to as nodes, each node representing a different device on the system. They can be actual hardware devices or pseudo devices; either way, the purpose of having them listed as part of the filesystem is to facilitate input and output interactions with programs and services—software developers can access devices as simply as they would read or write to a file. You will learn that device files are used in a number of situations and you should sooner or later have a use for them.

Device types

There may be a large variety of devices available in the /dev directory, unfortunately all of them usually bear an obscure name making it nearly impossible for you to understand their purpose. Device files are named according to conventions in use in Linux operating systems. Since there is a potentially infinite amount of devices, we will only identify the most common ones. A device filename is composed of a prefix, conventionally defined according to the driver type, and optionally a number (or letter) if there is more than one device of that type present on the system.

Device file conventional prefixes for the most common types:

  • cdrom: CD and DVD-ROM drives

  • fd: Floppy disk drives

  • hd: IDE-connected devices such as hard drives and CD-ROMs

  • md: Metadisks and RAID devices such as hard drives

  • ram: RAM disks

  • sd: SCSI-connected mass-storage device

  • usb: USB-connected devices

Pseudo devices

Some of the devices listed in the /dev directory do not correspond to actual hardware devices. Instead, they are here for the sake of providing administrators and developers with simple input and output access to specific resources. For that reason, we call them "pseudo devices". Here is a brief description of the most commonly-used pseudo devices:

Pseudo Device

Description

/dev/null

Null device

This pseudo device is often nicknamed black hole as its purpose is to disregard all data that is being sent to it. When written to, it always reports the write operation as successful. When read from, the device returns no data.

This is particularly useful if you want to redirect the output of a program to nowhere; in other words, if you want to make sure a command executes but outputs no text on the screen.

[[email protected] ~]$ cat shopping.txt > /dev/null 

/dev/random

Random number generators

/dev/urandom

Streams that generate flows of random numbers. /dev/random generates true random numbers, whereas /dev/urandom provides pseudorandom numbers. These streams can be written to in order to feed the pool.

Since they generate binary data, numbers coming from /dev/random and /dev/urandom cannot be displayed to the console terminal (they would look like a flow of garbage data). These devices are mostly used by developers wishing to collect reliable random numbers.

/dev/full

Full device

This pseudo device is a stream that returns an error when written to as it is always considered full. When read from, it returns an infinite stream of null characters.

The purpose of /dev/full is to provide programmers and administrators with an operation that will always trigger an error:

[[email protected] ~]$ echo Hello! > /dev/full~bash: echo: write error: No space left on device

/dev/zero

Zero data

Much like /dev/null, the zero pseudo device always provides successful return codes when written to. However, when read from, it outputs an infinite stream of null characters.

There is a variety of cases where reading from /dev/null can prove useful, such as providing data as input to a program that will generate a file of a given size or writing to a storage device in order to format it.

Mounting a storage device

As you may have noticed in the previous sections, some of the devices available in the /dev directory are storage devices, such as hard disk drives, solid-state drives (SSD), floppies, or CD-ROMs. However, accessing the content that they serve is not as simple as browsing them with the cd command. Storage devices need to be mounted to the filesystem. In other words, devices need to be attached to a fixed directory.

[[email protected] ~]$ cd /dev/md1
~bash: cd: /dev/md1: is not a directory.
[[email protected] ~]$ mount /dev/md1 /mnt/alexdrive
[[email protected] ~]$ cd /mnt/alexdrive
[[email protected] alexdrive]$ ls
Documents Music Photos Videos boot.ini

The mount command allows you to attach a device (first argument, /dev/md1 in the previous example) to an existing directory on your system (second argument). Once the drive is mounted, you are able to access the drive like you would access any other directory of the filesystem.

Note

In modern Linux distributions, CD-ROMs and other common devices are automatically mounted by the system.

If you want to obtain information about currently mounted devices, a simple call to mount does the job—it tells you where each device is mounted, as well as the filesystem in use:

If you wish to have a drive automatically mounted on system startup, or to simply set a directory to be used as the default mount point for a device, you will need to edit the /etc/fstab file logged with administrator privileges. It is a simple text file and thus can be opened with a text editor such as nano. The file, however, respects a specific syntax, and making some changes unknowingly could cause a lot of damage to your system. More details on the fstab syntax can be found online on websites such as tuxfiles.org.

Eventually, if you need to remove a device while the computer is in use (for instance, remove a USB storage drive) you should always unmount it first. Unmounting a device is done using the umount command:

[[email protected] ~]$ umount /dev/usb1

Note that the first argument of the command may either be the device filename or the mount point, producing the same result.

Files and inodes

There is a common misconception of the notion of "filesystem" when it comes to Unix-based operating systems in general. Since those systems respect the FHS, they use a common directory hierarchy regrouping all files and devices. However, storage devices may have their independent disk filesystem. A disk filesystem is designed for the organization of files on a mass storage device (hard disk drives, CD-ROMs, and so on). Microsoft Windows operating systems favor the FAT, FAT32, and NTFS specifications; whereas the default and most recommended one for working under Linux is the EXT3 filesystem. EXT3 comes with a number of characteristics, and it is essential for administrators to master them in order to fully understand the operating system they work with.

EXT3 filesystem specifications

Unlike Microsoft's antique FAT32 file system that only allows files up to 4 gigabytes, the size restriction with EXT32 is 16 terabytes (depending on the block size). Moreover, the maximum storage space that can be used by EXT3 on a device is 32 terabytes, so you should have no trouble using it for a number of years, unless storage drive capacities suddenly skyrocket. One of the interesting features of EXT3 is that it lays out the data on the storage device in a way that file fragmentation is kept to a minimum and does not affect system performance. As a result there is no need to defragment your drives.

Filenames

The EXT3 filesystem accepts filenames up to 256 characters. Filename extensions are not required, although they are usually present and correspond to the content offered by the file—a .txt file should contain text, a .mp3 file for music, and so on. An important fact, however, is that filenames are case-sensitive—you may find, in the same directory, files named "SHOPPPING.TXT", "Shopping.txt", or "shopping.txt"; all three are different files.

Inodes

With Linux disk filesystems such as EXT3, a large variety of information is stored for each and every file. This information is separated both logically and physically from the actual file data and is stored in a specific structure called inode (index node). Some of the data contained in the inode indicates to the OS how to retrieve the contents of the file on the device. But that is not all—to the inode includes file permissions, user and group ownership, file size, access and modification times, and much more. Note that it does not contain the actual filename.

Inodes each have an identifier that is unique to the device. This identifier is called inode number or i-number and can be used in various situations. It can be retrieved by using the ls -i command:

Atime, ctime, and mtime

Among the metadata contained in an inode, you will find three different timestamps concerning the file. They are referred to as atime, ctime, and mtime.

Timestamp

Description

atime

Access time

The date and time the file was last accessed. Every time an application or service reads from the file using a system call, the file access time is updated.

mtime

Modification time

The date and time the file was last modified. When a change in the file content occurs, the file modification time is updated.

ctime

Change time

The date and time the file was last changed. This timestamp concerns changes on both the file attributes (in other words, alteration of the file's inode) and the file data.

Make sure to understand the difference between modification time and change time. The first one concerns the file data only, whereas the latter tracks modifications of both file attributes and data. Here are some common examples illustrating all three mechanisms:

File access time (atime):

 [[email protected] ~]$ nano shopping.txt

The file is opened in a text editor; its content is accessed. The file access time is updated.

File change time (ctime):

[[email protected] ~]$ chmod 0755 script.sh

The file permissions are updated (chmod command detailed in a later section); consequently, the inode is altered and the file change time updated.

File modification time (mtime):

[[email protected] ~]$ echo "- a pair of socks" >> shopping.txt

The file data is modified; as a result, both file modification time and file change time are updated.

As you may have noticed, there is no creation time recorded in the inode, so it is impossible to find out when a file was first created. It remains unclear as to why such an important element was left out. Either way if you want to know all the timestamps associated with a file, you may use the stat command:

[[email protected] ~]$ stat shopping.txt

Tip

Important information for SSD (Solid-State Drive) users

It is proven that enabling the access time feature of the filesystem can cause dramatic performance drops on your drive. Every time a file is read, its inode needs to be updated. As a result, frequent write operations are performed and that is obviously a major problem when using this kind of storage device. Be reassured that a simple solution exists for this problem as you have the possibility to completely disable file access time updates. This can be done via one of the options of the mount command, noatime. The option can be specified in the /etc/fstab file if you want to enable it permanently. More documentation can be found online with a simple noatime ssd search. Credit goes to Kevin Burton for this important finding.

Symbolic and hard links

Symbolic links in Linux are the equivalent of shortcuts in Microsoft Windows operating systems. There are a number of differences that need to be explained though, the most important one being that read or write accesses to the file performed by applications actually affect the target of the link and not the link itself. However, commands such as cp or rm affect the link, not its target.

Creating a link is done via the ln -s command. Here is an example that will help you understand the particularities of symbolic links:

[[email protected] ~]$ ln –s shoppinglist.txt link_to_list
[[email protected] ~]$ ls
link_to_list photo.jpg photo2.jpg shoppinglist.txt
[[email protected] ~]$ cat link_to_list
- toothpaste- a pair of socks
[[email protected] ~]$ rm link_to_list
[[email protected] ~]$ ls
photo.jpg photo2.jpg shoppinglist.txt

As you can see, reading the file content can be done via the symbolic link. If you delete the link, the target file is not affected; the same can be said for a copy operation (the link itself would be copied, but not the target file).

Another difference that makes symbolic links stand apart from Microsoft Windows shortcuts is that they can be connected to files using relative paths. This becomes particularly useful for embedding links within archives—deploying a shortcut using an absolute path would make no sense, as users may extract files to any location on the system.

Finally, Microsoft Windows shortcuts have the ability to include additional metadata. This allows the user to select an icon, assign a keyboard shortcut, and more. However, symbolic links are simple connections to the target file path, and as such, they do not offer the same possibilities.

Another type of link that is not available under Windows is hard links. They function a little differently, in the extent that they represent actual connections to file data. Two or more links may connect to the same data on the storage device; when one of those links is deleted, the data itself is unaffected and the other links still point to the data. Only when the last link gets deleted will the data be removed from the storage device.

To illustrate this example, let's create a hard link to that shopping list of ours—same command, but without the -s switch.

[[email protected] ~]$ ln shoppinglist.txt hard_link_to_list

If you decide to delete shoppinglist.txt, hard_link_to_list will remain here and the data it points to is still available. Additionally, the newly created link is considered as an actual file by some commands such as ls. If you run ls to calculate the total size occupied by files in this directory, you will notice that link file sizes add up. If the shopping list file itself takes up 5 kilobytes of storage space, the total size reported by ls for the directory will be 10 kilobytes—five for the shopping list file itself, and five for its link. However, some tools such as du (for Disk Usage, evoked further below) are able to dig deeper and report the actual occupied storage.

File manipulation

The next step towards your discovery of the Linux shell is to learn how to manipulate files with a command-line interface. There are many operations that you can perform with simple tools—editing text, compressing files and folders, modifying file attributes, and so on, but let's begin with a more elementary topic—displaying a file.

Reading a file

Before all, you should understand that we are working with a terminal here, in other words, there is no possibility to work with graphical data; only text can be displayed on the screen. In that extent, this section deals with text files only; no binary files such as graphics, videos, or any other form of binary data may be displayed on the screen.

The most used and simplest way to display a text file on the terminal is to use the cat command, as you may have noticed in examples from previous sections.

Although the cat command can be used to perform more complex operations (such as concatenation from multiple input sources), its simplest form consists of using the syntax—cat filename. The content of filename will be displayed to the standard output—in other words, the terminal screen.

If you reuse the grep mechanism that we approached in the process management section, you can achieve interesting results for filtering the output:

[[email protected] ~]$ cat /etc/fstab | grep sys
/dev/sys     /sys      /sysfs      defaults     0  0

As you can see, piping the output to grep allows you to specify a text string; all lines that do not contain the specified string will not be displayed.

You can pipe the output to other programs as well, in order to have your text displayed in a different manner. For example, if your file happens to be a large text document, it will probably not fit in the terminal window. The solution to this problem is to pipe the output to more:

More allows you to control the document flow—it displays as many lines of text as your terminal can contain and waits until you push the Return key to display more. Pressing Q or Ctrl+C will let you return to the prompt.

Even better—the less command allows you to scroll up and down in the document flow. It is used as a standalone program, no need to pipe its output from cat:

[[email protected] ~]$ less /etc/php.ini

Editing a file

If you are a long time Microsoft Windows or Mac OS user, you might be surprised to learn that there are actually advanced command-line text editors. Several of them come with most Linux distributions—vim, emacs, nano, and so on. The question here is—which one should you use? Since you are reading this, the best choice for you should be nano, which has already been mentioned in previous sections.

Nano is a user-friendly text editor that comes with a lot of interesting features such as syntax highlighting, text search and replace, and keyboard shortcuts. Unlike its competitors that usually require a lengthy learning process, nano's interface is intuitive.

Since there is no mouse cursor, the interface is controlled via keyboard shortcuts; available operations are displayed at the bottom in the command bar. Once you finished editing your document, save (Ctrl+O) and exit (Ctrl+X). Note that the list of available shortcuts is displayed in the bottom bar, the ^ character indicating a Control key combination (^G stands for Ctrl+G, ^O stands for Ctrl+O, and so on).

There are other ways to write in a file though, using commands that do not require any form of interface at all. One of the possible ways is to use the mechanism of redirection. This allows you to specify a location for the input and output streams interacting with a shell command. In other words, by default, the text shows up on the screen; but you do have the option to specify other locations. The most common usage for redirections is writing the output of a command to a file. Here is an example demonstrating the syntax:

[[email protected] ~]$ ls /etc > files_in_etc.txt

The command executes normally but does not output any text to the screen; instead, the text is saved to the file you specified. The > character allows you to write the text to the file, and if the specified file already exists on the system, the original is deleted and replaced. In this example, we list the files located in the /etc directory and save the results in a text file. Using >>, you have the possibility to append the output to an eventual existing file (if the file does not exist, it is created):

[[email protected] ~]$ ls /etc/init.d >> files_in_etc.txt

The list of files found in /etc/init.d is appended to the text file. There is much more you can do with redirections including replacing standard input, but covering it all would be unnecessary to your understanding of Nginx.

Finally, the touch command allows you to update the access and modification date of a file without having to actually edit its content.

[[email protected] ~]$ touch shopping.txt

Compression and archiving

Although the ZIP and RAR formats are popular and wide-spread across the Internet, they are both proprietary software technologies. As a result, they are not mainstream choices in the Linux world; other formats such as Gzip and bzip2 are favored. Of course, solutions exist for both ZIP and RAR under Linux; the point being that most projects and downloadable archives that you will find will come as .tar.gz or .tar.bz2 files.

You read correctly, there are two extensions—tar, and gz or bz2. The first part indicates the method with which files have been gathered together and the second part shows the algorithm used to compress the result. Tar (for Tape archive) is a tool that concatenates multiple files into a single one called tarball. It also gives you the option to compress the tarball once it is created, offering various compression alternatives. The tool is available under most distributions, though in some of the most minimal ones, you may have to install it manually with your system package manager (read the section further below).

The syntax for creating a tarball using Gzip and bz2 compressions respectively is as follows:

tar czvf archive.tar.gz [file1 file2…]
tar cjvf archive.tar.bz2 [file1 file2…]

Conventionally, Linux users do not archive multiple files together; instead they first gather files into a unique folder and then archive the folder. As a result, when users extract the archive, only a single item is appended to their directory listing. Imagine extracting a ZIP file onto your Windows desktop. Would you rather have all files appearing individually on your desktop, or collected neatly in a single directory? Either way, the syntax remains the same whether you want to archive files or directories.

Tar can, of course, perform the opposite operation—extracting files. However, you need to enter a slightly different command depending on the compression algorithm at use:

tar xzvf archive.tar.gz
tar xjvf archive.tar.bz2

Note that tar.gz files are also found as .tgz, and tar.bz2 files as .tbz. Other compression formats handled by tar are: LZMA (.tar.lzma) and compress (.tar.z), but they are now obsolete and there is a good chance you will never have to use them.

If you stumble upon RAR or ZIP files, you may still extract the files they contain by downloading and installing the unrar or unzip tools for Linux. The syntax that they offer is rather simple:

unrar x file.rar
unzip file.zip