Book Image

PostgreSQL 14 Administration Cookbook

By : Simon Riggs, Gianni Ciolli
5 (1)
Book Image

PostgreSQL 14 Administration Cookbook

5 (1)
By: Simon Riggs, Gianni Ciolli

Overview of this book

PostgreSQL is a powerful, open-source database management system with an enviable reputation for high performance and stability. With many new features in its arsenal, PostgreSQL 14 allows you to scale up your PostgreSQL infrastructure. With this book, you'll take a step-by-step, recipe-based approach to effective PostgreSQL administration. This book will get you up and running with all the latest features of PostgreSQL 14 while helping you explore the entire database ecosystem. You’ll learn how to tackle a variety of problems and pain points you may face as a database administrator such as creating tables, managing views, improving performance, and securing your database. As you make progress, the book will draw attention to important topics such as monitoring roles, validating backups, regular maintenance, and recovery of your PostgreSQL 14 database. This will help you understand roles, ensuring high availability, concurrency, and replication. Along with updated recipes, this book touches upon important areas like using generated columns, TOAST compression, PostgreSQL on the cloud, and much more. By the end of this PostgreSQL book, you’ll have gained the knowledge you need to manage your PostgreSQL 14 database efficiently, both in the cloud and on-premise.
Table of Contents (14 chapters)

Locating the database server files

Database server files are initially stored in a location referred to as the data directory. Additional data files may also be stored in tablespaces if any exist.

In this recipe, you will learn how to find the location of these directories on a given database server.

Getting ready

You'll need to get operating system access to the database system, which is what we call the platform on which the database runs.

How to do it...

If you can connect using psql, then you can use this command:

postgres=# SHOW data_directory; 
    data_directory
----------------------
 /var/lib/pgsql/data/

If not, the following are the system default data directory locations:

  • Debian or Ubuntu systems: /var/lib/postgresql/MAJOR_RELEASE/main
  • Red Hat RHEL, CentOS, and Fedora: /var/lib/pgsql/data/
  • Windows: C:\Program Files\PostgreSQL\MAJOR_RELEASE\data

MAJOR_RELEASE is composed of just one number (for release 10 and above) or two (for releases up to 9.6).

On Debian or Ubuntu systems, the configuration files are located in /etc/postgresql/MAJOR_RELEASE/main/, where main is just the name of a database server. Other names are also possible. For the sake of simplicity, we assume that you only have a single installation, although the point of including the release number and database server name as components of the directory path is to allow multiple database servers to coexist on the same host.

Note

The pg_lsclusters utility is specific to Debian/Ubuntu and displays a list of all the available database servers, including information for each server.

The information for each server includes the following:

  • Major release number
  • Port
  • Status (for example, online and down)
  • Data directory
  • Log file

The pg_lsclusters utility is part of the postgresql-common Debian/Ubuntu package, which provides a structure under which multiple versions of PostgreSQL can be installed, and multiple clusters can be maintained, at the same time.

In the packages distributed with Red Hat RHEL, CentOS, and Fedora, the default data directory location also contains the configuration files (*.conf) by default. However, note that the packages distributed by the PostgreSQL community use a different default location: /var/lib/pgsql/MAJOR_RELEASE/data/.

Again, that is just the default location. You can create additional data directories using the initdb utility.

The initdb utility populates the given data directory with the initial content. The directory will be created for convenience if it is missing but, for safety, the utility will stop if the data directory is not empty. The initdb utility will read the data directory name from the PGDATA environment variable unless the -d command-line option is used.

How it works...

Even though the Debian/Ubuntu and Red Hat file layouts are different, they both follow the Linux Filesystem Hierarchy Standard (FHS), so neither layout is wrong.

The Red Hat layout is simpler and easier to understand. The Debian/Ubuntu layout is more complex, but it has different and more adventurous goals. The Debian/Ubuntu layout is similar to the Optimal Flexible Architecture (OFA) of other database systems. As pointed out earlier, the goals are to provide a file layout that will allow you to have multiple PostgreSQL database servers on one system and to allow many versions of the software to exist in the filesystem at once.

Again, the layouts for the Windows and OS X installers are different. Multiple database clusters are possible, but they are also more complex than on Debian/Ubuntu.

I recommend that you follow the Debian/Ubuntu layout on whichever platform you are using. It doesn't really have a name, so I call it the PostgreSQL Flexible Architecture (PFA). Clearly, if you are using Debian or Ubuntu, then the Debian/Ubuntu layout is already being used. If you do this on other platforms, you'll need to lay things out yourself, but it does pay off in the long run. To implement PFA, you can set the following environment variables to name parts of the file layout:

export PGROOT=/var/lib/pgsql/ 
export PGRELEASE=14
export PGSERVERNAME=mamba 
export PGDATA=$PGROOT/$PGRELEASE/$PGSERVERNAME

In this example, PGDATA is /var/lib/pgsql/14/mamba.

Finally, you must run initdb to initialize the data directory, as noted earlier, and custom administration scripts should be prepared to automate actions, such as starting or stopping the database server, when the system undergoes similar procedures.

Note that server applications such as initdb can only work with one major PostgreSQL version. On distributions that allow several major versions, such as Debian or Ubuntu, these applications are placed in dedicated directories, which are not put in the default command path. This means that if you just type initdb, the system will not find the executable, and you will get an error message.

This may look like a bug, but in fact, it is the desired behavior. Instead of accessing initdb directly, you are supposed to use the pg_createcluster utility from postgresql-common, which will select the right initdb utility depending on the major version you specify.

Note

If you plan to run more than one database server on the same host, you must set the preceding variables differently for each server as they determine the name of the data directory. For instance, you can set them in the script that you use to start or stop the database server, which would be enough because PGDATA is mostly used only by the database server process.

There's more…

Once you've located the data directory, you can look for the files that comprise the PostgreSQL database server. The layout is as follows:

Figure 2.2 – Contents of the PostgreSQL data directory

Figure 2.2 – Contents of the PostgreSQL data directory

None of the aforementioned directories contain user-modifiable files, nor should any of the files be manually deleted to save space, or for any other reason. Don't touch it, because you'll break it, and you may not be able to fix it! It's not even sensible to copy files in these directories without carefully following the procedures described in Chapter 11Backup and Recovery. Keep off the grass!

We'll talk about tablespaces later in the book. We'll also discuss a performance enhancement that involves putting the transaction log on its own set of disk drives in Chapter 10Performance and Concurrency.

The only things you are allowed to touch are configuration files, which are all *.conf files, and server message log files. Server message log files may or may not be in the data directory. For more details on this, refer to the next recipe, Locating the database server's message log.