Learn PostgreSQL - Second Edition

By : Luca Ferrari, Enrico Pirozzi

1 (2)

Buy this Book

Learn PostgreSQL - Second Edition

1 (2)

By: Luca Ferrari, Enrico Pirozzi

Buy this Book

Overview of this book

The latest edition of this PostgreSQL book will help you to start using PostgreSQL from absolute scratch, helping you to quickly understand the internal workings of the database. With a structured approach and practical examples, go on a journey that covers the basics, from SQL statements and how to run server-side programs, to configuring, managing, securing, and optimizing database performance. This new edition will not only help you get to grips with all the recent changes within the PostgreSQL ecosystem but will also dig deeper into concepts like partitioning and replication with a fresh set of examples. The book is also equipped with Docker images for each chapter which makes the learning experience faster and easier. Starting with the absolute basics of databases, the book sails through to advanced concepts like window functions, logging, auditing, extending the database, configuration, partitioning, and replication. It will also help you seamlessly migrate your existing database system to PostgreSQL and contains a dedicated chapter on disaster recovery. Each chapter ends with practice questions to test your learning at regular intervals. By the end of this book, you will be able to install, configure, manage, and develop applications against a PostgreSQL database.

Preface

Who this book is for

What this book covers

To get the most out of this book

Get in touch

Introduction to PostgreSQL

Technical requirements

PostgreSQL at a glance

Exploring PostgreSQL terminology

Installing PostgreSQL

Summary

References

Learn more on Discord

Free Chapter

Getting to Know Your Cluster

Technical requirements

Managing your cluster

Connecting to the cluster

Solving common connection problems

Exploring the disk layout of PGDATA

Exploring configuration files and parameters

Summary

Verify your knowledge

References

Learn more on Discord

Managing Users and Connections

Technical requirements

Introduction to users and groups

Managing roles

Managing incoming connections at the role level

Summary

Verify your knowledge

References

Learn more on Discord

Basic Statements

Technical requirements

Creating and managing databases

Managing tables

Understanding basic table manipulation statements

Summary

Verify your knowledge

References

Learn more on Discord

Advanced Statements

Technical requirements

Exploring the SELECT statement

Using the like clause

Using ilike

Using distinct

Using limit and offset

Learning the RETURNING clause for INSERT

Exploring CTEs

CTE in PostgreSQL since version 12

Summary

Verify your knowledge

References

Learn more on Discord

Window Functions

Technical requirements

Using basic statement window functions

Using advanced statement window functions

Summary

Verify your knowledge

References

Learn more on Discord

Server-Side Programming

Technical requirements

Exploring data types

The NoSQL data type

Exploring functions and languages

Summary

Verify your knowledge

References

Learn more on Discord

Triggers and Rules

Technical requirements

Exploring rules in PostgreSQL

Managing triggers in PostgreSQL

Event triggers

Summary

Verify your knowledge

References

Learn more on Discord

Partitioning

Technical requirements

Basic concepts

Exploring declarative partitioning

The default partition

Partitioning and tablespaces

A simple case study

Summary

Verify your knowledge

References

Learn more on Discord

Users, Roles, and Database Security

Technical requirements

Understanding roles

ACLs

Granting and revoking permissions

RLS

Role password encryption

SSL connections

Summary

Verify your knowledge

References

Learn more on Discord

Transactions, MVCC, WALs, and Checkpoints

Technical requirements

Introducing transactions

Transaction isolation levels

Explaining MVCC

Savepoints

Deadlocks

How PostgreSQL handles persistency and consistency: WALs

VACUUM

Summary

Verify your knowledge

References

Learn more on Discord

Extending the Database – the Extension Ecosystem

Technical requirements

Introducing extensions

Managing extensions

Exploring the PGXN client

Installing extensions

Creating your own extension

Summary

Verify your knowledge

References

Learn more on Discord

Query Tuning, Indexes, and Performance Optimization

Technical requirements

Execution of a statement

Indexes

The EXPLAIN statement

Examples of query tuning

ANALYZE and how to update statistics

Auto-explain

Summary

Verify your knowledge

References

Learn more on Discord

Logging and Auditing

Technical requirements

Introduction to logging

Extracting information from logs – pgBadger

Implementing auditing

Summary

Verify your knowledge

References

Learn more on Discord

Backup and Restore

Technical requirements

Introducing types of backups and restores

Exploring logical backups

Exploring physical backups

Basic concepts behind PITR

Summary

Verify your knowledge

References

Learn more on Discord

Configuration and Monitoring

Technical requirements

Cluster configuration

Monitoring the cluster

Advanced statistics with pg_stat_statements

Summary

Verify your knowledge

References

Learn more on Discord

Physical Replication

Technical requirements

Exploring basic replication concepts

Managing streaming replication

Summary

Verify your knowledge

References

Learn more on Discord

Logical Replication

Technical requirements

Understanding the basic concepts of logical replication

Comparing logical replication and physical replication

Exploring a logical replication setup and new logical replication features on PostgreSQL 16

Summary

Verify your knowledge

References

Learn more on Discord

Useful Tools and Extensions

Technical requirements

Exploring the pg_trgm extension

Using foreign data wrappers and the postgres_fdw extension

Disaster recovery with pgbackrest

The postgresql.conf file

The pgbackrest.conf file

Migrating from MySQL/MariaDB to PostgreSQL using pgloader

Summary

Verify your knowledge

References

Other Books You May Enjoy

Index

Customer Reviews

1 (2)

5 star

4 star

3 star

2 star

1 star

100%

Exploring the disk layout of PGDATA

In the previous sections, you have seen how to install PostgreSQL and connect to it, but we have not looked at the storage part of a cluster. Since the aim of PostgreSQL, as well as the aim of any relational database, is to permanently store data, the cluster needs some sort of permanent storage. In particular, PostgreSQL exploits the underlying filesystem to store its own data. All of the PostgreSQL-related stuff is contained in a directory known as PGDATA.

The PGDATA directory acts as the disk container that stores all the data of the cluster, including the users’ data and cluster configuration.

The following is an example of the content of PGDATA for a running PostgreSQL 16 cluster:

$  ls -1 /postgres/16/data
base
global
pg_commit_ts
pg_dynshmem
pg_hba.conf
pg_ident.conf
pg_logical
pg_multixact
pg_notify
pg_replslot
pg_serial
pg_snapshots
pg_stat
pg_stat_tmp
pg_subtrans
pg_tblspc
pg_twophase
PG_VERSION
pg_wal
pg_xact
postgresql.auto.conf
postgresql.conf
postmaster.opts
postmaster.pid

The PGDATA directory is structured in several files and subdirectories. The main files are as follows:

postgresql.conf is the main configuration file, used by default when the service is started.
postgresql.auto.conf is the automatically included configuration file used to store dynamically changed settings via SQL instructions.
pg_hba.conf is the HBA file that provides the configuration regarding available database connections.
PG_VERSION is a text file that contains the major version number (useful when inspecting the directory to understand which version of the cluster has managed the PGDATA directory).
postmaster.pid is the PID of the postmaster process, the first launched process in the cluster.

The main directories available in PGDATA are as follows:

base is a directory that contains all the users’ data, including databases, tables, and other objects.
global is a directory containing cluster-wide objects.
pg_wal is the directory containing the WAL files.
pg_stat and pg_stat_tmp are, respectively, the storage of permanent and temporary statistical information about the status and health of the cluster.

Of course, all files and directories in PGDATA are important for the cluster to work properly, but so far, the preceding is the “core” list of objects that are fundamental in PGDATA itself. Other files and directories will be discussed in later chapters.

Objects in the PGDATA directory

PostgreSQL does not name objects on disk, such as tables, in a mnemonic or human-readable way; instead, every file is named after a numeric identifier. You can see this by having a look, for instance, at the base subdirectory:

$  ls -1 /postgres/16/data/base
1
16386
4
5

As you can see from the preceding code, the base directory contains four objects, named 1,4, 5, and 16386. Please note that these numbers could be different on your machine. In particular, each of the preceding is a directory that contains other files, as shown here:

$ ls -1 /postgres/16/data/base/16386 | head
112
113
1247
1247_fsm
1247_vm
1249
1249_fsm
1249_vm
1255
1255_fsm

As you can see, each file is named with a numeric identifier. Internally, PostgreSQL holds a specific catalog that allows the database to match a mnemonic name to a numeric identifier, and vice versa. The integer identifier is named OID (or, Object Identifier); this name is a historical term that today corresponds to the so-called filenode. The two terms will be used interchangeably in this section.

There is a specific utility that allows you to inspect a PGDATA directory and extract mnemonic names: oid2name. For example, if you executed the oid2name utility, you’d get a list of all available databases similar to the following one:

$ oid2name
All databases:
   Oid  Database Name  Tablespace
----------------------------------
 16390        forumdb  pg_default
     5       postgres  pg_default
     4      template0  pg_default
     1      template1  pg_default

As you can see, the Oid numbers in the oid2name output reflect the same directory names listed in the base directory; every subdirectory has a name corresponding to the database.

You can even go further and inspect a single file going into the database directory, specifying the database where you are going to search for an object name with the -d flag:

$ cd /postgres/16/data/base/1
$ oid2name -d template1 -f 3395
From database "template1":
  Filenode                 Table Name
-------------------------------------
      3395  pg_init_privs_o_c_o_index

As you can see from the preceding example, the 3395 file in the /postgres/16/data/base/1 directory corresponds to the table named pg_init_privs_o_c_o_index. Therefore, when PostgreSQL needs to interact with a table like this, it will seek the disk to the /postgres/16/data/base/1/3395 file.

From the preceding example, it should be clear that every SQL table is stored as a file with a numeric name. However, PostgreSQL does not allow a single file to be greater than 1 GB in size, so what happens if a table grows beyond that limit? PostgreSQL “attaches” another file with a numeric extension that indicates the next chunk of 1 GB of data. In other words, if your table is stored in the 123 file, the second gigabyte will be stored in the 123.1 file, and if another gigabyte of storage is needed, another file, 123.2, will be created. Therefore, the filenode refers to the very first file related to a specific table, but more than one file can be stored on disk.

Tablespaces

PostgreSQL pretends to find all its data within the PGDATA directory, but that does not mean that your cluster is “jailed” in this directory. In fact, PostgreSQL allows “escaping” the PGDATA directory by means of tablespaces. A tablespace is a directory that can be outside the PGDATA directory and can also belong to different storage. Tablespaces are mapped into the PGDATA directory by means of symbolic links stored in the pg_tblspc subdirectory. In this way, the PostgreSQL processes do not have to look outside PGDATA, but are still able to access “external” storage. A tablespace can be used to achieve different aims, such as enlarging the storage data or providing different storage performances for specific objects. For instance, you can create a tablespace on a slow disk to contain infrequently accessed objects and tables, keeping fast storage within another tablespace for frequently accessed objects.

You don’t have to make links by yourself: PostgreSQL provides the TABLESPACE feature to manage this and the cluster will create and manage the appropriate links under the pg_tblspc subdirectory.

For instance, the following is a PGDATA directory that has three different tablespaces:

$ ls -l /postgres/16/data/pg_tblspc/
lrwxrwxrwx 1 postgres postgres 22 Jan 19 13:08 16384 -> /data/tablespaces/ts_a
lrwxrwxrwx 1 postgres postgres 22 Jan 19 13:08 16385 -> /data/tablespaces/ts_b
lrwxrwxrwx 1 postgres postgres 22 Jan 19 13:08 16386 -> /data/tablespaces/ts_c

As you can see from the preceding example, there are three tablespaces that are attached to the /data storage. You can inspect them with oid2name and the -s flag:

$ oid2name -s
All tablespaces:
    Oid  Tablespace Name
------------------------
   1663       pg_default
   1664        pg_global
  16384             ts_a
  16385             ts_b
  16386             ts_c

As you can see, the numeric identifiers of the symbolic links are mapped to the mnemonic names of the tablespaces. From the preceding example, you can observe that there are also two particular tablespaces:

pg_default is the default tablespace corresponding to “none,” the default storage to be used for every object when nothing is explicitly specified. In other words, every object stored directly under the PGDATA directory is attached to the pg_default tablespace.
pg_global is the tablespace used for system-wide objects.

By default, both of the preceding tablespaces refer directly to the PGDATA directory, meaning any cluster without a custom tablespace is totally contained within the PGDATA directory.

Learn PostgreSQL - Second Edition

By : Luca Ferrari, Enrico Pirozzi

Learn PostgreSQL - Second Edition

By: Luca Ferrari, Enrico Pirozzi

Overview of this book

Related Content you might be interested in

Current Title:

Learn PostgreSQL - Second Edition

PostgreSQL 11 Server Side Programming Quick Start Guide

PostgreSQL 16 Administration Cookbook

Mastering PostgreSQL 15

Exploring the disk layout of PGDATA

Objects in the PGDATA directory

Tablespaces