PostgreSQL 16 Administration Cookbook

By : Gianni Ciolli, Boriss Mejías, Jimmy Angelakos, Vibhor Kumar, Simon Riggs

5 (1)

Buy this Book

PostgreSQL 16 Administration Cookbook

5 (1)

By: Gianni Ciolli, Boriss Mejías, Jimmy Angelakos, Vibhor Kumar, Simon Riggs

Buy this Book

Overview of this book

PostgreSQL has seen a huge increase in its customer base in the past few years and is becoming one of the go-to solutions for anyone who has a database-specific challenge. This PostgreSQL book touches on all the fundamentals of Database Administration in a problem-solution format. It is intended to be the perfect desk reference guide. This new edition focuses on recipes based on the new PostgreSQL 16 release. The additions include handling complex batch loading scenarios with the SQL MERGE statement, security improvements, running Postgres on Kubernetes or with TPA and Ansible, and more. This edition also focuses on certain performance gains, such as query optimization, and the acceleration of specific operations, such as sort. It will help you understand roles, ensuring high availability, concurrency, and replication. It also draws your attention to aspects like validating backups, recovery, monitoring, and scaling aspects. This book will act as a one-stop solution to all your real-world database administration challenges. By the end of this book, you will be able to manage, monitor, and replicate your PostgreSQL 16 database for efficient administration and maintenance with the best practices from experts.

Preface

Who this book is for

What this book covers

To get the most out of this book

Get in touch

Free Chapter

First Steps

Introducing PostgreSQL 16

How to get PostgreSQL

Connecting to the PostgreSQL server

Enabling access for network/remote users

Using the pgAdmin 4 GUI tool

Using the psql query and scripting tool

Changing your password securely

Avoiding hardcoding your password

Using a connection service file

Troubleshooting a failed connection

PostgreSQL in the cloud

PostgreSQL with Kubernetes

PostgreSQL with TPA

Exploring the Database

What type of server is this?

What version is the server?

What is the server uptime?

Locating the database server files

Locating the database server’s message log

Locating the database’s system identifier

Listing databases on the database server

How many tables are there in a database?

How much disk space does a database use?

How much memory does a database currently use?

How much disk space does a table use?

Which are my biggest tables?

How many rows are there in a table?

Quickly estimating the number of rows in a table

Listing extensions in this database

Understanding object dependencies

Server Configuration

Read the fine manual (RTFM)

Planning a new database

Setting the configuration parameters for the database server

Setting the configuration parameters in your programs

Finding the configuration settings for your session

Finding parameters with non-default settings

Setting parameters for particular groups of users

A basic server configuration checklist

Adding an external module to PostgreSQL

Using an installed module/extension

Managing installed extensions

Server Control

An overview of controlling the database server

Starting the database server manually

Stopping the server safely and quickly

Stopping the server in an emergency

Reloading server configuration files

Restarting the server quickly

Preventing new connections

Restricting users to only one session each

Pushing users off the system

Deciding on a design for multitenancy

Using multiple schemas

Giving users their own private databases

Running multiple servers on one system

Setting up a connection pool

Accessing multiple servers using the same host and port

Running multiple PgBouncer on the same port to leverage multiple cores

Tables and Data

Choosing good names for database objects

Handling objects with quoted names

Identifying and removing duplicates

Preventing duplicate rows

Finding a unique key for a set of data

Generating test data

Randomly sampling data

Loading data from a spreadsheet

Loading data from flat files

Making bulk data changes using server-side procedures with transactions

Dealing with large tables with table partitioning

Finding good candidates for partition keys

Consolidating data with MERGE

Deciding when to use JSON data types

Security

An overview of PostgreSQL security

The PostgreSQL superuser

Revoking user access to tables

Granting user access to a table

Granting user access to specific columns

Granting user access to specific rows

Creating a new user

Temporarily preventing a user from connecting

Removing a user without dropping their data

Checking whether all users have a secure password

Giving limited superuser powers to specific users

Auditing database access

Always knowing which user is logged in

Integrating with LDAP

Connecting using encryption (SSL / GSSAPI)

Using SSL certificates to authenticate

Mapping external usernames to database roles

Using column-level encryption

Setting up cloud security using predefined roles

Database Administration

Writing a script that either succeeds entirely or fails entirely

Writing a psql script that exits on the first error

Using psql variables

Placing query output into psql variables

Writing a conditional psql script

Investigating a psql error

Setting the psql prompt with useful information

Using pgAdmin for DBA tasks

Scheduling jobs for regular background execution

Performing actions on many tables

Adding/removing columns on a table

Changing the data type of a column

Changing the definition of an enum data type

Adding a constraint concurrently

Adding/removing schemas

Moving objects between schemas

Adding/removing tablespaces

Moving objects between tablespaces

Accessing objects in other PostgreSQL databases

Accessing objects in other foreign databases

Making views updatable

Using materialized views

Using GENERATED data columns

Using data compression

Monitoring and Diagnosis

Cloud-native monitoring

Providing PostgreSQL information to monitoring tools

Real-time viewing using pgAdmin

Monitoring the PostgreSQL message log

Checking whether a user is connected

Checking whether a computer is connected

Repeatedly executing a query in psql

Checking which queries are running

Monitoring the progress of commands

Checking which queries are active or blocked

Knowing who is blocking a query

Killing a specific session

Knowing whether anybody is using a specific table

Knowing when a table was last used

Monitoring I/O statistics

Usage of disk space by temporary data

Understanding why queries slow down

Analyzing the real-time performance of your queries

Tracking important metrics over time

Regular Maintenance

Controlling automatic database maintenance

Avoiding auto-freezing

Removing issues that cause bloat

Actions for heavy users of temporary tables

Identifying and fixing bloated tables and indexes

Monitoring and tuning a vacuum

Maintaining indexes

Finding unused indexes

Carefully removing unwanted indexes

Planning maintenance

Performance and Concurrency

Finding slow SQL statements

Finding out what makes SQL slow

Reducing the number of rows returned

Simplifying complex SQL queries

Speeding up queries without rewriting them

Discovering why a query is not using an index

Forcing a query to use an index

Using parallel query

Using Just-In-Time (JIT) compilation

Creating time-series tables using partitioning

Using optimistic locking to avoid long lock waits

Reporting performance problems

Backup and Recovery

Understanding and controlling crash recovery

Planning your backups

Hot logical backup of one database

Hot logical backup of all databases

Backup of database object definitions

A standalone hot physical backup

Hot physical backups with Barman

Recovery of all databases

Recovery to a point in time

Recovery of a dropped/damaged table

Recovery of a dropped/damaged database

Extracting a logical backup from a physical one

Improving the performance of logical backup/recovery

Improving the performance of physical backup/recovery

Validating backups

Replication and Upgrades

Replication concepts

Replication best practices

Setting up streaming replication

Setting up streaming replication security

Hot Standby and read scalability

Managing streaming replication

Using repmgr

Using replication slots

Setting up replication with TPA

Setting up replication with CloudNativePG

Monitoring replication

Performance and synchronous replication (sync rep)

Delaying, pausing, and synchronizing replication

Logical replication

EDB Postgres Distributed

Archiving transaction log data

Upgrading minor releases

Major upgrades in-place

Major upgrades online

Other Books You May Enjoy

Index

Customer Reviews

5 (1)

5 star

100%

4 star

3 star

2 star

1 star

PostgreSQL with TPA

Trusted Postgres Architect (TPA) is a software based on Ansible that can be used to deploy database clusters on a variety of platforms.

In this recipe, we will use TPA to configure and deploy a small cluster on our own Linux workstation.

This recipe uses TPA’s docker platform, which is meant to be used only for test clusters. TPA currently supports two other platforms:

The aws platform, to provision and use instances on AWS EC2
The bare platform, to use existing instances (including bare-metal and already provisioned servers)

For more information on how to use these platforms, please refer to the corresponding TPA documentation pages:

Getting ready

First, we need to install TPA, which is free software, released under the GPL v3 license. Therefore, you can download it from public repositories, as explained in the installation instructions:

https://www.enterprisedb.com/docs/tpa/latest/INSTALL/

Make sure you have the latest version installed; you can check it by typing:

tpaexec info

At the time when this recipe was written, TPA version 23.23 was the latest release available. Given that TPA tries hard to keep compatibility with clusters installed using previous versions, you should definitely always use the latest version of TPA, and be able to repeat this recipe even with releases newer than 23.23.

Then, we need to install Docker. If you don’t have it already on your laptop you can install it as described here: https://www.enterprisedb.com/docs/tpa/latest/platform-docker/#installing-docker.

In the common microservices approach, each container runs a specific service. The way TPA uses Docker is quite different because each container runs a miniature copy of a Linux OS. This approach is not meant for production use, but it is a great way to test the behavior of a cluster with minimal resource use.

How to do it...

This is our first TPA example, so we will deploy the smallest possible PostgreSQL cluster, composed of a single instance with a backup server. No replication, no high availability (which most of the time means no production!)

First, we create the cluster configuration using the tpaexec configure command as follows:

tpaexec configure myfirstcluster --architecture M1 \
  --platform docker --enable-repmgr --postgresql 16

This command creates a directory named myfirstcluster with the following contents:

myfirstcluster/
├── commands
│   ├── status.yml -> /opt/EDB/TPA/architectures/M1/commands/status.yml
│   ├── switchover.sh -> /opt/EDB/TPA/architectures/M1/commands/switchover.sh
│   ├── switchover.yml -> /opt/EDB/TPA/architectures/M1/commands/switchover.yml
│   └── upgrade.yml -> /opt/EDB/TPA/architectures/M1/commands/upgrade.yml
├── config.yml
└── deploy.yml -> /opt/EDB/TPA/architectures/M1/deploy.yml

The commands directory contains some symlinks to commands that are specific to the architecture that we have chosen, while deploy.yml is a symlink to the playbook used for the deploy command. As you can see, all these are files distributed together with TPA, which are linked to this cluster directory so they can easily be used.

The only new file that has been created by this invocation is config.yml, which describes the cluster. It is effectively a template that the user can modify if they want to fine-tune the cluster; in fact, editing that file is quite common because only some of the settings can be specified as options of the tpaexec configure command.

We created a configuration file specifying this architecture:

https://www.enterprisedb.com/docs/tpa/latest/architecture-M1/

As we want a smaller example, we will now edit config.yml to remove some of the instances because in this first example, we just want to deploy one PostgreSQL instance and one Barman instance instead of the full M1 architecture, which by default includes a three-node physical replication cluster plus a Barman node, which also acts as a log server and as a monitoring server.

Let’s locate the instances section, at the end of the file:

instances:
- Name: kennel
  backup: karma
  location: main
  node: 1
  role:
  - primary
- Name: quintet
  location: main
  node: 2
  role:
  - replica
  upstream: kennel
- Name: karma
  location: main
  node: 3
  role:
  - barman
  - log-server
  - monitoring-server
- Name: kinship
  location: dr
  node: 4
  role:
  - replica
  upstream: quintet

The instance names in your example will likely be different every time you run tpaexec configure because TPA by default picks them at random from a built-in list of words; however, the structure will be the same.

From there, we can remove:

The physical replicas – that is, instances 2 and 4 (here, quintet and kinship)
The additional roles for the Barman instance – that is, log-server and monitoring-server from instance 3 (here, karma)

We end up with the following instances section:

instances:
- Name: kennel
  backup: karma
  location: main
  node: 1
  role:
  - primary
- Name: karma
  location: main
  node: 3
  role:
  - barman

After making these changes, we can deploy the cluster, which is as simple as issuing the following command:

tpaexec deploy myfirstcluster

This command will display copious output, ending like this after a few minutes:

PLAY RECAP *********************************************************************
karma                      : ok=177  changed=40   unreachable=0    failed=0    skipped=163  rescued=0    ignored=0   
kennel                     : ok=316  changed=97   unreachable=0    failed=0    skipped=222  rescued=0    ignored=1   
localhost                  : ok=4    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
real	5m35.687s
user	1m13.249s
sys	0m30.098s

The output is also collected in the ansible.log file, with millisecond timestamps, if you need to inspect the (many) steps afterward.

Now that we have a cluster, we can use it. Let’s connect with SSH to the Postgres host:

$ cd myfirstcluster
$ ssh -F ssh_config kennel
[root@kennel ~]# su - postgres
postgres@kennel:~ $ psql
psql (15.4)
Type "help" for help.
postgres=#
We can also open another terminal and connect to the Barman host:
$ ssh -F ssh_config karma
Last login: Mon Sep 18 21:35:41 2023 from 172.17.0.1
[root@karma ~]# su - barman
[barman@karma ~]$ barman list-backup all
kennel 20230918T213317 - Mon Sep 18 21:33:19 2023 - Size: 22.2 MiB - WAL Size: 0 B
kennel 20230918T213310 - Mon Sep 18 21:33:11 2023 - Size: 22.2 MiB - WAL Size: 36.2 KiB
kennel 20230918T213303 - Mon Sep 18 21:33:05 2023 - Size: 22.2 MiB - WAL Size: 36.8 KiB

There’s more

TPA reads the config.yml file, where the cluster is described in a declarative way, and then performs all the actions needed to deploy the cluster, or to modify an already-deployed cluster if config.yml has been changed since the last run of the deploy command.

The tpaexec deploy command automatically performs the preliminary tpaexec provision, which is the step where TPA populates the Ansible inventory based on the contents of config.yml and then creates the required resources, such as SSH keys, passwords, and instances. Here, “instances” means:

Containers, when using the docker platform
VMs, when using the aws platform
Nothing, when using the bare platform (TPA will expect “bare metal” instances, in the sense that they exist already and TPA has sudo SSH access to them)

For more details, please refer to the TPA online documentation:

The configure command: https://www.enterprisedb.com/docs/tpa/latest/tpaexec-configure/
The provision command: https://www.enterprisedb.com/docs/tpa/latest/tpaexec-provision/
The deploy command: https://www.enterprisedb.com/docs/tpa/latest/tpaexec-deploy/

Learn more on Discord

To join the Discord community for this book – where you can share feedback, ask questions to the author, and learn about new releases – follow the QR code below:

https://discord.gg/pQkghgmgdG

PostgreSQL 16 Administration Cookbook

By : Gianni Ciolli, Boriss Mejías, Jimmy Angelakos, Vibhor Kumar, Simon Riggs

PostgreSQL 16 Administration Cookbook

By: Gianni Ciolli, Boriss Mejías, Jimmy Angelakos, Vibhor Kumar, Simon Riggs

Overview of this book

Related Content you might be interested in

Current Title:

PostgreSQL 16 Administration Cookbook

Learn PostgreSQL

PostgreSQL 10 High Performance

Learning PostgreSQL 11

PostgreSQL with TPA

Getting ready

How to do it...

There’s more

Learn more on Discord