Book Image

PostgreSQL 14 Administration Cookbook

By : Simon Riggs, Gianni Ciolli
Book Image

PostgreSQL 14 Administration Cookbook

By: Simon Riggs, Gianni Ciolli

Overview of this book

PostgreSQL is a powerful, open-source database management system with an enviable reputation for high performance and stability. With many new features in its arsenal, PostgreSQL 14 allows you to scale up your PostgreSQL infrastructure. With this book, you'll take a step-by-step, recipe-based approach to effective PostgreSQL administration. This book will get you up and running with all the latest features of PostgreSQL 14 while helping you explore the entire database ecosystem. You’ll learn how to tackle a variety of problems and pain points you may face as a database administrator such as creating tables, managing views, improving performance, and securing your database. As you make progress, the book will draw attention to important topics such as monitoring roles, validating backups, regular maintenance, and recovery of your PostgreSQL 14 database. This will help you understand roles, ensuring high availability, concurrency, and replication. Along with updated recipes, this book touches upon important areas like using generated columns, TOAST compression, PostgreSQL on the cloud, and much more. By the end of this PostgreSQL book, you’ll have gained the knowledge you need to manage your PostgreSQL 14 database efficiently, both in the cloud and on-premise.
Table of Contents (14 chapters)

Quickly estimating the number of rows in a table

We don't always need an accurate count of rows, especially on a large table that may take a long time to execute. Administrators often need to estimate how big a table is so that they can estimate how long other operations may take.

How to do it…

The Postgres optimizer can provide a quick estimate of the number of rows in a table simply by using its statistics:

                          QUERY PLAN                            
 Seq Scan on mytable  (cost=0.00..2640.00 rows=100000 width=97)
(1 row)

We can directly compute a similar number using roughly the same calculation:

SELECT (CASE WHEN reltuples > 0 THEN pg_relation_size(oid)*reltuples/(8192*relpages) 
END)::bigint AS estimated_row_count
FROM pg_class
WHERE oid = 'mytable'::regclass;

This gives us the following output:

(1 row)

Both queries return a row count very quickly, no matter how large the table that we are examining is, because they use statistics that were collected in advance. You may want to create a SQL function for the preceding calculation, so you won't need to retype the SQL code every now and then.

The following function estimates the total number of rows using a mathematical procedure called extrapolation. In other words, we take the average number of bytes per row resulting from the last statistics collection, and we apply it to the current table size:

CREATE OR REPLACE FUNCTION estimated_row_count(text) 
RETURNS bigint 
AS $$ 
SELECT (CASE WHEN reltuples > 0 THEN 
             ELSE 0 
FROM pg_class 
WHERE oid = $1::regclass; 

How it works…

We saw the pg_relation_size() function earlier, so we know that it brings back an accurate value for the current size of the table.

When we vacuum a table in Postgres, we record two pieces of information in the pg_class catalog entry for the table. These two items are the number of data blocks in the table (relpages) and the number of rows in the table (reltuples). Some people think they can use the value of reltuples in pg_class as an estimate, but it could be severely out of date. You will also be fooled if you use information in another table named pg_stat_user_tables, which is discussed in more detail in Chapter 10Performance and Concurrency.

The Postgres optimizer uses the relpages and reltuples values to calculate the average rows per block, which is also known as the average tuple density.

If we assume that the average tuple density remains constant over time, then we can calculate the number of rows using this formula: Row estimate = number of data blocks * rows per block.

We include some code to handle cases where the reltuples or relpages fields are zero. The Postgres optimizer actually works a little harder than we do in that case, so our estimate isn't very good.

The WHERE oid = 'mytable'::regclass; syntax introduces the concept of object identifier types. They just use a shorthand trick to convert the name of an object to the object identifier number for that object. The best way to understand this is to think of that syntax as meaning the same as a function named relname2relid().

There's more…

The good thing about the preceding recipe is that it returns a value in about the same time, no matter how big the table is. The bad thing about it is that pg_relation_size() requests a lock on the table, so if any other user has an AccessExclusiveLock lock on the table, then the table size estimate will wait for the lock to be released before returning a value.

Err... so what is an AccessExclusiveLock lock? While performing a SQL maintenance action, such as changing the data type of a column, PostgreSQL will lock out all other actions on that table, including pg_relation_size, which takes a lock in the AccessShareLock mode. For me, a typical case is when I issue some form of SQL maintenance action, such as ALTER TABLE, and the statement takes much longer than I thought it would. At that point, I think, Oh, was that table bigger than I thought? How long will I be waiting? Yes, it's better to calculate that beforehand, but hindsight doesn't get you out of the hole you are in right now. So, we need a way to calculate the size of a table without needing the lock.

A solution is to look at the operating system files that Postgres uses to store data, and figure out how large they are, but that requires a high level of security than most people usually allow. In any case, looking at files without a lock could cause problems if the table were dropped or changed.