Book Image

PostgreSQL 14 Administration Cookbook

By : Simon Riggs, Gianni Ciolli
5 (1)
Book Image

PostgreSQL 14 Administration Cookbook

5 (1)
By: Simon Riggs, Gianni Ciolli

Overview of this book

PostgreSQL is a powerful, open-source database management system with an enviable reputation for high performance and stability. With many new features in its arsenal, PostgreSQL 14 allows you to scale up your PostgreSQL infrastructure. With this book, you'll take a step-by-step, recipe-based approach to effective PostgreSQL administration. This book will get you up and running with all the latest features of PostgreSQL 14 while helping you explore the entire database ecosystem. You’ll learn how to tackle a variety of problems and pain points you may face as a database administrator such as creating tables, managing views, improving performance, and securing your database. As you make progress, the book will draw attention to important topics such as monitoring roles, validating backups, regular maintenance, and recovery of your PostgreSQL 14 database. This will help you understand roles, ensuring high availability, concurrency, and replication. Along with updated recipes, this book touches upon important areas like using generated columns, TOAST compression, PostgreSQL on the cloud, and much more. By the end of this PostgreSQL book, you’ll have gained the knowledge you need to manage your PostgreSQL 14 database efficiently, both in the cloud and on-premise.
Table of Contents (14 chapters)

How many rows are there in a table?

There is no limit on the number of rows in a table, but the table is limited to available disk space and memory/swap space. If you are storing rows that exceed an aggregated data size of 2 KB, then the maximum number of rows may be limited to 4 billion or fewer.

Counting is one of the easiest SQL statements, so it is also many people's first experience of a PostgreSQL query.

How to do it…

From any interface, the SQL command used to count rows is as follows:

SELECT count(*) FROM table;

This will return a single integer value as the result.

In psql, the command looks like the following:

cookbook=# select count(*) from orders;
 count 
-------
   345
(1 row)

How it works...

PostgreSQL can choose between two techniques available to compute the SQL count(*) function. Both are available in all the currently supported versions:

  • The first is called sequential scan. We access every data block in the table one after the other, reading the number of rows in each block. If the table is on the disk, it will cause a beneficial disk access pattern, and the statement will be fairly fast.
  • The other technique is known as an index-only scan. It requires an index on the table, and it covers a more general case than optimizing SQL queries with count(*), so we will cover it in more detail in Chapter 10Performance and Concurrency.

Some people think that the count SQL statement is a good test of the performance of a DBMS. Some DBMSs have specific tuning features for the count SQL statement, and Postgres optimizes this using index-only scans. The PostgreSQL project has talked about this many times, but few people thought we should try to optimize this. Yes, the count function is frequently used within applications, but without any WHERE clause, it is not that useful. Therefore, the index-only scans feature has been implemented, which applies to more real-world situations, as well as this recipe.

We scan every block of the table because of a major feature of Postgres, named Multiversion Concurrency Control (MVCC). MVCC allows us to run the count SQL statement at the same time that we are inserting, updating, or deleting data from the table. That's a very cool feature, and we went to a lot of trouble in Postgres to provide it for you.

MVCC requires us to record information on each row of a table, stating when that change was made. If the changes were made after the SQL statement began to execute, then we just ignore those changes. This means that we need to carry out visibility checks on each row in the table to allow us to work out the results of the count SQL statement. The optimization provided by index-only scans is the ability to skip such checks on the table blocks that are already known to be visible to all current sessions. Rows in these blocks can be counted directly on the index, which is normally smaller than the table, and is, therefore, faster.

If you think a little deeper about this, you'll see that the result of the count SQL statement is just the value at a moment in time. Depending on what happens to the table, that value could change a little or a lot while the count SQL statement is executing. So, once you've executed this, all you really know is that, at a particular point in the past, there were exactly x rows in the table.