How many rows are there in a table?
There is no limit on the number of rows in a table, but the table is limited to available disk space and memory/swap space. If you are storing rows that exceed an aggregated data size of 2 KB, then the maximum number of rows may be limited to 4 billion or fewer.
Counting is one of the easiest SQL statements, so it is also many people's first experience of a PostgreSQL query.
How to do it…
From any interface, the SQL command used to count rows is as follows:
SELECT count(*) FROM table;
This will return a single integer value as the result.
psql, the command looks like the following:
cookbook=# select count(*) from orders; count ------- 345 (1 row)
How it works...
PostgreSQL can choose between two techniques available to compute the SQL
count(*) function. Both are available in all the currently supported versions:
- The first is called sequential scan. We access every data block in the table one after the other, reading the number of rows in each block. If the table is on the disk, it will cause a beneficial disk access pattern, and the statement will be fairly fast.
- The other technique is known as an index-only scan. It requires an index on the table, and it covers a more general case than optimizing SQL queries with
count(*), so we will cover it in more detail in Chapter 10, Performance and Concurrency.
Some people think that the
count SQL statement is a good test of the performance of a DBMS. Some DBMSs have specific tuning features for the
count SQL statement, and Postgres optimizes this using index-only scans. The PostgreSQL project has talked about this many times, but few people thought we should try to optimize this. Yes, the
count function is frequently used within applications, but without any
WHERE clause, it is not that useful. Therefore, the index-only scans feature has been implemented, which applies to more real-world situations, as well as this recipe.
We scan every block of the table because of a major feature of Postgres, named Multiversion Concurrency Control (MVCC). MVCC allows us to run the
count SQL statement at the same time that we are inserting, updating, or deleting data from the table. That's a very cool feature, and we went to a lot of trouble in Postgres to provide it for you.
MVCC requires us to record information on each row of a table, stating when that change was made. If the changes were made after the SQL statement began to execute, then we just ignore those changes. This means that we need to carry out visibility checks on each row in the table to allow us to work out the results of the
count SQL statement. The optimization provided by index-only scans is the ability to skip such checks on the table blocks that are already known to be visible to all current sessions. Rows in these blocks can be counted directly on the index, which is normally smaller than the table, and is, therefore, faster.
If you think a little deeper about this, you'll see that the result of the count SQL statement is just the value at a moment in time. Depending on what happens to the table, that value could change a little or a lot while the
count SQL statement is executing. So, once you've executed this, all you really know is that, at a particular point in the past, there were exactly x rows in the table.