Quickly estimating the number of rows in a table
We don't always need an accurate count of rows, especially on a large table that may take a long time to execute. Administrators often need to estimate how big a table is so that they can estimate how long other operations may take.
How to do it…
The Postgres optimizer can provide a quick estimate of the number of rows in a table simply by using its statistics:
EXPLAIN SELECT * FROM mytable; QUERY PLAN ---------------------------------------------------------------- Seq Scan on mytable (cost=0.00..2640.00 rows=100000 width=97) (1 row)
We can directly compute a similar number using roughly the same calculation:
SELECT (CASE WHEN reltuples > 0 THEN pg_relation_size(oid)*reltuples/(8192*relpages) ELSE 0 END)::bigint AS estimated_row_count FROM pg_class WHERE oid = 'mytable'::regclass;
This gives us the following output:
estimated_row_count --------------------- 99960 (1 row)
Both queries return a row count very quickly, no matter how large the table that we are examining is, because they use statistics that were collected in advance. You may want to create a SQL function for the preceding calculation, so you won't need to retype the SQL code every now and then.
The following function estimates the total number of rows using a mathematical procedure called extrapolation. In other words, we take the average number of bytes per row resulting from the last statistics collection, and we apply it to the current table size:
CREATE OR REPLACE FUNCTION estimated_row_count(text) RETURNS bigint LANGUAGE sql AS $$ SELECT (CASE WHEN reltuples > 0 THEN pg_relation_size($1)*reltuples/(8192*relpages) ELSE 0 END)::bigint FROM pg_class WHERE oid = $1::regclass; $$;
How it works…
We saw the
pg_relation_size() function earlier, so we know that it brings back an accurate value for the current size of the table.
When we vacuum a table in Postgres, we record two pieces of information in the
pg_class catalog entry for the table. These two items are the number of data blocks in the table (
relpages) and the number of rows in the table (
reltuples). Some people think they can use the value of
pg_class as an estimate, but it could be severely out of date. You will also be fooled if you use information in another table named
pg_stat_user_tables, which is discussed in more detail in Chapter 10, Performance and Concurrency.
We include some code to handle cases where the
relpages fields are zero. The Postgres optimizer actually works a little harder than we do in that case, so our estimate isn't very good.
WHERE oid = 'mytable'::regclass; syntax introduces the concept of object identifier types. They just use a shorthand trick to convert the name of an object to the object identifier number for that object. The best way to understand this is to think of that syntax as meaning the same as a function named
The good thing about the preceding recipe is that it returns a value in about the same time, no matter how big the table is. The bad thing about it is that
pg_relation_size() requests a lock on the table, so if any other user has an
AccessExclusiveLock lock on the table, then the table size estimate will wait for the lock to be released before returning a value.
Err... so what is an
AccessExclusiveLock lock? While performing a SQL maintenance action, such as changing the data type of a column, PostgreSQL will lock out all other actions on that table, including
pg_relation_size, which takes a lock in the
AccessShareLock mode. For me, a typical case is when I issue some form of SQL maintenance action, such as
ALTER TABLE, and the statement takes much longer than I thought it would. At that point, I think, Oh, was that table bigger than I thought? How long will I be waiting? Yes, it's better to calculate that beforehand, but hindsight doesn't get you out of the hole you are in right now. So, we need a way to calculate the size of a table without needing the lock.
A solution is to look at the operating system files that Postgres uses to store data, and figure out how large they are, but that requires a high level of security than most people usually allow. In any case, looking at files without a lock could cause problems if the table were dropped or changed.