In this section, you will learn about basic locking mechanisms. The goal is to understand how locking works in general and how to get simple applications right.
To show you how things work, we will create a simple table. For demonstrative purposes, I will add one row to the table using a simple INSERT command:
test=# CREATE TABLE t_test (id int); CREATE TABLE
test=# INSERT INTO t_test VALUES (0); INSERT 0 1
The first important thing is that tables can be read concurrently. Many users reading the same data at the same time won't block each other. This allows PostgreSQL to handle thousands of users without any problems.
The question now is what happens if reads and writes occur at the same time? Here is an example. Let's assume that the table contains one row and its id = 0:
|
Transaction 1 |
Transaction 2 |
|
BEGIN; |
BEGIN; |
|
UPDATE t_test SET id = id + 1 RETURNING *; |
|
|
User will see 1 |
SELECT * FROM t_test; |
|
User will see 0 |
|
|
COMMIT; |
COMMIT; |
Two transactions are opened. The first one will change a row. However, this is not a problem as the second transaction can proceed. It will return the old row as it was before UPDATE. This behavior is called Multi-Version Concurrency Control (MVCC).
There is also a second important aspect—many commercial or open source databases are still unable to handle concurrent reads and writes. In PostgreSQL, this is absolutely not a problem—reads and writes can coexist.
After the transaction has been committed, the table will contain 1.
What will happen if two people change data at the same time? Here is an example:
|
Transaction 1 |
Transaction 2 |
|
BEGIN; |
BEGIN; |
|
UPDATE t_test SET id = id + 1 RETURNING *; |
|
|
It will return 2 |
UPDATE t_test SET id = id + 1 RETURNING *; |
|
It will wait for transaction 1 |
|
|
COMMIT; |
It will wait for transaction 1 |
|
It will reread the row, find 2, set the value, and return 3 |
|
|
COMMIT; |
Suppose you want to count the number of hits on a website. If you run the preceding code, no hits will be lost because PostgreSQL guarantees that one UPDATE statement is performed after another.
It is also worth noting that you can always run concurrent reads. Our two writes will not block reads.
Avoiding typical mistakes and explicit locking
In my life as a professional PostgreSQL consultant (https://www.cybertec-postgresql.com), I have seen a couple of mistakes that are repeated frequently. If there are constants in life, these typical mistakes are definitely some of the things that never change.
Here is my favorite:
|
Transaction 1 |
Transaction 2 |
|
BEGIN; |
BEGIN; |
|
SELECT max(id) FROM product; |
SELECT max(id) FROM product; |
|
User will see 17 |
User will see 17 |
|
User will decide to use 18 |
User will decide to use 18 |
|
INSERT INTO product ... VALUES (18, ...) |
INSERT INTO product ... VALUES (18, ...) |
|
COMMIT; |
COMMIT; |
In this case, there will be either a duplicate key violation or two identical entries. Neither variation of the problem is all that appealing.
One way to fix this problem is to use explicit table locking. The following code shows us the syntax definition of LOCK:
test=# \h LOCK
Command: LOCK
Description: lock a table
Syntax:
LOCK [ TABLE ] [ ONLY ] name [ * ] [, ...] [ IN lockmode MODE ] [ NOWAIT ]
where lockmode is one of:
ACCESS SHARE | ROW SHARE | ROW EXCLUSIVE | SHARE UPDATE EXCLUSIVE
| SHARE | SHARE ROW EXCLUSIVE | EXCLUSIVE | ACCESS EXCLUSIVE
URL: https://www.postgresql.org/docs/13/sql-lock.html
As you can see, PostgreSQL offers eight types of locks to lock an entire table. In PostgreSQL, a lock can be as light as an ACCESS SHARE lock or as heavy as an ACCESS EXCLUSIVE lock. The following list shows what these locks do:
- ACCESS SHARE: This type of lock is taken by reads and conflicts only with ACCESS EXCLUSIVE, which is set by DROP TABLE and so on. Practically, this means that SELECT cannot start if a table is about to be dropped. This also implies that DROP TABLE has to wait until a reading transaction is complete.
- ROW SHARE: PostgreSQL takes this kind of lock in the case of SELECT FOR UPDATE/SELECT FOR SHARE. It conflicts with EXCLUSIVE and ACCESS EXCLUSIVE.
- ROW EXCLUSIVE: This lock is taken by INSERT, UPDATE, and DELETE. It conflicts with SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE, and ACCESS EXCLUSIVE.
- SHARE UPDATE EXCLUSIVE: This kind of lock is taken by CREATE INDEX CONCURRENTLY, ANALYZE, ALTER TABLE, VALIDATE, and some other flavors of ALTER TABLE, as well as by VACUUM (not VACUUM FULL). It conflicts with the SHARE UPDATE EXCLUSIVE, SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE, and ACCESS EXCLUSIVE lock modes.
- SHARE: When an index is created, SHARE locks will be set. It conflicts with ROW EXCLUSIVE, SHARE UPDATE EXCLUSIVE, SHARE ROW EXCLUSIVE, EXCLUSIVE, and ACCESS EXCLUSIVE.
- SHARE ROW EXCLUSIVE: This one is set by CREATE TRIGGER and some forms of ALTER TABLE and conflicts with everything except ACCESS SHARE.
- EXCLUSIVE: This type of lock is by far the most restrictive one. It protects against reads and writes alike. If this lock is taken by a transaction, nobody else can read or write to the table that's been affected.
- ACCESS EXCLUSIVE: This lock prevents concurrent transactions from reading and writing.
Given the PostgreSQL locking infrastructure, one solution to the max problem we outlined previously would be as follows. The example in the following code shows how to lock a table:
BEGIN; LOCK TABLE product IN ACCESS EXCLUSIVE MODE; INSERT INTO product SELECT max(id) + 1, ... FROM product; COMMIT;
Keep in mind that this is a pretty nasty way of doing this kind of operation because nobody else can read or write to the table during your operation. Therefore, ACCESS EXCLUSIVE should be avoided at all costs.
Considering alternative solutions
There is an alternative solution to this problem. Consider an example where you are asked to write an application to generate invoice numbers. The tax office might require you to create invoice numbers without gaps and without duplicates. How would you do this? Of course, one solution would be a table lock. However, you can really do better. Here is what you can do to handle the numbering problem we are trying to solve:
test=# CREATE TABLE t_invoice (id int PRIMARY KEY);
CREATE TABLE
test=# CREATE TABLE t_watermark (id int); CREATE TABLE
test=# INSERT INTO t_watermark VALUES (0); INSERT 0
test=# WITH x AS (UPDATE t_watermark SET id = id + 1 RETURNING *) INSERT INTO t_invoice SELECT * FROM x RETURNING *;
id
----
1
(1 row)
In this case, we introduced a table called t_watermark. It contains just one row. The WITH command will be executed first. The row will be locked and incremented, and the new value will be returned. Only one person can do this at a time. The value returned by the CTE is then used in the invoice table. It is guaranteed to be unique. The beauty is that there is only a simple row lock on the watermark table, which leads to no reads being blocked in the invoice table. Overall, this way is more scalable.