Introducing quorum COMMIT

PostgreSQL has offered support to synchronous replication for quite some time now. Traditionally, only one server could act as a synchronous standby. This has changed. In PostgreSQL 10.0, the community has introduced quorum COMMITs. The idea is actually quite simple. Suppose you want five out of seven servers to confirm a transaction before the master returns a COMMIT. This is exactly what a quorum COMMIT does. It gives the developers and administrators a chance to define what COMMIT does in a more fine-grained way.

To configure quorum COMMITs, the syntax of synchronous_standby_names has been extended. Here are two simple examples:

synchronous_standby_names = ANY 1 (s1, s2)  
synchronous_standby_names = ANY 2 (s1, s2, s3)

Partitioning data

There have been talks about introducing partitioning to PostgreSQL for years. However, big, important features take time to implement and this is especially true if you are aiming for a good, extensible, and future-proof implementation. In PostgreSQL 10.0, table partitioning has finally made it to the PostgreSQL core. Of course, the implementation is far from complete, and a lot of work has to be done in the future to add even more features. However, support for partitioning is important and will definitely be one of the most desirable things in PostgreSQL 10.0.

As of now, partitioning is able to:

Automatically create proper child constraints
Route changes made to the parent table to the child table

However, as stated earlier, there are still a couple of missing features that have not been addressed yet. Here are some of the more important things:

Create child tables automatically in case data comes in, which is not covered by partitioning criteria yet
No support for hash partitioning
Move updated rows that no longer match the partition
Handle partitions in parallel

The roadmap for PostgreSQL 11.0 already suggests that many of these things might be supported in the next release.

Making use of CREATE STATISTICS

CREATE STATISTICS is definitely one of my personal favorite features of PostgreSQL 10.0 because it allows consultants to help customers in many real-world situations. So, what is it all about? When you run SQL, the optimizer has to come up with clever decisions to speed up your queries. However, to do so, it has to rely heavily on estimates to figure out how much data a certain clause or a certain operation returns. Before version 10.0, PostgreSQL only had information about individual columns. Let's look at an example:

SELECT * FROM car WHERE vendor = 'Ford' AND model = 'Mini Clubman';

In version 9.6, PostgreSQL checks which fraction of the table matches Ford and which fraction matches Mini Clubman. Then, it would try to guess how many rows match both criteria. Remember, PostgreSQL 9.6 only has information about each column—it does not know that these columns are actually related. Therefore, it will simply multiply the odds of finding Ford with the odds of finding Mini Clubman and use this number. However, Ford does not produce a Mini Clubman instance—only BMW does. Therefore, the estimate is wrong. The same cross column correlation problem can happen in other cases too. The number of rows returned by a join might not be clear and the number of groups returned by a GROUP BY clause might be an issue.

Consider the following example:

SELECT gender, age, count(*) FROM children_born GROUP BY gender, age

The number of children born to people of a certain age will definitely depend on their age. The likelihood that some 30 year old women will have children is pretty high and therefore there will be a count. However, if you happen to be 98, you might not be so lucky and it is pretty unrealistic to have a baby, especially if you are a man (men tend to not give birth to children).

CREATE STATISTICS will give the optimizer a chance to gain deeper insights into what is going on by storing multivariate statistics. The idea is to help the optimizer handle functional dependencies.

Improving parallelism

PostgreSQL 9.6 was the first version supporting parallel queries in their most basic form. Of course, not all parts of the server are fully parallel yet. Therefore, it is an ongoing effort to speed up even more operations than before. PostgreSQL 10.0 is a major step towards even more parallelism as a lot more operations can now benefit from multi-core systems.

Indexes are a key area of improvement and will benefit greatly from additional features introduced into PostgreSQL 10.0. There is now full support for parallel b-tree scans as well as for bitmap scans. For now, only b-tree indexes can benefit from parallelism but this will most likely change in future releases too, to ensure that all types of indexes can enjoy an even better performance.

In addition to indexing, the PostgreSQL community has also worked hard to introduce support for parallel merge joins and to allow for more procedures to run in parallel. Some of the latest blog posts from the PostgreSQL community already suggest that many new features related to parallelism are in the pipeline for PostgreSQL 11.0.

Mastering PostgreSQL 10

By : Hans-Jürgen Schönig

Mastering PostgreSQL 10

By: Hans-Jürgen Schönig

Overview of this book

Introducing quorum COMMIT

Partitioning data

Making use of CREATE STATISTICS

Improving parallelism

Mastering PostgreSQL 10

By : Hans-Jürgen Schönig

Mastering PostgreSQL 10

By: Hans-Jürgen Schönig

Overview of this book

Confirmation

Buy this book with your credits?

Submit Your Feedback

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access