Book Image

Learning PostgreSQL

Book Image

Learning PostgreSQL

Overview of this book

PostgreSQL is one of the most powerful and easy to use database management systems. It supports the most advanced features included in SQL standards. The book starts with the introduction of relational databases with PostegreSQL. It then moves on to covering data definition language (DDL) with emphasis on PostgreSQL and common DDL commands supported by ANSI SQL. You will then learn the data manipulation language (DML), and advanced topics like locking and multi version concurrency control (MVCC). This will give you a very robust background to tune and troubleshoot your application. The book then covers the implementation of data models in the database such as creating tables, setting up integrity constraints, building indexes, defining views and other schema objects. Next, it will give you an overview about the NoSQL capabilities of PostgreSQL along with Hstore, XML, Json and arrays. Finally by the end of the book, you'll learn to use the JDBC driver and manipulate data objects in the Hibernate framework.
Table of Contents (21 chapters)
Learning PostgreSQL
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Cross column correlation


Cross column correlation can cause a wrong estimation of the number of rows as PostgreSQL assumes that each column is independent of other columns. In reality, there are a lot of examples where this is not true. For example, one could find patterns where the first and last names in certain cultures are correlated. Another example is the country and language preference of the users. To understand cross column correlation, let's create a table called users, as follows:

CREATE TABLE users (
  id serial primary key,
  name text,
  country text,
  language text
);
INSERT INTO users(name, country, language) SELECT generate_random_text(8), 'Germany', 'German' FROM generate_series(1, 10);
INSERT INTO users(name, country, language) SELECT generate_random_text(8), 'USA', 'English' FROM generate_series(1, 10);
VACUUM ANALYZE users;

If one wants to get users whose language is German and country is Germany, he/she will end up with a wrong estimation of rows as both columns are...