Book Image

Learning PostgreSQL 11 - Third Edition

By : Salahaldin Juba, Andrey Volkov
Book Image

Learning PostgreSQL 11 - Third Edition

By: Salahaldin Juba, Andrey Volkov

Overview of this book

PostgreSQL is one of the most popular open source database management systems in the world, and it supports advanced features included in SQL standards. This book will familiarize you with the latest features in PostgreSQL 11, and get you up and running with building efficient PostgreSQL database solutions from scratch. Learning PostgreSQL, 11 begins by covering the concepts of relational databases and their core principles. You’ll explore the Data Definition Language (DDL) and commonly used DDL commands supported by ANSI SQL. You’ll also learn how to create tables, define integrity constraints, build indexes, and set up views and other schema objects. As you advance, you’ll come to understand Data Manipulation Language (DML) and server-side programming capabilities using PL/pgSQL, giving you a robust background to develop, tune, test, and troubleshoot your database application. The book will guide you in exploring NoSQL capabilities and connecting to your database to manipulate data objects. You’ll get to grips with using data warehousing in analytical solutions and reports, and scaling the database for high availability and performance. By the end of this book, you’ll have gained a thorough understanding of PostgreSQL 11 and developed the necessary skills to build efficient database solutions.
Table of Contents (18 chapters)

Relational algebra

Relational algebra is the formal language of the relational model. It defines a set of closed operations over relations, that is, the result of each operation is a new relation. Relational algebra inherits many operators from set algebra. Relational algebra operations can be categorized into two groups:

  • The first one is a group of operations that are inherited from set theory such as UNION, intersection, set difference, and Cartesian product, also known as cross product.

  • The second is a group of operations that are specific to the relational model such as SELECT and PROJECT. Relational algebra operations could also be classified as binary and unary operations.

The primitive operators are as follows:

  • SELECT (σ): A unary operation written as σϕR where ϕ is a predicate. The selection retrieves the tuples in R, where ϕ holds.
  • PROJECT (π): A unary operation used to slice the relation in a vertical dimension, that is, attributes. This operation is written as πa1,a2,…,an R(), where a1, a2, ..., an are a set of attribute names.
  • Cartesian product (×): A binary operation used to generate a more complex relation by joining each tuple of its operands together. Let's assume that R and S are two relations, then R×S = (r1, r2, ..., rn, s1, s2, ..., sn) where (r1, r2,...,rn) R and (s1, s2, ..., sn) S.
  • UNION (): Appends two relations together; note that the relations should be UNION-compatible, that is, they should have the same set of ordered attributes. Formally, R∪S = (r1,r2,...rn)(s1,s2,...,sn) where (r1, r2,...,rn) ∈ R and (s1, s2, ..., sn) S.
  • Difference (-): A binary operation in which the operands should be UNION-compatible. Difference creates a new relation from the tuples, which exist in one relation but not in the other. The set difference for the relation R and S can be given as R-S = (r1,r2,...rn) where (r1,r2,...rn) R and (r1,r2,...rn) S.
  • RENAME (ρ): A unary operation that works on attributes. This operator is mainly used to distinguish the attributes with the same names but in different relation when joined together, or it is used to give a more user-friendly name for the attribute for presentation purposes. RENAME is expressed as ρa/bR, where a and b are attribute names and b is an attribute of R.

In addition to the primitive operators, there are aggregation functions such as sum, count, min, max, and avg aggregates. Primitive operators can be used to define other relation operators such as left-join, right-join, equi-join, and intersection. Relational algebra is very important due to its expressive power in optimizing and rewriting queries. For example, the selection is commutative, so σaσbR = σbσaR. A cascaded selection may also be replaced by a single selection with a conjunction of all the predicates, that is, σaσbR = σa AND b R.

The SELECT and PROJECT operations

SELECT is used to restrict tuples from the relation. SELECT always returns a unique set of tuples; that is inherited from entity integrity constraint. For example, the query give me the customer information where the customer_id equals 2 is written as follows:

σcustomer_id =2 customer

The selection, as mentioned earlier, is commutative; the query give me all customers where the customer's email is known, and the customer's first name is kim is written in three different ways, as follows:

σemail is not nullfirst_name =kim customer)
σfirst_name =kimemail is not null customer)
σfirst_name =kim and email is not null (customer)

The selection predicates are certainly determined by the data types. For numeric data types, the comparison operator might be ≠, =, <, >, ≥, or ≤. The predicate expression can also contain complex expressions and functions. The equivalent SQL statement for the SELECT operator is the SELECT * statement, and the predicate is defined in the WHERE clause.

The * symbol means all the relation attributes; note that in a production environment, it is not recommended to use *. Instead, one should list all the relation attributes explicitly. Using * in production code could easily break the application since the order and the type of expected result is given implicitly. This situation can occur when one renames a table attribute field, or adds a new column.

The following SELECT statement is equivalent to the relational algebra expression σcustomer_id =2 customer:

SELECT * FROM customer WHERE customer_id = 2;

The PROJECT operation could be visualized as a vertical slicing of the table. The query give me the customer names is written in relational algebra as follows:

π first_name, last_name customer

The following is the result of projection expression:

first_name

last_name

thomas

sieh

wang

kim

Duplicate tuples are not allowed in the formal relational model; the number of tuples returned from the PROJECT operator is always equal to or less than the number of total tuples in the relation. If a PROJECT operator's attribute list contains a primary key, then the resultant relation has the same number of tuples as the projected relation.

The projection operator also can be optimized, for example, cascading projections could be optimized as the following expression:

πaa,πb(R)) = πa(R)

The SQL equivalent for the PROJECT operator is SELECT DISTINCT. The DISTINCT keyword is used to eliminate duplicates. To get the result shown in the preceding expression, one could execute the following SQL statement:

SELECT DISTINCT first_name, last_name FROM customers;

The sequence of the execution of the PROJECT and SELECT operations can be interchangeable in some cases. The query give me the name of the customer with customer_id equal to 2 could be written as follows:

σcustomer_id =2first_name, last_name customer)
π first_name, last_name(σcustomer_id =2 customer)

In other cases, the PROJECT and SELECT operators must have an explicit order, as shown in the following example; otherwise, it will lead to an incorrect expression. The query give me the last name of the customers where the first name is kim could be written in the following way:

π last_name(σfirst_name=kim customer)

The RENAME operation

The RENAME operation is used to alter the attribute name of the resultant relation or to give a specific name to the resultant relation. The RENAME operation is used to perform the following:

  • Remove confusion if two or more relations have attributes with the same name
  • Provide user-friendly names for attributes, especially when interfacing with reporting engines
  • Provide a convenient way to change the relation definition and still be backward compatible

The AS keyword in SQL is the equivalent of the RENAME operator in relational algebra. The following SQL example creates a relation with one tuple and one attribute, which is renamed PI:

SELECT 3.14::real AS PI;

The set theory operations

The set theory operations are UNION, intersection, and minus (difference). Intersection is not a primitive relational algebra operator, because it can be written using the UNION and difference operators:

A∩B = ((A∪B)-(A-B))-(B-A)

The intersection and union are commutative:

A∩B=B∩A

A∪B=B∪A

For example, the query give me all the customer IDs where the customer does not have a service assigned to him could be written as follows:

πcustomer_id customer-πcustomer_id customer_service

The Cartesian product operation

The Cartesian product operation is used to combine tuples from two relations into a single one. The number of attributes in the single relation equals the sum of the number of attributes of the two relations. The number of tuples in the single relation equals the product of the number of tuples in the two relations. Let's assume that A and B are two relations, and C = A × B:

The number of attributes of C = the number of attributes of A + the number of attributes of B

The number of tuples of C = the number of tuples of A * the number of tuples of B

The following table shows the cross join of customer and customer service:

The equivalent SQL join for Cartesian product is CROSS JOIN; the query for the customer with customer_id equal to 1, retrieve the customer_id, name, and the customer service IDs can be written in SQL as follows:

SELECT DISTINCT customer_id, first_name, last_name, service_id FROM customer AS c CROSS JOIN customer_service AS cs WHERE c.customer_id=cs.customer_id AND c.customer_id = 1;

In the preceding example, one can see the relationship between relational algebra and the SQL language. For example, we have used SELECT, RENAME, PROJECT, and Cartesian product. The preceding example shows how relational algebra could be used to optimize query execution. This example could be executed in several ways:

Execution plan 1:

  1. SELECT the customer where customer_id = 1
  2. SELECT the customer service where customer_id = 1
  1. CROSS JOIN the relations resulting from Step 1 and Step 2
  2. PROJECT customer_id, first_name, last_name, and service_id from the relation resulting from Step 3

Execution plan 2:

  1. CROSS JOIN customer and customer_service
  2. SELECT all the tuples where Customer_service.customer_id=customer.customer_id and customer.customer_id = 1
  3. PROJECT customer_id, first_name, last_name, and service_id from the relation resulting from Step 2
The SELECT query is written in this way to show how to translate relational algebra to SQL. In modern SQL code, we can PROJECT attributes without using DISTINCT. In addition to that, one should use a proper join instead of cross join.

Each execution plan has a cost in terms of CPU, random access memory (RAM), and hard disk operations. The RDBMS picks the one with the lowest cost. In the preceding execution plans, the RENAME and DISTINCT operators were ignored for simplicity.