Learning PostgreSQL 11 - Third Edition

By : Salahaldin Juba, Andrey Volkov

Learning PostgreSQL 11 - Third Edition

By: Salahaldin Juba, Andrey Volkov

Overview of this book

PostgreSQL is one of the most popular open source database management systems in the world, and it supports advanced features included in SQL standards. This book will familiarize you with the latest features in PostgreSQL 11, and get you up and running with building efficient PostgreSQL database solutions from scratch. Learning PostgreSQL, 11 begins by covering the concepts of relational databases and their core principles. You’ll explore the Data Definition Language (DDL) and commonly used DDL commands supported by ANSI SQL. You’ll also learn how to create tables, define integrity constraints, build indexes, and set up views and other schema objects. As you advance, you’ll come to understand Data Manipulation Language (DML) and server-side programming capabilities using PL/pgSQL, giving you a robust background to develop, tune, test, and troubleshoot your database application. The book will guide you in exploring NoSQL capabilities and connecting to your database to manipulate data objects. You’ll get to grips with using data warehousing in analytical solutions and reports, and scaling the database for high availability and performance. By the end of this book, you’ll have gained a thorough understanding of PostgreSQL 11 and developed the necessary skills to build efficient database solutions.

Free Chapter

Relational Databases

Database management systems

Relational and object-relational databases

Relational algebra

Data modeling

Summary

Questions

PostgreSQL in Action

An overview of PostgreSQL

PostgreSQL capabilities

Installing PostgreSQL

Summary

Questions

PostgreSQL Basic Building Blocks

Database coding

PostgreSQL object hierarchy

PostgreSQL database components

PostgreSQL native data types

The car web portal database

Summary

Questions

PostgreSQL Advanced Building Blocks

Views

Indexes

Functions

User-defined data types

Trigger and rule systems

Summary

Questions

SQL Language

SQL fundamentals

Querying data with SELECT statements

Changing the data in a database

Summary

Questions

Advanced Query Writing

Common table expressions

Window functions

Advanced SQL techniques

Summary

Questions

Server-Side Programming with PL/pgSQL

The SQL language and PL/pgSQL – a comparison

Functions and procedures – a comparison

PostgreSQL function parameters

The PostgreSQL PL/pgSQL control statements

Function predefined variables

Exception handling

Dynamic SQL

Summary

Questions

OLAP and Data Warehousing

OLTP and OLAP

Online analytical processing

Summary

Beyond Conventional Data Types

Arrays

The hash store data structure

The JSON data structure

A PostgreSQL full-text search

Summary

Questions

Transactions and Concurrency Control

Transactions

Explicit locking

Transaction in functions and procedures

Summary

Questions

PostgreSQL Security

Authentication in PostgreSQL

Authentication best practices

PostgreSQL default access privileges

Role system and proxy authentication

PostgreSQL security levels

Encrypting data

Summary

Questions

The PostgreSQL Catalog

The system catalog

System catalog for administrators

Cleaning up the database

Tuning for performance

Selective dump

Summary

Questions

Optimizing Database Performance

PostgreSQL configuration tuning

Tuning performance for write

Tuning performance for read

Detecting problems in query plans

Common mistakes in writing queries

Table partitioning

Query rewriting

Summary

Questions

Testing

Technical requirements

Unit testing

Database-abstraction interfaces

Summary

Questions

Using PostgreSQL in Python Applications

Technical requirements

Python DB-API 2.0

Low-level database access with Psycopg 2

Alternative drivers for PostgreSQL

Object-relational mapping with SQLAlchemy

Summary

Questions

Scalability

Technical requirements

The problem of scalability and the CAP theorem

Data replication in PostgreSQL

Building a scalable PostgreSQL solution

Summary

Questions

Database management systems

Different database management systems (DBMS) support diverse application scenarios, use cases, and requirements. DBMS have a long history. First, we will quickly take a look at the recent history, and then explore the market-dominant database management system categories.

A brief history

Broadly, the term database can be used to present a collection of things. Moreover, this term brings to mind many other terms including data, information, data structure, and management. A database can be defined as a collection or repository of data, which has a certain structure, managed by a DBMS. Data can be structured as tabular data, semi-structured as XML documents, or unstructured data that does not fit a predefined data model.

In the early days, databases were mainly aimed at supporting business applications; this led us to the well-defined relational algebra and relational database systems. With the introduction of object-oriented languages, new paradigms of DBMS appeared, such as object-relational databases and object-oriented databases. Also, many businesses, as well as scientific applications, use arrays, images, and spatial data; thus, new models such as raster, map, and array algebra are supported. Graph databases are used to support graph queries such as the shortest path from one node to another, along with supporting traversal queries easily.

With the advent of web applications such as social portals, it is now necessary to support a huge number of requests in a distributed manner. This has led to another new paradigm of databases called Not Only SQL (NoSQL), which has different requirements, such as performance over fault tolerance and horizontal scaling capabilities. In general, the timeline of database evolution was greatly affected by many factors, such as the following:

Functional requirements: The nature of the applications using a DBMS led to the development of extensions on top of relational databases such as PostGIS (for spatial data), or even a dedicated DBMS such as SciDB (for scientific data analytics).
Nonfunctional requirements: The success of object-oriented programming languages has created new trends such as object-oriented databases. Object-relational DBMS have appeared to bridge the gap between relational databases and the object-oriented programming languages. Data explosion and the necessity to handle terabytes of data on commodity hardware have led to database systems that can easily scale up horizontally.

Database categories

Many database models have appeared and vanished, such as the network model and the hierarchical model. The predominant categories now in the market are relational, object-relational, and NoSQL databases. One should not think of NoSQL and SQL databases as rivals; they are complementary to each other. By utilizing different database systems, one can overcome many limitations and get the best of different technologies.

NoSQL databases can provide great benefits such as availability, schema-free, and horizontal scaling, but they also have limitations such as performance, data retrieval constraints, and learning time. Relational databases often adhere to SQL as defined by ISO. SQL is a very expressive and extremely powerful tool for retrieving data in different forms. Many NoSQL databases such as Cassandra lack the capability to retrieve data as in relational databases.

NoSQL databases

NoSQL databases are affected by the CAP theorem, also known as Brewer's theorem. In 2002, S. Gilbert and N. Lynch published a formal proof of the CAP theorem in their article Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services. In 2009, the NoSQL movement began. Currently, there are over 150 NoSQL databases (nosql-database.org).

The CAP theorem

The CAP theorem states that it is impossible for a distributed computing system to simultaneously provide all three of the following guarantees:

Consistent: All clients see (immediately) the latest data even in the case of updates.
Available: All clients can find a replica of some data even in the case of a node failure. This means that even if some part of the system goes down, the clients can still access consistent and valid data.
Partition tolerance: The system continues to work regardless of arbitrary message loss or failure of part of the system.

The choice of which features to discard determines the nature of the system. For example, one could sacrifice consistency to get a scalable, simple, and high-performance database management system. Often, the main difference between a relational database and a NoSQL database is consistency. A relational database enforces atomicity, consistency, isolation, and durability (ACID) properties. In contrast, many NoSQL databases adopt the basically available, soft-state, eventual consistency (BASE) model.

NoSQL motivation

A NoSQL database does not entity relation model for data storage, manipulation, and retrieval. NoSQL databases are often distributed, open source, and horizontally scalable. NoSQL often adopts the base model, which prizes availability over consistency, and informally guarantees that if no new updates are made on a data item, eventually all access to that data item will return the latest version of that data item. The advantages of this approach include the following:

Simplicity of design
Horizontal scaling and easy replication

Schema-free
A huge amount of data support

We will now explore a few types of NoSQL databases.

Key-value databases

The key-value store is the simplest database store. In this database model, the storage, as its name suggests, is based on maps or hash tables. Some key-value databases allow complex values to be stored as lists and hash tables. Key-value pairs are extremely fast for certain scenarios but lack the support for complex queries and aggregation. Some of the existing open source key-value databases are Riak, Redis, Couchbase Server, and MemcacheDB.

Columnar databases

Columnar or column-oriented databases are based on columns. Data in a certain column in a two-dimensional relation is stored together.

Unlike relational databases, adding columns is inexpensive and is done on a row-by-row basis. Rows can have a different set of columns. Tables can benefit from this structure by eliminating the storage cost of the null values. This model is best suited for distributed databases.

HBase is one of the most famous columnar databases. It is based on the Google Bigtable storage system. Column-oriented databases are designed for huge data scenarios, so they scale up easily. For example, Facebook uses HBase to power their message infrastructure. For small datasets, HBase is not a suitable architecture. First, the recommended hardware topology for HBase is a five-node server deployment. Also, it needs a lot of administration and is difficult to learn and master.

Document databases

A document-oriented database is suitable for documents and semi-structured data. The central concept of a document-oriented database is the notion of a document. Documents encapsulate and encode data (or information) in some standard formats or encodings such as XML, JSON, and BSON. Documents do not adhere to a standard schema or have the same structure, so they provide a high degree of flexibility. Unlike relational databases, changing the structure of the document is simple and does not lock the clients from accessing the data.

Document databases merge the power of relational databases and column-oriented databases. They provide support for ad hoc queries and can be scaled up easily. Depending on the design of the document database, MongoDB is designed to handle a huge amount of data efficiently. On the other hand, CouchDB provides high availability even in the case of hardware failure.

Graph databases

Graph databases are based on graph theory, where a database consists of nodes and edges. The nodes, as well as the edges, can be assigned data. Graph databases allow traversing between the nodes using edges. As a graph is a generic data structure, graph databases are capable of representing different data. A famous implementation of an open source, commercially supported graph database is Neo4j.

Learning PostgreSQL 11 - Third Edition

By : Salahaldin Juba, Andrey Volkov

Learning PostgreSQL 11 - Third Edition

By: Salahaldin Juba, Andrey Volkov

Overview of this book

Related Content you might be interested in

Current Title:

Learning PostgreSQL 11 - Third Edition

PostgreSQL 11 Administration Cookbook

PostgreSQL 16 Administration Cookbook

Learn PostgreSQL