Learning PostgreSQL 10 - Second Edition

Learning PostgreSQL 10 - Second Edition

Overview of this book

PostgreSQL is one of the most popular open source databases in the world, supporting the most advanced features included in SQL standards. This book will familiarize you with the latest features released in PostgreSQL 10. We’ll start with a thorough introduction to PostgreSQL and the new features introduced in PostgreSQL 10. We’ll cover the Data Definition Language (DDL) with an emphasis on PostgreSQL, and the common DDL commands supported by ANSI SQL. You’ll learn to create tables, define integrity constraints, build indexes, and set up views and other schema objects. Moving on, we’ll cover the concepts of Data Manipulation Language (DML) and PostgreSQL server-side programming capabilities using PL/pgSQL. We’ll also explore the NoSQL capabilities of PostgreSQL and connect to your PostgreSQL database to manipulate data objects. By the end of this book, you’ll have a thorough understanding of the basics of PostgreSQL 10 and will have the necessary skills to build efficient database solutions.

Title Page

Credits

About the Authors

About the Reviewers

www.PacktPub.com

Customer Feedback

Preface

Free Chapter

Relational Databases

Database management systems

Relational and object relational databases

Relational algebra

Data modeling

Summary

PostgreSQL in Action

An overview of PostgreSQL

PostgreSQL capabilities

Installing PostgreSQL

Summary

PostgreSQL Basic Building Blocks

Database coding

PostgreSQL objects hierarchy

PostgreSQL database components

The car web portal database

Summary

PostgreSQL Advanced Building Blocks

Views

Indexes

User-defined data types

Triggers and rule systems

Summary

SQL Language

SQL fundamentals

Querying data with SELECT statements

Changing the data in the database

Summary

Advanced Query Writing

Common table expressions

Window functions

Advanced SQL techniques

Summary

Server-Side Programming with PL/pgSQL

SQL language and PL/pgSQL – a comparison

PostgreSQL function parameters

The PostgreSQL PL/pgSQL control statements

Function predefined variables

Exception handling

Dynamic SQL

Summary

OLAP and Data Warehousing

Online analytical processing

Partitioning

Parallel query

Index-only scans

Summary

Beyond Conventional Data Types

Arrays

The hash store data structure

The JSON data structure

A PostgreSQL full text search

Summary

Transactions and Concurrency Control

Transactions

Explicit locking

Summary

PostgreSQL Security

Authentication in PostgreSQL

Authentication best practices

PostgreSQL default access privileges

Role system and proxy authentication

PostgreSQL security levels

Encrypting data

Summary

The PostgreSQL Catalog

The system catalog

System catalog for administrators

Cleaning up the database

Tuning for performance

Selective dump

Summary

Optimizing Database Performance

PostgreSQL configuration tuning

Tuning performance for write

Tuning performance for read

Detecting problems in query plans

Common mistakes in writing queries

Table partitioning

Query rewriting

Summary

Testing

Unit testing

Database abstraction interfaces

Summary

Using PostgreSQL in Python Applications

Python DB API 2.0

Low-level database access with psycopg2

Alternative drivers for PostgreSQL

Object relational mapping with SQLAlchemy

Summary

Scalability

The problem of scalability and the CAP theorem

Data replication in PostgreSQL

Using replication to scale PostgreSQL

Summary

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Database management systems

Different database management systems support diverse application scenarios, use cases, and requirements. Database management systems have a long history. First, we will quickly take a look at the recent history, and then explore the market-dominant database management system categories.

A brief history

Broadly, the term database can be used to present a collection of things. Moreover, this term brings to mind many other terms including data, information, data structure, and management. A database can be defined as a collection or repository of data, which has a certain structure, managed by a database management system (DBMS). Data can be structured as tabular data, semi-structured as XML documents, or unstructured data that does not fit a predefined data model.

In the early days, databases were mainly aimed at supporting business applications; this led us to the well-defined relational algebra and relational database systems. With the introduction of object-oriented languages, new paradigms of database management systems appeared such as object-relational databases and object-oriented databases. Also, many businesses as well as scientific applications use arrays, images, and spatial data; thus, new models such as raster, map, and array algebra are supported. Graph databases are used to support graph queries such as the shortest path from one node to another, along with supporting traversal queries easily.

With the advent of web applications such as social portals, it is now necessary to support a huge number of requests in a distributed manner. This has led to another new paradigm of databases called NoSQL (Not Only SQL), which has different requirements such as performance over fault tolerance and horizontal scaling capabilities. In general, the timeline of database evolution was greatly affected by many factors such as the following:

Functional requirements: The nature of the applications using a DBMS has led to the development of extensions on top of relational databases such as PostGIS (for spatial data) or even dedicated DBMS such as SciDB (for scientific data analytics).
Nonfunctional requirements: The success of object-oriented programming languages has created new trends such as object-oriented databases. Object relational database management systems have appeared to bridge the gap between relational databases and the object-oriented programming languages. Data explosion and the necessity to handle terabytes of data on commodity hardware have led to columnar databases, which can easily scale up horizontally.

Database categories

Many database models have appeared and vanished such as the network model and hierarchical model. The predominant categories now in the market are relational, object-relational databases, and NoSQL databases. One should not think of NoSQL and SQL databases as rivals--they are complementary to each other. By utilizing different database systems, one can overcome many limitations and get the best of different technologies.

The NoSQL databases

The NoSQL databases are affected by the CAP theorem, also known as Brewer's theorem. In 2002, S. Gilbert and N. Lynch published a formal proof of the CAP theorem in their article, Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services. In 2009, the NoSQL movement began. Currently, there are over 150 NoSQL databases (nosql-database.org).

The CAP theorem

The CAP theorem states that it is impossible for a distributed computing system to simultaneously provide all three of the following guarantees:

Consistency: All clients see (immediately) the latest data even in the case of updates.
Availability: All clients can find a replica of some data even in the case of a node failure. This means that even if some part of the system goes down, the clients can still access the data.
Partition tolerance: The system continues to work regardless of arbitrary message loss or failure of part of the system.

The choice of which feature to discard determines the nature of the system. For example, one could sacrifice consistency to get a scalable, simple, and high performance database management system. Often, the main difference between a relational database and a NoSQL database is consistency. A relational database enforces ACID (atomicity, consistency, isolation, durability) properties. In contrast, many NoSQL databases adopt the basically available soft-state, eventual consistency (base) model.

NoSQL motivation

A NoSQL database provides a means for data storage, manipulation, and retrieval for non-relational data. The NoSQL databases are distributed, open source, and horizontally scalable. NoSQL often adopts the base model, which prizes availability over consistency, and informally guarantees that if no new updates are made on a data item, eventually all access to that data item will return the latest version of that data item. The advantages of this approach include the following:

Simplicity of design
Horizontal scaling and easy replication

Schema free
Huge amount of data support

We will now explore a few types of NoSQL databases.

Key-value databases

The key-value store is the simplest database store. In this database model, the storage, as its name suggests, is based on maps or hash tables. Some key-value databases allow complex values to be stored as lists and hash tables. Key-value pairs are extremely fast for certain scenarios, but lack the support for complex queries and aggregation. Some of the existing open source key-value databases are Riak, Redis, Memebase, and MemcacheDB.

Columnar databases

Columnar or column-oriented databases are based on columns. Data in a certain column in a two-dimensional relation is stored together.

Note

Unlike relational databases, adding columns is inexpensive and is done on a row-by-row basis. Rows can have a different set of columns. Tables can benefit from this structure by eliminating the storage cost of the null values. This model is best suited for distributed databases.

HBase is one of the most famous columnar databases. It is based on the Google Bigtable storage system. Column-oriented databases are designed for huge data scenarios, so they scale up easily. For small datasets, HBase is not a suitable architecture. First, the recommended hardware topology for HBase is a five-node or server deployment. Also, it needs a lot of administration and is difficult to master and learn.

Document databases

A document-oriented database is suitable for documents and semi-structured data. The central concept of a document-oriented database is the notion of a document. Documents encapsulate and encode data (or information) in some standard formats or encodings such as XML, JSON, and BSON. Documents do not adhere to a standard schema or have the same structure, so they provide a high degree of flexibility. Unlike relational databases, changing the structure of the document is simple and does not lock the clients from accessing the data.

Document databases merge the power of relational databases and column-oriented databases. They provide support for ad hoc queries and can be scaled up easily. Depending on the design of the document database, MongoDB is designed to handle a huge amount of data efficiently. On the other hand, CouchDB provides high availability even in the case of hardware failure.

Graph databases

Graph databases are based on the graph theory, where a database consists of nodes and edges. The nodes as well as the edges can be assigned data. Graph databases allow traversing between the nodes using edges. As a graph is a generic data structure, graph databases are capable of representing different data. A famous implementation of an open source commercially supported graph database is Neo4j.

Learning PostgreSQL 10 - Second Edition

Learning PostgreSQL 10 - Second Edition

Overview of this book

Related Content you might be interested in

Current Title:

Learning PostgreSQL 10 - Second Edition

Learn PostgreSQL

PostgreSQL 14 Administration Cookbook

PostgreSQL 16 Administration Cookbook

Database management systems

A brief history

Database categories

The NoSQL databases

The CAP theorem

NoSQL motivation

Key-value databases

Columnar databases

Note

Document databases

Graph databases