Book Image

Learning PostgreSQL 10 - Second Edition

Book Image

Learning PostgreSQL 10 - Second Edition

Overview of this book

PostgreSQL is one of the most popular open source databases in the world, supporting the most advanced features included in SQL standards. This book will familiarize you with the latest features released in PostgreSQL 10. We’ll start with a thorough introduction to PostgreSQL and the new features introduced in PostgreSQL 10. We’ll cover the Data Definition Language (DDL) with an emphasis on PostgreSQL, and the common DDL commands supported by ANSI SQL. You’ll learn to create tables, define integrity constraints, build indexes, and set up views and other schema objects. Moving on, we’ll cover the concepts of Data Manipulation Language (DML) and PostgreSQL server-side programming capabilities using PL/pgSQL. We’ll also explore the NoSQL capabilities of PostgreSQL and connect to your PostgreSQL database to manipulate data objects. By the end of this book, you’ll have a thorough understanding of the basics of PostgreSQL 10 and will have the necessary skills to build efficient database solutions.
Table of Contents (23 chapters)
Title Page
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Customer Feedback
Preface

Preface

Picking the right database management system is a difficult task due to the vast number of options on the market. Depending on the business model, one can pick a commercial database or an open source database with commercial support. In addition to this, there are several technical and non-technical factors to assess. When it comes to a relational database management system, PostgreSQL stands at the top for several reasons. The PostgreSQL slogan, The world's most advanced open source database, shows the sophistication of its features and community confidence.

PostgreSQL is an open source object relational database management system. It emphasizes extensibility and competes with major relational database vendors such as Oracle, SQL Server, and MySQL. Due to its rich extensions and open source license, it is often used for research purposes, but PostgreSQL code is also the base for many open source and commercial database management systems such as Greenplum and Vertica. Furthermore, start-ups often favor PostgreSQL due to its licensing costs and because there are a lot of companies that provide commercial support.

PostgreSQL runs on most modern operating systems, including Windows, Mac, and Linux flavors. Also, there are several extensions to access, manage, and monitor PostgreSQL clusters, such as pgAdmin, OmniDB, and psql. PostgreSQL installation and configuration is moderately easy as it is supported by most packaging tools, such as yum and apt. Database developers can easily learn and use PostgreSQL because it complies with ANSI SQL standards. Other than this, there are a lot of resources to help developers learn PostgreSQL; it has a very good documentation manual and a very active and organized community.

PostgreSQL can be used for both OLTP and OLAP applications. As it is ACID compliant, it can be used out of the box for OLTP applications. For OLAP applications, PostgreSQL supports Window functions, FDW, and table inheritance; there are many external extensions for this purpose as well.

Even though PostgreSQL is ACID compliant, it has very good performance as it utilizes state-of-the-art algorithms and techniques. For example, PostgreSQL utilizes MVCC architecture to allow concurrent access to data. In addition to that, PostgreSQL supports both pessimistic and optimistic concurrency control, and the locking behavior can be altered based on the use case. Also, PostgreSQL provides a very good analyzer and advanced features, such as data partitioning using table inheritance and constraint exclusion, to speed up the handling of very large amounts of data. PostgreSQL supports several types of indexes, such as B-Tree, GiN, and GiST, and BRIN indexes. Also, parallel query execution has been supported since PostgreSQL 9.6. Finally, one can use replication to load balance the load to different database clusters.

PostgreSQL is scalable thanks to the many replication solutions in the market, such as Slony and pgpool-II. Additionally, PostgreSQL supports out-of-the-box synchronous and asynchronous streaming replication, as well as logical replication. This makes PostgreSQL very attractive because it can be used to set up highly available and performant systems. 

What this book covers

Chapter 01, Relational Databases, introduces relational database system concepts, including relational database properties, relational algebra, and database modeling. Also, it describes different database management systems such as graph, document, key value, and columnar databases.

Chapter 02, PostgreSQL in Action, provides first-hand experience of installing the PostgreSQL server and client tools on different platforms. This chapter also introduces PostgreSQL capabilities, such as out-of-the-box replication support and its very rich data types.

Chapter 03, PostgreSQL Basic Building Blocks, provides some coding best practices, such as coding conventions of PostgreSQL and identifier names. This chapter describes the basic building blocks and the interaction between these blocks, mainly template databases, user databases, tablespaces, roles, and settings. Also, it describes basic data types and tables.

Chapter 04, PostgreSQL Advanced Building Blocks, introduces several building blocks, including views, indexes, functions, user-defined data types, triggers, and rules. This chapter provides use cases of these building blocks and compares building blocks that can be used for the same case, such as rules and triggers.

Chapter 05, SQL Language, introduces structured query language (SQL), which is used to interact with a database, create and maintain data structures, and enter data into databases as well as, change it, retrieve it, and delete it. SQL has commands related to data definition language (DDL), data manipulation language (DML), and data control language (DCL). Four SQL statements form the basis of DML and are described in this chapter. The SELECT statement is examined in detail to explain SQL concepts such as grouping and filtering to show what SQL expressions and conditions are and how to use subqueries. Some relational algebra topics are also covered in application to joining tables.

Chapter 06, Advanced Query Writing, describes advanced SQL concepts and features such as common table expressions and window functions. This helps you implement a logic that would not be possible without them, such as recursive queries. Other techniques explained here, such as the DISTINCT ON clause, the FILTER clause, and lateral subqueries, are not irreplaceable. However, they can help make a query smaller, easier, and faster.

Chapter 07, Server-Side Programming with PL/pgSQL, covers function parameters, such as the number of returned rows and function cost, which is mainly used by the query planner. Also, it presents control statements such as conditionals and iteration. Finally, it explains the concept of dynamic SQL and some recommended practices when using dynamic SQL.

Chapter 08, OLAP and Data Warehousing, introduces several concepts regarding the usage of recreational databases in the realm of analytical processing. It discusses the deference between OLTP load and OLAP loads; furthermore, it discusses the modeling aspect of OLAP applications. In addition to that, it discusses some technical methods to perform ETL (extract, transform, and load) operations such as the COPY command. Also, it discusses some features of PostgreSQL which increasing data retrieval performance such as index-only scans and table partitioning.

Chapter 09, Beyond Conventional Data types, discusses several rich data types, including arrays, hash stores, JSON documents, and full-text search. It presents operations and functions for each data type to initialize, insert, update, access, and delete these data types. Finally, it shows how PostgreSQL can be combined with Nginx to serve read-only restful requests.

Chapter 10Transactions and Concurrency Control, discusses in detail the ACID properties and the relation between these properties and concurrency controls. This chapter also discusses concepts such as isolation levels and their side-effects and it shows these side-effects using SQL examples. Finally, the chapter discusses different locking methods, including pessimistic locking strategies such as row locking and advisory locks. 

Chapter 11, PostgreSQL Security, covers concepts of authentication and authorization. It describes PostgreSQL authentication methods and explains the structure of a PostgreSQL host-based authentication configuration file. It also discusses the permissions that can be granted to database building objects such as schemas, tables, views, indexes, and columns. Finally, it shows how sensitive data, such as passwords, can be protected using different techniques, including one-way and two-way encryption.

Chapter 12, The PostgreSQL Catalog, provides several recipes to maintain a database cluster, including cleaning up data, maintaining user processes, cleaning up indexes and unused databases objects, discovering and adding indexes to foreign keys, and so on.

Chapter 13, Optimizing Database Performance, discusses several approaches to optimize performance. It presents PostgreSQL cluster configuration settings, which are used in tuning the whole cluster's performance. Also, it presents common mistakes in writing queries and discusses several approaches to increase performance, such as using indexes or table partitioning and constraint exclusion.

Chapter 14, Testing, covers some aspects of the software testing process and how it can be applied to databases. Unit tests for databases can be written as SQL scripts or stored functions in a database. There are several frameworks that help us write unit tests and process the results of testing.

Chapter 15Using PostgreSQL in Python Applications, discusses several advanced concepts, such as connection pooling, asynchronous access, and object relational mappers (ORMs). The chapter shows by example how to connect to database, query it, and perform updates using Python. Finally, it introduces different technologies that interact with PostgreSQL, and this gives the developer a broad overview of the state-of-the-art technologies.

Chapter 16Scalability, discusses the problem of scalability and the CAP theorem in detail. Also, it covers data replication in PostgreSQL, including physical replication and logical replication. Finally, it shows different scaling scenarios and their implementation in PostgreSQL. 

What you need for this book

In general, PostgreSQL server and client tools do not need exceptional hardware. PostgreSQL can be installed on almost all modern platforms, including Linux, Windows, and Mac. Also, in the book, when a certain library is needed, the installation instructions are given.

You need PostgreSQL version 10; however, most of the examples can be executed on earlier versions as well. In order to execute the sample code, scripts, and examples provided in the book, you need to have at least a PostgreSQL client tool installed on your machine—preferably psql—and access to a remote server running the PostgreSQL server.

In a Windows environment, the cmd.exe command prompt is not very convenient; thus, the user might consider using Cygwin (http://www.cygwin.com/) or another alternative, such as PowerShell.

Some chapters might requires additional software. For example, in Chapter 15, Using PostgreSQL in Python application, you need to install Python and the required Python libraries to interact with PostgreSQL. In Chapter 16, Scalability, Docker is used to give the reader a better user experience. 

For other chapters, such as Chapter 09, Beyond Conventional Data types, and Chapter 11, PostgreSQL Security, it is better to use Linux because some of the software used is not very convenient on Windows such as Nginx and GnuPG. To run a Linux distribution on your Windows machine, you can use Virtual Box (https://www.virtualbox.org/).

Who this book is for

If you're interested in learning more about PostgreSQL--one of the most popular relational databases in the world--then this book is for you. Those looking to build solid database or data warehousing applications with PostgreSQL 10 will also find this book a useful resource. No prior knowledge of database programming or administration is required to get started with this book.

Conventions

In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "The next lines of code read the link and assign it to the open function."

A block of code is set as follows:

fin = open('data/fake_weather_data.csv','r',newline='')
reader = csv.reader(fin)
for row in reader:
    myData.append(row)

Any command-line input or output is written as follows:

$ mongoimport --file fake_weather_data.csv

New terms and important words are shown in bold.

Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: "In order to download new modules, we will go to Files | Settings | Project Name | Project Interpreter."

Note

Warnings or important notes appear like this.

Note

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book-what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of. To send us general feedback, simply email [email protected], and mention the book's title in the subject of your message. If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files emailed directly to you. You can download the code files by following these steps:

  1. Log in or register to our website using your email address and password.
  2. Hover the mouse pointer on the SUPPORT tab at the top.
  3. Click on Code Downloads & Errata.
  4. Enter the name of the book in the Search box.
  5. Select the book for which you're looking to download the code files.
  6. Choose from the drop-down menu where you purchased this book from.
  7. Click on Code Download.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

  • WinRAR / 7-Zip for Windows
  • Zipeg / iZip / UnRarX for Mac
  • 7-Zip / PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Learning-PostgreSQL-10-Second-Edition. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Downloading the color images of this book

We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from https://www.packtpub.com/sites/default/files/downloads/LearningPostgreSQL10SecondEdition_ColorImages.pdf.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books-maybe a mistake in the text or the code-we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title. To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyrighted material on the internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the internet, please provide us with the location address or website name immediately so that we can pursue a remedy. Please contact us at [email protected] with a link to the suspected pirated material. We appreciate your help in protecting our authors and our ability to bring you valuable content.

Questions

If you have a problem with any aspect of this book, you can contact us at [email protected], and we will do our best to address the problem.