PostgreSQL High Performance Cookbook

PostgreSQL High Performance Cookbook

By : Chitij Chauhan, Dinesh Kumar

Buy this Book

PostgreSQL High Performance Cookbook

By: Chitij Chauhan, Dinesh Kumar

Buy this Book

Overview of this book

PostgreSQL is one of the most powerful and easy to use database management systems. It has strong support from the community and is being actively developed with a new release every year. PostgreSQL supports the most advanced features included in SQL standards. It also provides NoSQL capabilities and very rich data types and extensions. All of this makes PostgreSQL a very attractive solution in software systems. If you run a database, you want it to perform well and you want to be able to secure it. As the world’s most advanced open source database, PostgreSQL has unique built-in ways to achieve these goals. This book will show you a multitude of ways to enhance your database’s performance and give you insights into measuring and optimizing a PostgreSQL database to achieve better performance. This book is your one-stop guide to elevate your PostgreSQL knowledge to the next level. First, you’ll get familiarized with essential developer/administrator concepts such as load balancing, connection pooling, and distributing connections to multiple nodes. Next, you will explore memory optimization techniques before exploring the security controls offered by PostgreSQL. Then, you will move on to the essential database/server monitoring and replication strategies with PostgreSQL. Finally, you will learn about query processing algorithms.

PostgreSQL High Performance Cookbook

Credits

About the Authors

About the Reviewers

www.PacktPub.com

Customer Feedback

Preface

Free Chapter

Database Benchmarking

Performing a seek rate test

Working with the fsync commit rate

Checking IOPS

Storage sizing

Discussing RAID levels

Configuring pgbench

Running read/write pgbench test cases

Server Configuration and Control

Introduction

Starting the server manually

Stopping the server quickly

Stopping the server in an emergency

Reloading server configuration

Restarting the database server quickly

Tuning connection-related parameters

Tuning query-related parameters

Tuning logging-related parameters

Device Optimization

Introduction

Understanding memory units in PostgreSQL

Handling Linux/Unix memory parameters

CPU scheduling parameters

Disk tuning parameters

Identifying checkpoint overhead

Analyzing buffer cache contents

Monitoring Server Performance

Introduction

Monitoring CPU usage

Monitoring paging and swapping

Tracking CPU consuming processes

Monitoring CPU load

Identifying CPU bottlenecks

Identifying disk I/O bottlenecks

Monitoring system load

Tracking historical CPU usage

Tracking historical memory usage

Monitoring disk space

Monitoring network status

Connection Pooling and Database Partitioning

Introduction

Installing pgpool-II

Configuring pgpool and testing the setup

Installing PgBouncer

Connection pooling using PgBouncer

Managing PgBouncer

Implementing partitioning

Managing partitions

Installing PL/Proxy

Partitioning with PL/Proxy

High Availability and Replication

Introduction

Setting up hot streaming replication

Replication using Slony

Replication using Londiste

Replication using Bucardo

Replication using DRBD

Setting up a Postgres-XL cluster

Working with Third-Party Replication Management Utilities

Introduction

Setting up Barman

Backup and recovery using Barman

Setting up OmniPITR

WAL management with OmniPITR

Setting up repmgr

Using repmgr to create replica

Setting up walctl

Using walctl to create replica

Database Monitoring and Performance

Introduction

Checking active sessions

Finding out what the users are currently running

Finding blocked sessions

Dealing with deadlocks

Table access statistics

Logging slow statements

Determining disk usage

Preventing page corruption

Routine reindexing

Generating planner statistics

Tuning with background writer statistics

Vacuum Internals

Introduction

Dealing with bloating tables and indexes

Vacuum and autovacuum

Freezing and transaction ID wraparound

Monitoring vacuum progress

Control bloat using transaction age

Data Migration from Other Databases to PostgreSQL and Upgrading the PostgreSQL Cluster

Introduction

Using pg_dump to upgrade data

Using the pg_upgrade utility for version upgrade

Replicating data from other databases to PostgreSQL using Goldengate

Query Optimization

Introduction

Using sample data sets

Timing overhead

Studying hot and cold cache behavior

Clearing the cache

Query plan node structure

Generating an explain plan

Computing basic cost

Running sequential scans

Running bitmap heap and index scan

Aggregate and hash aggregate

Running CTE scan

Nesting loops

Working with hash and merge join

Grouping

Working with set operations

Working on semi and anti joins

Database Indexing

Introduction

Measuring query and index block statistics

Index lookup

Comparing indexed scans and sequential scans

Clustering against an index

Concurrent indexes

Combined indexes

Partial indexes

Finding unused indexes

Forcing a query to use an index

Detecting a missing index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Discussing RAID levels

In this recipe, we will be discussing about various RAID levels and their unique usage.

Getting ready

In this recipe, we will be discussing several RAID levels, which we configure for database requirements. RAID (Redundant Array of Interdependent Disks) has a dedicated hardware controller to deal with multiple disks, including a separate processor along with a battery backup cache, where data can be flushed to disk properly when a power failure occurs.

How to do it...

RAID levels can be differentiated as per their configurations. RAID supports configuration techniques such as striping, mirroring, and parity to improve the disk storage performance, or high availability. The most popular RAID levels are zero to six, and each level provides its own kind of disk storage capacity, read/write performance and high availability. The common RAID levels we configure for DBMS are 0, 1, 5, 6, or 10 (1 and 0).

How it works...

Let us discuss about how the mostly used RAID level works:

RAID 0

This configuration only focuses on read/write performance by striping the data across multiple devices. With this configuration, we can allocate the complete disk storage for the applications data. The major drawback in this configuration is no high availability. In the case of any single disk failure, it will cause the remaining disks to be useless as they are missing the chunks from the failed disk. This is a not recommended RAID configuration for real-time database systems, but it is a recommended configuration for storing non-critical business data such as historical application logs, database logs, and so on.

RAID 1

This configuration is only to focus on high availability rather than on performance, by broadcasting the data among two disk drives. That is, a single copy of the data will be kept on two disks. If one disk is corrupted, then we can still use the other one for read/write operations. This is also not a recommended configuration for real-time database systems, as it is lacking the write performance. Also, in this configuration, we will be utilizing 50% of the disk to store the actual data, and the rest to keep its duplicated information for high availability. This is a recommended configuration where the durability of data matters when compared with write performance.

RAID 5

This configuration provides more storage and high availability on the disk, by storing the parity blocks across the disks. Unlike RAID 1, it offers more disk space to keep the actual data, as parity blocks are spread among the disks. In any case, if one disk is corrupted, then we can use the parity blocks from the other disk, to fetch the missing data. However, this is also not a recommended configuration, since every read/write operation on the disk needs to process the parity blocks, to get the actual data out of it.

RAID 6

This configuration provides more redundancy than RAID 5 by storing the two parity blocks information for each write operation. That is, if both disks become corrupted, RAID 6 can still get the data from the parity blocks, unlike RAID 5. This configuration is also not recommended for the database systems, as write performance is less as compared than previous RAID levels.

RAID 10

This configuration is the combination of RAID levels 0 and 1. That is, the data will be striped to multiple disks and will be replicated to another disk storage. It is the most recommended RAID level for real-time business applications, where we achieve a better performance than with RAID 1, and higher availability than RAID 0.

Note

For more information about RAID levels, refer to the following URLs:

PostgreSQL High Performance Cookbook

By : Chitij Chauhan, Dinesh Kumar

PostgreSQL High Performance Cookbook

By: Chitij Chauhan, Dinesh Kumar

Overview of this book

Related Content you might be interested in

Current Title:

PostgreSQL High Performance Cookbook

Discussing RAID levels

Getting ready

How to do it...

How it works...

RAID 0

RAID 1

RAID 5

RAID 6

RAID 10

Note