PostgreSQL 9.0 High Performance

Overview of this book

PostgreSQL database servers have a common set of problems they encounter as their usage gets heavier and requirements more demanding. You could spend years discovering solutions to them all, step by step as you encounter them. Or you can just look in here.All successful database applications are destined to eventually run into issues scaling up their performance. Peek into the future of your PostgreSQL database's problems today. Know the warning signs to look for, and how to avoid the most common issues before they even happen.Surprisingly, most PostgreSQL database applications evolve in the same way: Choose the right hardware. Tune the operating system and server memory use. Optimize queries against the database, with the right indexes. Monitor every layer, from hardware to queries, using some tools that are inside PostgreSQL and others that are external. Using monitoring insight, continuously rework the design and configuration. On reaching the limits of a single server, break things up; connection pooling, caching, partitioning, and replication can all help handle increasing database workloads. The path to a high performance database system isn't always easy. But it doesn't have to be mysterious with the right guide.

Preface

Free Chapter

What this book covers

What you need for this book

Who this book is for

Conventions

Note

Reader feedback

Customer support

Tip

Errata

Piracy

Questions

Chapter 1. PostgreSQL Versions

Performance of historical PostgreSQL releases

Choosing a version to deploy

Upgrading to a newer major version

PostgreSQL or another database?

PostgreSQL tools

PostgreSQL contrib

pgFoundry

Additional PostgreSQL-related software

PostgreSQL application scaling lifecycle

Performance tuning as a practice

Summary

Chapter 2. Database Hardware

Balancing hardware spending

CPUs

Memory

Disks

Disk controllers

Reliable controller and disk setup

Write-back caches

Performance impact of write-through caching

Summary

Chapter 3. Database Hardware Benchmarking

CPU and memory benchmarking

memtest86+

STREAM memory testing

CPU benchmarking

Sources of slow memory and processors

Physical disk performance

Random access and I/Os Per Second

Sequential access and ZCAV

Commit rate

Disk benchmarking tools

hdtune

bonnie++

sysbench

Complicated disk benchmarks

Sample disk results

Disk performance expectations

Summary

Chapter 4. Disk Setup

Maximum filesystem sizes

Filesystem crash recovery

Journaling filesystems

Linux filesystems

ext2

ext3

ext4

XFS

Other Linux filesystems

Write barriers

General Linux filesystem tuning

Solaris and FreeBSD filesystems

Solaris UFS

FreeBSD UFS2

ZFS

Windows filesystems

FAT32

NTFS

Disk layout for PostgreSQL

Symbolic links

Tablespaces

Database directory tree

Disk arrays, RAID, and disk layout

Summary

Chapter 5. Memory for Database Caching

Memory units in the postgresql.conf

Increasing UNIX shared memory parameters for larger buffer sizes

Inspecting the database cache

Note

Installing pg_buffercache into a database

Database disk layout

Creating a new block in a database

Writing dirty blocks to disk

Crash recovery and the buffer cache

Checkpoint processing basics

Write-ahead log and recovery processing

Checkpoint timing

Database block lifecycle

Database buffer cache versus operating system cache

Doubly cached data

Checkpoint overhead

Starting size guidelines

Analyzing buffer cache contents

Inspection of the buffer cache queries

Using buffer cache inspection for sizing feedback

Summary

Chapter 6. Server Configuration Tuning

Interacting with the live configuration

Defaults and reset values

Allowed change context

Reloading the configuration file

Server-wide settings

Database connections

Shared memory

Logging

Vacuuming and statistics

Checkpoints

WAL settings

PITR and WAL Replication

Per-client settings

New server tuning

Dedicated server guidelines

Shared server guidelines

pgtune

Summary

Chapter 7. Routine Maintenance

Transaction visibility with multiversion concurrency control

Visibility computation internals

Updates

Row lock conflicts

Deletions

Advantages of MVCC

Disadvantages of MVCC

Transaction ID wraparound

Vacuum

Vacuum Implementation

Cost-based vacuuming

autovacuum

Common vacuum and autovacuum problems

Autoanalyze

Index bloat

Measuring index bloat

Detailed data and index page monitoring

Monitoring query logs

Basic PostgreSQL log setup

Logging difficult queries

Log file analysis

Summary

Chapter 8. Database Benchmarking

pgbench default tests

Table definition

Scale detection

Query script definition

Configuring the database server for pgbench

Running pgbench manually

Graphing results with pgbench-tools

Configuring pgbench-tools

Sample pgbench test results

SELECT-only test

TPC-B-like test

Latency analysis

Sources for bad results and variation

Developer PostgreSQL builds

Worker threads and pgbench program limitations

pgbench custom tests

Insert speed test

Transaction Processing Performance Council benchmarks

Summary

Chapter 9. Database Indexing

Indexing example walkthrough

Measuring query disk and index block statistics

Lookup with an inefficient index

Combining indexes

Switching from indexed to sequential scans

Clustering against an index

Explain with buffer counts

Index creation and maintenance

Unique indexes

Concurrent index creation

Clustering an index

Reindexing

Index types

B-tree

Hash

GIN

GiST

Advanced index use

Multicolumn indexes

Indexes for sorting

Partial indexes

Expression-based indexes

Indexing for full-text search

Summary

Chapter 10. Query Optimization

Sample data sets

Pagila

Dell Store 2

EXPLAIN basics

Timing overhead

Hot and cold cache behavior

Query plan node structure

Basic cost computation

Explain analysis tools

Visual explain

Verbose output

Machine readable explain output

Assembling row sets

Bitmap heap and index scans

Processing nodes

Sort

Limit

Aggregate

HashAggregate

Unique

Result

Append

Group

Subquery Scan and Subplan

Joins

Statistics

Viewing and estimating with statistics

Statistics targets

Difficult areas to estimate

Other query planning parameters

effective_cache_size

work_mem

constraint_exclusion

cursor_tuple_fraction

Executing other statement types

Improving queries

Optimizing for fully cached data sets

Testing for query equivalence

Disabling optimizer features

Working around optimizer bugs

Avoiding plan restructuring with OFFSET

External trouble spots

SQL Limitations

Numbering rows in SQL

Using Window functions for numbering

Using Window functions for cumulatives

Summary

Chapter 11. Database Activity and Statistics

Statistics views

Cumulative and live views

Table statistics

Table I/O

Index statistics

Index I/O

Database wide totals

Connections and activity

Locks

Virtual transactions

Decoding lock information

Transaction lock waits

Table lock waits

Logging lock information

Disk usage

Note

Buffer, background writer, and checkpoint activity

Saving pg_stat_bgwriter snapshots

Tuning using background writer statistics

Summary

Chapter 12. Monitoring and Trending

UNIX monitoring tools

Sample setup

vmstat

iostat

top

sysstat and sar

Windows monitoring tools

Task Manager

Windows System Monitor

Trending software

Types of monitoring and trending software

Nagios

Cacti

Munin

Other trending packages

Summary

Chapter 13. Pooling and Caching

Connection pooling

Pooling connection counts

Database caching

Summary

Chapter 14. Scaling with Replication

Hot Standby

Terminology

Setting up WAL shipping

Streaming Replication

Tuning Hot Standby

Replication queue managers

Slony

Londiste

Read scaling with replication queue software

Special application requirements

Bucardo

pgpool-II

Other interesting replication projects

Summary

Chapter 15. Partitioning Data

Table range partitioning

Determining a key field to partition over

Sizing the partitions

Creating the partitions

Redirecting INSERT statements to the partitions

Empty partition query plans

Date change update trigger

Live migration of a partitioned table

Partitioned queries

Creating new partitions

Partitioning advantages

Common partitioning mistakes

Horizontal partitioning with PL/Proxy

Hash generation

Scaling with PL/Proxy

Scaling with GridSQL

Summary

Chapter 16. Avoiding Common Problems

Bulk loading

Loading methods

Tuning for bulk loads

Skipping WAL acceleration

Recreating indexes and adding constraints

Parallel restore

Post load cleanup

Common performance issues

Counting rows

Unexplained writes

Slow function and prepared statement execution

PL/pgSQL benchmarking

High foreign key overhead

Trigger memory use

Heavy statistics collector overhead

Materialized views

Profiling the database

gprof

OProfile

Visual Studio

DTrace

Performance related features by version

Aggressive PostgreSQL version upgrades

8.1

8.2

8.3

8.4

9.0

Summary

Customer Reviews

5 star

4 star

3 star

2 star

1 star

pgpool-II

The oldest of the PostgreSQL compatible packages used for connection pooling that's still in development, pgpool-II improves on the original pgpool in a variety of ways: http://pgpool.projects.postgresql.org/.

Its primary purpose is not just connection pooling, it also provides load balancing and replication related capabilities. It even supports some parallel query setups, where queries can be broken into pieces and spread across nodes where each has a copy of the information being asked about. The "pool" in pgpool is primarily to handle multiple servers, with the program serving as a proxy server between the clients and some number of databases.

There are a few limitations to pgpool-II setup to serve as a connection pooler. One is that each connection is set up as its own process, similar to the database only re-used. The memory overhead of that approach, with each process using a chunk of system RAM, can be significant. pgpool-II is not known for having powerful monitoring tools either. But the main drawback of the program is its queuing model. Once you've gone beyond the number of connections that it handles, additional ones are queued up at the operating system level, with each connection waiting for its network connection to be accepted. This can result in timeouts that depend on the network configuration, which is never a good position to be in. It's a good idea to proactively monitor the "waiting for connection" time in your application and look for situations where it's grown very large, to let you correlate that with any timeouts that your program might run into.

pgpool-II load balancing for replication scaling

Because of its replication and load balancing related features, for some purposes pgpool-II is the right approach even though it's not necessarily optimal as just a connection pool. pgpool-II supports what it calls master/slave mode, for situations where you have a master database that handles both reads and writes as well as some number of replicated slaves that are only available for reading.

The default replication software it assumes you're using, and only one available in older versions of the software, requires you have a set of databases all kept in sync using the Slony-I replication software. A common setup is to have a pgpool-II proxy in front of all your nodes, to spread the query load across them. This lets you scale up a read-only load in a way that's transparent to the application, presuming every node is qualified to answer every query.

Starting in pgpool-II 3.0, you can use this feature with the PostgreSQL 9.0 streaming replication and Hot Standby capabilities too. The read-only slaves will still be a subject to the limitations of Hot Standby described in the Chapter 14, Scaling with Replication. But within those, pgpool-II will handle the job of figuring out which statements must execute on the master and which can run against slaves instead.

As with the Slony case, it does that by actually parsing the statement that's executing to figure out how to route it. The way it makes that decision is covered in the pgpool-II documentation. This is one of the reasons pgpool-II is slower than pgBouncer, that it's actually interpreting the SQL executing. But as it enables the intelligent routing capability too, this may be worth doing.

PostgreSQL 9.0 High Performance

PostgreSQL 9.0 High Performance

Overview of this book

Related Content you might be interested in

Current Title:

PostgreSQL 9.0 High Performance

pgpool-II

pgpool-II load balancing for replication scaling