PostgreSQL 9.0 High Performance

Overview of this book

PostgreSQL database servers have a common set of problems they encounter as their usage gets heavier and requirements more demanding. You could spend years discovering solutions to them all, step by step as you encounter them. Or you can just look in here.All successful database applications are destined to eventually run into issues scaling up their performance. Peek into the future of your PostgreSQL database's problems today. Know the warning signs to look for, and how to avoid the most common issues before they even happen.Surprisingly, most PostgreSQL database applications evolve in the same way: Choose the right hardware. Tune the operating system and server memory use. Optimize queries against the database, with the right indexes. Monitor every layer, from hardware to queries, using some tools that are inside PostgreSQL and others that are external. Using monitoring insight, continuously rework the design and configuration. On reaching the limits of a single server, break things up; connection pooling, caching, partitioning, and replication can all help handle increasing database workloads. The path to a high performance database system isn't always easy. But it doesn't have to be mysterious with the right guide.

Preface

Free Chapter

What this book covers

What you need for this book

Who this book is for

Conventions

Note

Reader feedback

Customer support

Tip

Errata

Piracy

Questions

Chapter 1. PostgreSQL Versions

Performance of historical PostgreSQL releases

Choosing a version to deploy

Upgrading to a newer major version

PostgreSQL or another database?

PostgreSQL tools

PostgreSQL contrib

pgFoundry

Additional PostgreSQL-related software

PostgreSQL application scaling lifecycle

Performance tuning as a practice

Summary

Chapter 2. Database Hardware

Balancing hardware spending

CPUs

Memory

Disks

Disk controllers

Reliable controller and disk setup

Write-back caches

Performance impact of write-through caching

Summary

Chapter 3. Database Hardware Benchmarking

CPU and memory benchmarking

memtest86+

STREAM memory testing

CPU benchmarking

Sources of slow memory and processors

Physical disk performance

Random access and I/Os Per Second

Sequential access and ZCAV

Commit rate

Disk benchmarking tools

hdtune

bonnie++

sysbench

Complicated disk benchmarks

Sample disk results

Disk performance expectations

Summary

Chapter 4. Disk Setup

Maximum filesystem sizes

Filesystem crash recovery

Journaling filesystems

Linux filesystems

ext2

ext3

ext4

XFS

Other Linux filesystems

Write barriers

General Linux filesystem tuning

Solaris and FreeBSD filesystems

Solaris UFS

FreeBSD UFS2

ZFS

Windows filesystems

FAT32

NTFS

Disk layout for PostgreSQL

Symbolic links

Tablespaces

Database directory tree

Disk arrays, RAID, and disk layout

Summary

Chapter 5. Memory for Database Caching

Memory units in the postgresql.conf

Increasing UNIX shared memory parameters for larger buffer sizes

Inspecting the database cache

Note

Installing pg_buffercache into a database

Database disk layout

Creating a new block in a database

Writing dirty blocks to disk

Crash recovery and the buffer cache

Checkpoint processing basics

Write-ahead log and recovery processing

Checkpoint timing

Database block lifecycle

Database buffer cache versus operating system cache

Doubly cached data

Checkpoint overhead

Starting size guidelines

Analyzing buffer cache contents

Inspection of the buffer cache queries

Using buffer cache inspection for sizing feedback

Summary

Chapter 6. Server Configuration Tuning

Interacting with the live configuration

Defaults and reset values

Allowed change context

Reloading the configuration file

Server-wide settings

Database connections

Shared memory

Logging

Vacuuming and statistics

Checkpoints

WAL settings

PITR and WAL Replication

Per-client settings

New server tuning

Dedicated server guidelines

Shared server guidelines

pgtune

Summary

Chapter 7. Routine Maintenance

Transaction visibility with multiversion concurrency control

Visibility computation internals

Updates

Row lock conflicts

Deletions

Advantages of MVCC

Disadvantages of MVCC

Transaction ID wraparound

Vacuum

Vacuum Implementation

Cost-based vacuuming

autovacuum

Common vacuum and autovacuum problems

Autoanalyze

Index bloat

Measuring index bloat

Detailed data and index page monitoring

Monitoring query logs

Basic PostgreSQL log setup

Logging difficult queries

Log file analysis

Summary

Chapter 8. Database Benchmarking

pgbench default tests

Table definition

Scale detection

Query script definition

Configuring the database server for pgbench

Running pgbench manually

Graphing results with pgbench-tools

Configuring pgbench-tools

Sample pgbench test results

SELECT-only test

TPC-B-like test

Latency analysis

Sources for bad results and variation

Developer PostgreSQL builds

Worker threads and pgbench program limitations

pgbench custom tests

Insert speed test

Transaction Processing Performance Council benchmarks

Summary

Chapter 9. Database Indexing

Indexing example walkthrough

Measuring query disk and index block statistics

Lookup with an inefficient index

Combining indexes

Switching from indexed to sequential scans

Clustering against an index

Explain with buffer counts

Index creation and maintenance

Unique indexes

Concurrent index creation

Clustering an index

Reindexing

Index types

B-tree

Hash

GIN

GiST

Advanced index use

Multicolumn indexes

Indexes for sorting

Partial indexes

Expression-based indexes

Indexing for full-text search

Summary

Chapter 10. Query Optimization

Sample data sets

Pagila

Dell Store 2

EXPLAIN basics

Timing overhead

Hot and cold cache behavior

Query plan node structure

Basic cost computation

Explain analysis tools

Visual explain

Verbose output

Machine readable explain output

Assembling row sets

Bitmap heap and index scans

Processing nodes

Sort

Limit

Aggregate

HashAggregate

Unique

Result

Append

Group

Subquery Scan and Subplan

Joins

Statistics

Viewing and estimating with statistics

Statistics targets

Difficult areas to estimate

Other query planning parameters

effective_cache_size

work_mem

constraint_exclusion

cursor_tuple_fraction

Executing other statement types

Improving queries

Optimizing for fully cached data sets

Testing for query equivalence

Disabling optimizer features

Working around optimizer bugs

Avoiding plan restructuring with OFFSET

External trouble spots

SQL Limitations

Numbering rows in SQL

Using Window functions for numbering

Using Window functions for cumulatives

Summary

Chapter 11. Database Activity and Statistics

Statistics views

Cumulative and live views

Table statistics

Table I/O

Index statistics

Index I/O

Database wide totals

Connections and activity

Locks

Virtual transactions

Decoding lock information

Transaction lock waits

Table lock waits

Logging lock information

Disk usage

Note

Buffer, background writer, and checkpoint activity

Saving pg_stat_bgwriter snapshots

Tuning using background writer statistics

Summary

Chapter 12. Monitoring and Trending

UNIX monitoring tools

Sample setup

vmstat

iostat

top

sysstat and sar

Windows monitoring tools

Task Manager

Windows System Monitor

Trending software

Types of monitoring and trending software

Nagios

Cacti

Munin

Other trending packages

Summary

Chapter 13. Pooling and Caching

Connection pooling

Pooling connection counts

Database caching

Summary

Chapter 14. Scaling with Replication

Hot Standby

Terminology

Setting up WAL shipping

Streaming Replication

Tuning Hot Standby

Replication queue managers

Slony

Londiste

Read scaling with replication queue software

Special application requirements

Bucardo

pgpool-II

Other interesting replication projects

Summary

Chapter 15. Partitioning Data

Table range partitioning

Determining a key field to partition over

Sizing the partitions

Creating the partitions

Redirecting INSERT statements to the partitions

Empty partition query plans

Date change update trigger

Live migration of a partitioned table

Partitioned queries

Creating new partitions

Partitioning advantages

Common partitioning mistakes

Horizontal partitioning with PL/Proxy

Hash generation

Scaling with PL/Proxy

Scaling with GridSQL

Summary

Chapter 16. Avoiding Common Problems

Bulk loading

Loading methods

Tuning for bulk loads

Skipping WAL acceleration

Recreating indexes and adding constraints

Parallel restore

Post load cleanup

Common performance issues

Counting rows

Unexplained writes

Slow function and prepared statement execution

PL/pgSQL benchmarking

High foreign key overhead

Trigger memory use

Heavy statistics collector overhead

Materialized views

Profiling the database

gprof

OProfile

Visual Studio

DTrace

Performance related features by version

Aggressive PostgreSQL version upgrades

8.1

8.2

8.3

8.4

9.0

Summary

Customer Reviews

5 star

4 star

3 star

2 star

1 star

pgBouncer

The PostgreSQL connection pooler with the highest proven performance in the field is pgBouncer, a project originating as part of the database scaling work done by Skype: http://pgfoundry.org/projects/pgbouncer/.

Designed to be nothing but a high-performance connection pooler, it excels at solving that particular problem. pgBouncer runs as a single process, not spawning a process per connection. The underlying architecture, which relies on a low-level UNIX library named libevent, was already proven for this purpose in the field—the memcached program uses the same approach. The internal queue management for waiting connections is configurable so that it's easy to avoid timeouts.

And when time comes to monitor the pool itself, it displays its internal information by a database interface you can even send commands to, serving to both provide information and provide a control console. Simply connect to the pgbouncer database on the port where pgBouncer is running, using the standard psql tool, and you can use the SHOW command to get a variety of information about the internal state of the pool. The console interface accepts commands like PAUSE and RESUME to control the operation of the pool.

Another neat feature of pgBouncer is that it can connect to multiple underlying database servers. You can have databases on different hosts look like different databases on the single host the pool is running. This allows a form of partitioning for scaling upward if your system's load is split among many databases. Simply move each database to its own host and merge them together using pgBouncer as the intermediary, and your application won't even need to be changed.

If you have hundreds or thousand of connections and are out of CPU time, pgBouncer should be your first consideration as a way to reduce the amount of processor time being used. The main situations where pgpool-II works better at this point are ones where its load-balancing features mesh well with the replication approach being used.

Application server pooling

Depending on the application you're running, you may not need to use a database-level connection pooler. Some programming models include what's referred to as an application server, an idea popularized by Java. Popular application servers for Java include Tomcat, JBoss, and others. The Java database access library, JDBC includes support for connection pooling. Put those together, and you might get efficient database connection pooling without adding any more software to the mix. Tomcat calls this its Database Connection Pool (DBCP). A longer list of open-source pooling software is available at http://java-source.net/open-source/connection-pools and commercial vendors selling application servers might include their own pooler.

There are also application poolers available for some other programs, too. It's not an idea unique to Java application servers. If you have such an application level pooling solution available, you should prefer it for two main reasons, beyond just reducing complexity. First, it's probably going to be faster than passing through an additional layer of software just for pooling purposes. Second, monitoring of the pool is integrated into the application server already. You'll still need to monitor the database underneath the pool.

PostgreSQL 9.0 High Performance

PostgreSQL 9.0 High Performance

Overview of this book

Related Content you might be interested in

Current Title:

PostgreSQL 9.0 High Performance

pgBouncer

Application server pooling