Book Image

PostgreSQL 9.0 High Performance

Book Image

PostgreSQL 9.0 High Performance

Overview of this book

PostgreSQL database servers have a common set of problems they encounter as their usage gets heavier and requirements more demanding. You could spend years discovering solutions to them all, step by step as you encounter them. Or you can just look in here.All successful database applications are destined to eventually run into issues scaling up their performance. Peek into the future of your PostgreSQL database's problems today. Know the warning signs to look for, and how to avoid the most common issues before they even happen.Surprisingly, most PostgreSQL database applications evolve in the same way: Choose the right hardware. Tune the operating system and server memory use. Optimize queries against the database, with the right indexes. Monitor every layer, from hardware to queries, using some tools that are inside PostgreSQL and others that are external. Using monitoring insight, continuously rework the design and configuration. On reaching the limits of a single server, break things up; connection pooling, caching, partitioning, and replication can all help handle increasing database workloads. The path to a high performance database system isn't always easy. But it doesn't have to be mysterious with the right guide.
Table of Contents (120 chapters)
Preface
Free Chapter
1
What this book covers
2
What you need for this book
3
Who this book is for
4
Conventions
5
Reader feedback
7
Chapter 1. PostgreSQL Versions
9
PostgreSQL or another database?
11
PostgreSQL application scaling lifecycle
12
Performance tuning as a practice
13
Summary
14
Chapter 2. Database Hardware
17
Summary
18
Chapter 3. Database Hardware Benchmarking
23
Summary
24
Chapter 4. Disk Setup
25
Maximum filesystem sizes
26
Filesystem crash recovery
28
Solaris and FreeBSD filesystems
29
Windows filesystems
31
Summary
37
Summary
38
Chapter 6. Server Configuration Tuning
42
New server tuning
43
Dedicated server guidelines
44
Shared server guidelines
45
pgtune
46
Summary
47
Chapter 7. Routine Maintenance
50
Autoanalyze
52
Detailed data and index page monitoring
54
Summary
55
Chapter 8. Database Benchmarking
57
Running pgbench manually
58
Graphing results with pgbench-tools
61
pgbench custom tests
62
Transaction Processing Performance Council benchmarks
63
Summary
64
Chapter 9. Database Indexing
67
Index types
69
Summary
70
Chapter 10. Query Optimization
71
Sample data sets
73
Query plan node structure
80
Executing other statement types
83
Summary
84
Chapter 11. Database Activity and Statistics
85
Statistics views
86
Cumulative and live views
87
Table statistics
88
Index statistics
89
Database wide totals
90
Connections and activity
92
Disk usage
94
Summary
95
Chapter 12. Monitoring and Trending
99
Summary
100
Chapter 13. Pooling and Caching
102
Database caching
103
Summary
104
Chapter 14. Scaling with Replication
107
Special application requirements
108
Other interesting replication projects
109
Summary
110
Chapter 15. Partitioning Data
113
Summary
114
Chapter 16. Avoiding Common Problems
119
Summary

pgBouncer

The PostgreSQL connection pooler with the highest proven performance in the field is pgBouncer, a project originating as part of the database scaling work done by Skype: http://pgfoundry.org/projects/pgbouncer/.

Designed to be nothing but a high-performance connection pooler, it excels at solving that particular problem. pgBouncer runs as a single process, not spawning a process per connection. The underlying architecture, which relies on a low-level UNIX library named libevent, was already proven for this purpose in the field—the memcached program uses the same approach. The internal queue management for waiting connections is configurable so that it's easy to avoid timeouts.

And when time comes to monitor the pool itself, it displays its internal information by a database interface you can even send commands to, serving to both provide information and provide a control console. Simply connect to the pgbouncer database on the port where pgBouncer is running, using the standard psql tool, and you can use the SHOW command to get a variety of information about the internal state of the pool. The console interface accepts commands like PAUSE and RESUME to control the operation of the pool.

Another neat feature of pgBouncer is that it can connect to multiple underlying database servers. You can have databases on different hosts look like different databases on the single host the pool is running. This allows a form of partitioning for scaling upward if your system's load is split among many databases. Simply move each database to its own host and merge them together using pgBouncer as the intermediary, and your application won't even need to be changed.

If you have hundreds or thousand of connections and are out of CPU time, pgBouncer should be your first consideration as a way to reduce the amount of processor time being used. The main situations where pgpool-II works better at this point are ones where its load-balancing features mesh well with the replication approach being used.

Application server pooling

Depending on the application you're running, you may not need to use a database-level connection pooler. Some programming models include what's referred to as an application server, an idea popularized by Java. Popular application servers for Java include Tomcat, JBoss, and others. The Java database access library, JDBC includes support for connection pooling. Put those together, and you might get efficient database connection pooling without adding any more software to the mix. Tomcat calls this its Database Connection Pool (DBCP). A longer list of open-source pooling software is available at http://java-source.net/open-source/connection-pools and commercial vendors selling application servers might include their own pooler.

There are also application poolers available for some other programs, too. It's not an idea unique to Java application servers. If you have such an application level pooling solution available, you should prefer it for two main reasons, beyond just reducing complexity. First, it's probably going to be faster than passing through an additional layer of software just for pooling purposes. Second, monitoring of the pool is integrated into the application server already. You'll still need to monitor the database underneath the pool.