Book Image

PostgreSQL 9.0 High Performance

Book Image

PostgreSQL 9.0 High Performance

Overview of this book

PostgreSQL database servers have a common set of problems they encounter as their usage gets heavier and requirements more demanding. You could spend years discovering solutions to them all, step by step as you encounter them. Or you can just look in here.All successful database applications are destined to eventually run into issues scaling up their performance. Peek into the future of your PostgreSQL database's problems today. Know the warning signs to look for, and how to avoid the most common issues before they even happen.Surprisingly, most PostgreSQL database applications evolve in the same way: Choose the right hardware. Tune the operating system and server memory use. Optimize queries against the database, with the right indexes. Monitor every layer, from hardware to queries, using some tools that are inside PostgreSQL and others that are external. Using monitoring insight, continuously rework the design and configuration. On reaching the limits of a single server, break things up; connection pooling, caching, partitioning, and replication can all help handle increasing database workloads. The path to a high performance database system isn't always easy. But it doesn't have to be mysterious with the right guide.
Table of Contents (120 chapters)
Preface
Free Chapter
1
What this book covers
2
What you need for this book
3
Who this book is for
4
Conventions
5
Reader feedback
7
Chapter 1. PostgreSQL Versions
9
PostgreSQL or another database?
11
PostgreSQL application scaling lifecycle
12
Performance tuning as a practice
13
Summary
14
Chapter 2. Database Hardware
17
Summary
18
Chapter 3. Database Hardware Benchmarking
23
Summary
24
Chapter 4. Disk Setup
25
Maximum filesystem sizes
26
Filesystem crash recovery
28
Solaris and FreeBSD filesystems
29
Windows filesystems
31
Summary
37
Summary
38
Chapter 6. Server Configuration Tuning
42
New server tuning
43
Dedicated server guidelines
44
Shared server guidelines
45
pgtune
46
Summary
47
Chapter 7. Routine Maintenance
50
Autoanalyze
52
Detailed data and index page monitoring
54
Summary
55
Chapter 8. Database Benchmarking
57
Running pgbench manually
58
Graphing results with pgbench-tools
61
pgbench custom tests
62
Transaction Processing Performance Council benchmarks
63
Summary
64
Chapter 9. Database Indexing
67
Index types
69
Summary
70
Chapter 10. Query Optimization
71
Sample data sets
73
Query plan node structure
80
Executing other statement types
83
Summary
84
Chapter 11. Database Activity and Statistics
85
Statistics views
86
Cumulative and live views
87
Table statistics
88
Index statistics
89
Database wide totals
90
Connections and activity
92
Disk usage
94
Summary
95
Chapter 12. Monitoring and Trending
99
Summary
100
Chapter 13. Pooling and Caching
102
Database caching
103
Summary
104
Chapter 14. Scaling with Replication
107
Special application requirements
108
Other interesting replication projects
109
Summary
110
Chapter 15. Partitioning Data
113
Summary
114
Chapter 16. Avoiding Common Problems
119
Summary

pgpool-II

The oldest of the PostgreSQL compatible packages used for connection pooling that's still in development, pgpool-II improves on the original pgpool in a variety of ways: http://pgpool.projects.postgresql.org/.

Its primary purpose is not just connection pooling, it also provides load balancing and replication related capabilities. It even supports some parallel query setups, where queries can be broken into pieces and spread across nodes where each has a copy of the information being asked about. The "pool" in pgpool is primarily to handle multiple servers, with the program serving as a proxy server between the clients and some number of databases.

There are a few limitations to pgpool-II setup to serve as a connection pooler. One is that each connection is set up as its own process, similar to the database only re-used. The memory overhead of that approach, with each process using a chunk of system RAM, can be significant. pgpool-II is not known for having powerful monitoring tools either. But the main drawback of the program is its queuing model. Once you've gone beyond the number of connections that it handles, additional ones are queued up at the operating system level, with each connection waiting for its network connection to be accepted. This can result in timeouts that depend on the network configuration, which is never a good position to be in. It's a good idea to proactively monitor the "waiting for connection" time in your application and look for situations where it's grown very large, to let you correlate that with any timeouts that your program might run into.

pgpool-II load balancing for replication scaling

Because of its replication and load balancing related features, for some purposes pgpool-II is the right approach even though it's not necessarily optimal as just a connection pool. pgpool-II supports what it calls master/slave mode, for situations where you have a master database that handles both reads and writes as well as some number of replicated slaves that are only available for reading.

The default replication software it assumes you're using, and only one available in older versions of the software, requires you have a set of databases all kept in sync using the Slony-I replication software. A common setup is to have a pgpool-II proxy in front of all your nodes, to spread the query load across them. This lets you scale up a read-only load in a way that's transparent to the application, presuming every node is qualified to answer every query.

Starting in pgpool-II 3.0, you can use this feature with the PostgreSQL 9.0 streaming replication and Hot Standby capabilities too. The read-only slaves will still be a subject to the limitations of Hot Standby described in the Chapter 14, Scaling with Replication. But within those, pgpool-II will handle the job of figuring out which statements must execute on the master and which can run against slaves instead.

As with the Slony case, it does that by actually parsing the statement that's executing to figure out how to route it. The way it makes that decision is covered in the pgpool-II documentation. This is one of the reasons pgpool-II is slower than pgBouncer, that it's actually interpreting the SQL executing. But as it enables the intelligent routing capability too, this may be worth doing.