Book Image

PostgreSQL 9 Admin Cookbook

By : Simon Riggs, Hannu Krosing
Book Image

PostgreSQL 9 Admin Cookbook

By: Simon Riggs, Hannu Krosing

Overview of this book

<p>PostgreSQL is a powerful, open source object-relational database system. An enterprise database, PostgreSQL includes features such as Multi-Version Concurrency Control (MVCC), point-in-time recovery, tablespaces, asynchronous replication, nested transactions (savepoints), online/hot backups, a sophisticated query planner/optimizer, and write-ahead logging for fault tolerance. PostgreSQL 9 Admin cookbook covers everything a database administrator needs to protect, manage and run a healthy and efficient PostgreSQL 9.0 database.</p> <p>PostgreSQL 9 Admin Cookbook describes key aspects of the PostgreSQL open source database system. The book covers everything a sysadmin or DBA needs to protect, manage, and run a healthy and efficient PostgreSQL 9 database. This hands-on guide will assist developers working on live databases, supporting web or enterprise software applications using Java, Python, Ruby, or .Net from any development framework. It's easy to manage your database when you've got PostgreSQL 9 Admin Cookbook to hand.</p> <p>PostgreSQL is fast becoming one of the world's most popular server databases with an enviable reputation for performance, stability, and an enormous range of advanced features. PostgreSQL is one of the oldest open source projects, completely free to use and developed by a very diverse worldwide community. Most of all, It Just Works!</p> <p>PostgreSQL 9 Admin Cookbook offers the information you need to manage your live production databases on PostgreSQL. The book contains insights direct from the main author of the PostgreSQL replication and recovery features, and the database architect of the most successful startup using PostgreSQL, Skype.</p> <p>This practical guide gives quick answers to common questions and problems, building on the authors' experience as trainers, users, and core developers of the PostgreSQL database server.</p> <p>Each technical aspect is broken down into short recipes that demonstrate solutions with working code then explain why and how that works. The book is intended to be a desk reference for both new users and technical experts.</p> <p>The book covers all the latest features in PostgreSQL 9. Soon you will be running a smooth database with ease!</p>
Table of Contents (18 chapters)
PostgreSQL 9 Administration Cookbook
Credits
About the Authors
About the Reviewers
Preface
Index

Randomly sampling data


DBAs may be asked to set up a test server, and populate it with test data. Often, that server will be old hardware, possibly with smaller disk sizes. So, the subject of data sampling raises its head.

The purpose of sampling is to reduce the size of the data set and improve the speed of later analysis. Some statisticians are so used to the idea of sampling that they may not even question whether its use is valid, or cause further complications.

How to do it...

First, you should realize that there isn't a simple tool to slice off a sample of your database. It would be neat if there were, but there isn't. You'll need to read all of this to understand why.

We first need to consider some SQL to derive a sample. Random sampling is actually very simple, because we can use the SQL function random() within the WHERE clause. For example:

postgres=# SELECT count(*) FROM mybigtable;
 count
-------
 10000
(1 row)
postgres=# SELECT count(*) FROM mybigtable WHERE random() < 0.01;
...