Book Image

PostgreSQL 9 Administration Cookbook - Second Edition

Book Image

PostgreSQL 9 Administration Cookbook - Second Edition

Overview of this book

Table of Contents (19 chapters)
PostgreSQL 9 Administration Cookbook Second Edition
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Randomly sampling data


DBAs may be asked to set up a test server and populate it with test data. Often, that server will be old hardware, possibly with smaller disk sizes. So, the subject of data sampling raises its head.

The purpose of sampling is to reduce the size of the data set and improve the speed of later analysis. Some statisticians are so used to the idea of sampling that they may not even question whether its use is valid or it can cause further complications.

How to do it…

In this section, we will take a random sample of a given collection of data (for example, a given table). First, you should realize that there isn't a simple tool to slice off a sample of your database. It would be neat if there were, but there isn't. You'll need to read all of this to understand why:

  1. We first consider using SQL to derive a sample. Random sampling is actually very simple because we can use the random() SQL function within the WHERE clause. Consider the following example:

    postgres=# SELECT count...