Book Image

PostgreSQL for Data Architects

By : Jayadevan M
Book Image

PostgreSQL for Data Architects

By: Jayadevan M

Overview of this book

Table of Contents (19 chapters)
PostgreSQL for Data Architects
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Initializing a cluster


First we add an OS user. This user will be used to start/stop/restart the database. This user will also be the superuser for the cluster. The following command has to be executed as root or with sudo:

adduser postgres

Tip

The new user need not be named postgres. It can be mydbadmin, mydba, or anything we fancy.

Next, we create a directory that will be the base directory for the new cluster. This could be anywhere. A standard location can be /usr/local/pgsql/data. However, you might want to have the database cluster on a separate partition. In the event of an OS and associated file system crash, your database data remains intact. It can also be that you want to use faster spinning disks or Solid State Disks for the database cluster to improve performance. In short, performance and/or reliability concerns can make you choose a location other than the default location to initialize the database cluster. As root, we execute the commands:

[root@MyCentOS extension]# mkdir -p /pgdata/9.3

The -p option ensures that the parent directory is also created if it is nonexistent:

[root@MyCentOS extension]# chown postgres /pgdata/9.3

Then, we switch user to postgres. This is necessary because when we initialize the cluster, the user under which the command was executed becomes the owner of the cluster. The server process will also be owned by this user. We will go with the standard user –postgres, which we created:

su - postgres

The next step is to run the initdb script. Well! Not exactly. If we run initdb now, we will get an error:

[postgres@MyCentOS ~]$ initdb
-bash: initdb: command not found

This is because we haven't added the directory containing the PostgreSQL executables to the environment variable PATH yet. We could provide the absolute path and make it work. However, it is better to set the variables.

In .bash_profile of postgres, we have the following lines:

PATH=$PATH:$HOME/bin
export PATH

Just before the export PATH line, add:

PATH=$PATH:/usr/local/pgsql/bin

Then, try this:

[postgres@MyCentOS ~]$ which initdb
/usr/bin/which: no initdb in (/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/postgres/bin)

Not surprising, as .bash_profile doesn't get executed unless we source it or log out and log in. Log out, log in, and try again:

[postgres@MyCentOS ~]$ exit
logout
[root@MyCentOS ~]# su - postgres
[postgres@MyCentOS ~]$ which initdb
/usr/local/pgsql/bin/initdb

Now we are good to go! It's a good idea to execute the following:

initdb --help | more

Among the many parameters available, the important ones, in most cases, will be -D or --pgdata. This parameter is used to define the directory where the cluster will be initialized (where the database cluster should be stored). This is the only mandatory parameter. Another parameter that can be useful is --pwprompt. Using this, we can set the password for the database superuser. So, we execute the following command:

initdb --pgdata=/pgdata/9.3 --pwprompt

If this is not set now and password authentication is to be used, we have to set the password later, as shown here:

As seen in the preceding screenshot, the process asks for the superuser password. Towards the end, it gives a warning that the trust authentication is enabled for local connections. This means that it will be possible to make connections from the localhost without being prompted for a password. It's a good idea to change this setting. We will come to this later. For more options available for initdb, please refer to http://www.postgresql.org/docs/current/static/app-initdb.html.

As always, let's see what happened; which directories got created when we initialized the cluster:

cd /pgdata/9.3
[postgres@MyCentOS 9.3]$ find ./ -maxdepth 1 -type d
./
./base
./pg_stat
./pg_clog
./pg_xlog
./pg_tblspc
./pg_twophase
./pg_subtrans
./global
./pg_notify
./pg_stat_tmp
./pg_snapshots
./pg_multixact
./pg_serial

A quick walk through the directories

We will start the cluster as it will help us to relate the files system with the databases:

[postgres@MyCentOS base]$ expo
rt PGDATA=/pgdata/9.3/

PGDATA is the default data directory location.

[postgres@MyCentOS base]$ pg_ctl start
server starting
[postgres@MyCentOS base]$ LOG:  database system was shut down at 2013-10-13 13:48:07 IST
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started

The pg_ctl is a utility to start, stop, check the status, or restart the database cluster. Passing init or initdb as a parameter results in the initialization of a cluster. More options can be explored at http://www.postgresql.org/docs/current/static/app-pg-ctl.html.

Now, we will go through the directories:

  • base: This directory holds the databases that will be created by database users. This also holds the pg_defaulttablespace with the databases: postgres, template0, and template1:

    [postgres@MyCentOS base]$ pwd
    /pgdata/9.3/base
    [postgres@MyCentOS base]$ find ./ -type d
    ./
    ./1
    ./12891
    ./12896
    [postgres@MyCentOS base]$ oid2name
    All databases:
    Oid  Database Name  Tablespace
    ----------------------------------
      12896       postgrespg_default
      12891      template0pg_default
          1      template1  pg_default
    
    • We can see here that Oid points to the directory names. We can try creating and dropping a database, as shown in the following code:

      [postgres@MyCentOS base]$ pwd
      /pgdata/9.3/base
      [postgres@MyCentOS base]$ psql
      psql (9.3.0)
      Type "help" for help.
      
      postgres=# \! ls
      1  12891  12896
      postgres=# CREATE DATABASE test;
      CREATE DATABASE
      postgres=# \! ls
      1  12891  12896  16385
      postgres=# DROP DATABASE test;
      DROP DATABASE
      postgres=# \! ls
      1  12891  12896
      

      Tip

      OID stands for Object Identifier. They are used internally by PostgreSQL as primary keys for various system tables. Objects created in the cluster (tables, databases, and so on) will have OIDs associated with them. The oid2name utility is available as a contrib module and helps us examine the databases, tables, and related file nodes.

    • We note a few things here. First, we were not prompted for a password (remember the warning about trust authentication?). Next, we can execute host commands from psql (more on this in a later chapter). Third, creating/dropping databases are actually similar to creating/deleting directories. PostgreSQL does do quite a bit of internal book-keeping when we create or drop objects in addition to manipulating directories.

  • global: This directory contains cluster-wide tables. There are many tables and associated views that keep track of the entire cluster, namely, database roles, system catalog data, and so on.

  • pg_clog: This directory contains the transaction commit status data.

  • pg_multixact: This directory contains multitransaction status data (concurrent transactions waiting for a lock).

  • pg_notify: This directory contains the LISTEN/NOTIFY status data.

  • pg_serial: This directory contains information about committed serializable transactions.

  • pg_snapshots: This directory contains exported snapshots.

  • pg_stat_tmp: This directory contains temporary files for the statistics subsystem.

  • pg_subtrans: This directory contains subtransaction status data.

  • pg_tblspc: This directory contains symbolic links to tablespaces.

  • pg_twophase: This directory contains state files for prepared transactions.

  • pg_xlog: This directory contains Write Ahead Log (WAL) files.

In short, we have directories to hold containers of real user created data (tables and tablespaces), directories to hold data about all the data and data structures (metadata), and then directories to hold data about the state of transactions.

Processes created

Let's take a look at the processes spawned when we started PostgreSQL:

[root@MyCentOS ~]# ps  -fupostgres

The following screenshot illustrates the processes spawned:

We can see that there is one parent process (pid1566), which spawned a few child processes.

Important files created

One important file that gets created when we initialize a database cluster is postgresql.conf. This file contains a number of critical parameters related to the server processes and resource allocation, memory management, logging, file locations, and so on. Another file is pg_hba.conf."hba" (which stands for host-based authentication). Changes to these files are necessary to enable client connection to the database from a different machine. Both of these are in the PGDATA folder.

The postmaster.pid file in the same directory is used by pg_ctl to determine whether the database is running:

[postgres@MyCentOS 9.3]$ pg_ctl status
pg_ctl: server is running (PID: 1566)

The contents of the file are as follows:

[postgres@MyCentOS 9.3]$ head -1 postmaster.pid
1566

The 1566 number in the pid file is the same as what we got for the parent process when we did a process listing earlier.