Book Image

Troubleshooting PostgreSQL

Book Image

Troubleshooting PostgreSQL

Overview of this book

Table of Contents (17 chapters)
Troubleshooting PostgreSQL
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Preventing encoding-related issues


Encoding-related problems are some of the most frequent problems that occur when people start with a fresh PostgreSQL setup. In PostgreSQL, every database in your instance has a specific encoding. One database might be en_US@UTF-8, while some other database might have been created as de_AT@UTF-8 (which denotes German as it is used in Austria).

To figure out which encodings your database system is using, try to run psql -l from your Unix shell. What you will get is a list of all databases in the instance that include those encodings.

So where can we actually expect trouble? Once a database has been created, many people would want to load data into the system. Let's assume that you are loading data into the aUTF-8 database. However, the data you are loading contains some ASCII characters such as ä, ö, and so on. The ASCII code for ö is 148. Binary 148 is not a valid Unicode character. In Unicode, U+00F6 is needed. Boom! Your import will fail and PostgreSQL will error out.

If you are planning to load data into a new database, ensure that the encoding or character set of the data is the same as that of the database. Otherwise, you may face ugly surprises.

To create a database using the correct locale, check out the syntax of CREATE DATABASE:

test=# \h CREATE DATABASE
Command:     CREATE DATABASE
Description: create a new database
Syntax:
CREATE DATABASE name
    [ [ WITH ] [ OWNER [=] user_name ]
           [ TEMPLATE [=] template ]
           [ ENCODING [=] encoding ]
           [ LC_COLLATE [=] lc_collate ]
           [ LC_CTYPE [=] lc_ctype ]
           [ TABLESPACE [=] tablespace_name ]
           [ CONNECTION LIMIT [=] connlimit ] ]

ENCODING and the LC* settings are used here to define the proper encoding for your new database.