Book Image

PostgreSQL Server Programming - Second Edition

Book Image

PostgreSQL Server Programming - Second Edition

Overview of this book

Table of Contents (21 chapters)
PostgreSQL Server Programming Second Edition
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Programming best practices


Developing application software is complicated. Some of the approaches that help manage this complexity are so popular that they have been given simple acronyms that can be remembered. Next, we'll introduce some of these principles and show you how server programming helps make them easier to follow.

KISS – keep it simple stupid

One of the main techniques to successful programming is writing simple code. That is, writing code that you can easily understand 3 years from now and that others can understand as well. It is not always achievable, but it almost always makes sense to write your code in the simplest way possible. You can rewrite parts of it later for various reasons such as speed, code compactness, to show off how clever you are, and so on. However, always write the code in a simple way first, so that you can be absolutely sure that it does what you want. Not only do you get working on the code quickly, but you also have something to compare to when you try more advanced ways to do the same thing.

Remember, debugging is harder than writing code; so, if you write the code in the most complex way you can, you will have a really hard time debugging it.

It is often easier to write a set returning function instead of a complex query. Yes, it will probably run slower than the same thing implemented as a single complex query, due to the fact that the optimizer can do very little to the code written as functions, but the speed may be sufficient for your needs. If more speed is required, it's very likely to refactor the code piece by piece, joining parts of the function into larger queries where the optimizer has a better chance of discovering better query plans until the performance is acceptable again.

Remember that most of the time, you don't need the absolutely fastest code. For your clients or bosses, the best code is the one that does the job well and arrives on time.

DRY – don't repeat yourself

This principle means you should implement any piece of business logic just once and put the code for doing it in the right place.

This may be hard sometimes; for example, you want to do some checks on your web forms in the browser, but still do the final checks in the database. However, as a general guideline, it is very much valid.

Server programming helps a lot here. If your data manipulation code is in the database near the data, all the data users have easy access to it, and you will not need to manage a similar code in a C++ Windows program, two PHP websites, and a bunch of Python scripts doing nightly management tasks. If any of them need to do this thing to a customer's table, they just call:

SELECT * FROM do_this_thing_to_customers(arg1, arg2, arg3);

That's it!

If the logic behind the function needs to be changed, you just change the function with no downtime and no complicated orchestration of pushing database query updates to several clients. Once the function is changed in the database, it is changed for all the users.

YAGNI – you ain't gonna need it

In other words, don't do more than you absolutely need to.

If you have a creepy feeling that your client is not yet well aware of how the final database will look or what it will do, it's helpful to resist the urge to design everything into the database. A much better way is to do a minimal implementation that satisfies the current specifications, but do it with extensibility in mind. It is very easy to "paint yourself into a corner" when implementing a big specification with large imaginary parts.

If you organize your access to the database through functions, it is often possible to do even large rewrites of business logic without touching the frontend application code. Your application still performs SELECT * FROM do_this_thing_to_customers(arg1, arg2, arg3), even after you have rewritten the function five times and changed the whole table structure twice.

SOA – service-oriented architecture

Usually, when you hear the acronym SOA, it will be from enterprise software people trying to sell you a complex set of SOAP services. But the essence of SOA is to organize your software platform as a set of services that clients, and other services, call in order to perform certain well-defined atomic tasks, as follows:

  • Checking a user's password and credentials

  • Presenting him/her with a list of his/her favorite websites

  • Selling him/her a new red dog collar with a complementary membership in the red-collared dog club

These services can be implemented as SOAP calls with corresponding WSDL definitions and Java servers with servlet containers, as well as a complex management infrastructure. They can also be a set of PostgreSQL functions, taking a set of arguments and returning a set of values. If the arguments or return values are complex, they can be passed as XML or JSON, but a simple set of standard PostgreSQL data types is often enough. In Chapter 10, Scaling Your Database with PL/Proxy, you will learn how to make such a PostgreSQL-based SOA service infinitely scalable.

Type extensibility

Some of the preceding techniques are available in other databases, but PostgreSQL's extensibility does not stop here. In PostgreSQL, you can just write UDFs in any of the most popular scripting languages. You can also define your own types, not just domains, which are standard types with some extra constraints attached, and new full-fledged types too.

For example, a Dutch company, MGRID, has developed a value with unit set of data types, so that you can divide 10 km by 0.2 hours and get the result in 50 km/h. Of course, you can also cast the same result to meters per second or any other unit of speed. And yes, you can get this as a fraction of c—the speed of light.

This kind of functionality needs both the types and overloaded operands, which know that if you divide distance by time, then the result is speed. You will also need user-defined casts, which are automatically or manually-invoked conversion functions between types.

MGRID developed this for use in medical applications, where the cost of an error can be high—the difference between 10 ml and 10 cc can be vital. However, using a similar system might also have averted many other disasters, where wrong units ended up producing bad computation results. If the amount is always accompanied by the unit, the possibility for these kinds of errors is diminished. You can also add your own index method if you have some programming skills and your problem domain is not well served by the existing indexes. There is already a respectable set of index types included in the core PostgreSQL, as well as several others that are developed outside the core.

The latest index method that became officially included in PostgreSQL is k nearest neighbor (KNN)—a clever index, which can return K rows ordered by their distance from the desired search target. One use of KNN is in fuzzy text search, where this can be used to rank full-text search results by how well they match the search terms. Before KNN, this kind of thing was done by querying all the rows which matched even slightly, then sorting all these by the distance function, and returning K top rows as the final step.

If done using the KNN index, the index access can start returning the rows in the desired order; so, a simple LIMIT K function will return the K top matches.

The KNN index can also be used for real distances, for example, answering the request "Give me the 10 nearest pizza places to Central Station."

As you saw, index types are different from the data types they index. Another example, is the same General Inverted Index (GIN) can be used for full-text searches (together with stemmers, thesauri, and other text-processing stuff), as well as for indexing elements of integer arrays.