Book Image

CouchDB and PHP Web Development Beginner's Guide

By : Tim Juravich
Book Image

CouchDB and PHP Web Development Beginner's Guide

By: Tim Juravich

Overview of this book

CouchDB is a NoSQL database which is making waves in the development world. It's the tool of choice for many PHP developers so they need to understand the robust features of CouchDB and the tools that are available to them.CouchDB and PHP Web Development Beginner's Guide will teach you the basics and fundamentals of using CouchDB within a project. You will learn how to build an application from beginning to end, learning the difference between the "quick way"ù to do things, and the "right way"ù by looking through a variety of code examples and real world scenarios. You will start with a walkthrough of setting up a sound development environment and then learn to create a variety of documents manually and programmatically. You will also learn how to manage their source control with Git and keep track of their progress. With each new concept, such as adding users and posts to your application, the author will take you through code step-by-step and explain how to use CouchDB's robust features. Finally, you will learn how to easily deploy your application and how to use simple replication to scale your application.
Table of Contents (17 chapters)
CouchDB and PHP Web Development Beginner's Guide
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
4
Starting your Application

The NoSQL database evolution


In the early 1960s, the term database was introduced to the world as a simple layer that would serve as the backbone behind information systems. The simple concept of separating applications from data was new and exciting, and it opened up possibilities for applications to become more robust. At this point, databases existed first as tape-based devices, but soon became more usable as system direct-access storage on disks.

In 1970, Edgar Codd proposed a more efficient way of storing data — the relational model. This model would also use SQL to allow the applications to find the data stored within its tables. This relational model is nearly identical to what we know as traditional relational databases today. While this model was widely accepted, it wasn't until the mid 1980s that there was hardware that could actually make effective use of it. By 1990, hardware finally caught up, and the relational model became the dominant method for storing data.

Just as in any area of technology, competition arose with Relational Database Management Systems (RDBMS) . Some examples of popular RDMBS systems are Oracle, Microsoft SQL Server, MySQL, and PostgreSQL.

As we moved past the year 2000, applications began to produce incredible amounts of data through more complex applications. Social networks entered the scene. Companies wanted to make sense of the vast amounts of data that were available. This shift brought up some serious concerns about the datastructure, scalability, and availability of data that the relational model didn't seem to handle. With the uncertainty of how to manage this large amount of ever-changing data, the term NoSQL emerged.

The term NoSQL isn't short for "no SQL;" it actually stands for "not only SQL". NoSQL databases are a group of persistent solutions, which do not follow the relational model and do not use SQL for querying. On top of that, NoSQL wasn't introduced to replace relational databases. It was introduced to complement relational databases where they fell short.

What makes NoSQL different

Besides the fact that NoSQL databases do not use SQL to query data, there are a few key characteristics of NoSQL databases. In order to understand these characteristics, we'll need to cover a lot of terminology and definitions. It's not important that you memorize or remember everything here, but it's important for you to know exactly what makes up a NoSQL database.

The first thing that makes NoSQL databases different is their data structure. There are a variety of different ways in which NoSQL databases are classified.

Classification of NoSQL databases

NoSQL databases (for the most part) fit into four main data structures:

  • Key-value stores: They save data with a unique key and a value. Their simplicity allow them to be incredibly fast and scale to enormous sizes.

  • Column stores: They are similar to relational databases, but instead of storing records, they store all of the values for a column together in a stream.

  • Document stores: They save data without it being structured in a schema, with buckets of key-value pairs inside a self-contained object. This datastructure is reminiscent of an associative array in PHP. This is where CouchDB lands on the playing field. We'll go much deeper into this topic in Chapter 3, Getting Started with CouchDB and Futon.

  • Graph databases: They store data in a flexible graph model that contains a node for each object. Nodes have properties and relationships to other nodes.

We won't go too deeply into examples of each of these types of databases, but it's important to look at the different options that are out there. By looking at databases at this level, it's relatively easy for us to see (in general) how the data will scale to size and complexity, by looking at the following screenshot:

If you look at this diagram, you'll see that I've placed a Typical Relational Database with a crude performance line. This performance line gives you a simple idea of how a database might scale in size and complexity. How is it possible that NoSQL databases perform so much better in regards to high size and complexity of data?

For the most part, NoSQL databases are scalable because they rely on distributed systems and ignore the ACID model. Let's talk through what we gain and what we give up through a distributed system, and then define the ACID model.

When talking about any distributed system (not just storage or databases), there is a concept that defines the limitations of what you can do. This is known as the CAP theorem.

CAP theorem

Eric Brewer introduced the CAP theorem in the year 2000. It states that in any distributed environment, it is impossible for it to provide three guarantees.

  • Consistency: All the servers in the system will have the same data. So, anyone using the system will get the latest data, regardless of which node they talk to in the distributed system.

  • Availability: All of the servers will always return data.

  • Partition-tolerance: The system continues to operate as a whole, even if an individual server fails or cannot be reached.

By looking at these choices, you can tell that it would definitely be ideal to have all three of these things guaranteed, but it's theoretically impossible. In the real world, each NoSQL database picks two of the three options, and usually develops some kind of process to mitigate the impact of the third, unhandled property.

We'll talk about which approach CouchDB takes shortly, but there is still a bit to learn about another concept that NoSQL databases avoid: ACID.

ACID

ACID is a set of properties that apply to database transactions, which are the core of traditional relational databases. While transactions are incredibly powerful, they are also one of the things that make reading and writing quite a bit slower in relational databases.

ACID is made up of four main properties:

  • Atomicity: This is an all or nothing approach to dealing with data. Everything in the transaction must happen successfully, or none of the changes are committed. This is a key property whenever money or currency is handled in a system, and requires a system of checks and balances.

  • Consistency: Data will only be accepted if it passes all of the validation in place on the database, such as triggers, data types, and constraints.

  • Isolation: Transactions will not affect other transactions that are occurring, and other users won't see partial results of a transaction in progress.

  • Durability: Once the data is saved, it is safe against errors, crashes, and other software malfunctions.

Again, as you read through the definition of ACID, you are probably thinking to yourself, "These are all must haves!" That may be the case, but keep in mind that most NoSQL databases do not fully employ ACID, because it's near impossible to have all of these restrictions and still have blazing fast writes to data.

So what does all of that mean?

I've given you a lot of definitions now, but let's try to wrap it together into a few simple lists. Let's talk through the advantages and disadvantages of NoSQL databases, when to use, and when to avoid NoSQL databases.

Advantages of NoSQL databases

With the introduction of NoSQL databases, there are lot of advantages:

  • You can do things that simply weren't possible with the processing and query power of traditional relational databases.

  • Your data is scalable and flexible, allowing it to scale to size and complexity faster, right out of the box.

  • There are new data models to consider. You don't have to force your data into a relational model if it doesn't make sense.

  • Writing data is blazing fast.

As you can see, there are some clear advantages of NoSQL databases, but as I mentioned before, there are still some negatives that we need to consider.

Negatives of NoSQL databases

However, along with the good, there's also some bad:

  • There are no common standards; each database does things just a little bit differently

  • Querying data does not involve the familiar SQL model to find records

  • NoSQL databases are still relatively immature and constantly evolving

  • There are new data models to consider; sometimes it can be confusing to make your data fit

  • Because a NoSQL database avoids the ACID model, there is no guarantee that all of your data will be successfully written

Some of those negatives may be pretty easy for you to stomach, except for NoSQL's avoidance of the ACID model.

When you should use NoSQL databases

Now that we have a good take on the advantages and disadvantages, let's talk about some great use cases for using NoSQL databases:

  • Applications that have a lot of writing

  • Applications where the schema and structure of the data might change

  • Large amount of unstructured or semi-structured data

  • Traditional relational databases feel restricting, and you want to try something new.

That list isn't exclusive, but there are no clear definitions on when you can use NoSQL databases. Really, you can use them for just about every project.

When you should avoid NoSQL databases

There are, however, some pretty clear areas that you should avoid when storing data in NoSQL.

  • Anything involving money or transactions. What happens if one record doesn't save correctly because of NoSQL avoidance of the ACID model or the data isn't 100 percent available because of the distributed system?

  • Business critical data or line of business applications, where missing one row of data could mean huge problems.

  • Heavily-structured data requiring functionality in a relational database.

For all of these use cases, you should really focus on using relational databases that will make sure that your data is safe and sound. Of course, you can always include NoSQL databases where it makes sense.

When choosing a database, it's important to remember that "There is no silver bullet." This phrase is used a lot when talking about technology, and it means that there is no one technology that will solve all of your problems without having any side effects or negative consequences. So choose wisely!