Traditional databases are arranged by fields, records, and files. A field is defined as a single piece of information; a record is one complete set of fields; and a file is a collection of records. This recipe explains traditional databases and the bottlenecks in using them.
Let us look at the features of traditional databases in this section.
The traditional databases available today support only the storage of data. The data may be coming in from a variety of data sources, that is, data may be in an unstructured format or from data marts, operational datastores, data warehouses, and so on. Every year, a massive amount of data is being created, and for an organization, it is always critical to make decisions based on this big chunk of data. There are a few challenges, such as cost, latency, architecture, and complexity, in accessing these databases for analyzing Big Bata in real time. These result in inadequate access to complete data, and there will be a lag in gathering data and analyzing it.
Let us consider the following simple example to get an idea of the amount of data created on the Internet every minute:
With the evolution of e-commerce, it is very necessary for organizations to remain competitive. To achieve this, the data of the clients who visit a company's website has to be captured and analyzed. This analysis helps the company draw two major findings:
Customer behavior can be analyzed by analyzing customers' usage patterns. This helps companies understand the types of customers visiting their websites.
Customer satisfaction can be increased by catering to their requirements. These can be easily found out by analyzing the usage pattern of their company website.
When the preceding points are considered, it is a huge business advantage, and effective ways of advertising can be determined. We can achieve this huge advantage using clickstreams; organizations have already understood the importance of clickstreams and are in the process of building Business Intelligence based on this clickstream data, which helps monitor the data, analyze it, and make decisions. There are several techniques to achieve better results in data recording and analyzing. One of the techniques is usage of data mining, column-oriented DBMS, and integrated OLAP systems, in combination with clickstreams.
It is a very well-known fact that data never sleeps; creation of data is continuous and will increase with population growth. Analyzing data in huge amounts in real time is only a dream when working with traditional databases.
Having looked at the features of traditional databases, now let us see the bottlenecks in using them.
As mentioned in the beginning of the section, there are a few challenges in traditional databases, such as latency, the cost involved, and complexity in accessing databases.
Databases store data in secondary storage devices. When applications are built on databases to analyze the data, disk I/O operation is the main problem in data throughput. The CPU waits for the data to be loaded from the disk to a CPU cache, which leads to very high latency. There were many changes made to existing systems to minimize disk access, which in turn have minimized the number of pages loaded to the main memory when processing a query. The following diagram shows the evolution of memory bandwidth and CPU clock speed over the years:
With the advent of multicore CPUs and the declining cost of memory, computer architecture has changed in the recent past by hosting an entire database in the RAM.
In the current scenario, multicore CPUs (multiple CPUs on one chip or in one package) have become standard, which enables fast communication between processor cores. With the advent of these changes in the technology, main memory is not a limited resource now. There are servers that can have a system memory of up to 2 TB. This helps us to store an entire database in the RAM itself. The processors used in these servers have up to 64 cores and, in the near future, they are expected to get 128 cores. When the number of cores increases, CPUs can process huge amounts of data simultaneously. When this happens, a performance bottleneck is considered to have occurred between the CPU cache and main memory, and not for disk I/O operations.
In mainframes, transactional data and applications are stored on the same system. This is due to the fact that applications, operating systems, and the underlying databases share the same hardware resources. This means that we can't process transactions and reports concurrently. The problem here is cost. If we want to scale, we need another mainframe, which includes high cost. The cost of memory has come down drastically. This has brought on a revolution in increasing the size of memory. The following graph shows the fall of memory price over the years:
From the preceding graph, it is very clear that the cost of memory has come down tremendously and is predicted to go down further in the near future.
Present day applications running on traditional databases follow a 3-tier architecture. This is because databases are not capable of doing any calculations that involve complex logic or huge amounts of data; they are only capable of storing the data. There is a need for one more layer between the database and presentation layers—the application layer—to take care of all the calculations, based on which business logics from the base fields are implemented, as shown in the following diagram:
Let us look at each layer in detail:
Presentation Layer: This is the top-most layer and allows users to manipulate data so that they can input it for querying. This data input from users is passed on to the database layer through the application layer and the results are passed back to the application layer to implement business logics. The presentation layer can be anything—the web browser, SAP GUI, SAP BEx, SAP Business Objects, and so on. These tools will be installed on individual client machines.
Application Layer: This layer is also called the business layer. All the business logic will be executed in this layer. It controls the application's functionality by performing detailed processing. This can be installed on one machine or distributed across more than one system.
Database Layer: This layer receives data from the business layer and performs the required operation from the database. It contains database servers that store the data. Data is stored independently of application layers or business logics. The database layer remains as an internal interface and is not exposed to the end users. The application layer has to access the data in the database only through this layer.
The basics of RDBMS concepts is available at http://www.srnr.arizona.edu/rnr/rnr417/rdbms.pdf