Book Image

SAP HANA Cookbook

Book Image

SAP HANA Cookbook

Overview of this book

SAP HANA is a real-time applications platform that provides a multi-purpose, in-memory appliance. Decision makers in the organization can gain instant insight into business operations. Thus all the data available can be analysed and you can react to the changing business conditions rapidly to make decisions. The real-time platform not only empowers business users and top management to make decisions but also provides the capability to make decisions in real-time.A practical and comprehensive guide that helps you understand the power of SAP HANA’s real-time and in-memory capabilities. It also provides step-by-step instructions to exploit all the possible features of the SAP HANA database, enabling users to harness the full potential of this technology and its features.You will gain an understanding of real-time replications, effective data loading from various sources, how to load data, and how to create re-usable objects such as models and reports.Use this practical guide to enable or transform your business landscape by implementing SAP HANA to meet your business requirements. The book shows you how to load data from different types of systems, create models in SAP HANA, and consume data for decision-making. The book covers various tools at different stages creating models using SAP HANA Studio, and consuming data using reporting tools such as SAP BusinessObjects, SAP Lumira, and so on . This book also explains the in-depth architecture of SAP HANA to help you understand SAP HANA as an appliance, that is, a combination of hardware and software.The book covers the best practices to leverage SAP HANA’s in-memory technology to transform data into insightful information. It also covers technology landscaping, solution architecture, connectivity, data loading, and setting up the environment for modeling purpose (including setup of SAP HANA Studio).If you have an intention to start your career as SAP HANA Modeler, this book is the perfect start.  
Table of Contents (16 chapters)
SAP HANA Cookbook
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Explaining traditional databases and bottlenecks


Traditional databases are arranged by fields, records, and files. A field is defined as a single piece of information; a record is one complete set of fields; and a file is a collection of records. This recipe explains traditional databases and the bottlenecks in using them.

How it works…

Let us look at the features of traditional databases in this section.

Traditional databases

The traditional databases available today support only the storage of data. The data may be coming in from a variety of data sources, that is, data may be in an unstructured format or from data marts, operational datastores, data warehouses, and so on. Every year, a massive amount of data is being created, and for an organization, it is always critical to make decisions based on this big chunk of data. There are a few challenges, such as cost, latency, architecture, and complexity, in accessing these databases for analyzing Big Bata in real time. These result in inadequate access to complete data, and there will be a lag in gathering data and analyzing it.

Let us consider the following simple example to get an idea of the amount of data created on the Internet every minute:

With the evolution of e-commerce, it is very necessary for organizations to remain competitive. To achieve this, the data of the clients who visit a company's website has to be captured and analyzed. This analysis helps the company draw two major findings:

  • Customer behavior can be analyzed by analyzing customers' usage patterns. This helps companies understand the types of customers visiting their websites.

  • Customer satisfaction can be increased by catering to their requirements. These can be easily found out by analyzing the usage pattern of their company website.

When the preceding points are considered, it is a huge business advantage, and effective ways of advertising can be determined. We can achieve this huge advantage using clickstreams; organizations have already understood the importance of clickstreams and are in the process of building Business Intelligence based on this clickstream data, which helps monitor the data, analyze it, and make decisions. There are several techniques to achieve better results in data recording and analyzing. One of the techniques is usage of data mining, column-oriented DBMS, and integrated OLAP systems, in combination with clickstreams.

It is a very well-known fact that data never sleeps; creation of data is continuous and will increase with population growth. Analyzing data in huge amounts in real time is only a dream when working with traditional databases.

There's more…

Having looked at the features of traditional databases, now let us see the bottlenecks in using them.

Bottlenecks

As mentioned in the beginning of the section, there are a few challenges in traditional databases, such as latency, the cost involved, and complexity in accessing databases.

Latency

Databases store data in secondary storage devices. When applications are built on databases to analyze the data, disk I/O operation is the main problem in data throughput. The CPU waits for the data to be loaded from the disk to a CPU cache, which leads to very high latency. There were many changes made to existing systems to minimize disk access, which in turn have minimized the number of pages loaded to the main memory when processing a query. The following diagram shows the evolution of memory bandwidth and CPU clock speed over the years:

With the advent of multicore CPUs and the declining cost of memory, computer architecture has changed in the recent past by hosting an entire database in the RAM.

In the current scenario, multicore CPUs (multiple CPUs on one chip or in one package) have become standard, which enables fast communication between processor cores. With the advent of these changes in the technology, main memory is not a limited resource now. There are servers that can have a system memory of up to 2 TB. This helps us to store an entire database in the RAM itself. The processors used in these servers have up to 64 cores and, in the near future, they are expected to get 128 cores. When the number of cores increases, CPUs can process huge amounts of data simultaneously. When this happens, a performance bottleneck is considered to have occurred between the CPU cache and main memory, and not for disk I/O operations.

Cost

In mainframes, transactional data and applications are stored on the same system. This is due to the fact that applications, operating systems, and the underlying databases share the same hardware resources. This means that we can't process transactions and reports concurrently. The problem here is cost. If we want to scale, we need another mainframe, which includes high cost. The cost of memory has come down drastically. This has brought on a revolution in increasing the size of memory. The following graph shows the fall of memory price over the years:

From the preceding graph, it is very clear that the cost of memory has come down tremendously and is predicted to go down further in the near future.

Architecture

Present day applications running on traditional databases follow a 3-tier architecture. This is because databases are not capable of doing any calculations that involve complex logic or huge amounts of data; they are only capable of storing the data. There is a need for one more layer between the database and presentation layers—the application layer—to take care of all the calculations, based on which business logics from the base fields are implemented, as shown in the following diagram:

Let us look at each layer in detail:

  • Presentation Layer: This is the top-most layer and allows users to manipulate data so that they can input it for querying. This data input from users is passed on to the database layer through the application layer and the results are passed back to the application layer to implement business logics. The presentation layer can be anything—the web browser, SAP GUI, SAP BEx, SAP Business Objects, and so on. These tools will be installed on individual client machines.

  • Application Layer: This layer is also called the business layer. All the business logic will be executed in this layer. It controls the application's functionality by performing detailed processing. This can be installed on one machine or distributed across more than one system.

  • Database Layer: This layer receives data from the business layer and performs the required operation from the database. It contains database servers that store the data. Data is stored independently of application layers or business logics. The database layer remains as an internal interface and is not exposed to the end users. The application layer has to access the data in the database only through this layer.

See also