SAP HANA Cookbook

SAP HANA Cookbook

Overview of this book

SAP HANA is a real-time applications platform that provides a multi-purpose, in-memory appliance. Decision makers in the organization can gain instant insight into business operations. Thus all the data available can be analysed and you can react to the changing business conditions rapidly to make decisions. The real-time platform not only empowers business users and top management to make decisions but also provides the capability to make decisions in real-time.A practical and comprehensive guide that helps you understand the power of SAP HANA’s real-time and in-memory capabilities. It also provides step-by-step instructions to exploit all the possible features of the SAP HANA database, enabling users to harness the full potential of this technology and its features.You will gain an understanding of real-time replications, effective data loading from various sources, how to load data, and how to create re-usable objects such as models and reports.Use this practical guide to enable or transform your business landscape by implementing SAP HANA to meet your business requirements. The book shows you how to load data from different types of systems, create models in SAP HANA, and consume data for decision-making. The book covers various tools at different stages creating models using SAP HANA Studio, and consuming data using reporting tools such as SAP BusinessObjects, SAP Lumira, and so on . This book also explains the in-depth architecture of SAP HANA to help you understand SAP HANA as an appliance, that is, a combination of hardware and software.The book covers the best practices to leverage SAP HANA’s in-memory technology to transform data into insightful information. It also covers technology landscaping, solution architecture, connectivity, data loading, and setting up the environment for modeling purpose (including setup of SAP HANA Studio).If you have an intention to start your career as SAP HANA Modeler, this book is the perfect start.

SAP HANA Cookbook

Credits

About the Authors

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

SAP HANA Studio – Look and Feel

Introduction

Understanding SAP HANA Studio

Switching between different views – perspectives

Navigating SAP HANA Studio – the Navigator Pane

Administering SAP HANA – the Administration Console perspective

Modeling SAP HANA Studio – the Modeler perspective

Data Provisioning

Introduction

Loading data into SAP HANA – data provisioning methods

Uploading data from flat files

Using SLT to load data into SAP HANA

Using SAP Data Services as an ETL tool to load data into SAP HANA

Loading data into SAP HANA using DXC

Loading data using SAP Sybase Replication Server

Modeling

Introduction

Approaching SAP HANA modeling

Creating attribute views

Creating analytic views

Creating calculation views

Preparing documents – Auto Documentation

Modeling with Information Composer

Reporting

Introduction

The reporting layer on top of SAP HANA

Connecting reporting tools to SAP HANA

Creating reports using SAP BusinessObjects Web Intelligence

Creating reports using SAP BusinessObjects Explorer

Creating reports using SAP BusinessObjects Dashboards/Xcelsius

Creating reports using SAP BusinessObjects Analysis for OLAP

Creating reports using Microsoft Excel

Creating reports in SAP Lumira

Advanced Features in SAP HANA

Introduction

Converting different currencies

Creating hierarchies

Creating variables

Creating input parameters

Creating filters

Creating procedures using SQLScript

Creating decision tables

User Management

Introduction

Creating users

Creating roles

Assigning roles to users

Restricting access to data – creating analytic privileges

Securing logging in to SAP HANA – authentication methods

Securing logging in to SAP HANA – privileges

Introduction to SAP HANA

Introduction

Explaining traditional databases and bottlenecks

Introducing technology and hardware innovations

Looking into versions and technical requirements

Describing why you should use SAP HANA

Looking into SAP HANA features

Comparing BWA and SAP HANA

Architecture

Understanding the SAP HANA architecture

Explaining IMCE and its components

Storing data – row storage

Storing data – column storage

Understanding the persistence layer

Understanding backup and recovery

Applications Powered by SAP HANA

Introduction

Introducing flavors on top of SAP HANA

Introducing SAP NetWeaver BW powered by SAP HANA

Introducing SAP Business Suite on SAP HANA

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Storing data – row storage

As seen in the architectural diagram of SAP HANA IMCE, there are two relational engines in the heart of the IMCE. These relational engines are in-memory, meaning that their primary data persistence is based in RAM. The row store stores the data in rows, and in this respect, it behaves like a traditional database—except that the data always resides in RAM. The row store engine is highly optimized for write operations and is interfaced from the calculation/execution layer. All the operations on the row tables will be processed by this row engine. When a query is fired on to the SAP HANA database, the optimizer decides in which engine the query has to be executed. For example, there may be some functions that OLAP engine doesn't support, but the row engine does. In that case, the optimizer sends all the data to the row engine and gets the task done. This may be more expensive as the column data has to be converted to row data before it is processed by row engine. One such example is non equi join. Non equi joins will be executed by the row engine only as this is not supported by the column engine.

Now, let us see the internal architecture of the SAP HANA row store engine in the following diagram:

The main functions of the different components are explained as follows:

Transactional Version Memory: This memory sections holds the temporary version of data. All the recent versions of changed records are maintained in this section. This data is required by MVCC. For concurrency control, SAP HANA implements the classic MVCC principle to provide concurrent access to the database. Data reading and writing will happen in parallel from database. When the data is being written and some users are reading the same data, there are fair chances that the data is inconsistent. To avoid this, techniques such as locking and MVCC are implemented.
Locking is an effective way of handling concurrency problem, but takes lot of time. However, MVCC is very effective in handling the latest versions of data. When a query hits the database, the data at that instant of time is displayed. The changes made will not be reflected in the results until the transaction is committed to the database.
When there is a new set of data to be updated, MVCC will not update the old data set. Instead, it marks the old data as outdated and writes a new set of data elsewhere. In this process, there will be many versions of data stored—only one being the latest. Hence, a considerable amount of memory is required to maintain these data versions. MVCC in combination with a time-travel mechanism allows temporal queries inside the relational engine.
Segments: Segments contain the actual data in the form of pages. All the data in the row-store tables are stored in segments, in the form of pages. The concept of linked list is used in storing the memory pages. Linked list is one of the fundamental data structures. The SAP HANA database uses the same concept. The row store tables are linked lists of memory pages. Pages are grouped in segments. The typical size of each page is 16 KB.
Page Manager: Page Manager is responsible for memory allocation. It also keeps track of the used pages and the free pages available.
Version Memory Consolidation: As discussed earlier, different versions of the data are stored in the transactional version memory and MVCC takes care of the data consistency. When a transaction is committed, it has to be stored in a database table, a row table in this case. Version Memory Consolidation takes care of this activity. The recent versions of the changed records are moved from the transaction version memory to the persistent segment on commit-ID basis. After moving the recent version to the persistent segment, all the temporary data and the different versions created by MVCC have to be cleared from the transaction version memory for effective utilization of memory. This activity is also taken care of by Version Memory Consolidation. Hence, Version Memory Consolidation can be considered as garbage collector for MVCC.
Persistence Layer: Persistence Layer is used for writing purposes. It is called in log write operations and checkpoints. All the database logs are maintained by the log replay/undo agent. After the data has been reloaded into the data area of database, it will replay the log from the log backups and the log area. The database will be back online only after these actions are completed.

The redo log information is located in the log backup and in the log area of the database. The recovery process takes care of checking log positions in the data backup after the data area has been restored. In order to replay the logs, the log position must be available either in the log backups or the log area. Also, the system should find the offset on the log. If the backup being used for recovery is not the latest one, we must ensure that the offset needed for the backups is available in the log backups or the log area. Unless the required offset is present, log replay cannot be performed.

During recovery, if the system cannot find the log offset in the log area, we see an error message log and data must be compatible. In this error situation, we must use the clear log option during to get the system online again. Any logs in the log area are ignored during the log replay phase. Even if the replay of the log area is not performed, the system ends up in a consistent data state. The data area holds all the undo log information, and it is reloaded into the area during recovery. The replication server won't have a restart point if the log replay has not taken place. When this situation occurs, it is essential to refer to the replication server documentation for information on how to solve this problem.

If we perform a recovery without implicit log replay, the log area is formatted. The log backups are replayed, but not the logs in the log area. In this situation, the .ini files can be recovered. On the other hand, their recovery is not important. If the .ini files are recovered, parameter changes made after the backup will not be recovered; therefore they are lost.

When we use the clear log option, the following actions will be performed:

The data changes made after the back up will be lost; as the log entries get cleared from the system, there is no more information available to perform redo
The transactions that are not yet committed in the backup area will be rolled back (undo)

Only when the log replay of the log area cannot take place, the clear log option has to be used as an exception.

The following are examples of situations where the log replay may not be possible:

When the log area is corrupted and the log information is no longer available
A log backup is missing, which links the latest available log backup to the log area
While performing a disaster recovery if the log available in the log backups and the log in the log area are not compatible

Let us complete our learning about all the components of the row store engine:

Write Operations: When there are any write operations, they mainly go to the Transaction Version Memory. Here, all the versions are maintained by MVCC and finally written to Persisted Segment. The Insert operation also writes the data to Persisted Segment.
Persisted Segment: Persisted Segment contains data that is used in ongoing active transactions and data that has been committed before any active transaction was started.
Index: Each row-store table has a primary index. ROW ID is a number that specifies the memory segment and page for each record. Primary index maps Primary Key of the table to ROW ID. ROW ID contains the segment address and the offset. To locate a record, combination of segment address and segment offset is used. The formula becomes Segment Address + Segment Offset. The memory page for a table record can be obtained. A structure called ROW ID contains the segment and the page for the record. The page can then be searched for the records based on Primary Key. As mentioned earlier, ROW ID is a part of the primary index of the table.

Indices are never persistent. They are always stored in the memory only. When tables are loaded into the memory on system start up, indices for all the row tables are filled. They are never stored permanently.

We can create secondary indices if required. It is better to go with row storage in the following situations:

It is recommended when the tables contain a low volume of data
It is used when the application request has to access the entire row
It is used when the data has to be processed record by record

For more information, refer to the following link:

http://scn.sap.com/community/developer-center/hana/blog/2012/08/16/in-a-relationship-with-hana--part-3

SAP HANA Cookbook

SAP HANA Cookbook

Overview of this book

Related Content you might be interested in

Current Title:

SAP HANA Cookbook

Storing data – row storage