Book Image

SAP Data Services 4.x Cookbook

Book Image

SAP Data Services 4.x Cookbook

Overview of this book

Want to cost effectively deliver trusted information to all of your crucial business functions? SAP Data Services delivers one enterprise-class solution for data integration, data quality, data profiling, and text data processing. It boosts productivity with a single solution for data quality and data integration. SAP Data Services also enables you to move, improve, govern, and unlock big data. This book will lead you through the SAP Data Services environment to efficiently develop ETL processes. To begin with, you’ll learn to install, configure, and prepare the ETL development environment. You will get familiarized with the concepts of developing ETL processes with SAP Data Services. Starting from smallest unit of work- the data flow, the chapters will lead you to the highest organizational unit—the Data Services job, revealing the advanced techniques of ETL design. You will learn to import XML files by creating and implementing real-time jobs. It will then guide you through the ETL development patterns that enable the most effective performance when extracting, transforming, and loading data. You will also find out how to create validation functions and transforms. Finally, the book will show you the benefits of data quality management with the help of another SAP solution—Information Steward.
Table of Contents (19 chapters)
SAP Data Services 4.x Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Introduction


Simply put, Extract-Transform-Load (ETL) is an engine of any data warehouse. The nature of the ETL system is straightforward:

  • Extract data from operational databases/systems

  • Transform data according to the requirements of your data warehouse so that the different pieces of data can be used together

  • Apply data quality transformation methods in order to cleanse data and ensure that it is reliable before it gets loaded into a data warehouse

  • Load conformed data into a data warehouse so that end users can access it via reporting tools, using client applications directly, or with the help of SQL-based query tools

While your data warehouse delivery structures or data marts represent the frontend or, in other words, what users see when they access the data, the ETL system itself is a backbone backend solution that does all the work of moving data and getting it ready in time for users to use. Building the ETL system can be a really challenging task, and though it is not part of the data warehouse data structures, it is definitely the key factor in defining the success of the data warehouse solution as a whole. In the end, who wants to use a data warehouse where the data is unreliable, corrupted, or sometimes even missing? This is exactly what ETL is responsible for getting right.

The following data structure types most often used in ETL development to move data between sources and targets are flat files, XML datasets, and DBMS tables, both in normalized schemas and dimensional data models. When choosing an ETL solution, you might face two simple choices: building a handcoded ETL solution or using a commercial one.

The following are some advantages of a handcoded ETL solution:

  • A programming language allows you to build your own sophisticated transformations

  • You are more flexible in building the ETL architecture as you are not limited by the vendor's ETL abilities

  • Sometimes, it can be a cheap way of building a few simplistic ETL processes, whereas buying an ETL solution from a vendor can be overkill

  • You do not have to spend time learning the commercial ETL solution's architecture and functionality

Here are some advantages of a commercial ETL solution:

  • This is more often a simpler, faster, and cheaper development option as a variety of existing tools allow you to build a very sophisticated ETL architecture quickly

  • You do not have to be a professional programmer to use the tool

  • It automatically manages ETL metadata by collecting, storing, and presenting it to the ETL developer, which is another important aspect of any ETL solution

  • It has a huge range of additional ready-to-use functionality, from built-in schedulers to various connectors to existing systems, built-in data lineages, impact analysis reports, and many others

In the majority of DWH projects, the commercial ETL solution from a specific vendor, in spite of the higher immediate cost, eventually saves you a significant amount of money on the development and maintenance of ETL code.

SAP Data Services is an ETL solution provided by SAP and is part of the Enterprise Information Management product stack, which also includes SAP Information Steward; we will review this in one of the last chapters of this book.