Book Image

SAP Data Services 4.x Cookbook

Book Image

SAP Data Services 4.x Cookbook

Overview of this book

Want to cost effectively deliver trusted information to all of your crucial business functions? SAP Data Services delivers one enterprise-class solution for data integration, data quality, data profiling, and text data processing. It boosts productivity with a single solution for data quality and data integration. SAP Data Services also enables you to move, improve, govern, and unlock big data. This book will lead you through the SAP Data Services environment to efficiently develop ETL processes. To begin with, you’ll learn to install, configure, and prepare the ETL development environment. You will get familiarized with the concepts of developing ETL processes with SAP Data Services. Starting from smallest unit of work- the data flow, the chapters will lead you to the highest organizational unit—the Data Services job, revealing the advanced techniques of ETL design. You will learn to import XML files by creating and implementing real-time jobs. It will then guide you through the ETL development patterns that enable the most effective performance when extracting, transforming, and loading data. You will also find out how to create validation functions and transforms. Finally, the book will show you the benefits of data quality management with the help of another SAP solution—Information Steward.
Table of Contents (19 chapters)
SAP Data Services 4.x Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Preface

SAP Data Services delivers an enterprise-class solution to build data integration processes as well as perform data quality and data profiling tasks, allowing you to govern your data in a highly-efficient way.

Some of the tasks that Data Services helps accomplish include: migration of the data between databases or applications, extracting data from various source systems into flat files, data cleansing, data transformation using either common database-like functions or complex custom-built functions that are created using an internal scripting language, and of course, loading data into your data warehouse or external systems. SAP Data Services has an intuitive user-friendly graphical interface, allowing you to access all its powerful Extract, Transform, and Load (ETL) capabilities from the single Designer tool. However, getting started with SAP Data Services can be difficult, especially for people who have little or no experience in ETL development. The goal of this book is to guide you through easy-to-understand examples of building your own ETL architecture. The book can also be used as a reference to perform specific tasks as it provides real-world examples of using the tool to solve data integration problems.

What this book covers

Chapter 1, Introduction to ETL Development, explains what Extract, Transform, and Load (ETL) processes are, and what role Data Services plays in ETL development. It includes the steps to configure the database environment used in recipes of the book.

Chapter 2, Configuring the Data Services Environment, explains how to install and configure all Data Services components and applications. It introduces the Data Services development GUI—the Designer tool—with the simple example of "Hello World" ETL code.

Chapter 3, Data Services Basics – Data Types, Scripting Language, and Functions, introduces the reader to Data Services internal scripting language. It explains various categories of functions that are available in Data Services, and gives the reader an example of how scripting language can be used to create custom functions.

Chapter 4, Dataflow – Extract, Transform, and Load, introduces the most important processing unit in Data Service, dataflow object, and the most useful types of transformations that can be performed inside a dataflow. It gives the reader examples of extracting data from source systems and loading data into target data structures.

Chapter 5, Workflow – Controlling Execution Order, introduces another Data Services object, workflow, which is used to group other workflows, dataflows, and script objects into execution units. It explains the conditional and loop structures available in Data Services.

Chapter 6, Job – Building the ETL Architecture, brings the reader to the job object level and reviews the steps used in the development process to make a successful and robust ETL solution. It covers the monitoring and debugging functionality available in Data Services and embedded audit features.

Chapter 7, Validating and Cleansing Data, introduces the concepts of validating methods, which can be applied to the data passing through the ETL processes in order to cleanse and conform it according to the defined Data Quality standards.

Chapter 8, Optimizing ETL Performance, is one of the first advanced chapters, which starts explaining complex ETL development techniques. This particular chapter helps the user understand how the existing processes can be optimized further in Data Services in order to make sure that they run quickly and efficiently, consuming as less computer resources as possible with the least amount of execution time.

Chapter 9, Advanced Design Techniques, guides the reader through advanced data transformation techniques. It introduces concepts of Change Data Capture methods that are available in Data Services, pivoting transformations, and automatic recovery concepts.

Chapter 10, Developing Real-time Jobs, introduces the concept of nested structures and the transforms that work with nested structures. It covers the mains aspects of how they can be created and used in Data Services real-time jobs. It also introduces new a Data Services component—Access Server.

Chapter 11, Working with SAP Applications, is dedicated to the topic of reading and loading data from SAP systems with the example of the SAP ERP system. It presents the real-life use case of loading data into the SAP ERP system module.

Chapter 12, Introduction to Information Steward, covers another SAP product, Information Steward, which accompanies Data Services and provides a comprehensive view of the organization's data, and helps validate and cleanse it by applying Data Quality methods.

What you need for this book

To use the examples given in this book, you will need to download and make sure that you are licensed to use the following software products:

  • SQL Server Express 2012

  • SAP Data Services 4.2 SP4 or higher

  • SAP Information Steward 4.2 SP4 or higher

  • SAP ERP (ECC)

  • SoapUI—5.2.0

Who this book is for

The book will be useful to application developers and database administrators who want to get familiar with ETL development using SAP Data Services. It can also be useful to ETL developers or consultants who want to improve and extend their knowledge of this tool. The book can also be useful to data and business analysts who want to take a peek at the backend of BI development. The only requirement of this book is that you are familiar with the SQL language and general database concepts. Knowledge of any kind of programming language will be a benefit as well.

Sections

In this book, you will find several headings that appear frequently (Getting ready, How to do it, How it works, There's more, and See also).

To give clear instructions on how to complete a recipe, we use these sections as follows:

Getting ready

This section tells you what to expect in the recipe, and describes how to set up any software or any preliminary settings required for the recipe.

How to do it…

This section contains the steps required to follow the recipe.

How it works…

This section usually consists of a detailed explanation of what happened in the previous section.

There's more…

This section consists of additional information about the recipe in order to make the reader more knowledgeable about the recipe.

See also

This section provides helpful links to other useful information for the recipe.

Conventions

In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "We can include other contexts through the use of the include directive."

A block of code is set as follows:

select * 
from dbo.al_langtext txt
  JOIN dbo.al_parent_child pc
  on txt.parent_objid = pc.descen_obj_key
where  
  pc.descen_obj = 'WF_continuous';

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

AlGUIComment ("ActaName_1" = 'RSavedAfterCheckOut', "ActaName_2" = 'RDate_created', "ActaName_3" = 'RDate_modified', "ActaValue_1" = 'YES', "ActaValue_2" = 'Sat Jul 04 16:52:33 2015', "ActaValue_3" = 'Sun Jul 05 11:18:02 2015', "x" = '-1', "y" = '-1')
CREATE PLAN WF_continuous::'7bb26cd4-3e0c-412a-81f3-b5fdd687f507'( )
DECLARE
  $l_Directory VARCHAR(255) ;
  $l_File VARCHAR(255) ;
BEGIN
 AlGUIComment ("UI_DATA_XML" = '<UIDATA><MAINICON><LOCATION><X>0</X><Y>0</Y></LOCATION><SIZE><CX>216</CX><CY>-179</CY></SIZE></MAINICON><DESCRIPTION><LOCATION><X>0</X><Y>-190</Y></LOCATION><SIZE><CX>200</CX><CY>200</CY></SIZE><VISIBLE>0</VISIBLE></DESCRIPTION></U
IDATA>', "ui_display_name" = 'script', "ui_script_text" = '$l_Directory = \'C:\\\\AW\\\\Files\\\\\';
$l_File = \'flag.txt\';

$g_count = $g_count + 1;

print(\'Execution #\'||$g_count);
print(\'Starting  \'||workflow_name()||\' ...\');
sleep(10000);
print(\'Finishing \'||workflow_name()||\' ...\');', "x" = '116', "y" = '-175')
BEGIN_SCRIPT
$l_Directory = 'C:\\AW\\Files\\';$l_File = 'flag.txt';$g_count = ($g_count + 1);print(('Execution #' || $g_count));print((('Starting  ' || workflow_name()) || ' ...'));sleep(10000);print((('Finishing ' || workflow_name()) || ' ...'));END
END
 SET ("loop_exit" = 'fn_check_flag($l_Directory, $l_File)', "loop_exit
_option" = 'yes', "restart_condition" = 'no', "restart_count" = '10', "restart_count_option" = 'yes', "workflow_type" = 'Continuous')

Any command-line input or output is written as follows:

setup.exe SERVERINSTALL=Yes

New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: "Open the workflow properties again to edit the continuous options using the Continuous Options tab."

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.

To send us general feedback, simply e-mail , and mention the book's title in the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

Downloading the color images of this book

We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from: https://www.packtpub.com/sites/default/files/downloads/6565EN_Graphics.pdf.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at with a link to the suspected pirated material.

We appreciate your help in protecting our authors and our ability to bring you valuable content.

Questions

If you have a problem with any aspect of this book, you can contact us at , and we will do our best to address the problem.