Book Image

SAP Data Services 4.x Cookbook

Book Image

SAP Data Services 4.x Cookbook

Overview of this book

Want to cost effectively deliver trusted information to all of your crucial business functions? SAP Data Services delivers one enterprise-class solution for data integration, data quality, data profiling, and text data processing. It boosts productivity with a single solution for data quality and data integration. SAP Data Services also enables you to move, improve, govern, and unlock big data. This book will lead you through the SAP Data Services environment to efficiently develop ETL processes. To begin with, you’ll learn to install, configure, and prepare the ETL development environment. You will get familiarized with the concepts of developing ETL processes with SAP Data Services. Starting from smallest unit of work- the data flow, the chapters will lead you to the highest organizational unit—the Data Services job, revealing the advanced techniques of ETL design. You will learn to import XML files by creating and implementing real-time jobs. It will then guide you through the ETL development patterns that enable the most effective performance when extracting, transforming, and loading data. You will also find out how to create validation functions and transforms. Finally, the book will show you the benefits of data quality management with the help of another SAP solution—Information Steward.
Table of Contents (19 chapters)
SAP Data Services 4.x Cookbook
About the Author
About the Reviewers

Optimizing dataflow execution – the Data_Transfer transform

The transform object Data_Transfer is a pure optimization tool helping you to push down resource-consuming operations and transformations like JOIN and GROUP BY to the database level.

Getting ready

  1. Take the dataflow from the Loading data from a flat file recipe in Chapter 4, Dataflow – Extract, Transform, and Load. This dataflow loads the Friends_*.txt file into a STAGE.FRIENDS table.

  2. Modify the Friends_30052015.txt file and remove all lines except the ones about Jane and Dave.

  3. In the dataflow, add another source table, OLTP.PERSON, and join it to a source file object in the Query transform by the first-name field. Propagate the PERSONTYPE and LASTNAME columns from the source OLTP.PERSON table into the output Query transform schema, as shown here:

How to do it…

Our goal will be to configure this new dataflow to push down the insert of the joined dataset of data coming from the file and data coming from the OLTP.PERSON table to a database...