8. Optimizing ETL Performance | SAP Data Services 4.x Cookbook

Sign In Start Free Trial

Book Overview & Buying
Table Of Contents

SAP Data Services 4.x Cookbook

By : Shomnikov

4 (1)

SAP Data Services 4.x Cookbook

4 (1)

By: Shomnikov

Overview of this book

Want to cost effectively deliver trusted information to all of your crucial business functions? SAP Data Services delivers one enterprise-class solution for data integration, data quality, data profiling, and text data processing. It boosts productivity with a single solution for data quality and data integration. SAP Data Services also enables you to move, improve, govern, and unlock big data. This book will lead you through the SAP Data Services environment to efficiently develop ETL processes. To begin with, you’ll learn to install, configure, and prepare the ETL development environment. You will get familiarized with the concepts of developing ETL processes with SAP Data Services. Starting from smallest unit of work- the data flow, the chapters will lead you to the highest organizational unit—the Data Services job, revealing the advanced techniques of ETL design. You will learn to import XML files by creating and implementing real-time jobs. It will then guide you through the ETL development patterns that enable the most effective performance when extracting, transforming, and loading data. You will also find out how to create validation functions and transforms. Finally, the book will show you the benefits of data quality management with the help of another SAP solution—Information Steward.

Preface

Preface

What this book covers

What you need for this book

Who this book is for

Sections

Conventions

Reader feedback

Customer support

Free Chapter

1. Introduction to ETL Development

1. Introduction to ETL Development

Introduction

Preparing a database environment

Creating a source system database

Defining and creating staging area structures

Creating a target data warehouse

2. Configuring the Data Services Environment

2. Configuring the Data Services Environment

Introduction

Creating IPS and Data Services repositories

Installing and configuring Information Platform Services

Installing and configuring Data Services

Configuring user access

Starting and stopping services

Administering tasks

Understanding the Designer tool

3. Data Services Basics – Data Types, Scripting Language, and Functions

3. Data Services Basics – Data Types, Scripting Language, and Functions

Introduction

Creating variables and parameters

Creating a script

Using string functions

Using date functions

Using conversion functions

Using database functions

Using aggregate functions

Using math functions

Using miscellaneous functions

Creating custom functions

4. Dataflow – Extract, Transform, and Load

4. Dataflow – Extract, Transform, and Load

Introduction

Creating a source data object

Creating a target data object

Loading data into a flat file

Loading data from a flat file

Loading data from table to table – lookups and joins

Using the Map_Operation transform

Using the Table_Comparison transform

Exploring the Auto correct load option

Splitting the flow of data with the Case transform

Monitoring and analyzing dataflow execution

5. Workflow – Controlling Execution Order

5. Workflow – Controlling Execution Order

Introduction

Creating a workflow object

Nesting workflows to control the execution order

Using conditional and while loop objects to control the execution order

Using the bypassing feature

Controlling failures – try-catch objects

Use case example – populating dimension tables

6. Job – Building the ETL Architecture

6. Job – Building the ETL Architecture

Introduction

Projects and jobs – organizing ETL

Using object replication

Migrating ETL code through the central repository

Migrating ETL code with export/import

Debugging job execution

Monitoring job execution

Building an external ETL audit and audit reporting

Using built-in Data Services ETL audit and reporting functionality

Auto Documentation in Data Services

7. Validating and Cleansing Data

7. Validating and Cleansing Data

Introduction

Creating validation functions

Using validation functions with the Validation transform

Reporting data validation results

Using regular expression support to validate data

Enabling dataflow audit

Data Quality transforms – cleansing your data

8. Optimizing ETL Performance

8. Optimizing ETL Performance

Introduction

Optimizing dataflow execution – push-down techniques

Optimizing dataflow execution – the SQL transform

Optimizing dataflow execution – the Data_Transfer transform

Optimizing dataflow readers – lookup methods

Optimizing dataflow loaders – bulk-loading methods

Optimizing dataflow execution – performance options

9. Advanced Design Techniques

9. Advanced Design Techniques

Introduction

Change Data Capture techniques

Automatic job recovery in Data Services

Simplifying ETL execution with system configurations

Transforming data with the Pivot transform

10. Developing Real-time Jobs

10. Developing Real-time Jobs

Introduction

Working with nested structures

The XML_Map transform

The Hierarchy_Flattening transform

Configuring Access Server

Creating real-time jobs

11. Working with SAP Applications

Introduction

Loading data into SAP ERP

12. Introduction to Information Steward

12. Introduction to Information Steward

Introduction

Exploring Data Insight capabilities

Performing Metadata Management tasks

Working with the Metapedia functionality

Creating a custom cleansing package with Cleansing Package Builder

Index

Index

Optimizing dataflow execution – the Data_Transfer transform

The transform object Data_Transfer is a pure optimization tool helping you to push down resource-consuming operations and transformations like JOIN and GROUP BY to the database level.

Getting ready

Take the dataflow from the Loading data from a flat file recipe in Chapter 4, Dataflow – Extract, Transform, and Load. This dataflow loads the Friends_*.txt file into a STAGE.FRIENDS table.
Modify the Friends_30052015.txt file and remove all lines except the ones about Jane and Dave.
In the dataflow, add another source table, OLTP.PERSON, and join it to a source file object in the Query transform by the first-name field. Propagate the PERSONTYPE and LASTNAME columns from the source OLTP.PERSON table into the output Query transform schema, as shown here:

How to do it…

Our goal will be to configure this new dataflow to push down the insert of the joined dataset of data coming from the file and data coming from the OLTP.PERSON table to a database...

CONTINUE READING

83

Tech Concepts

36

Programming languages

73

Tech Tools

Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

50+ new titles added per month and exclusive early access to books as they are being written.

SAP Data Services 4.x Cookbook

Search

Your notes and bookmarks