Book Image

Getting Started with Oracle Data Integrator 11g: A Hands-On Tutorial

Book Image

Getting Started with Oracle Data Integrator 11g: A Hands-On Tutorial

Overview of this book

Oracle Data Integrator (ODI) is Oracle's strategic data integration platform for high-speed data transformation and movement between different systems. From high-volume, SOA-enabled data services, to trickle operations ñ ODI is a cutting-edge platform that offers heterogeneous connectivity, enterprise-level deployment, and strong administrative, diagnostic, and management capabilities. "Getting Started with Oracle Data Integrator 11g: A Hands-On Tutorial" provides you with everything you to get up and running with Oracle Data Integrator, and more! Following an example scenario, the book covers essential information about the ODI architecture and using ODI across different databases (Oracle, Microsoft SQL Server and MySQL), and file types such as XML, before covering Orchestrating Data Integration Workflows, Error Management, Operational Management and Monitoring, and beyond. "Getting Started with Oracle Data Integrator 11g: A Hands-On Tutorial" begins by describing the Oracle Data Integrator architecture and teaching you to install the product following best practices. You'll then be introduced to some of the key concepts of ODI such as the Knowledge Modules. Later topics include moving and transforming data from sources to targets including the Oracle Database, Microsoft SQL Server, MySQL, Flat files, and XML files, each with illustrated hands-on examples for the different technologies. Your learning experience will be made all the more rich with chapters introducing, explaining and leveraging additional ODI functionality such as variables, reusable procedures, temporary indexes and more. Finally ODI's workflow and task orchestration capabilities are explained before introducing you to Error Management with ODI's built-in 'error hospital' and 'error recycling' capabilities for non-compliant data, not to mention tackling ODI Studio, ODI Console and Enterprise Manager Fusion Middleware Control.
Table of Contents (21 chapters)
Getting Started with Oracle Data Integrator 11g: A Hands-On Tutorial
Credits
Foreword
About the Authors
About the Reviewers
www.PacktPub.com
Preface
10
Creating Workflows—Packages and Load Plans
13
Concluding Remarks
Index

Chapter 1. Product Overview

The purpose of ETL (Extract, Load, Transform) tools is to help with the consolidation of data that is dispersed throughout the information system. Data is stored in disparate applications, databases, files, operating systems, and in incompatible formats. The consequences of such a dispersal of the information can be dire, for example, different business units operating on different data will show conflicting results and information cannot be shared across different entities of the same business.

Imagine the marketing department reporting on the success of their latest campaign while the finance department complains about its lack of efficiency. Both have numbers to back up their assertions, but the numbers do not match!

What could be worse than a shipping department that struggles to understand customer orders, or a support department that cannot confirm whether a customer is current with his/her payment and should indeed receive support? The examples are endless.

The only way to have a centralized view of the information is to consolidate the data—whether it is in a data warehouse, a series of data marts, or by normalizing the data across applications with master data management (MDM) solutions. ETL tools usually come into play when a large volume of data has to be exchanged (as opposed to Service-Oriented Architecture infrastructures for instance, which would be more transaction based).

In the early days of ETL, databases had very weak transformation functions. Apart from using an insert or a select statement, SQL was a relatively limited language. To perform heavy duty, complex transformations, vendors put together transformation platforms—the ETL tools.

Over time, the SQL language has evolved to include more and more transformation capabilities. You can now go as far as handling hierarchies, manipulating XML formats, using analytical functions, and so on. It is not by chance that 50 percent of the ETL implementations in existence today are done in plain SQL scripts—SQL makes it possible.

This is where the ODI ELT architecture (Extract-Load-Transform—the inversion in the acronym is not a mistake) comes into play. The concept with ELT is that instead of extracting the data from a source, transforming it with a dedicated platform, and then loading into the target database, you will extract from the source, load into the target, then transform into the target database, leveraging SQL for the transformations.

To some extent, ETL and ELT are marketing acronyms. When you look at ODI for instance, it can perform transformations on the source side as well as on the target side. You can also dedicate some database or schema for the staging and transformation of your data, and can have something more similar to an ETL architecture. Similarly, some ETL tools all have the ability to generate SQL code and to push some transformations at the database level.

The key differences then for a true ELT architecture are as follows:

  • The ability to dynamically manage a staging area (location, content, automatic management of table alterations)

  • The ability to generate code on source and target systems alike, in the same transformation

  • The ability to generate native SQL for any database on the market—most ETL tools will generate code for their own engines, and then translate that code for the databases—hence limiting their generation capacities to their ability to convert proprietary concepts

  • The ability to generate DML and DDL, and to orchestrate sequences of operations on the heterogeneous systems

In a way, the purpose of an ELT tool is to provide the comfort of a graphical interface with all the functionality of traditional ETL tools, to keep the efficiency of SQL coding with set-based processing of data in the database, and limiting the overhead of moving data from place to place.

In this chapter we will focus on the architecture of Oracle Data Integrator 11g, as well as the key concepts of the product. The topics we will cover are as follows:

  • The elements of the architecture, namely, the repository, the Studio, the Agents, the Console, and integration into Oracle Enterprise Manager

  • An introduction to key concepts, namely, Execution Contexts, Knowledge Modules, Models, Interfaces, Packages, Scenarios, and Load Plans