Book Image

Getting Started with Oracle Data Integrator 11g: A Hands-On Tutorial

Book Image

Getting Started with Oracle Data Integrator 11g: A Hands-On Tutorial

Overview of this book

Oracle Data Integrator (ODI) is Oracle's strategic data integration platform for high-speed data transformation and movement between different systems. From high-volume, SOA-enabled data services, to trickle operations ñ ODI is a cutting-edge platform that offers heterogeneous connectivity, enterprise-level deployment, and strong administrative, diagnostic, and management capabilities. "Getting Started with Oracle Data Integrator 11g: A Hands-On Tutorial" provides you with everything you to get up and running with Oracle Data Integrator, and more! Following an example scenario, the book covers essential information about the ODI architecture and using ODI across different databases (Oracle, Microsoft SQL Server and MySQL), and file types such as XML, before covering Orchestrating Data Integration Workflows, Error Management, Operational Management and Monitoring, and beyond. "Getting Started with Oracle Data Integrator 11g: A Hands-On Tutorial" begins by describing the Oracle Data Integrator architecture and teaching you to install the product following best practices. You'll then be introduced to some of the key concepts of ODI such as the Knowledge Modules. Later topics include moving and transforming data from sources to targets including the Oracle Database, Microsoft SQL Server, MySQL, Flat files, and XML files, each with illustrated hands-on examples for the different technologies. Your learning experience will be made all the more rich with chapters introducing, explaining and leveraging additional ODI functionality such as variables, reusable procedures, temporary indexes and more. Finally ODI's workflow and task orchestration capabilities are explained before introducing you to Error Management with ODI's built-in 'error hospital' and 'error recycling' capabilities for non-compliant data, not to mention tackling ODI Studio, ODI Console and Enterprise Manager Fusion Middleware Control.
Table of Contents (21 chapters)
Getting Started with Oracle Data Integrator 11g: A Hands-On Tutorial
Credits
Foreword
About the Authors
About the Reviewers
www.PacktPub.com
Preface
10
Creating Workflows—Packages and Load Plans
13
Concluding Remarks
Index

ODI key concepts


Understanding key concepts in ODI will help developers take advantage of the graphical interface and further improve their productivity. In no specific order, we will now review the notions of Execution Contexts, Knowledge Modules, Models, Interfaces, Packages, and Scenarios.

Execution Contexts

Everyone encounters the same issue. The parameters used to connect to the development server are different from the parameters used in the QA or production servers and there could be more than these few environments. Some companies add environments such as user acceptance, code consolidation, and pre-production; you name it and it is there! Maintaining the connection parameters is a cumbersome activity. Beyond the maintenance itself, there is a risk if these parameters are modified in the code after the code has been validated by the QA team. Ideally, connection parameters and environment-specific parameters should be maintained independently of the code. This is what ODI provides with the notion of Execution Contexts.

To make sure that developers are independent of the physical location of the different systems, ODI enforces the use of Logical Schemas. Logical Schemas are labels or aliases that represent the connections for the developers. At execution time, the agent will translate these logical names into physical ones based on the information stored in the repository. This way, the maintenance of the connection parameters, location of the databases, and schema names is entirely independent of the code itself.

In addition, whenever an ODI process is executed, a Context must be selected. The structure of the metadata will always be the same from one environment to the next (say for instance, development, QA, and production) but the connection information will be different. By selecting an execution context, the agent will know which connection definition to use when completing the code.

In the previous figure, the logical name Datawarehouse is actually pointing to the physical schema db_dwh in the development server as long as we execute the transformations in the development context. It will automatically point to the dwh schema in the QA server for the QA context.

The physical details are maintained in the Physical Architecture of the Topology Navigator and the logical names that point to the physical implementations are defined in the Logical Architecture of the Topology Navigator. The Context accordion lets administrators define as many contexts as needed. At run time, the selection of a Context will define which physical implementation to use for each logical name used in the code.

The physical architecture will contain the details for ODI to log in to the database, the IP address or name of the server on which the database runs, the port numbers to connect to the database, as well as the name of the actual schemas, catalogs, libraries, and other objects found in these databases.

Knowledge Modules

A Knowledge Module (KM) is a template of code that defines what types of SQL queries (insert, select, and so on) and what scripts need to be generated to complete the data extraction, loading, and transformation operations.

There were several objectives behind the inception of these KMs. Some of them are as follows:

  • Improve productivity: Integration tasks are very repetitive in nature. Once the extract/load techniques have been defined for a given project, they will be reused over and over again. Consolidating these practices in reusable templates helps improve developers' productivity.

  • Encapsulate integration best practices: Even if a team agrees on integration best practices, there is usually no guarantee that all developers will implement what has been decided. In other cases, only the best developers will come up with the best integration strategies. By encapsulating the integration strategies in a template, all developers can share the same quality of code.

  • Allow for further customizations: Oracle is doing its best to encapsulate industry best practices in all KMs no matter what database you are using—but your environment is unique, and at times only you can further optimize the code that will be generated. Again, to make sure that all developers will benefit from the ultimate optimizations, customized KMs will be shared by all developers.

Typically, an ETL developer will be able to use a Knowledge Module without any need to understand the underlying requirements (creation of staging tables, location of these tables, intermediate operations on these tables, cleanup of these tables, and so on).

The developer will control what code will be generated by setting yes/no options to insert records, perform updates, create the target table, and so on.

There are six types of Knowledge Modules that will enable various steps in the data integration process:

  • Loading Knowledge Modules (LKM): These modules determine how to extract and load data across two heterogeneous technologies. Different Knowledge Modules will offer different connection strategies such as JDBC connection, and database load/unload utilities.

  • Integration Knowledge Modules (IKM): These modules define the strategy used to integrate the data into the target. The different strategies include inserts only, update and inserts, slowly changing dimension, and so on.

  • Check Knowledge Modules (CKM): These modules implement in-line data quality control before insertion into the target such as validate business rules, control the uniqueness of primary keys, and validate the existence of parent records for foreign keys.

  • Reverse-engineering Knowledge Modules (RKM): These modules import the metadata for a given technology into the ODI repository. These KMs are usually used for enterprise applications rather than databases. Most reverse-engineering operations will not require any KM since the JDBC drivers leveraged by ODI usually support reverse-engineering operations.

  • Journalizing Knowledge Module (JKM): This module manages the ODI journals and necessary CDC mechanisms to provide automatic Changed Data Capture (CDC) functionality.

  • Service Knowledge Modules (SKM): These modules automatically generate the code and expose the database table operations as web services. Insert, update, delete, select, or consumption of data changes can then be performed by using these web services.

Knowledge Modules are named after the specific database for which they have been optimized, the utilities that they leverage, and the technique that they implement. For instance, an IKM Teradata to File (TTU) will move data from Teradata into a flat file, and leverage the TTU utilities for that operation, or an LKM File to Oracle (EXTERNAL TABLE) will expose a flat file as an external table for Oracle. Similarly, an IKM Oracle Slowly Changing Dimension will generate code optimized for the Oracle database which implements a Slowly Changing Dimension (Type 2) type of integration.

Most developers will only use out of the box KMs—only the most advanced developers will modify the code templates. For that reason, we will keep KM modifications outside the scope of this book. But the examples we will go through in the rest of this book will help you understand how best to select a KM for the task at hand.

Models

Models in ODI are used to store the metadata imported from the databases. When developers (or data custodian) create a model, they have the choice of importing only the relevant metadata (you may not want to import the definition of all tables and views in your environment). As we have seen earlier with Knowledge Modules for reverse-engineering, metadata can also be imported from applications, where objects can be a business representation of the data rather than an actual physical table.

Once the metadata has been imported, it can be organized and enhanced—Models can be grouped in folders. Submodels can be created within Models to organize tables in logical units.

When submodels are created, the users can define how tables will be organized; either they will be manually moved into the appropriate subfolders, or their location will be determined automatically by ODI based on their names.

Metadata can be enhanced by adding more constraints (such as referential integrity, check constraints) that would not exist in the database.

Metadata can also be directly created in ODI. Each model has a Diagrams folder. Diagrams let users graphically design their table structures. More importantly, they can drag-and-drop object definitions from other technologies, and ODI will automatically translate the datatypes from one technology to the next.

If tables are derived from another technology, they can still be manually modified (by changing the column names, datatypes, adding or removing columns, and constraints). The hidden benefit of the use of diagrams to create tables from other Models is the ability to automate the generation of Interfaces.

ODI knows what table was used as a basis for the new one. If you right-click on the model name, and select the menu option Generate Interfaces IN, ODI will automatically create an interface with the original table as a source and the new table as a target with all columns properly mapped. You will be prompted to place the interface in the project and folder of your choice. We will take a detailed look at Interfaces in the next section. If you choose to select the menu option Generate Interfaces OUT, ODI will generate the reverse interface, with the new table as a source and the original one as a target.

Interfaces

Interfaces are where the transformations are defined. We will spend a lot of time in this book covering the details of Interfaces. Interfaces are created in projects, and the various components are organized in Packages.

There are multiple parts to an interface such as description, mappings, flow, controls, to name a few.

Interface descriptions

Often overlooked, descriptions are the documentation of the objects. Because the descriptions are stored with the objects in the repository, all documentation generated by ODI will contain the descriptions. It may sound trivial, but the descriptions are key to the success of your projects.

Interface mappings

ODI adheres to a concept called "declarative design". The definition of the transformation logic is isolated from the definition of the data movement and integration logic for two reasons which are as follows:

  • Change to one should not alter the other: If I decide to add or remove columns in my mappings, it should have no impact on the type of technology I use (external tables, bulk load, JDBC, and so on) or the type of integration strategy I use (inserts only, updates, and so on). Similarly if I decide to change technology or integration strategy, this should have no impact on my transformations.

  • Productivity can be greatly improved: Mappings are often unique and must be handled one column at a time. Integration strategies and technologies can be selected once for a complete project with minimal changes and reviews. Separating the two allows developers to focus more on the moving parts (the transformations) than on the stable parts (the integration strategy).

Mappings are where the transformation logic is defined.

Interface flow tab

The flow tab of the interfaces is where the integration strategies are selected. The Loading Knowledge Modules (LKM) will define which techniques must be used to extract data from remote source systems and how to load the data in the target system. The Integration Knowledge Module (IKM) will define which integration strategy will be used to integrate the data in the target system.

Interfaces will only list the KMs that have been imported into the project where they are created (and starting with ODI 11.1.1.6, the Global KMs that are shared across projects). One key benefit is that it is possible to control which strategies will be used for a given project, hence limiting the choices of the developers to the choices made by the team.

This will ensure the following:

  • All developers implement the same best practice

  • If changes are required later in the project, they can be done extremely efficiently by changing the standard for a given project

  • Ideally, because the choice can be limited, developers know that ODI will always choose the KM they need and do not even have to worry about that step

Interface controls

One option that is available with some IKMs is the ability to control data integrity on the fly. There are two options called Flow Control and Static Control. Flow Control will check for data inconsistencies, remove invalid data from the staging tables, and write them to a separate error table. Static Control will check for inconsistencies in the target table after it has been loaded with all incoming records. Invalid records identified in the target table are then copied to the error table but they are not removed from the target table. When developers decide to activate one or both options in the Flow tab of the interface, they can then refine what will be done—the error table can be recreated automatically. Some constraints can be ignored while others are enforced.

All constraints listed here are defined at the metadata level, under the definition of the tables in the Models.

Packages and Scenarios

Packages are designed to orchestrate the individual objects created by the developers—interfaces, variables, procedures will be sequenced in Packages. If a step fails in a package, it is possible to branch out and immediately take action based on the errors that are detected. For execution purposes, Packages are compiled into Scenarios. Scenarios execution can then be organized with Load Plans. We will see all these concepts in more detail in a dedicated chapter.