Book Image

Getting Started with Oracle Data Integrator 11g: A Hands-On Tutorial

Book Image

Getting Started with Oracle Data Integrator 11g: A Hands-On Tutorial

Overview of this book

Oracle Data Integrator (ODI) is Oracle's strategic data integration platform for high-speed data transformation and movement between different systems. From high-volume, SOA-enabled data services, to trickle operations ñ ODI is a cutting-edge platform that offers heterogeneous connectivity, enterprise-level deployment, and strong administrative, diagnostic, and management capabilities. "Getting Started with Oracle Data Integrator 11g: A Hands-On Tutorial" provides you with everything you to get up and running with Oracle Data Integrator, and more! Following an example scenario, the book covers essential information about the ODI architecture and using ODI across different databases (Oracle, Microsoft SQL Server and MySQL), and file types such as XML, before covering Orchestrating Data Integration Workflows, Error Management, Operational Management and Monitoring, and beyond. "Getting Started with Oracle Data Integrator 11g: A Hands-On Tutorial" begins by describing the Oracle Data Integrator architecture and teaching you to install the product following best practices. You'll then be introduced to some of the key concepts of ODI such as the Knowledge Modules. Later topics include moving and transforming data from sources to targets including the Oracle Database, Microsoft SQL Server, MySQL, Flat files, and XML files, each with illustrated hands-on examples for the different technologies. Your learning experience will be made all the more rich with chapters introducing, explaining and leveraging additional ODI functionality such as variables, reusable procedures, temporary indexes and more. Finally ODI's workflow and task orchestration capabilities are explained before introducing you to Error Management with ODI's built-in 'error hospital' and 'error recycling' capabilities for non-compliant data, not to mention tackling ODI Studio, ODI Console and Enterprise Manager Fusion Middleware Control.
Table of Contents (21 chapters)
Getting Started with Oracle Data Integrator 11g: A Hands-On Tutorial
Credits
Foreword
About the Authors
About the Reviewers
www.PacktPub.com
Preface
10
Creating Workflows—Packages and Load Plans
13
Concluding Remarks
Index

ODI product architecture


Since ODI is an ELT tool, it requires no other platform than the source and target systems. But there still are ODI components to be deployed: we will see in this section what these components are and where they should be installed.

The components of the ODI architecture are as follows:

  • Repository: This is where all the information handled by ODI is stored, namely, connectivity details, metadata, transformation rules and scenarios, generated code, execution logs, and statistics.

  • Studio: The Studio is the graphical interface of ODI. It is used by administrators, developers, and operators.

  • Agents: The Agents can be seen as orchestrators for the data movement and transformations. They are very lightweight java components that do not require their own server—we will see in detail where they can be installed.

  • Console: The Console is a web tool that lets users browse the ODI repository, but it is not a tool used to develop new transformations. It can be used by operators though to review code execution, and start or restart processes as needed.

  • The Oracle Enterprise Manager plugin for ODI integrates the monitoring of ODI components directly into OEM so that administrators can consolidate the monitoring of all their Oracle products in one single graphical interface.

    At a high level, here is how the different components of the architecture interact with one another. The administrators, developers, and operators typically work with the ODI Studio on their machine (operators also have the ability to use the Console for a more lightweight environment). All Studios typically connect to a shared repository where all the metadata is stored. At run time, the ODI Agent receives execution orders (from the Studio, or any external scheduler, or via a Web Service call). At this point it connects to the repository, retrieves the code to execute, adds last minute parameters where needed (elements like connection strings, schema names where the data resides, and so on), and sends the code to the databases for execution. Once the databases have executed the code, the agent updates the repository with the status of the execution (successful or not, along with any related error message) and the relevant statistics (number of rows, time to process, and so on).

Now let's look into the details of each component.

ODI repository

To store all its information, ODI requires a repository. The repository is by default a pair of schemas (called Master and Work repositories) stored in a database. Unless ODI is running in a near real time fashion, continuously generating SQL code for the databases to execute the code, there is no need to dedicate a database for the ODI repository. Most customers leverage existing database installations, even if they create a dedicated tablespace for ODI.

Repository overview

The only element you will never find in the repository is the actual data processed by ODI. The data will be in the source and target systems, and will be moved directly from source to target. This is a key element of the ELT architecture. All other elements that are handled through ODI are stored into the repository. An easy way to remember this is that everything that is visible in the ODI Studio is stored in the repository (except, of course, for the actual data), and everything that is saved in the ODI Studio is actually saved into the repository (again, except for the actual data).

The repository is made of two entities which can be separated into two separate database schemas, namely, the Master repository and the Work repository.

We will look at each one of these in more detail later, but for now you can consider that the Master repository will host sensitive data whereas the Work repository will host project-related data. A limited version of the Work repository can be used in production environments, where the source code is not needed for execution.

Repository location

Before going into the details of the Master and Work repositories, let's first look into where to install the repository.

The repository is usually installed in an existing database, often in a separate tablespace. Even though ODI is an Oracle product, the repository does not have to be stored in an Oracle database (but who would not use the best database in the world?). Generally speaking, the databases supported for the ODI repository are Oracle, Microsoft SQL Server, IBM/DB2 (LUW and iSeries), Hypersonic SQL, and Sybase ASE. Specific versions and platforms for each database are published by Oracle and are available at:

http://www.oracle.com/technetwork/middleware/ias/downloads/fusion-certification-100350.html.

It is usual to see the repository share the same system as the target database.

We will now look into the specifics of Master and Work repositories.

Master repository

As stated earlier, the Master repository is where the sensitive data will be stored. This information is of the following types:

  • All the information that pertains to ODI users privileges will be saved here. This information is controlled by administrators through the Security Navigator of the ODI Studio. We will learn more about this navigator when we look into the details of the Studio.

  • All the information that pertains to connectivity to the different systems (sources and targets), and in particular the requisite usernames and passwords, will be stored here. This information will be managed by administrators through the Topology Navigator.

  • In addition, whenever a developer creates several versions of the same object, the subsequent versions of the objects are stored in the Master repository. Versioning is typically accessed from the Designer Navigator.

Work repository

Work repositories will store all the data that is required for the developers to design their data transformations. All the information stored in the Work repository is managed through the Designer Navigator and the Operator Navigator. The Work repository contains the following components:

  • The Metadata that represents the source and target tables, files, applications, message buses. These will be organized in Models in the Designer Navigator.

  • The transformation rules and data movement rules. These will be organized in Interfaces in the Designer Navigator.

  • The workflows designed to orchestrate the transformations and data movement. These are organized in Packages and Load Plans in the Designer Navigator.

  • The jobs schedules, if the ODI Agent is used as the scheduler for the integration tasks. These can be defined either in the Designer Navigator or in the Operator Navigator.

  • The logs generated by ODI, where the generated code can be reviewed, along with execution statistics and statuses of the different executions (running, done successfully or in error, queued, and so on). The logs are accessed from the Operator Navigator.

Execution repository

In a production environment, most customers do not need to expose the source code for the processes that are running. Modifications to the processes that run in production will have to go through a testing cycle anyway, so why store the source code where one would never access it? For that purpose, ODI proposes an execution repository that only stores the operational metadata, namely, generated code, execution results, and statistics. The type of Work repository (execution or development) is selected at installation time. A Work repository cannot be converted from development to execution or execution to development—a new installation will be required if a conversion is needed.

Lifecycle management and repositories

We now know that there will be different types of repositories. All enterprise application development teams have more than one environment to consider. The code development itself occurs in a development environment, the validation of the quality of the code is typically done in a test environment, and the production environment itself will have to be separate from these two. Some companies will add additional layers in this lifecycle, with code consolidation (if remote developers have to combine code together), user acceptance (making sure that the code conforms to user expectations), and pre-production (making sure that everything works as expected in an environment that perfectly mimics the production environment).

In all cases, each environment will typically have a dedicated Work repository. The Master repository can be a shared resource as long as no network barrier prevents access from Master to Work repository. If the production environment is behind a firewall for instance, then a dedicated Master repository will be required for the production environment.

The exchange of metadata between repositories can be done in one of the following ways:

  • Metadata can be exchanged through versioning. All different versions of the objects are uploaded to the Master repository automatically by ODI as they are created. These versions can later be restored to a different Work repository attached to the same Master repository.

  • All objects can be exported as XML files, and XML files can be used to import the exported objects into the new repository. This will be the only option if a firewall prevents connectivity directly to a central Master repository.

In the graphical representations shown previously, the leftmost repository is obviously our development repository, and the rightmost repository is the production repository. Why are we using an execution for the test environment? There are two rationales for this. They are as follows:

  • There is no point in having the source code in the test repository, the source code can always be retrieved from the versioning mechanisms.

  • Testing should not be limited to the validation of the artifacts concocted by the developers; the process of migrating to production should also be validated. By having the same setup for our test and production environments, we ensure that the process of going from a development repository to an execution repository has been validated as well.

Studio

The ODI Studio is the graphical interface provided to all users to interact with ODI.

People who need to use the Studio usually install the software on their own machine and connect to a shared repository. The only exception would be when the repository is not on the same LAN as the Studio. In that case, most customers use Remote Terminal Service technologies to ensure that the Studio is local to the repository (same LAN). Only the actual display is then sent over the WAN.

The Studio includes four navigators that are typically used by different users who can share the same objects and the same repository. Some users may not have access to some navigators, depending on their security profiles. The navigators are as follows:

  • Security Navigator: This navigator is typically used by system administrators, security administrators, and DBAs. Through this interface, they can assign roles and privileges to the different users, making sure that they can only view and modify objects that they are allowed to handle.

  • Topology Navigator: This navigator is usually restricted to DBAs and System administrators. Through this interface, they declare the systems where the data resides (sources, targets, references, and so on), along with the credentials that ODI will use to connect to these systems. Developers and operators will leverage the information stored in the repository, but will not necessarily have the right to modify, or even view that information. They will be provided with a name for the connections and this is all they will need. We will see this in more detail when we address logical schemas.

  • Designer Navigator: This navigator is used by developers and data custodians alike. Metadata are imported and enriched through this navigator. The metadata is then used to define the transformations in objects called Interfaces. The Interfaces are finally orchestrated in workflows called Packages.

  • Operator Navigator: This navigator is used by developers and operators. In a development environment, developers will use the Operator views to check on the code generated by ODI, to debug their transformations, and to validate and understand performance of their developments. In a production environment, operators use this same navigator to view which processes are running, to check whether processes are successful or not, and to check on the performance of the processes that are running.

Agent

The ODI Agent is the component that will orchestrate all the operations. If SQL code must be executed by a database (source or target), the agent will connect to that database and will send the code (DDL and DML, as needed) for that database to perform the transformations. If utilities must be used as part of the transformations (or, more likely, as part of the data transfer) then the agent will generate whatever configuration files or parameter files are required for the utility, and will invoke this utility with the appropriate parameters—SQL Loader, BCP, Multiload, and NZload are just a small list of such utilities.

There are two types of ODI Agent, namely, the standalone agent (available in all releases of ODI) and the JEE agent (available with ODI 11g and after) that runs on top of WebLogic Server. Each type has its own benefits, and both types of agents can co-exist in the same environment:

  • The JEE agent will take advantage of Weblogic in terms of high availability and pooling of the connections.

  • The standalone agents are very lightweight and can easily be installed on any platform. They are small Java applications that do not require a server.

A common configuration is to use the JEE agent as a "Master" agent, whose sole purpose it is to distribute execution requests across several child agents. These children can very well be standalone agents. The master agent will know at all times which children are up or down. The master agent will also balance the load across all child agents.

In a pure standalone environment, the Agent is often installed on the target server. Agents are also often installed on file servers, where they can leverage database loading utilities to bulk load data into the target systems. Load balancing can also be done with a standalone master agent. Multiple standalone agents can run on the same server, as long as they each have a dedicated port. This port number is defined in the Topology navigator, where the agent is defined.

The Agent can receive execution orders from different origins as follows:

  • Execution from the Studio: When a user executes a process from the Studio, he/she is prompted for the name of the agent that will be in charge of the execution.

  • Execution from the Console: Similarly to the Studio execution, the person requesting a process execution will have to choose the Agent in charge.

  • Execution from a command line: In this case, ODI will start a dedicated session of the agent, limited to the execution of the process that is passed as a parameter. The script to be used to start a process from a command line is startscen.bat on Windows or startscen.sh on Unix. This script can be found under the /bin directory under the agent installation path.

  • Execution from a web service: ODI 10g offered this feature but required a dedicated setup. ODI 11g offers this feature as part of the agent deployment. All agents support web services to start processes. For a standalone agent, connect to the agent via HTTP to view the associated WSDL. For instance, if the agent is running on server odi_dev on port 20910, the wsdl can be found on this very machine at http://odi_dev:20910/oraclediagent/OdiInvoke?wsdl.

    The application name for a standalone agent will always be oraclediagent. Customers using a JEE agent will use the application name for the ODI Agent.

  • ODI Schedules: If ODI processes are scheduled from within ODI (from the Operator navigator or the Designer navigator) then the schedule itself is associated with an agent. Either the schedules will be uploaded to the agent by an administrator, or the agent will refresh its list of schedules when it is restarted.

Console

The Console is an HTML interface to the repository. The Console is installed on a WebLogic Server (other application servers will be supported with later releases of the product).

The Console can be used to browse the repository, but no new developments can be created through this interface.

The Console is useful for viewing lineage and impact analysis without having the full Studio installed on a machine. Operators can also perform most of the tasks they would perform with the Studio, including starting or restarting processes.

The exact information that is available in the Operator Navigator of the Studio will be found in the matching view of the Console: generated code, execution statistics, and statuses of executed processes are all available.

Oracle Enterprise Manager

As part of the consolidation of features across all Oracle product lines, ODI now integrates with WebLogic Enterprise Manager.

Administrators can now use one single tool (OEM) to monitor the overall health of their environment, including ODI Agents and ODI processes.