Book Image

Getting Started with Oracle Data Integrator 11g: A Hands-On Tutorial

Book Image

Getting Started with Oracle Data Integrator 11g: A Hands-On Tutorial

Overview of this book

Oracle Data Integrator (ODI) is Oracle's strategic data integration platform for high-speed data transformation and movement between different systems. From high-volume, SOA-enabled data services, to trickle operations ñ ODI is a cutting-edge platform that offers heterogeneous connectivity, enterprise-level deployment, and strong administrative, diagnostic, and management capabilities. "Getting Started with Oracle Data Integrator 11g: A Hands-On Tutorial" provides you with everything you to get up and running with Oracle Data Integrator, and more! Following an example scenario, the book covers essential information about the ODI architecture and using ODI across different databases (Oracle, Microsoft SQL Server and MySQL), and file types such as XML, before covering Orchestrating Data Integration Workflows, Error Management, Operational Management and Monitoring, and beyond. "Getting Started with Oracle Data Integrator 11g: A Hands-On Tutorial" begins by describing the Oracle Data Integrator architecture and teaching you to install the product following best practices. You'll then be introduced to some of the key concepts of ODI such as the Knowledge Modules. Later topics include moving and transforming data from sources to targets including the Oracle Database, Microsoft SQL Server, MySQL, Flat files, and XML files, each with illustrated hands-on examples for the different technologies. Your learning experience will be made all the more rich with chapters introducing, explaining and leveraging additional ODI functionality such as variables, reusable procedures, temporary indexes and more. Finally ODI's workflow and task orchestration capabilities are explained before introducing you to Error Management with ODI's built-in 'error hospital' and 'error recycling' capabilities for non-compliant data, not to mention tackling ODI Studio, ODI Console and Enterprise Manager Fusion Middleware Control.
Table of Contents (21 chapters)
Getting Started with Oracle Data Integrator 11g: A Hands-On Tutorial
Credits
Foreword
About the Authors
About the Reviewers
www.PacktPub.com
Preface
10
Creating Workflows—Packages and Load Plans
13
Concluding Remarks
Index

Preface

Oracle Data Integrator—background and history

Oracle has been a leading provider of database, data warehousing, and other data management technologies for over 30 years. More recently it has also become a leading provider of standards-based integration, Service-oriented architecture (SOA) and Business Process Automation technologies (also known as Middleware), Big Data, and Cloud solutions. Data integration technologies are at the heart of all these solutions. Beyond the technical solutions, adopting and using ODI allows IT to cross the chasm between business requirements and data integration challenges.

In July 2010, the 11gR1 release of Oracle Data Integrator was made available to the marketplace. Oracle Data Integrator 11g (referred to in the rest of this book as ODI) is Oracle's strategic data integration platform. Having roots from the Oracle acquisition of Sunopsis in October 2006, ODI is a market leading data integration solution with capabilities across heterogeneous IT systems. Oracle has quickly and aggressively invested in ODI to provide an easy-to-use and comprehensive approach for satisfying data integration requirements within Oracle software products. As a result, there are dozens of Oracle products such as Hyperion Essbase, Agile PLM, AIA Process Integration Packs, and Business Activity Monitor (BAM) that are creating an explosive increase in the use of ODI within IT organizations. If you are using Oracle software products and have not heard of or used ODI yet, one thing is sure—you soon will!

This book is not meant to be used as a reference book—it is a means to accelerate your learning of ODI 11g. When designing the book, the following top-level objectives were kept in mind:

  • To highlight the key capabilities of the product in relation to data integration tasks (loading, enrichment, quality, and transformation) and the productivity achieved by being able to do so much work with heterogeneous datatypes while writing so little SQL

  • To select a sample scenario that was varied enough to do something useful and cover the types of data sources and targets customers are using most frequently (multiple flavors of relational database, flat files, and XML data) while keeping it small enough to provide an ODI accelerated learning experience

  • To ensure that where possible within our examples, we examine the new features and functionality introduced with version 11g—the first version of ODI architected, designed, and implemented as part of Oracle

Data integration usage scenarios

As seen in the following figure, no matter what aspect of IT you work on, all have a common element among them, that is, Data Integration. Everyone wants their information accessible, up-to-date, consistent, and trusted.

Data warehouses and BI

Before you can put together the advanced reporting metrics required by the different entities of your enterprise, you will have to consolidate, rationalize, and organize the data. Operational systems are too busy serving their customers to be overloaded by additional reporting queries. In addition, they are optimized to serve their applications—not for the purposes of analytics and reporting.

Data warehouses are often time-designed to support reporting requirements. Integrating data from operational systems into data warehouses has traditionally been the prime rationale for investing in integration technologies: disparate and heterogeneous systems hold critical data that must be consolidated; data structures have to be transposed and reorganized. Data Integrator is no exception to the rule and definitely plays a major role in such initiatives.

Throughout this book, we will cover data integration cases that are typical of integration requirements found in a data warehousing environment.

Service-oriented architecture (SOA)

Service-oriented architecture encourages the concept of service virtualization. As a consequence, the actual physical location of where data requests are resolved is of less concern to consumers of SOA-based services. The SOA implementations rely on large amounts of data being processed so that the services built on top of the data can serve the appropriate information. ODI plays a crucial role in many SOA deployments as it seamlessly integrates with web services. We are not focusing on the specifics of web services in this book, but all the logic of data movement and transformations that ODI would perform when working in a SOA environment would remain the same as the ones described in this book.

Applications

More and more applications have their own requirements in terms of data integration. As such, more and more applications utilize a data integration tool to perform all these operations: the generated flows perform better, are easier to design and to maintain. It should be no surprise then that ODI is used under the covers by dozens of applications. In some cases, the ODI code is visible and can be modified by the users of the applications. In other cases, the code is operating "behind the scenes" and does not become visible.

In all cases though, the same development best practices, and design rules are applied. For the most part, application developers will use the same techniques and best practices when using ODI. And if you have to customize these applications, the lessons learned from this book will be equally useful.

Master Data Management

The rationale for Master Data Management (MDM) solutions is to normalize data definitions. Take the example of customer references in an enterprise for instance. The sales application has a definition for customers. The support application has its own definition, so do the finance application, and the shipping application. The objective of MDM solutions is to provide a single definition of the information, so that all entities reference the same data (versus each having their own definition). But the exchange and transformation of data from one environment to the next can only be done with a tool like ODI.

Big Data

The explosion of data in the information age is offering new challenges to IT organizations, often referenced as Big Data. The solutions for Big Data often rely on distributed processing to reduce the complexity of processing gigantic volumes of data. Delegating and distributing processing is what ODI does with its ELT architecture. As new implementation designs are conceived, ODI is ready to endorse these new infrastructures. We will not look into Big Data implementations with ODI in this book, but you have to know that ODI is ready for Big Data integration as of its 11.1.1.6 release.

What this book covers

The number one goal of this book is to get you familiar, comfortable, and successful with using Oracle Data Integrator 11gR1. To achieve this, the largest part of the book is a set of hands-on step-by-step tutorials that build a non-trivial Order Processing solution that you can run, test, monitor, and manage.

Chapter 1, Product Overview, gets you up to speed quickly with the ODI 11g product and terminology by examining the ODI 11g product architecture and concepts.

Chapter 2, Product Installation, provides the necessary instructions for the successful download, installation, and configuration of ODI 11g.

Chapter 3, Using Variables, is a chapter that can be read out of sequence. It covers variables in ODI, a concept that will allow you to have very dynamic code. We will mention variables in the subsequent chapters, so having this reference early can help.

Chapter 4, ODI Sources, Targets, and Knowledge Modules, is a general introduction to the key features of ODI Studio. It will also explain how they map onto core concepts and activities of data integration tasks, such as sources, targets and how data flows between them.

Chapter 5, Working with Databases, is the first chapter that will show how to use ODI Studio to work with databases: how to connect to the databases, how to reverse-engineer metadata, how to design transformations, and how to review the executions. This chapter will specifically concentrate on connecting to Oracle databases, and will be a baseline for chapters 6 to 9.

Chapter 6, Working with MySQL, will introduce the requirements of working with a different technology: MySQL. We will expand on the techniques covered in the previous chapter with a description of how to incorporate joins, lookups, and aggregations in the transformations.

Chapter 7, Working with Microsoft SQL Server, will expand the examples with use of yet another database, this time Microsoft SQL Server. It will focus on possible alteration to transformations: Is the code executed on the source, staging area, or target? When making these choices, where is the code generated in the Operator? We will also detail how to leverage the ODI Expression editor to write the transformations, and how to have ODI create a temporary index to further improve integration performance.

Chapter 8, Integrating File Data, will introduce the notion of flat files and will focus on the differences between flat files and databases.

Chapter 9, Working with XML Files, will focus on a specific type of file, that is XML files. This chapter will show how easy it is with ODI to parse XML files with standard SQL queries.

Chapter 10, Creating Workflows—Packages and Load Plans, will show you how to orchestrate your work and go beyond the basics of integration.

Chapter 11, Error Management, will explore in depth the subject of error management: data error versus process errors, how to trap them, and how to handle them.

Chapter 12, Managing and Monitoring ODI Components, will conclude with the management aspect of the processes, particularly with regard to to scheduling of the jobs designed with ODI.

If it is not obvious by the time you finish reading this book, we really like ODI 11gR1. Those feelings have been earned by rock solid architecture choices and an investment level that allows innovation to flourish—from new agent clustering and manageability features to integrating with any size of system, including the largest data warehouses using Oracle, Exadata, Teradata, and others from files to in-memory data caches.

What you need for this book

If you want to follow the examples in your own environment, you'll need:

  • Oracle Data Integrator 11g

  • Oracle database (10g or 11g)

  • Microsoft SQL Server (2005 or 2008)

  • MySQL 5 and higher

  • RCU (Oracle Repository Creation Utility) and Java 1.6 (needed for the Oracle Universal Installer that installs ODI)

Who this book is for

This book is intended for those who are interested in, or responsible for, the content, freshness, movement, access to, or integration with data. Job roles that are a likely match include ETL developers, Data Warehouse Specialists, Business Intelligence Analysts, Database Administrators, Database Programmers, Enterprise, or Data Architect, among others.

Those interested in, or responsible for, data warehouses, data marts, operational data stores, reporting and analytic servers, bulk data load/movement/transformation, real-time Business Intelligence, and/or MDM will find this material of particular interest.

No prior knowledge or experience with Oracle Data Integrator is required or assumed. However, people with experience in programming with SQL or developing ETL processes with other products will better understand how to achieve the same tasks—hopefully being more productive and with better performance.

Who this book is not for

This book is not for someone looking for a tutorial on SQL and/or relational database concepts. It is not a book on advanced features of ODI, or advanced integration techniques using ODI.

Conventions

In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and an explanation of their meaning.

Code words in text are shown as follows: "We'll be integrating data into the PURCHASE_ORDER table in the data mart".

A block of code is set as follows:

<?xml version="1.0" encoding="UTF-8"?>
<Building>
  <StreetAddr>32 Lincoln Road</StreetAddr>
  <Locality>Olton</Locality>
  <City>Birmingham</City>
  <StateOrProv>West Midlands</StateOrProv>
  <PostCode>B27 6PA</PostCode>
  <CountryCode>44</CountryCode>
</Building>

Any command-line input or output is written as follows:

OdiFileCopy -FILE=c:/po/input/order_20001.xml-TOFILE=c:/po/input/single_po.xml -CASESENS=yes

New terms and important words are shown in bold. Words that you see on the screen, in menus or dialog boxes for example, appear in the text like this: "Next we click on the browse icon to the right of the JDBC Url field to open the URL examples dialog".

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.

To send us general feedback, simply send an e-mail to , and mention the book title through the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you would report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/support, selecting your book, clicking on the errata submission form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website, or added to any list of existing errata, under the Errata section of that title.

Piracy

Piracy of copyright material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works, in any form, on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at with a link to the suspected pirated material.

We appreciate your help in protecting our authors, and our ability to bring you valuable content.

Questions

You can contact us at if you are having a problem with any aspect of the book, and we will do our best to address it.