Book Image

Oracle Information Integration, Migration, and Consolidation

Book Image

Oracle Information Integration, Migration, and Consolidation

Overview of this book

The book covers data migration, data consolidation, and data integration, the three scenarios that are typically part of the information integration life cycle. Organizations typically find themselves migrating data to Oracle and either later, or at the same time, consolidating multiple database instances into a single global instance for a department, or even an entire company. The business savings and technical benefits of data consolidation cannot be overlooked, and this book will help you to use Oracle's technology to achieve these goals. This highly practical and business-applicable book will teach you to be successful with the latest Oracle data and application integration, migration, information life-cycle management, and consolidation products and technologies.In this book, you will gain hands-on advice about data consolidation, integration, and migration using tools and best practices. Along the way you will leverage products like Oracle Data Integrator, Oracle GoldenGate, and SQL Developer, as well as Data Hubs and 11gR2 Database. The book covers everything from the early background of information integration and the impact of SOA, to products like Oracle GoldenGate and Oracle Data Integrator. By the end you'll have a clear idea of where information and application integration is headed and how to plan your own projects.
Table of Contents (17 chapters)
Oracle Information Integration, Migration, and Consolidation
Credits
About The Author
About the Contributing Authors
About the Reviewers
www.PacktPub.com
Preface

The real world: studies in integration


In the following sections, we will take a look at several businesses and their challenges for integration. Such solutions as replication, integration with non-Oracle sources, and queuing will be discussed.

Banking case

Gruppo Sanpaolo d'Intermediazione Mobiliare is Italy's second largest bank and among the top 50 banks worldwide. The investment banking arm of the group is Banca d'Intermediazione Mobiliare (Banca IMI). In addition to servicing the other parts of the Sanpaolo Group, Banca IMI provides investment banking services to a wide range of institutions, including other banks, asset managers for major global corporations, and other financial institutions. In the process of buying and selling a variety of financial instruments, Banca IMI must communicate with all of the major exchanges (for example, the New York Stock Exchange). The volume of its daily trades can frequently scale up to hundreds of thousands.

Because its business operations are global, its IT systems must operate 24/7, and transactions with both internal application systems and external trading centers must be processed with minimum error and as close to real time as possible. This is a very demanding business application and a complex example of data integration. The main applications with which Banca IMI must communicate include its own backend administrative systems and legacy applications, the financial exchanges, domestic and international customers, and financial networks. Not only do all of these have different protocols and formats, but there are also differences just within the financial exchanges themselves. The latter is handled through a marketing interface layer with custom software for each exchange. Domestic customers are connected through open standard protocols (financial information exchange standards, FIX) and are used with international customers and other financial institutions.

Banca IMI uses Oracle Streams (Advanced Queuing) to coordinate transactions across all these stakeholders. Streams is a fully integrated feature of the Oracle database, and takes full advantage of Oracle's security, optimization, performance, and scalability. Streams can accommodate Banca IMI's very high level of transactions close to real time and still perform all the transformations required to communicate in the various protocols needed. Scalability is very important to Banca IMI and the primary reason why it moved to Oracle Streams from a previous solution that relied on another vendor's product. "We're happy about what Advanced Queuing offers us", says Domenico Betunio, Banca IMI's manager of electronic trading. "Besides speed and scalability, we're impressed by messaging reliability, and especially auditing".

Education case

The Hong Kong Institute of Education (HKIEd) is a leading teacher education institution in the Hong Kong Special Administrative Region (HKSAR). The institute was formally established by statute in April 1994 by uniting the former Northcote College of Education, Grantham College of Education, Sir Robert Black College of Education, the Hong Kong Technical Teachers' College, and the Institute of Languages in Education, the earliest of which was started in 1939. The Institute plays a key role in helping the Hong Kong government fulfill its commitments: to develop new curriculum; to achieve its goal of an 'all graduate all trained' teaching profession; and to provide for the continuous professional development of all serving teachers. The Institute is organized around four schools with a current enrollment of nearly 7,000 students in a variety of daytime and evening degree programs. The Institute staff exceeds 1,000, almost 400 of whom are teaching staff. Across the Institute, more than 200 funded research and development projects are being actively pursued. The two main languages the Institute must accommodate are English and Traditional Chinese.

Soon after its founding, HKIEd began in-house development of several administrative applications, all running in conjunction with Sybase databases. These applications included student admission, enrollment and profiling, human resources and payroll, smart card management, and a library interface. In 2002, HKIEd purchased a set of packaged applications: Banner, from SCT. SCT's Banner system runs on Oracle and is analogous to an enterprise resource planning system. It supports student services, admission, enrollment, and finance functions, some of which it took over from the in house Sybase applications. The Sybase in-house applications still account for roughly 50 percent of the Institute's administrative applications, including classroom booking, HR, payroll, JUPAS student selection, smart card management, and the library INNOPAC system.

As these vital Institute administrative systems are on two platforms, Sybase and Oracle, there is a requirement to keep the data consistent in more than ten common data fields. They account for less than 5 percent of the total number of fields, which is still significant. Each database contains more than 10,000 records. The number of daily transactions affecting these data fields ranges from 200 to 5,000.

After experimenting with SQL Loader scripts to update the common fields using a batch upload process, HKIEd switched to using the Oracle Transparent Gateway. Since January 2003, HKIEd has been using the gateway to access and update the Sybase database from Oracle to keep the common data fields in sync in real-time. HKIEd has created views in the Oracle database based on a distributed join of tables from the Oracle and Sybase databases. This enables SCT's Banner system to transparently access and update fields in the Oracle and Sybase databases. The system automatically performs a two-phase commit to preserve transactional consistency across the two databases. The Transparent Gateway has NLS support, enabling access to Sybase data in any character set.

High technology case

This case illustrates the use of Oracle Streams for information integration, load balancing, and consolidation.

Internet Securities, Inc. (ISI), a Euromoney Institutional Investor Company is the pioneering publisher of Internet-delivered emerging market news and information. Internet Securities (www.securities.com) provides hard-to-get information through its network of 20 offices in 19 countries, covering 45 national markets in Asia, Central and Eastern Europe, and Latin America. Its flagship product, the Emerging Markets Information Service aggregates and produces unique company and industry information including financial, economic and political news, for delivery to professionals over the Internet. The subscription-based service enables users to access and search through a comprehensive range of unique business information derived directly from over 6,800 leading local and international sources. Primarily because of its international clientele, the operations of ISI are run on a 24/7 basis. ISI has offices in 18 locales around the globe, with clients in each locale. Its provisioning operations are centralized and located, along with its headquarters, in New York City.

ISI's content is also global and emphasizes information about emerging markets, which in this context means markets in countries like Romania, Brazil, or China. The content being aggregated arrives in automated feeds of various forms at an average rate of 50,000 documents a day, with hourly arrival rates ranging from 100 to several thousand per hour. All documents, regardless of the source language, are converted to a single encoding standard (UTF8) when being loaded into the ISI document base.

One of ISI's competitive differentiators is that information is retained regardless of age. The size of the ISI content base has grown rapidly to over one terabyte and, in tandem, the level of query activity has grown as well. Until recently, daily operations were run on NT-based systems using Oracle. The need for high scalability and availability while superseding performance prompted ISI to migrate its database operations onto Solaris using Oracle9. Oracle Streams was selected to achieve a major increase in availability and performance. Higher availability is obtained by fully replicating to a secondary server and by being able to perform much faster backups. The performance improvements come from load balancing between the servers and the upgraded hardware.

Overall, Oracle Streams is used for three databases supporting ISI operations. Each database is replicated to a secondary server. The three are:

  • Back-office database (100 GB): This database supports the company's proprietary CRM and authentication systems.

  • Document database (50 GB): This contains metadata about documents and true paths to the physical location of documents.

  • Search database (1 TB): This is the database in which the documents are loaded and where the client queries are executed. Documents are stored as BLOBS. Each record is one document.

The replication for each database is such that either replica can service the functions of both, but for performance (load balancing) and administrative reasons, both are typically serving different operational needs. For example, ISI call center agents primarily use 'Backoffice A' while 'Backoffice B' services all of the client activity monitoring. In the context of the Document database, 'Documents A' is the production machine, and 'Documents B' is the standby. In the case of the huge Search database, 'Search A' is used to perform the document loading that occurs in hourly batches, and 'Search B' is used to receive the users' queries that can be executed on either Search database server.

ISI expected to obtain benefits in two ways: performance and availability. It had evaluated other possible solutions, but these fell short in the performance dimension. Oracle Streams did not. With the Streams-based solution, queries are executing in half the time or better. With respect to availability, switchover in case of a failure is now instantaneous. With NT, ISI was using a physical standby that could be switched over in the best case in 7 to 8 minutes for read-only, and 10 to 15 minutes for read/write transactions.