Book Image

Oracle 11g Streams Implementer's Guide

Book Image

Oracle 11g Streams Implementer's Guide

Overview of this book

From smaller businesses through to huge enterprises ñ users all over the world often require access to data 24 hours a day. Distributed database systems proliferate the world of data sharing providing an ability to access real-time data anywhere, anytime. Oracle Streams, a built-in feature of the Oracle database, is a data replication and integration feature critical to the success and wellbeing of enterprises in today's fast moving economy. This book provides the reader with solid techniques to master Oracle Streams technology and successfully deploy distributed database systems. This book quickly goes over the basics and gets you up and running with a simple Oracle 11g Streams environment. It will serve as an excellent companion to the Oracle Streams Administration Guide. It is intended for Oracle database architects and administrators, and provides in-depth discussion on must-know information for the design, implementation, and maintenance of an Oracle Streams environment. The book does not attempt to regurgitate all the information in the Oracle Streams Administration Guides, but rather provides additional clarification and explanation of design, implementation, and troubleshooting concepts that are often elusive in Streams documentation. It also identifies helpful tools and Oracle resources to add to your knowledge base, as well as tried and tested tricks and tips to help you tame Oracle Streams. The book starts by introducing and explaining the components of Oracle Streams and how they work together. It then moves on logically, helping you to determine your distributed environment requirements and design your Streams implementation to meet those requirements. Once these concepts are discussed, the book moves to configuration and basic implementation examples to help solidify those concepts. It then addresses advanced features such as tags, down-stream capture, and conflict resolution. You then move on to maintenance techniques such as documenting the environment, effectively planning and implementing changes to the environment, and monitoring and troubleshooting the environment. When you have studied the techniques and completed the hands-on examples, you will have an understanding of Oracle Streams' core concepts and functionally that will allow you to successfully design, implement, and maintain an Oracle Streamed environment.
Table of Contents (14 chapters)
Oracle 11g Streams Implementer's Guide
Credits
About the Authors
About the Reviewers
Preface

Preface

This Preface and the entire book are a little bit different and that is by design. Both authors wrote this book understanding that our target audience often does not have time to read a whole book, or the Oracle documentation, from cover to cover. As such, we wrote this book with the idea that the table of contents and headings should tell you exactly what is being covered. Bullet lists will be used to quickly highlight key points where appropriate. Where concepts need to be explained in more detail, a supporting narrative is supplied. Another difference is that we make multiple references to Oracle documentation rather than attempting to rewrite everything. This is also by design. Having seen Oracle documentation evolve over the years, both authors, and our publisher, recognize the intrinsic value of getting specific detailed information straight from the "horse's mouth". To promote the development of overall expertise, we focus on helping our readers effectively use all the tools available. The Oracle documentation is one of your most valuable tools. At times, Oracle documentation can be difficult to follow or find information within, but once you develop an expertise in using the documentation, the expertise in the functionality is not far behind. The focus of this book is not to replace the Oracle documentation, but rather to be a quick reference companion to the Oracle documentation.

Replication in general

The concept of replication is simply to duplicate. Birds do it, bees do it, and even cells do it. However, replication is not limited to the biological world. Accurately duplicating data from which information is derived is the foundation of human communication. Whether that data be the words or gestures used to convey a story that is handed down from generation to generation, or the numbers used to quantify the quantifiable, or the grouping of on/off bits stored in a computer file; humans have been replicating data since they discovered the need to communicate.

Now that we have evolved into the wonderful age of computerized technology, we recognize the limitless advantages of sharing data, and the need to accurately and efficiently duplicate and distribute that data.

Distributed database systems

We all know that a database is a collection of data objects that are typically accessed through a client/server architecture, and where the database is the server.

We also know that client/server architecture uses a network communication channel that allows the client to send or get data to/from the database. The client can be local (on the same computer as the database), or it can be remote (on a different computer than the database). Either way, the client uses some type of network connection to access the database.

The sharing of data between two or more databases constitutes a distributed database system (even if the databases reside on the same computer). Distributed database systems can be homogeneous (all on the same platform, such as Oracle) or heterogeneous (two or more platforms, such as Oracle, MS SQL Server, SYBASE, and so on.) These systems can utilize a number of data distribution methods (unidirectional, bidirectional, read-only, synchronous, and asynchronous). The glue that holds this all together is the network and database links between the various databases.

A database link is a one-way communication channel from one database (source) to another database (target) that allows the source database to access the objects in the target database.

Key terms that have been discussed and should be understood here are: database link, communication channel, and network connections. These all work to provide connectivity in a distributed system. It is very important to understand that network connectivity makes or breaks a distributed system. No network connection means no data distribution. An unstable network means unstable data distribution.

Now that you have a distributed database system, add client applications that access one or more databases in that distributed database system, and voila! You have a full-blown distributed system.

What is Data Replication?

Data Replication is literally the act of accomplishing data object changes throughout a distributed system. Period. Replication can be manual, or it can be automated. Automated is the preferred mode of 3 out of 4 DBA's surveyed (we do not really count the 4th, he's semi-retired and has nothing better to do).

How do "Replication" and "Distributed Systems" interact?

Replication makes data located in different databases available to all databases within the distributed system. So replication is the method behind a distributed system. It moves the data around to different sites.

Note

Databases within a distributed system are often referred to as sites. As mentioned earlier, databases can be physically co-located on the same computer, but the databases themselves could still be referred to as separate sites. The term 'site' is more of a logical distinction, than a physical distinction.

Why would we want to replicate?

There are a number of reasons to replicate data, but it is a good bet that they all boil down to increased availability. This means that the same data is available at different sites, and the flow of data between those sites is automated. Replication supports increased availability by providing the following:

  • Change consistency: Ensures that all sites get the same change.

  • Mass deployment/disconnected computing: Data can be sent to secondary computers (laptops, desktops) so that it is available when these devices might be offline.

  • Faster access: Load balancing is the art of distributing client connections over multiple databases. This comes in really handy when the system has a large number of users, and even more so if those users are geographically separated from the system databases. The user just connects the geographically closest database. Network load can also be reduced by directing traffic over different routers for different database sites.

  • Survivability: Data is still accessible if one site fails.

Note

When not to use replication for survivability purposes

If the need is to only support survivability and data changes made at a single site, there are better tools to use to support survivability that require a little less configuration, maintenance, and monitoring. For example: Data Guard!

Replication architecture

Replication architecture refers to the overall structure of the replicated environment. This includes what is replicated between the sites and the role of each site. The following terms are used to make these distinctions:

Master table/object: A table or object that is replicated to another database. A replicated table can be a master table for a snapshot/materialized view, or a table that is duplicated at a remote site. For tables, both the structure and the data are replicated. For non-table objects, the object definition is replicated.

Master/Source site: A database which hosts master tables/objects. The tables can be a master table for a snapshot/materialized view, or a table that is replicated to a remote master site.

Secondary/Target site: A database which hosts replicated objects to which changes are sent by a master site. This can be another master site, or a materialized view site. The expectation of a secondary site is that if a data conflict occurs when attempting to apply the change from the sending master site, the conflicting secondary site data is always replaced by the values from the sending master site.

Replication methods

A replication method describes how data is replicated between sites. This can be broken down into commit synchronization and directional flows.

Commit synchronization flow refers to when changes are committed at and between sites. There are two methods of commit synchronization; synchronous and asynchronous.

Synchronous replication requires that all sites be able to commit the change before it is committed at the originating site. If any site is not able to commit the change, the change is rolled back at all sites, including the originating site. This requires all database sites in the distributed system to be writable over network connections. The nature of synchronous replication keeps the data at all sites synchronized, thus (at least theoretically), eliminating the need for conflict resolution. Synchronous is used for real-time, mission-critical replication.

Asynchronous replication allows the transaction to be committed at the originating site regardless of whether it is successfully committed at the other target sites in the distributed system. In this method, if the commit is successful at the originating site, appropriate deferred transactions for each target site are created and stored to be propagated and applied at a later time (keep in mind "a later time" can be as little as a few seconds). This allows work to continue at the originating site even if the changes cannot be applied to the other sites within the distributed system immediately. This does, however, open up the possibility of data divergence, and requires some form of conflict resolution (manual or automated) to be implemented should divergence occur.

Replication from one site to another can only be synchronous or asynchronous. It cannot be both (in other words, it is mutually exclusive).

Directional flow refers to the direction in which changes are passed between two sites.

Unidirectional means that data changes only flow one way. In this case, changes are made at a primary master and are sent to a secondary site. Direct changes made at secondary sites are either not allowed, or not sent to the primary master site. If changes are made at a secondary site that causes data divergence from the primary master database, subsequent changes from the primary master will either fail due to the data differences, or overwrite that change if conflict resolution mechanisms are in place. Read-only snapshots are an example of unidirectional replication.

Bidirectional (N-Way) replication means that data changes can flow to and from sites within a distributed system. Changes can be made at any master or updateable snapshot site. These changes are then propagated to all other sites. If the bidirectional replication is asynchronous it can lead to data divergence, and requires some form of conflict resolution (manual or automated) to be implemented, should divergence occur. Master-to-Master and Updateable Snapshots are examples of bidirectional replication.

Replication of an object between two sites can only be unidirectional or bidirectional. It cannot be both (again, mutually exclusive).

A commit synchronization method can be applied to either directional flow method, and vice versa.

Replication configurations

Now that you understand replication architecture and methods, these can be combined to create a replication configuration. A replication configuration can also be referred to as a replication environment. The following define the different replication configurations that you can implement:

N-Way/Master-to-Master/Multi-Source: A distributed environment that has two or more change source sites. These source sites push changes to other change source sites and receive changes from other change source sites.

Uni-directional/Master-to-Secondary/Single-Source: A distributed environment where one site is the (change) source site (primary/master). It, in turn, pushes changes to other sites (secondary). If data changes directly at a secondary site, this could result in data divergence and must be addressed through conflict resolution methods.

Hybrid: A distributed environment that has a combination of multi and single source configurations.

Oracle Streams

As you can see, there are many components and methods that can be used to implement replication. Where do you start? What do you use, what don't you use, and when? And most importantly, how does Streams help?

The concept of Streams grew from pairing the distributed theory of Oracle's Advanced Replication with the redo change capture technology of Oracle's LogMiner. Rather than using triggers to capture database changes (as is done with Oracle's Advanced Replication), Streams uses LogMiner to capture the committed changes from the database on-line redo/archive logs. This allows for a more flexible replication architecture (like data capture, propagation, and apply rules that support site-to-site pass-through propagation and data transformations). However, by the nature of redo change capture, Streams replication is always, technically, asynchronous. The data change is committed at the source regardless if it can be committed at the destination. If you require a truly synchronous environment, you will want to explore Oracle Advanced Replication rather than Oracle Streams.

What this book is (and is NOT)

This book is intended to be a quick reference guide to Oracle 11g Streams. Along those lines we are going to quickly go over the basics and have you up and running with a simple Oracle 11g Streams environment in the first sections of this book. This is because we believe that hands-on is the only true and meaningful way of developing an expertise with a technology. Then we will evolve the simple Streams environment to cover areas of concern related to more advanced configurations and the administration of an Oracle 11g Streams configuration in a production environment. The authors do make an attempt to direct the reader to specific Oracle documentation, should the reader desire additional detailed information.

You should also be aware that this book is meant to be read chapter to chapter for the first three chapters. This provides you with the foundation that is needed for later chapters. If you have a background with Oracle Streams, consider jumping to specific configuration chapters (Chapter 4 through to Chapter 6).

Here is the high level layout of the chapters.

Who this book is for

This book is not for the novice Oracle DBA. In order to gain the most out of this book, you should have a good background as a working Oracle DBA and have a good familiarity with the Oracle Streams Components mentioned in Chapter 1. However, Chapters 1 and 2 may prove helpful to the novice in gaining a high-level understanding of Streams architecture and components, and design considerations.

Conventions

In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and an explanation of their meaning.

Code words in text are shown as follows: "We can include other contexts through the use of the include directive."

A block of code is set as follows:

BEGIN
DBMS_STREAMS_ADM.SET_UP_QUEUE(
queue_table => '"STREAMS_CAPTURE_QT"',
queue_name => '"STREAMS_CAPTURE_Q"',
queue_user => '"STRM_ADMIN"');
END;

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

BEGIN
DBMS_STREAMS_ADM.SET_UP_QUEUE(
queue_table => '"STREAMS_CAPTURE_QT"',
queue_name => '"STREAMS_CAPTURE_Q"',
queue_user => '"STRM_ADMIN"');
END;

Any command-line input or output is written as follows:

ALTER TABLE <table_name> ADD SUPPLEMENTAL LOG GOUP <log_group_name> (col1, col2) ALWAYS;

New terms and important words are shown in bold. Words that you see on the screen, in menus or dialog boxes for example, appear in the text like this: "clicking the Next button moves you to the next screen".

Note

Warnings or important notes appear in a box like this.

Note

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.

To send us general feedback, simply send an email to , and mention the book title via the subject of your message.

If there is a book that you need and would like to see us publish, please send us a note in the SUGGEST A TITLE form on www.packtpub.com or email .

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book on, see our author guide on www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Note

Downloading the example code for the book

Visithttp://www.packtpub.com/files/code/9706_Code.zip to directly download the example code.

The downloadable files contain instructions on how to use them.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books maybe a mistake in the text or the code we would be grateful if you would report this to us. By doing so, you can save other readers from frustration, and help us to improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/support, selecting your book, clicking on the let us know link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata added to any list of existing errata. Any existing errata can be viewed by selecting your title from http://www.packtpub.com/support.

Piracy

Piracy of copyright material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works, in any form, on the Internet, please provide us with the location address or web site name immediately so that we can pursue a remedy.

Please contact us at with a link to the suspected pirated material.

We appreciate your help in protecting our authors, and our ability to bring you valuable content.

Questions

You can contact us at if you are having a problem with any aspect of the book, and we will do our best to address it.

So, grab your waterwings and let's jump into Streams...