Book Image

IBM InfoSphere Replication Server and Data Event Publisher

By : Pav Kumar-Chatterjee, Pav Kumar Chatterjee
Book Image

IBM InfoSphere Replication Server and Data Event Publisher

By: Pav Kumar-Chatterjee, Pav Kumar Chatterjee

Overview of this book

Business planning is no longer just about defining goals, analyzing critical issues, and then creating strategies. You must aid business integration by linking changed-data events in DB2 databases on Linux, UNIX, and Windows with EAI solutions , message brokers, data transformation tools, and more. Investing in this book will save you many hours of work (and heartache) as it guides you around the many potential pitfalls to a successful conclusion. This book will accompany you throughout your Q replication journey. Compiled from many of author's successful projects, the book will bring you some of the best practices to implement your project smoothly and within time scales. The book has in-depth coverage of Event Publisher, which publishes changed-data events that can run updated data into crucial applications, assisting your business integration processes. Event Publisher also eliminates the hand coding typically required to detect DB2 data changes that are made by operational applications. We start with a brief discussion on what replication is and the Q replication release currently available in the market. We then go on to explore the world of Q replication in more depth. The latter chapters cover all the Q replication components and then talk about the different layers that need to be implemented—the DB2 database layer, the WebSphere MQ layer, and the Q replication layer. We conclude with a chapter on how to troubleshoot a problem. The Appendix (available online) demonstrates the implementation of 13 Q replication scenarios with step-by-step instructions.
Table of Contents (12 chapters)
IBM InfoSphere Replication Server and Data Event Publisher
Credits
About the Author
About the Reviewer
Preface

The different types of Q replication


There are four basic types of Q replication:

  • Unidirectional

  • Bidirectional

  • Peer-to-peer

  • Event Publishing

Replicating to a stored procedure or a Consistent Change Data (CCD) table are a subset of unidirectional replication.

Let's look at each of these in more detail. In the following sections, we talk about source and target tables. You may be wondering, what about views, triggers, and so on? You should check the Pre-setup evaluation section of Appendix A, for a list of objects to check for, before deciding on whether Q replication is the correct solution.

Unidirectional replication

In unidirectional replication, we can replicate all of the rows and columns of a source table or we can just replicate a subset of the rows and columns. We cannot really perform any transformation on this data. If we want to perform some sort of transformation, then we would need to replicate to a stored procedure, which we will discuss in detail in Appendix A.

Replicating to a stored procedure

Stored procedure replication is a subset of unidirectional replication in which the target is not a table as such, but a stored procedure, as shown in the following diagram:

A stored procedure can transform the data and output the results to a target table. This target table is not known to Q Apply. These stored procedures can be written in SQL, C, or Java. An example of replicating to a stored procedure is shown in the Replication to a stored procedure section of Appendix A.

Note

Prior to DB2 9.7 the source table and the stored procedure must have the same name, and the target table name can be any name we like.

Bidirectional replication

In bidirectional replication, we replicate copies of tables between two servers, each of which has a copy of the table. Note that we can only set up bidirectional replication between two servers. Unlike unidirectional replication, where we can replicate a subset of rows and columns, this is not possible in bidirectional replication. The tables on both servers can have different names, but must have the same number of rows and columns. The columns must have identical column names of compatible data types. It is not possible to do any data transformation using this type of replication.

Because we are updating records on both servers, it is possible that the same record will be updated at the same time on both servers.

Note

Although Q replication provides a conflict detection mechanism, we strongly advise that the driving application should be written or modified in such a way that such conflicts be avoided. The conflict detection provided by Q replication should be treated as a safety net and not the primary conflict resolution mechanism.

This mechanism allows us to choose which data values are used to detect conflicts (key column values only, changed column values, or all column values) and which server should win if such a conflict is detected. The row in the losing system is rolled back and the record is written to the IBMQSNAP_EXCEPTIONS table for review. Conflict detection is discussed in detail in the Q replication conflict detection section.

One of the related subjects to conflict detection is the concept of which server takes precedence in a conflict, or to put it more bluntly, which server is the master and which is the slave! If there is a conflict, then whichever server takes precedence will not apply changes from the other server. This ensures that the servers remain in sync. There is a more egalitarian option, which is that no server takes precedence. In this situation, rows are applied irrespective of whether or not there is a conflict, which ultimately leads to a divergence of the contents of the databases, which is not good!

There are two types of bidirectional replication—the first type is where we have an active/passive setup and the second type is where we have an active/active setup. The type of replication you choose will have implications on which server is defined as the master and which as the slave and what to do if a Q subscription is inadvertently inactivated.

In an active/passive setup, the passive server should be made the master. In an active/active setup, the choice of which system is the master is a decision you have to make. See the Conflict detection: update/delete conflict section of Appendix A for further discussion.

Peer-to-peer replication

Peer-to-peer replication allows us to replicate data between two or more servers. This is different from bidirectional replication, which is only between two servers. Each server has a copy of the table (which can have a different schema and name), but must have the same number of rows and columns and these columns must have identical column names and compatible data types. It is not possible to do any data transformation using this type of replication.

In peer-to-peer replication, there is no such thing as a master or slave server—each server will have the most recent copy of the table—eventually! What this means is that there will be a slight delay between the first server having a copy of the table and the last server having that copy. This is an asynchronous process, so at any one time the tables might be different, but once applications stop updating them, then the tables will converge to the most recently updated value. This type of processing means that there isn't any "manual" conflict detection as such (it is handled automatically by Q Apply), because the latest update will always win.

If two applications update the same record at exactly the same time, then Q replication uses the server number allocated when the peer-to-peer environment was set up to determine the winner. This type of processing means that two columns are added to each of the tables in the Q replication environment, where the first column is a timestamp of when the row was last updated (GMT) and the second column is the machine number. These updates are performed through triggers on the tables.

Tree replication

Tree replication comes in two flavors: bidirectional, which we call B-Tree replication, and unidirectional, which we call U-Tree replication.

A variation on the bidirectional replication theme is that it would be nice to be able to replicate from one master to two slaves in a bidirectional manner. This requirement was addressed with B-Tree replication.

A B-Tree replication structure is one, which looks like a tree (as shown in the preceding diagram). DB2A replicates with DB2B, DB2C, and DB2<n> in a bidirectional manner, but DB2B, DB2C, and DB2<n> do not replicate directly with each other—they have to replicate through DB2A, which is what differentiates B-Tree replication from peer-to-peer replication.

For B-Tree replication, we can replicate between one master and many slaves in a bidirectional manner. In SQL replication terms, this was called Update Anywhere. Note that it is not possible to set up B-Tree replication using the Replication Center—we need to use ASNCLP commands, which is described in detail in the Bidirectional replication to two targets (B-Tree) of Appendix A.

We can also replicate from one source to many targets in a unidirectional scenario, which we call a U-Tree scenario. In the preceding figure, DB2A replicates with DB2B, DB2C, and DB2<n> in a unidirectional manner (we can have more than three targets). Note, there is no radio button in the Replication Center to set up unidirectional U-Tree replication. What we have to do is set up unidirectional replication from DB2A to DB2B, and then for DB2A to DB2C, and so on. It is easier to use ASNCLP commands, which are described in detail in the Unidirectional replication to two targets (U-Tree) section of Appendix A.

Replicating to a Consistent Change Data table

Let's first look at the definition of Consistent Change Data (CCD) replication. CCD table replication is a subset of unidirectional replication, in which the target is a CCD table, which contains a row for each insert/delete/update that occurs on the source table. These CCD tables can be complete and/or condensed (this will be explained later).

There are three main uses of CCD table replication:

  • To populate an operational data store

  • To keep a history of changes made to the source table for audit purposes

  • To enable multi-target update

Consider the situation where we want to populate an Operational Data Store (ODS) with data from our live system. We want to replicate all operations apart from delete operations. Before the introduction of CCD tables, our only option was to use a stored procedure. One of the parameters that the Q Apply program passes to a stored procedure is the operation (insert, delete, and so on) that occurred on the source system. See the Replication to a stored procedure section of Appendix A.

We can use CCD tables to keep a history of changes made to a table, or as a feed to InfoSphere DataStage.

The multi-target update scenario uses Q replication to populate the CCD table and then uses SQL Replication to populate the multiple target tables as shown in the following diagram:

In the Replicating to a CCD table section of Appendix A, we go through the steps needed to set up replication to a CCD table.

We can only specify that a target table be a CCD table in a unidirectional setup. A CCD target table is made up of the following columns (only four of the metadata columns are compulsory, the other four are optional, and the order of the columns does not matter):

<user key columns>

<user nonkey columns>

<user computed columns>

 

IBMSNAP_INTENTSEQ

IBMSNAP_OPERATION

IBMSNAP_COMMITSEQ

IBMSNAP_LOGMARKER

Compulsory auditing columns

IBMSNAP_AUTHID

IBMSNAP_AUTHTKN

IBMSNAP_PLANID

IBMSNAP_UOWID

Optional auditing columns

The IBMSNAP_INTENTSEQ column is a sequence number that uniquely identifies a change and is ascending within a transaction. The IBMSNAP_OPERATION column is a flag that indicates the type of operation for a record. The IBMSNAP_COMMITSEQ column is a sequence number that provides transactional order. The IBMSNAP_LOGMARKER column is the time that the data were committed.

The IBMSNAP_AUTHID column is the authorization ID that is associated with the transaction. This column is for used for Linux, UNIX, Windows, and z/OS. For z/OS, this is the primary authorization ID. The IBMSNAP_AUTHTKN column is the authorization token that is associated with the transaction. This column is for z/OS only and is the correlation ID—it will be NULL for Linux, UNIX, and Windows. The IBMSNAP_PLANID column is the plan name that is associated with the transaction. This column is for z/OS only—it will be NULL for Linux, UNIX, and Windows. And finally, the IBMSNAP_UOWID column is the unit-of-work identifier from the log record for this unit of work. This column is used for Linux, UNIX, Windows, and z/OS.

The <user computed columns> columns will be the before image columns and must be NULLABLE (because there is no before image when we insert a row, the before image is NULL).

Now let's look at what data is stored in a CCD table. With CCD tables we can specify that the target table is: COMPLETE or NONCOMPLETE and CONDENSED or NONCONSENSED. These are interpreted as follows:

  • Complete (COMPLETE=Y): A complete CCD table contains every row of interest from the source table and is initialized with a full set of source data. All target table loading options are valid for complete CCDs (automatic, manual, or no load).

  • Noncomplete (COMPLETE=N): A noncomplete CCD table contains only changes to the source table and starts with no data. A noncomplete CCD table records all UPDATE operations at the source. The only valid load option for noncomplete CCD tables is no load.

  • Condensed (CONDENSED=Y): A condensed CCD table contains one row for every key value in the source table and contains only the latest value for the row. For condensed CCD tables, a primary key is required, which is used in case of an update conflict. If such a conflict occurs, all the source columns are forced into the row (CONFLICT_ACTION=F).

  • Noncondensed (CONDENSED=N): A noncondensed CCD table contains multiple rows with the same key value, one row for every UPDATE, INSERT, or DELETE operation at the source table. When added to the CCD table, all of the rows become INSERT operations. No primary key is required.

The options for handling unexpected conditions at the target are limited for CCD tables:

For condensed and complete, two choices are available:

  • Force the source change into the target table (CONFLICT_ACTION=F)

  • Ignore the condition and continue (CONFLICT_ACTION=I)

For any combination other than condensed and complete, the only valid option is to force the change into the target table.

For all CCD table types, the only valid conflict rule is to check only key columns (CONFLICT_RULE=K).

Before we move on, let's quickly look at CCD tables in an SQL Replication environment. In SQL Replication, there is the concept of internal and external CCD tables, which does not exist in Q replication. In SQL Replication terminology, all Q replication CCD tables are external CDD tables.

Event Publishing

The Event Publishing functionality captures changes to source tables and converts committed transactional data to messages in an XML format. Each message can contain an entire transaction or only a row-level change. These messages are put on WebSphere MQ message queues and read by a message broker or other applications. We can publish subsets of columns and rows from source tables so that we publish only the data that we need.