Book Image

Instant Oracle GoldenGate

By : Tony Bruzzese
Book Image

Instant Oracle GoldenGate

By: Tony Bruzzese

Overview of this book

Oracle GoldenGate is a comprehensive package for low-impact, real-time data capture, distribution, and delivery of transactional data across heterogeneous systems for continuous availability, zero downtime migration, and disaster recovery. All in all, it is a precise tool for data replication, regardless of the platform that you use in today’s environment. Instant Oracle GoldenGate exemplifies the ease of use of this package through the use of real-world examples. This book gives a good overview and hands-on approach to the most commonly used implementations in an Oracle GoldenGate environment.This How-to book will take you through a number of real-world examples quickly and effectively by eliminating much of the guess work for all users from novices to experienced users. You will learn about the key components of the architecture and simple one-way replication for a number of database tables or an entire schema. You’ll be covering key concepts and the implementation of high availability configurations such as Oracle RAC, encryption, and many different ways to transform and filter data to your target systems.
Table of Contents (7 chapters)

Managing Oracle GoldenGate (Simple)


So far we've talked mostly about configurations and very little about managing an OGG instance. We'll discuss some of the more common management commands you'll probably use most often.

How to do it…

Here you'll get to know some of the most useful commands for displaying OGG status and/or for root cause analysis investigation.

  1. We would be investigating long-running transactions when there is a sequence lag of 1 or more logs difference between the recovery checkpoint and the current checkpoint. We do it using the following command:

    GGSCI> info ex01sand showch
    

    The output of the preceding command will be as follows:

    
    Recovery Checkpoint (position of oldest unprocessed transaction in the data source):
        Thread #: 1
        Sequence #: 10501   ß way behind by 20 logs
    
    
    Current Checkpoint (position of last record read in the data source):
        Thread #: 1
        Sequence #: 10521
    
    

    You or your database administrator would need to troubleshoot as to which transaction is not committing its work. You might need to wait for the transaction to complete; or sometimes the user of that transaction might have forgotten to commit its work and you need to alert him/her to either commit their work or roll it back to clear the long-running transaction lag.

  2. We can check Extract/Replicat status' for a running instance using the following command.

    GGSCI> send pp01sand status
    

    The output of the preceding command will be as follows:

    EXTRACT PP01SAND (PID 696398)
      Current status: Recovery complete: At EOF
      
      Current read position:
      Sequence #: 11
      RBA: 1096
      Timestamp: 2012-12-17 11:05:44.778119
      Extract Trail: /u01/app/oracle/gg/dirdat/pr
    
      Current write position:
      Sequence #: 6
      RBA: 1408
      Timestamp: 2012-12-17 11:20:03.446074
      Extract Trail: /u01/app/oracle/goldengate/dirdat/rp
    
  3. For measuring the true lag between the data source and Extract process use the following command:

    GGSCI> send * getlag
    

    The output of the preceding command will be as follows:

    Sending GETLAG request to EXTRACT EX01SAND ...
    Last record lag: 1 seconds.
    At EOF, no more records to process.
    
    

    However, the following command is more common:

    GGSCI> info all
    

    The output of the preceding command will be as follows:

    Program     Status      Group       Lag at Chkpt  Time Since Chkpt
    MANAGER     RUNNING
    EXTRACT     RUNNING     EX01SAND    00:00:00      00:00:09
    EXTRACT     RUNNING     PP01SAND    00:00:00      00:00:06
    REPLICAT    RUNNING     RE01SAND    00:00:00      00:00:05
    
  4. The Miscellaneous commands used are as follows. This closes the current trail file and opens a new one:

    GGSCI> send pp01sand rollover  
    GGSCI> send ex01sand stop
    GGSCI> send ext01sand forcestop
    

    When an extract does not stop normally, you can abruptly stop it with the forcestop option.

  5. Following are the different ways to report stats on objects (tables):

    GGSCI> stats ex01sand latest table scott.*
    

    The output of the preceding command will be as follows:

    Extracting from SCOTT.TCUSTMER to SCOTT.TCUSTMER:
    
    *** Latest statistics since 2012-12-17 12:57:36 ***
            Total inserts                                      0.00
            Total updates                                      0.00
            Total deletes                                      1.00
            Total discards                                     0.00
            Total operations                                   1.00
    
    End of Statistics.
    
  6. Reporting the status as a rate of insert/update/deletes per time interval is done using the following commands:

    GGSCI> stats ex01sand table.tcustmer reportrate sec|min|hr
    GGSCI> stats ex01sand totalsonly scott.*
    
  7. For killing a Replicat / Deleting an E/R group, use the following command:

    GGSCI> kill replicat <group>
    GGSCI> delete <group> [!]
    
  8. The trail commands to be used are as follows. When you add trails to your extracts, you can specify the size of the trails with the keyword megabytes n where n is a number representing the size in megabytes.

    GGSCI> add exttrail | rmttrail, extract <group name> megabytes n
    GGSCI> alter exttrail | rmttrail extract <group name> megabytes n
    GGSCI> delete exttrail | rmttrail <trail name>
    

How it works...

Long running transactions (LRT) are identified by OGG via the WARNLONGTRANS parameter which by default is 60 minutes and it checks after every 5 minutes. It is wise to set this in your extract parameter file to a valid value for your environment. When checking for an LRT with the show checkpoint showch option to the info command, you ideally want the sequence number for both the recovery and current checkpoint to be the same; under normal circumstances this will be the case. But if the difference is more than 1 sequence, then you need to investigate further as to which transaction is holding up the delay. An LRT is not a true lag because it is still processing rows, but just not committing until the end of the transaction. The SQL may be a bad design so it's worth investigating and identifying the DML in question and suggesting a more scalable approach to the designer. OGG has a monitoring tool called the Director which normally will alert you of such LRT and will also identify the long-running transaction automatically.

The info <group name> detail command will give you detailed information regarding your environment. Try it, and usually I save the output along with info * showch to a Notepad file for later reference should it be needed for troubleshooting. I normally perform this when I make structural changes to tables or add/remove tables from the OGG configuration.

Checking for lags is a common command used by DBA's as a quick way to detect any lags with the E/R or both with the info all command. If the lag is in the range of several minutes to hours, you need to investigate further. It could be due to an LRT, network latencies, or a number of other issues with your database performance. There are times it is obvious, but other times you really need to dig in through OGG logs and/or database logs. This one time, I had a huge lag with the data pump and the send <group name> status command helped me identify that the issue was related outside the OGG/database. The issue was with network latencies. The command output will display the current read position in the local trail file directory and the current write position to the remote trail. The current read position has a sequence number associated; say 11 as in our case in the preceding example, which is part of the trail file name. An example of a directory listing of the trail files is as follows:

-rw-rw-rw-    1 oracle   oinstall       1096 Dec 17 11:05 pr000010
-rw-rw-rw-    1 oracle   oinstall       1231 Dec 17 12:57 pr000011

-rw-rw-rw-    1 oracle   oinstall       1231 Dec 17 13:40 pr000030

So, the pump is not shipping the trails as fast as they should and we seem to be 29 trail files behind. At this point you need to focus on probable network latencies by contacting your network administrator and working with him/her.

The ...getlag command is also a very useful command when you stop the extract and want to ensure the pump and replicat have a chance to drain all its processing while they are still running and have no more records to process. Equally, you can also issue the following command to either the pump or replicat while the extract is stopped to check for any further processing by either:

GGSCI> send PP01SAND logend

The output of the preceding command will be as follows:

Sending LOGEND request to EXTRACT PP01SAND ...
YES.

YES meaning that all the records have been processed in the data source and there aren't any further records to process.

Finally, you can check out some of the stats commands to display statistics about your objects. Try them in your environment.