Book Image

Oracle 11g R1 / R2 Real Application Clusters Handbook

Book Image

Oracle 11g R1 / R2 Real Application Clusters Handbook

Overview of this book

RAC or Real Application Clusters is a grid computing solution that allows multiple nodes (servers) in a clustered system to mount and open a single database that resides on shared disk storage. Should a single system (node) fail, the database service will still be available on the remaining nodes. RAC is an integral part of the Oracle database setup: one database, multiple users accessing it, in real time. This book will enable DBAs to get their finger on the pulse of the Oracle 11g RAC environment quickly and easily. This practical handbook documents how to administer a complex Oracle 11g RAC environment. It covers all areas of the Oracle 11g R1 RAC environment, with bonus R2 information included, and is indispensable if you are an Oracle DBA charged with configuring and implementing Oracle11g. It presents a complete method for the design, installation, and configuration of Oracle 11g RAC, ultimately enabling rapid administration of Oracle 11g RAC environments.Packed with real-world examples, expert tips, and troubleshooting advice, the book begins by introducing the concept of RAC and High Availability. It then dives deep into the world of RAC design, installation, and configuration, enabling you to support complex RAC environments for real-world deployments. Chapters cover RAC and High Availability, Oracle 11g RAC Architecture, Oracle 11g RAC Installation, Automatic Storage Management, Troubleshooting, Workload Management, and much more. By following the practical examples in the book, you will learn every concept of the RAC environment and how to successfully support complex Oracle 11g R1 and R2 RAC environments for various deployments in real-world situations.
Table of Contents (20 chapters)
Oracle 11g R1/R2 Real Application Clusters Handbook
Credits
About the Authors
About the Reviewers
Preface

Oracle RAC commands and tips


We will now discuss tools and scripts that will benefit you in the administration of Oracle 11g RAC environments. First, we will discuss clusterware configuration tools such as the cluster deconfig tool that can be used to remove the Oracle clusterware software from your Oracle RAC environment.

Cluster deconfig tool for Oracle RAC

As referenced in the Oracle Database De-installation Tool for Oracle Clusterware and Oracle Real Application 10g Release 2 (10.2) for Unix and Windows documentation, the cluster deconfig tool allows you to uninstall RAC cluster configurations.

The tool was released after 10.2.0.2 and is available on the OTN website from Oracle (http://otn.oracle.com). It provides you with the following benefits:

  • The cluster deconfig tool removes and deconfigures all of the software and shared files that are associated with an Oracle Clusterware or Oracle RAC database installation.

  • The cluster deconfig tool removes software, clusterware, and database files, along with the global configuration across all of the nodes in a cluster environment.

  • On Windows-based systems, the the cluster deconfig tool removes Windows Registry entries.

It is advised to use the cluster deconfig tool to prepare a cluster, to reinstall Oracle Clusterware and Oracle database software after a successful or failed installation.

The cluster deconfig tool restores your cluster to its state prior to the installation, enabling you to perform a new installation. You can also use the Oracle Cluster Verification Utility (CVU) to determine the cause of any problems that may have occurred during an installation so that you can correct the errors.

Note

The cluster deconfig tool will not remove third-party software that depends on Oracle Clusterware. In addition, the cluster deconfig tool provides no warnings about any third-party software dependencies that may exist with the Oracle Clusterware or Oracle database homes prior to removing the respective homes.

To download the cluster deconfig tool, go to the following website: http://download.oracle.com/otndocs/products/clustering/deinstall/clusterdeconfig.zip.

The cluster deconfig tool needs to be run as the root OS user and provides a built-in help feature to display available options and details for command syntax:

clusterdeconfig -help or -h

On Unix-based systems and Windows-based systems, the logfile for the clusterdeconfig utility output is located on the cluster node where you ran the tool in the cluster deconfig path/logs directory.

Depending on the software that you want to uninstall, plan the uninstallation so that the components are removed in the correct order.

As such, due to the dependencies among the Oracle Clusterware and the Oracle database software components, you must uninstall the components in a specific order as follows:

  1. 1. Uninstall all Oracle database homes that do not include ASM software.

  2. 2. Uninstall the Oracle database home that includes ASM software, if you have one.

  1. 3. Uninstall the Oracle Clusterware home.

Using the cluster deconfig tool

You would want to use the cluster deconfig tool to clean failed installation operations for Oracle 11g RAC setups. For example, you would need to use this tool in the event that the installation halts due to a hardware or operating system failure, and you need to clean up the failed installation as a result. Another reason that you may want to use this tool would be to remove existing databases along with their associated database home and cluster home directories.

Note

Use this tool with caution on production clusters and databases. Always test in a sandbox environment first, if possible.

The cluster deconfig tool can only remove Oracle database components from cluster nodes that are active. For cluster nodes that are inactive, one must rerun the tool when those nodes are active to complete the deconfiguration.

Use the cluster deconfig tool to remove installed components in the following situations:

  • When you have encountered errors during or after installing Oracle database software on a cluster and you want to reattempt an installation.

  • When you have stopped an installation and do not know how to restart it. In such cases, use the cluster deconfig tool to remove the partially-installed product and restart your installation.

  • When you have researched all of the problems with your existing installation using the Cluster Verification Utility (CVU), or by examining the installation logfiles, and cannot determine the cause of a problem.

Limitations of the cluster deconfig tool

If you attempt to run the cluster deconfig tool to uninstall Oracle Clusterware without first removing the related Oracle Database homes, then the cluster deconfig tool reports an error instead of a warning. The following example shows how to use the cluster deconfig tool for Oracle 11gR1 and Oracle 11gR2 environments:

[oracle@raclinux1]$ ./clusterdeconfig -checkonly -home /u01/app/oracle/product/crs
ORACLE_HOME = /u01/app/oracle/product/11gR1/crs
############ CLUSTERWARE/RAC DEINSTALL DECONFIG TOOL START ############
######################## CHECK OPERATION START ########################
Install check configuration START
Make sure that all Real Application Cluster (RAC) database homes are de-installed, before you choose to de-install the Oracle Clusterware home. Exited from Program

Problems and limitations of the Cluster Deconfig tool

We can examine a few of the limitations for the cluster deconfig tool. Be sure to check the latest bug list available online at My Oracle Support (http://support.oracle.com) to remain up to date on the latest issues with Oracle. The following bugs mention some key problems that may arise while using the cluster deconfig tool.

BUG 5181492: Listener is not downgraded to 10.1 if 10.2 home missing: The cluster deconfig tool fails to downgrade listener configurations to its original home during de-installation and downgrade of Oracle 10.2 RAC home. This happens only if the Oracle 10.2 RAC home that is being deconfigured does not have Oracle software present and listener configurations do not exist in Oracle's home/network/admin directory.

Workaround: Run Oracle home/bin/netca from the original home to recreate listener configurations.

BUG 5364797: DC Tool not deleting home completely when home is shared: The cluster deconfig tool fails to delete Oracle software completely from a shared Oracle RAC database home or Oracle Clustereware home. It completely removes Oracle configuration that exists on all nodes of the cluster.

Workaround: Run the rm -rf <Oracle home path> command on any one node in the cluster to completely remove the software.

Starting the cluster deconfig tool

There are two ways to start the cluster deconfig tool depending on the platform in which you have Oracle 11g RAC. For Unix-and Linux-based systems, you would start the utility by logging in as the Oracle user account that performed the installation for the Oracle Clusterware and Oracle database home that is being uninstalled. For Windows, you would need to log in to the host as the user who is a member of the local administrators group.

To start the tool, you will need to issue the clusterdeconfig command.

Before you perform the uninstallation with the clusterdeconfig utility, you must first confirm that the connected user has user equivalence on all of the nodes in the cluster. This means users can automatically establish an ssh or rsh session between host nodes without a password. This can be verified by using the ssh command in Unix or Linux to verify that access is permitted without a password between all cluster nodes.

To begin the de-installation tool, connect to one of the nodes in your cluster that contains the installation that you want to remove. You can connect to the node directly or you can connect from a client.

Open a command-line interface and enter the following command:

$ clusterdeconfig

The output from this command displays the required and optional parameters.

You can also use the -help or -h option to obtain more information about the clusterdeconfig tool commands and their use.

Silent mode operations using cluster deconfig

The deconfiguration tool also supports the silent (-silent) option.

It is mandatory to have<Deconfiguration tool home>/ templates /rac.properties file to de-install or downgrade an RAC database.

It is strongly advised that you run the cluster deconfiguration tool using the checkonly option to generate the properties file such as<Deconfiguration tool home>/templates/rac.properties before runnnig the tool to remove the configuration or downgrading the Oracle RAC database configuration.

The deconfiguration tool supports the silent (-silent) option to operate without prompting you for information. In this mode, the deconfiguration tool will try to discover the Oracle network listener, the Oracle database, the Oracle ASM instance, and Oracle Enterprise Manager Database Control information. The successful discovery of these components depends on the configuration of the Oracle listener, Oracle RAC database, Oracle ASM instance, Oracle EM database control, and the availability of the Oracle RAC database software.

An example of using the cluster deconfig tool in non-silent mode for Oracle 11gR1 RAC is as follows:

[oracle@raclinux1 clusterdeconfig]$ ./clusterdeconfig -home /home/oracle/product/11.1.0/db -checkonly
ORACLE_HOME = /home/oracle/product/11.1.0/db
#### CLUSTERWARE/RAC DEINSTALL DECONFIG TOOL START ############
################### CHECK OPERATION START ########################
Install check configuration START
The cluster node(s) on which the Oracle home exists are: raclinux1, raclinux2
Checking for existance of the Oracle home location /home/oracle/product/11.1.0/db
Checking for existance of central inventory location /home/oracle/raclinux
Checking for existance of the Oracle clusterware home /home/oracle/product/11.1.0/crs_raclinux
The following nodes are part of this cluster: raclinux1,raclinux2
Install check configuration END
Network Configuration check config START
Network de-configuration trace file location: /home/oracle/clusterdeconfig/logs/netdc_check.log.
Location /home/oracle/product/11.1.0/db/network/tools does not exist!
Specify the list of listener prefixes that are configured in this Oracle home. For example, MYLISTENER would be the prefix for listeners named MYLISTENER_node1 and MYLISTENER_node2 []:LISTENER_RACLINUX1,LISTENER_RACLINUX2
Specify prefixes for all listeners to be migrated to another database or ASM home. The target Oracle Database home version should be 10g or above. This ensures that clients can continue to connect to other Oracle instances after the migration. Specify "." if you do not want to migrate any listeners. Listeners that you do not specify here will be de-configured. []:
Network Configuration check config END
Database Check Configuration START
Database de-configuration trace file location '/home/oracle/clusterdeconfig/logs/assistantsdc_check3731.log'
Specify the list of database names that are configured in this Oracle home []: racdb
Specify if Automatic Storage Management (ASM) instance is running from this Oracle home y|n [n]: y
cluster deconfig tool, Oracle RACcluster deconfig tool, Oracle RACusing, in silent mode###### For Database 'racdb' ######
Specify the nodes on which this database has instances [raclinux1, raclinux2]:
Specify the instance names [raclinux1, raclinux2]:
Specify the local instance name on node raclinux1 [raclinux1]:
Specify whether this was an upgraded database. The de-configuration tool will attempt to downgrade the database to the lower version if it is an upgraded database y|n [n]:
Specify the storage type used by the Database ASM|CFS|RAW []: ASM
Specify the list of directories if any database files exist on a shared file system. If 'racdb' subdirectory is found, then it will be deleted. Otherwise, the specified directory will be deleted. Alternatively, you can specify list of database files with full path [ ]: /oradata/racdb
Specify the flash recovery area location, if it is configured on the file system. If 'racdb' subdirectory is found, then it will be deleted. []:
Specify the database spfile location [ ]:
Specify the database dump destination directories. If multiple directories are configured, specify comma separated list []:
Specify the archive log destination directories []:
Database Check Configuration END
Enterprise Manager Configuration Assistant START
Checking configuration for database racdb
Enterprise Manager Configuration Assistant END
##################### CHECK OPERATION END #########################
########################### SUMMARY ###############################
Oracle Home selected for de-install is: /home/oracle/oracle/product/11.1.0/db
Inventory Location where the Oracle home registered is: /home/oracle/racdb
Oracle Clusterware Home is: /home/oracle/product/11.1.0/crs_racdb
The cluster node(s) on which the Oracle home exists are: raclinux1,raclinux2
The following databases were selected for de-configuration : racdb
Database unique name : racdb
Storage used : ASM A log of this session will be written to: '/home/oracle/clusterdeconfig/logs/deinstall_deconfig.out2'
Any error messages from this session will be written to: '/home/oracle/ clusterdeconfig/logs/deinstall_deconfig.err2'
################## ADDITIONAL INFORMATION ########################
The clusterdeconfig tool has detected that the Oracle home, /home/oracle/product/11.1.0/db, was removed without properly deconfiguring all of the Oracle database components related to that Oracle home. Because of this, processes such as tnslsnr, dbcontrol and so on that depend on this Oracle home might still be running on the cluster nodes, raclinux1,raclinux2.
Oracle recommends that you kill these processes on these cluster nodes before attempting to re-install the Oracle database software. You can use the command ps -efw to see the full paths of the processes that were started that have an absolute pathname of the Oracle home. Or you can also use lsof +D <Oracle home> to show all the open files in that directory and which user owns them. The -t option to the lsof command displays the process identifiers.
###### CLUSTERWARE/RAC DEINSTALL DECONFIG TOOL END #############

Manual cleanup for RAC

In the event that you cannot use the clusterware deconfig tool, you will need to manually clean up the RAC environment by using OS commands as shown next.

On the Linux platform, you will need to use the Linux ps (process status) and kill commands to shut down the Oracle RAC and clusterware background processes.

  • Kill EVM, CRS, and CSS processes if they are currently running:

ps -efl | grep crs
kill <crs pid> <evm pid> <css pid>

Now you will need to log in as root user at the Linux OS prompt and execute the rm command to delete the clusterware files from the /etc/oracle,/etc/init.d, and additional directory structures shown as follows.

  • As root user, remove the following files:

    rm /etc/oracle/*
    rm -f /etc/init.d/init.cssd
    rm -f /etc/init.d/init.crs
    rm -f /etc/init.d/init.crsd
    rm -f /etc/init.d/init.evmd
    rm -f /etc/rc2.d/K96init.crs
    rm -f /etc/rc2.d/S96init.crs
    rm -f /etc/rc3.d/K96init.crs
    rm -f /etc/rc3.d/S96init.crs
    rm -f /etc/rc5.d/K96init.crs
    rm -f /etc/rc5.d/S96init.crs
    rm -Rf /etc/oracle/scls_scr
    rm -f /etc/inittab.crs
    cp /etc/inittab.orig /etc/inittab
    
  • If you are using oraInventory that has other Oracle software install, then uninstall the CRS home using the Oracle Universal Installer (OUI).

  • If oraInventory has only CRS_HOME that you plan to remove, then as root user, remove CRS_HOME:

# rm -Rf CRS_HOME/*

For RDBMS installation, if you are using oraInventory that has other Oracle software installed, then uninstall the CRS home using the Oracle Universal Installer.

To know about the required steps for other platforms, refer to My Oracle Support (formerly Metalink) note 239998.1.

If oraInventory has only the RDBMS_HOME that you plan to remove, then as oracle user, remove the RDBMS_HOME:

rm -Rf $ORACLE_ HOME/*

For ASM, clean up any ASM disks if they have already been used.

If there is no other Oracle software running, you can remove the files in /var/tmp/.oracle or /tmp/.oracle:

# rm -f /var/tmp/.oracle or rm -f /tmp/.oracle

Now that we have cleaned up the environment, let's review how to repair the RAC environment, using the rootdelete.sh and rootdeinstall.sh commands, which doesn't require reinstallation.

Repairing the RAC environment without reinstalling

Under some circumstances, you may need to clean up an existing failed RAC install and run root.sh without reinstalling CRS. To do so, you can execute the following scripts:

$ORA_CRS_HOME/install/rootdelete.sh
$ORA_CRS_HOME/install/rootdeinstall.sh

You can also run these scripts if you want to reinitialize the OCR and Voting Disk without reinstalling CRS.

Reinitializing OCR and Voting Disks without reinstalling RAC

To perform the reinstantiation of the OCR and Vote Disks without an RAC reinstallation, perform the following tasks:

  1. 1. Obtain approval for a maintenance window for downtime, to reinitialize OCR. All resources and CRS must be stopped.

  2. 2. Run the rootdelete.sh script and then the rootdeinstall.sh script from the $ORA_CRS_HOME/install directory.

  3. 3. These scripts will stop CRS and clean up SCR settings in /etc/oracle/scls_scr or /var/opt/oracle/scls_scr (depending on your platform) directory and remove contents from OCR and OCR mirror using dd command.

  4. 4. If this succeeds, then reinitialize the OCR and Voting disk. If, for any reason, there is a problem with the script, then you will need to perform a manual clean up as mentioned earlier.

  5. 5. Stop all resources on all nodes in the RAC cluster using the following commands:

srvctl stop database -d <dbname>
srvctl stop asm -n <node name>
srvctl stop nodeapps -n <node name>
  1. 6. Stop CRS on all nodes while logged in as root user:

    Oracle 10.1: /etc/init.d/init.crs stop

    Oracle 10.2 and later: $ORA_CRS_HOME/bin/crsctl stop crs

  2. 7. Format the OCR and voting disk while logged in as root user using the following:

# dd if=/dev/zero of=<OCR disk> bs=125829120 count=1
# dd if=/dev/zero of=<Voting disk> bs=20971520 count=1

Note

If OCR or the Voting is on a shared filesystem, delete them from OS level.

  1. 8. Remove the scls_scr directory on all nodes in order to allow root.sh to re-execute using the following:

# rm -r /etc/oracle/scls_scr
  1. 9. In a new shell window, execute the following root.sh script as the root OS user:

# $ORA_CRS_HOME/root.sh

Note

This needs to be executed on all nodes one after the other (similar to the CRS installation time running root.sh). It cannot be executed at the same time on all nodes.

  1. 10. Once root.sh successfully completes execution on all nodes, CRS should start automatically.

  2. 11. If VIPCA errors at the root.sh for the last node in the cluster due to an IP address issue, run the VIPCA manually from an X-Window session with the display set up correctly to create the VIP/ONS/GSD resources.

# cd $ORA_CRS_HOME/bin
# vipca
  1. 12. Run oifcfg to configure the private interface.

  2. 13. As the Oracle user, issue the following oifcfg command to set the network interfaces for the cluster nodes:

oifcfg setif -global <if_name>/<subnet>:public
oifcfg setif -global <if_name>/<subnet>:cluster_interconnect
  1. 14. Run the NETCA utility to create a listener. Rename the $ORACLE_HOME/network/admin/listener.ora (under RDBMS ORACLE_HOME) to # $ORACLE_HOME/bin/netca on all cluster nodes. Enter the correct information as prompted. Now, the crs_stat -t output should display the listener resource for all cluster nodes.

Note

The asm_inst_name can only be +ASM1, +ASM2, and so on. Failure to provide the correct name may cause OCR corruption.

  1. 15. Once all resources are registered, start them using the SRVCTL command as the Oracle user:

$ srvctl start asm -n <node_name>
$ srvctl start instance -d <db_name> -i <inst_name>
  1. 16. Check the crs_stat -t output. It should now display all cluster resources with an ONLINE status on ALL nodes.

Now that we have verified the status for the Clusterware services, we are ready to show you how to use the rootdelete.sh script to remove a node from the Oracle 11g RAC environment.

Using ROOTDELETE.SH in debug mode

Oracle 11g provides you with the rootdelete.sh script to remove one or more nodes from the Oracle RAC configuration. To issue the script, you need to run it as the root OS user account by entering the rootdelete.sh command from the Oracle 11g CRS_HOME directory on the master cluster node. The following example shows the option to debug the removal process while executing the rootdelete.sh script:

[root@raclinux1 install]# ./rootdelete.sh
+ ORA_CRS_HOME=/home/oracle/product/11.1.0/crs_raclinux
+ ORACLE_OWNER=oracle
Start of script
+ DBA_GROUP=oinstall
+ USER_ARGS=
+ LOCALNODE=local
+ SHAREDVAR=nosharedvar
+ SHAREDHOME=sharedhome
+ DOWNGRADE=false
+ VERSION=11.1
+ CH=/home/oracle/product/11.1.0/crs_raclinux
+ ORACLE_HOME=/home/oracle/product/11.1.0/crs_raclinux
+ export ORA_CRS_HOME
+ export ORACLE_HOME
.
+ verifyCRSResources
+ VIP_SUFFIX=.vip
+ GSD_SUFFIX=.gsd
+ ONS_SUFFIX=.ons
+ LSNR_SUFFIX=.lsnr
+ DB_SUFFIX=.db
+ /home/oracle/product/11.1.0/crs_raclinux/bin/crs_stat
+ return 0
+ /bin/echo 'Checking to see if Oracle CRS stack is down...'
Checking to see if Oracle CRS stack is down...
+ /home/oracle/product/11.1.0/crs_raclinux/bin/crs_stat
+ /bin/echo 'Oracle CRS stack is not running.'
Oracle CRS stack is not running.
.
+ /bin/echo 'Oracle CRS stack is down now.'
Oracle CRS stack is down now.
.
+ /sbin/init q
+ /bin/echo 'Removing script for Oracle Cluster Ready services'
Removing script for Oracle Cluster Ready services
+ /bin/rm /etc/init.d/init.crs /etc/init.d/init.crsd /etc/init.d/init.cssd
.
+ '[' local = remote ']'
+ /bin/echo 'Cleaning up SCR settings in '\''/etc/oracle/scls_scr'\'''
Cleaning up SCR settings in '/etc/oracle/scls_scr'
+ /bin/rm -rf /etc/oracle/scls_scr
root@raclinux1 install]# ./rootdeinstall.sh
+ ORACLE_OWNER=oracle
+ DBA_GROUP=oinstall
+ ORA_CRS_HOME=/home/oracle/product/11.1.0/crs_raclinux
.
+ /bin/echo 'Removing contents from OCR device'
Removing contents from OCR device
+ /bin/dd if=/dev/zero skip=25 bs=4k count=2560 of=/ocfs21/oradata/test2/ocrtest2
2560+0 records in
2560+0 records out
+ /bin/rm /etc/oracle/ocr.loc
.
+ /bin/chown oracle /oradata/raclinux/ocrtest2
+ /bin/chgrp oinstall /ocfs21/oradata/raclinux/ocrtest2
+ /bin/chmod 644 /ocfs21/oradata/raclinux/ocrtest2

Using rootdeinstall.sh

The rootdeinstall.sh script allows you to format the Oracle Cluster Registry (OCR) by using the Unix/Linux dd command. In addition, it allows you to change the OCR device owner back to the Oracle user and dba group. All you will still need to manually remove the /var/oracle or /tmp/oracle directory to clean up the failed installation.

Reinstalling CRS on the same cluster in another CRS_HOME

Now let's illustrate how to reinstall the Cluster Ready Services (CRS) on the same cluster into another CRS_HOME directory. The following example will use the SRVCTL and CRSCTL commands to shutdown and disable the Oracle 11g Clusterware services. This is mandatory before we can re-install the Cluster Ready Services.

Stopping CRS processes

  1. 1. Stop Nodeapps:

./srvctl stop nodeapps -n prrrac1
./srvctl stop nodeapps -n prrrac2
  1. 2. Stop CRS:

./crsctl stop crs
  1. 3. Disable CRS:

./crsctl disable crs
  1. 4. Check the CRS process:

ps -ef | grep crs css evm

Reinstalling CRS on same cluster in another CRS_HOME

Now let's explain how to perform the reinstallation for the CRS software to the same cluster with a different CRS_HOME directory. To do so, you must log into the server as the root user.

  1. 1. Restore original inittab:

# cp /etc/inittab /etc/inittab_pretest1
# mv /etc/inittab.orig /etc/inittab
  1. 2. Reboot the node:

# /sbin/shutdown -r now
  1. 3. Move the CRS files to a backup location under the /etc/init.d directory as shown here:

# mv /etc/init.d/init.cssd /etc/init.d/init.cssd_pretest
# mv /etc/init.d/init.evmd /etc/init.d/init.evmd_pretest
# mv /etc/init.d/init.crsd /etc/init.d/init.crsd_pretest
# mv /etc/init.d/init.crs /etc/init.d/init.crs_pretest
# mv /etc/oracle /etc/oracle_pretest
# mv /etc/oraInst.loc /etc/oraInst.loc_pretest

Note

For Oracle 11gR2, we recommend that you use the new utility called rootcrs.pl or roothas.pl as mentioned in My Oracle Support (formerly Metalink) note 942166.1, How to Proceed from Failed 11gR2 Grid Infrastructure (CRS) Installation.

Oracle 11gR2 cluster removal tools for RAC

Oracle 11gR2 has a new tool called roothas.pl for standalone grid instances and rootcrs.pl for RAC configurations. To execute the script for an Oracle 11gR2 RAC environment, you will need to run the script as shown here:

  1. 1. Execute the following script as root:

$GRID_HOME/crs/install/rootcrs.pl verbose deconfig force
  1. 2. Execute the script on all nodes except for the last node in the cluster where $GRID_HOME is the environment variable for your 11gR2 RAC grid infrastructure directory.

  2. 3. As root user, execute the script

    $GRID_HOME/crs/install/rootcrs.pl -verbose -deconfig -force -lastnode on the last node in the cluster. This command will also format the OCR and Vote disks.

  3. 4. As the Oracle Grid user account, execute the following script:

$GRID_HOME/deinstall/deinstall

Tracing RAC issues with Oradebug

Oradebug is an excellent but poorly understood utility for diagnosing database issues. It has features to trace and monitor all of the critical items for Oracle RAC environments, including the ability to monitor and trace the Oracle RAC Clusterware (CRS) stack and Oracle 11g RAC interconnect operations for the Interprocess Communications usage. You must be logged into SQL*PLUS as SYSDBA to use Oradebug. The Oradebug help command will display its functions and general commands in the window within SQL*Plus, in Oracle 11g Release 1 (11.1) on the Red Hat Enterprise Linux platform, as shown next:

SQL> oradebug help
HELP [command] Describe one or all commands
SETMYPID Debug current process
SETOSPID <ospid> Set OS pid of process to debug
SETORAPID <orapid> ['force'] Set Oracle pid of process to debug
SETORAPNAME <orapname> Set Oracle process name to debug
SHORT_STACK Get abridged OS stack
CURRENT_SQL Get current SQL
DUMP <dump_name> <lvl> [addr] Invoke named dump
DUMPSGA [bytes] Dump fixed SGA
DUMPLIST Print a list of available dumps
EVENT <text> Set trace event in process
SESSION_EVENT <text> Set trace event in session
DUMPVAR <p|s|uga> <name> [level] Print/dump a fixed PGA/SGA/UGA variableDUMPTYPE <address> <type> <count> Print/dump an address with type info
SETVAR <p|s|uga> <name> <value> Modify a fixed PGA/SGA/UGA variable
PEEK <addr> <len> [level] Print/Dump memory
POKE <addr> <len> <value> Modify memory
WAKEUP <orapid> Wake up Oracle process
SUSPEND Suspend execution
RESUME Resume execution
FLUSH Flush pending writes to trace file
CLOSE_TRACE Close trace file
TRACEFILE_NAME Get name of trace file
LKDEBUG Invoke global enqueue service debugger
NSDBX Invoke CGS name-service debugger
-G <Inst-List | def | all> Parallel oradebug command prefix
-R <Inst-List | def | all> Parallel oradebug prefix (return outputSETINST <instance# .. | all> Set instance list in double quotes
SGATOFILE <SGA dump dir> Dump SGA to file; dirname in double quotesDMPCOWSGA <SGA dump dir> Dump & map SGA as COW; dirname in double quotes
MAPCOWSGA <SGA dump dir> Map SGA as COW; dirname in double quotes
HANGANALYZE [level] [syslevel] Analyze system hang
FFBEGIN Flash Freeze the Instance
FFDEREGISTER FF deregister instance from cluster
FFTERMINST Call exit and terminate instance
FFRESUMEINST Resume the flash frozen instance
FFSTATUS Flash freeze status of instance
SKDSTTPCS <ifname> <ofname> Helps translate PCs to names
WATCH <address> <len> <self|exist|all|target> Watch a region of memoryDELETE <local|global|target> watchpoint <id> Delete a watchpoint
SHOW <local|global|target> watchpoints Show watchpoints
DIRECT_ACCESS <set/enable/disable command | select query> Fixed table access
CORE Dump core without crashing process
IPC Dump ipc information
UNLIMIT Unlimit the size of the trace file
PROCSTAT Dump process statistics
CALL [-t count] <func> [arg1]...[argn] Invoke function with argumentsSQL>

In order to trace the interconnect events with Oracle RAC, you will need to issue the oradebug ipc command.

The following example will show you how to use Oradebug to trace the interconnect and IPC activities.

[oracle@raclinux1 ~]$ sqlplus "/as sysdba" LOG ON AS SYSDBA ACCOUNT to SQL*PLUS
SQL*Plus: Release 11.1.0.7.0 - Production on Fri Aug 15 21:43:46 2008
Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - Production
With the Partitioning, Real Application Clusters, Oracle Label Security, OLAP
and Data Mining Scoring Engine options
SQL> oradebug setmypid First we need to set the process id to trace
Statement processed.
SQL> oradebug unlimit
Statement processed.
SQL> oradebug ipc Then we set trace option for IPC memory
Information written to trace file.
SQL> oradebug tracefile_name Give the trace a name to identify the file
/u01/app/oracle/admin/RACDB/udump/racdb1_ora_6391.trc
SQL>
[oracle@raclinux1 ~]$ cd /u01/app/oracle/admin/RACDB/udump change to the user dump directory
[oracle@raclinux1 udump]$ view racdb1_ora_6391.trc open the file to look at contents of the trace
/u01/app/oracle/admin/RACDB/udump/racdb1_ora_6391.trc
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - Production
With the Partitioning, Real Application Clusters, Oracle Label Security, OLAP
and Data Mining Scoring Engine options
OradebugOradebugissues, tracingORACLE_HOME = /u01/app/oracle/product/11.1.0/db_1
System name: Linux
Node name: raclinux1.us.oracle.com
Release: 2.6.9-5.EL
Version: #1 Sun Jun 12 12:31:23 IDT 2005
Machine: i686
Instance name: RACDB1
Redo thread mounted by this instance: 1
Oracle process number: 20
Unix process pid: 6391, image: [email protected] (TNS V1-V3) shows us the process id for the trace
system cpu time since last wait 0 sec 0 ticks gives us the amount of CPU time that has occurred
locked 1
blocked 0
timed wait receives 0
admno 0x769fcb68 admport:
SSKGXPT 0xcc75e9c flags SSKGXPT_READPENDING info for network 0
socket no 7 IP 10.10.10.11 UDP 2247 Here we can find the network configuration details
sflags SSKGXPT_UP
info for network 1
socket no 0 IP 0.0.0.0 UDP 0
sflags SSKGXPT_DOWN
active 0 actcnt 1
context timestamp 0
no ports
sconno accono ertt state seq# sent async sync rtrans acks

The above output from the oradebug utility represents the network configuration details for the Oracle 11g RAC cluster obtained by the dump trace file

Using Oradebug to trace Oracle 11g Clusterware

We can use Oradebug to trace events for Oracle RAC Clusterware (CRS) by using the following command:

SQL> oradebug dump crs 3

We can also trace the behavior of the Oracle RAC Cluster Synchronization (CSS) operations with the following Oradebug command:

SQL> oradebug dump css 3

Traces can also be performed for the Oracle Cluster Registry (OCR), with Oracle RAC, with the help of the Oradebug command:

SQL> oradebug dump ocr 3

Server Control Utility

The Server Control Utility (SRVCTL) utility gives you the ability to manage RAC and ASM services. The following examples display the options available with this utility. A complete listing is available in Appendix A of the Oracle Real Application Clusters Administration and Deployment Guide 11g Release 2 (11.2) documentation, available online from either http://tahiti.oracle.com or http://otn.oracle.com.

As there are hundreds of SRVCTL commands explained in detail in the Oracle documentation, we will present only a few brief examples for Oracle 11gR2 RAC to illustrate how you can use this utility.

Oracle 11gR2 SRVCTL commands

The following SRVCTL commands are available only with Oracle 11gR2:

  • SRVCTL ADD EONS: Adds an eONS daemon to the RAC Grid Infrastructure

  • SRVCTL ADD FILESYSTEM: Adds a volume to Oracle ACFS (ASM Cluster Filesystem)

To manage Oracle ASM with Oracle Database 11g R2 (11.2), you need to use the SRVCTL binary in the Oracle Grid infrastructure home ($GRID_HOME) for a cluster. If you have Oracle RAC or Oracle Database installed, then you cannot use the SRVCTL binary in the database home ($ORACLE_HOME) to manage the Oracle 11gR2 ASM.

To manage Oracle ACFS on Oracle Database 11g R2 (11.2) installations, use the SRVCTL binary in the Oracle grid infrastructure home for a cluster (Grid home). If you have Oracle RAC or Oracle database installed, then you cannot use the SRVCTL binary in the database home to manage Oracle ACFS.

Managing Oracle Clusterware with the CRSCTL utility

CRSCTL commands function in tandem with the Oracle 11g RAC environment to provide tools to manage the operation of the Oracle 11g Clusterware.

While SRVCTL provides a more comprehensive suite of tools for managing all aspects of the Oracle RAC environment, the CRSCTL utility pertains only to the behavior of the Oracle Clusterware. A complete discussion of every CRSCTL command is beyond the scope of this book. We will highlight key usages of the tool. For a complete listing of the syntax and commands for CRSCTL, we recommend reading Appendix E of the Oracle Clusterware Administration and Deployment Guide 11g Release 2 (11.2) that provides a complete discussion of the CRSCTL utility.

If you wish to recive help when using CRSTCTL, you can issue the following command:

$ crsctl -help

If you want help for a specific command such as start, then enter the command and append -help at the end, as shown in the following example:

$ crsctl start -help

You can also use the abbreviations -h or -? instead of help&mdash; this option functions in Linux, Unix, and Windows environments.

Differences between 11gR1 and 11gR2 syntax for CRSCTL

Oracle 11gR2 introduces clusterized commands that are operating system independent. They rely on the Oracle High Availability Service Daemon (OHASD). If the OHASD daemon is running, then you can perform remote operations such as starting, stopping, and checking the status of remote nodes.

In addition, the following CRSCTL commands are left over from earlier releases, and as such are deprecated and no longer used in 11gR2:

  • crs_stat

  • crs_register

  • crs_unregister

  • crs_start

  • crs_stop

  • crs_getperm

  • crs_profile

  • crs_relocate

  • crs_setperm

  • crsctl check crsd

  • crsctl check cssd

  • crsctl check evmd

  • crsctl debug log

  • crsctl set css votedisk

  • crsctl start resources

  • crsctl stop resources

You can use the crsctl add resource command to register a resource to be managed by Oracle Clusterware. A resource can be an application process, a database, a service, a Listener, and so on. We will look at an example of how to use the crsctl command:

$ crsctl check css

Use the crsctl check css command to check the status of Cluster Synchronization Services. This command is most often used when Oracle Automatic Storage Management (Oracle ASM) is installed on the local server.

CRS_STAT

For Oracle 11gR1 and earlier versions of Oracle RAC, the CRS_STAT utility provides you with the ability to monitor the condition of your Oracle RAC environment.

The utility is based on the script called crs_stat.sh.

For Oracle 11gR2:

The crs_stat utility has been deprecated in 11gR2, so do not use it anymore. To find out all user resource state, $GRID_HOME/bin/crsctl stat res t script.

By default ora.gsd is offline if there is no 9i database in the cluster. ora.oc4j is OFFLINE in 11.2.0.1 as Database Workload Management (DBWLM) is unavailable.

In 11gR2, you can use the following command to find out the clusterware process state:

$GRID_HOME/bin/crsctl stat res -t -init
The Kernel file OSM Discovery tool

Oracle provides an undocumented command called the kernel file OSM discovery (KFOD) tool to monitor Oracle ASM (Automatic Storage Management) environments. To obtain a listing of the available options for this utility, at a Unix or Linux shell prompt, enter kfod help=y as shown following:

[oracle@raclinux1 ~]$ kfod help=y
_asm_a/llow_only_raw_disks KFOD allow only raw devices [_asm_allow_only_raw_disks=TRUE/(FALSE)]
_asm_l/ibraries ASM Libraries[_asm_libraries=lib1,lib2,...]
_asms/id ASM Instance[_asmsid=sid]
a/sm_diskstring ASM Diskstring [asm_diskstring=discoverystring, discoverystring ...]
c/luster KFOD cluster [cluster=TRUE/(FALSE)]
db/_unique_name db_unique_name for ASM instance[db_unique_name=dbname]
di/sks Disks to discover [disks=raw,asm,all]
ds/cvgroup Include group name [dscvgroup=TRUE/(FALSE)]
g/roup Disks in diskgroup [group=diskgroup]
h/ostlist hostlist[hostlist=host1,host2,...]
metadata_a/usize AU Size for Metadata Size Calculation
metadata_c/lients Client Count for Metadata Size Calculation
metadata_d/isks Disk Count for Metadata Size Calculation
metadata_n/odes Node Count for Metadata Size Calculation
metadata_r/edundancy Redundancy for Metadata Size Calculation
n/ohdr KFOD header suppression [nohdr=TRUE/(FALSE)]
o/p KFOD options type [OP=DISKS/CANDIDATES/MISSING/GROUPS/INSTS/VERSION/CLIENTS/RM/RMVERS/DFLTDSTR/GPNPDSTR/METADATA/ALL]
p/file ASM parameter file [pfile=parameterfile]
s/tatus Include disk header status [status=TRUE/(FALSE)]
v/erbose KFOD verbose errors [verbose=TRUE/(FALSE)]
KFOD-01000: USAGE: kfod op=<op> asm_diskstring=... | pfile=...

The KFOD utility also allows you to view the current Oracle ASM configuration by using the kfod disk=all command as shown in the following example:

oracle@raclinux1 ~]$ kfod disk=all
--------------------------------------------------------------------------------
Disk Size Path User Group
================================================================================
1: 15358 Mb /dev/sdb1 oracle oinstall
2: 15358 Mb /dev/sdc1 oracle oinstall
3: 15358 Mb /dev/sdd1 oracle oinstall
4: 40954 Mb /dev/sde1 oracle oinstall
--------------------------------------------------------------------------------
ORACLE_SID ORACLE_HOME
================================================================================
+ASM1 /u01/app/11.2.0/grid

To view the disk groups in your Oracle 11g ASM configuration, you can issue the kfod op=groups command as shown here:

[oracle@raclinux1 ~]$ kfod op=groups
--------------------------------------------------------------------------------
Group Size
================================================================================
1: 40954 Mb 30440 Mb EXTERN FLASH
2: 46074 Mb 42080 Mb NORMAL DATA
[oracle@raclinux1 ~]$

In case of Oracle 11g, you can examine the instance configuration for Oacle RAC and ASM by using the kfod op=inst command.

[oracle@raclinux1 ~]$ kfod op=insts
--------------------------------------------------------------------------------
ORACLE_SID ORACLE_HOME
================================================================================
+ASM1 /u01/app/11.2.0/grid
[oracle@raclinux1 ~]$
[oracle@raclinux1 ~]$ kfod op=version
--------------------------------------------------------------------------------
ORACLE_SID RAC VERSION
================================================================================
+ASM1 YES 11.2.0.1.0

The KFOD command may also be used to display a list of current Oracle RAC and ASM instances by using the kfod op=clients command.

[oracle@raclinux1 ~]$ kfod op=clients
--------------------------------------------------------------------------------
ORACLE_SID VERSION
================================================================================
RACDB_1 11.2.0.1.0
+ASM1 11.2.0.1.0
+ASM1 11.2.0.1.0
RACDB_1 11.2.0.1.0

Another useful function with the KFOD utility is to monitor the Oracle RAC configuration, during a rolling migration procedure, with the help of the kfod op=rm command.

[oracle@raclinux1 ~]$ kfod op=rm
--------------------------------------------------------------------------------
Rolling Migration State
================================================================================
Inactive

Once the rolling migration and upgrade is complete, you can use the kfod op=rmvers command to verify that it has completed successfully and that the versions have been upgraded to the new release.

[oracle@raclinux1 ~]$ kfod op=rmvers
--------------------------------------------------------------------------------
Rolling Migration Compatible Versions
================================================================================
11.1.0.6.0
11.1.0.7.0
[oracle@raclinux1 ~]$

As you can see, the kfod utility is a powerful tool to keep in your suite of Oracle RAC and ASM toolkit!