Oracle 11g R1/R2 Real Application Clusters Essentials

Oracle 11g R1/R2 Real Application Clusters Essentials

Overview of this book

Oracle RAC or Real Application Clusters is a grid computing solution that allows multiple nodes (servers) in a clustered system to mount and open a single database that resides on shared disk storage. Should a single system (node) fail, the database service will still be available on the remaining nodes. Oracle RAC is an integral part of the Oracle database setup. You have one database with multiple users accessing it, in real time. This book will enable DBAs to get their finger on the pulse of the Oracle 11g RAC environment quickly and easily.This book will cover all areas of the Oracle RAC environment and is indispensable if you are an Oracle DBA who is charged with configuring and implementing Oracle11g R1, with bonus R2 information included. This book presents a complete method for the configuration, installation, and design of Oracle 11g RAC, ultimately enabling rapid administration of Oracle 11g RAC environments.This practical handbook documents how to administer a complex Oracle 11g RAC environment. Packed with real world examples, expert tips and troubleshooting advice, the book begins by introducing the concept of Oracle RAC and High Availability. It then dives deep into the world of RAC configuration, installation and design, enabling you to support complex RAC environments for real world deployments. Chapters cover Oracle RAC and High Availability, Oracle 11g RAC Architecture, Oracle 11g RAC Installation, Automatic Storage Management, Troubleshooting, Workload Management and much more. By following the practical examples in this book, you will learn every concept of the RAC environment and how to successfully support complex Oracle 11g R1 and R2 RAC environments for various deployments within real world situations. This book is the updated release of our previous Oracle 11g R1/R2 Real Application Clusters Handbook. If you already own a copy of that Handbook, there is no need to upgrade to this book.

Oracle 11g R1/R2 Real Application Clusters Essentials

Credits

About the Authors

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

High Availability

High availability concepts

Fault-tolerant systems and high availability

High availability solutions for Oracle

Summary

Oracle 11g RAC Architecture

Oracle 11g RAC architecture

Hardware architecture for Oracle 11g RAC

Network architecture for Oracle 11g RAC

Storage architecture for Oracle 11g RAC

Storage protocols for RAC

Oracle 11g RAC components

New ASM features and RAC

New Oracle 11g ASM Disk Group compatibility features

Summary

Clusterware Installation

Preparing for a cluster installation

Oracle 11g R1 Clusterware installation

Oracle 11g R2 Clusterware installation

Removing/Reconfiguring a Grid Infrastructure configuration

Summary

Automatic Storage Management

Overview of Automatic Storage Management (ASM)

ASM instance configuration and management

Overview of ASMCMD

ASM 11g R1 new features

ASM 11g R2 new features

ASM backup strategies

Summary

Managing and Troubleshooting Oracle 11g Clusterware

Oracle 11g RAC Clusterware administration

Managing Oracle 11g Clusterware utilities

Troubleshooting Oracle 11g Clusterware

New features in Oracle 11g R2 Clusterware

Summary

RAC Database Administration and Workload Management

RAC database configuration and creation

What's new in Oracle 11g R1 and R2 databases?

RAC database administration

Automatic Workload Management

What's new in Oracle 11g services' behavior?

Summary

Backup and Recovery

An overview of backup and recovery

An overview of Recovery Manager (RMAN)

Backup types and methods

RMAN new features in 11g R1 and 11g R2

RMAN best practices for RAC

OCR and Voting disk backup and recovery strategies

Summary

Performance Tuning

Tuning differences: single instance versus RAC

New Oracle 11g performance tuning features

New performance features in Oracle 11gR2

Analyzing the Cache Fusion impact on RAC performance

Monitoring RAC cluster interconnect performance

Oracle cluster interconnects

Monitoring RAC wait events

Summary

Oracle 11g Clusterware Upgrade

Overview of an upgrade

Upgrading Oracle 10g R2 Clusterware to Oracle 11g R1

Upgrading to Oracle 11g R2 Clusterware

Downgrading Oracle Clusterware after an upgrade

Summary

Real-world Scenarios

Adding a new node to an existing cluster

Removing a node from the cluster

Adding an RAC database instance

Deleting an RAC database instance

Converting a single-instance database to an RAC database

Relocating an RAC database and instances across nodes

Summary

Enabling RAC for EBS

EBS architecture

Oracle 11g RAC suitability

Installing EBS 12.1.1

EBS implementation on Oracle 11g RAC

RAC-enabling EBS 12.1.1

Establishing applications environment for Oracle RAC

Setting up load balancing

Configuring Parallel Concurrent Processing

Cloning EBS concepts in brief

Summary

Maximum Availability

Oracle 11g Streams for RAC

Best practices for Streams in an RAC environment

New features for Streams in Oracle 11g R2

Oracle 11g Data Guard and RAC

New features for Data Guard in Oracle 11g R2

Summary

Additional Resources and Tools for the Oracle RAC Professional

Sample configurations

Oracle RAC commands and tips

Operating system-level commands for tuning and diagnosis

Additional references and tips

Clusterware startup sequence for Oracle 11g R2

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Fault-tolerant systems and high availability

Fault tolerance is data center technology that enables a system to continue to function correctly in the face of a failure with one or more faults within any given key component of the system architecture or data center. If operating quality experiences major degradation, the decrease in functionality of the environment is usually in direct proportion to the severity of the failure, whereas a poorly designed system will completely fail and breakdown with a small failure. In other words, fault tolerance gives you that added layer of protection and support to avoid a total meltdown of your mission-critical data center and, in our case, Oracle servers and database systems. Fault tolerance is often associated with highly available systems such as those found with Oracle Data Guard and Oracle RAC technologies.

Data formats may also be designed to degrade gracefully. For example, in the case of Oracle RAC environments, services provide for load balancing to minimize performance issues in the event that one or more nodes in the cluster are lost due to an unforeseen event.

Recovery from errors in fault-tolerant systems provides for either rollforward or rollback operations. For instance, whenever the Oracle server detects that it has an error condition and cannot find data from a missed transaction, rollback will occur either at the instance level or application level (a transaction must be atomic in that all elements must commit or rollback). Oracle takes the system state at that time and rolls back transactional changes to be able to move forward. Whenever a rollback is required for a transaction within Oracle, Oracle reverts the system state to some earlier correct version—for example, using the database checkpoint and rollback process inherent in the Oracle database engine and moving forward from there.

Rollback recovery requires that the operations between the checkpoint (implicit checkpoints are NEVER required for transactional recovery) and the detected erroneous state can be made to be transparent. Some systems make use of both rollforward and rollback recovery for different errors or different parts of one error.

For Oracle, database recovery always rolls back failed transactions and restores the state of the rollback or undo, from which it then rolls forward using the contents of the rollback or undo segments. However, when it comes to transactional-based recovery, Oracle only rolls back. Within the scope of an individual system, fault tolerance can be achieved by anticipating exceptional conditions and building the system to cope with them, and in general, aiming for self-healing so that the system converges towards an error-free state. In any case, if the consequence of a system failure is catastrophic, the system must be able to use reversion to fall back to a safe mode. This is similar to rollback recovery but can be a human action if humans are present in the loop.

Requirements for implementing fault tolerance

The basic characteristics of fault tolerance are:

No single point of failure
No single point of repair
Fault isolation to the failing component
Fault containment to prevent propagation of the failure
Availability of reversion modes

In addition, fault-tolerant systems are characterized in terms of both planned and unplanned service outages. These are usually measured at the application level and not just at a hardware level. The figure of merit is called availability and is expressed as a percentage. For instance, a five nine system would therefore statistically provide 99.999% availability. Fault-tolerant systems are typically based on the concept of redundancy. In theory, this would be ideal; however, in reality this is an elusive impractical goal. Due to the time required to fail over, reestablish middle-tier connections, and perform application restarts, it is not realistic to have complete availability. We can obtain four nines as the best goal for high availability with Oracle systems. For Oracle RAC, you can deploy a fault-tolerant environment by using multiple network interface cards, dual Host Bus Adapters (HBAs), and multiple switches to avoid any Single Point of Failure.

Fault tolerance and replication

By using spare components, we address the first fundamental characteristic of fault tolerance in the following two ways:

Replication: This provides multiple identical instances of the same system or subsystem by directing tasks or requests to all of them simultaneously. Oracle Streams and Oracle GoldenGate, as well as third-party solutions such as Quest Shareplex, are replication technologies.
Redundancy: This provides you with multiple identical instances of the same system and switching to one of the remaining instances in case of a failure. This switchover and failover process is available with standby database technology with Oracle Data Guard. Oracle RAC also provides node/server failover capability with the use of services by using Fast Connection Failover (FCF) and with Fast Application Notification (FAN).

At the storage layer, the major implementations of RAID (Redundant Array of Independent Disks) with the exception of disk striping (RAID 0) provide you with fault-tolerant appliances that also use data redundancy.

Bringing the replications into synchrony requires making their internal stored states the same. They can be started from a fixed initial state such as the reset state. Alternatively, the internal state of one replica can be copied to another replica.

One variant of Data Mirror Replication (DMR) is pair-and-spare. Two replicated elements operate in lockstep as a pair, with a voting circuit that detects any mismatch between their operations and outputs a signal indicating that there is an error. Another pair operates exactly the same way. A final circuit selects the output of the pair that does not proclaim that it is in error. Pair-and-spare requires four replicas rather than the three of DMR, but has been used commercially.

If a system experiences a failure, it must continue to operate without interruption during the repair process.

When a failure occurs, the system must be able to isolate the failure to the offending component. This requires the addition of dedicated failure-detection mechanisms that exist only for the purpose of fault isolation.

Recovery from a fault condition requires classifying the fault or failing component. The National Institute of Standards and Technology (NIST) categorizes faults based on locality, cause, duration, and effect.

Oracle 11g R1/R2 Real Application Clusters Essentials

Oracle 11g R1/R2 Real Application Clusters Essentials

Overview of this book

Related Content you might be interested in

Current Title:

Oracle 11g R1/R2 Real Application Clusters Essentials

Fault-tolerant systems and high availability

Requirements for implementing fault tolerance

Fault tolerance and replication