RAC CONCEPT (SERVICES AND COMPONENTS)

Oracle Clusterware processes for 10g

Cluster Synchronization Services (ocssd) — Manages cluster node membership and runs as the oracle user; failure of this process results in cluster restart.

Cluster Ready Services (crsd) — The crs process manages cluster resources (which could be a database, an instance, a service, a Listener, a virtual IP (VIP) address, an application process, and so on) based on the resource's configuration information that is stored in the OCR. This includes start, stop, monitor and failover operations. This process runs as the root user

Event manager daemon (evmd) —A background process that publishes events that crs creates.

Process Monitor Daemon (OPROCD) —This process monitor the cluster and provide I/O fencing. OPROCD performs its check, stops running, and if the wake up is beyond the expected time, then OPROCD resets the processor and reboots the node. An OPROCD failure results in Oracle Clusterware restarting the node. OPROCD uses the hangcheck timer on Linux platforms.

RACG (racgmain, racgimon) —Extends clusterware to support Oracle-specific requirements and complex resources. Runs server callout scripts when FAN events occur.

Oracle Clusterware Components

Voting Disk — Oracle RAC uses the voting disk to manage cluster membership by way of a health check and arbitrates cluster ownership among the instances in case of network failures. The voting disk must reside on shared disk.

Oracle Cluster Registry (OCR) — Maintains cluster configuration information as well as configuration information about any cluster database within the cluster. The OCR must reside on shared disk that is accessible by all of the nodes in your cluster

Oracle database background processes specific to RAC

•LMS—Global Cache Service Process

•LMD—Global Enqueue Service Daemon

•LMON—Global Enqueue Service Monitor

•LCK0—Instance Enqueue Process

* Lock monitor (LMON) process: The LMON process monitors all instances in a cluster to detect the failure of an instance. It then facilitates the recovery of the global locks held by the failed instance. It is also responsible for reconfiguring locks and other resources when instances leave or are added to the cluster (as they fail and come back online, or as new instances are added to the cluster in real time).

* Lock manager daemon (LMD) process: The LMD process handles lock manager service requests for the global cache service (keeping the block buffers consistent between instances). It works primarily as a broker sending requests for resources to a queue that is handled by the LMSn processes. The LMD handles global deadlock detection/resolution and monitors for lock timeouts in the global environment.

* Lock manager server (LMSn) process: In a RAC environment, each instance of Oracle is running on a different machine in a cluster, and they all access, in a read-write fashion, the same exact set of database files. To achieve this, the SGA block buffer caches must be kept consistent with respect to each other. This is one of the main goals of the LMSn process In earlier releases of Oracle Parallel Server (OPS) this was accomplished via a ping. That is, if a node in the cluster needed a read-consistent view of a block that was locked in exclusive mode by another node, the exchange of data was done via a disk flush (the block was pinged). This was a very expensive operation just to read data. Now, with the LMSn, this exchange is done via very fast cache-to-cache exchange over the clusters¿ high-speed connection. You may have up to ten LMSn processes per instance.
Its primary job is to transport blocks across the nodes for cache-fusion requests. If there is a consistent-read request, the LMS process rolls back the block, makes a Consistent-Read image of the block and then ship this block across the HSI (High Speed Interconnect) to the process requesting from a remote node.

* Lock (LCK0) process: This process is very similar in functionality to the LMD process described earlier, but it handles requests for all global resources other than database block buffers.

* Diagnosability daemon (DIAG) process: The DIAG process is used exclusively in a RAC environment. It is responsible for monitoring the overall ‘health’ of the instance, and it captures information needed in the processing of instance failures.

To ensure that each Oracle RAC database instance obtains the block that it needs to satisfy a query or transaction, Oracle RAC instances use two processes, the Global Cache Service (GCS) and the Global Enqueue Service (GES). The GCS and GES maintain records of the statuses of each data file and each cached block using a Global Resource Directory (GRD). The GRD contents are distributed across all of the active instances.

Private Interconnect

Clusterware uses the private interconnect for cluster synchronization (network heartbeat) and daemon communication between the the clustered nodes. This communication is based on the TCP protocol.
RAC uses the interconnect for cache fusion (UDP) and inter-process communication (TCP). Cache Fusion is the remote memory mapping of Oracle buffers, shared between the caches of participating nodes in the cluster. Virtual IP (VIP) in Oracle RAC

Without using VIPs or FAN, clients connected to a node that died will often wait for a TCP timeout period (which can be up to 10 min) before getting an error. As a result, you don't really have a good HA solution without using VIPs.

When a node fails, the VIP associated with it is automatically failed over to some other node and new node re-arps the world indicating a new MAC address for the IP. Subsequent packets sent to the VIP go to the new node, which will send error RST packets back to the clients. This results in the clients getting errors immediately.

Nodes are supported in a RAC Database

10g Release 2, support 100 nodes in a cluster using Oracle Clusterware, and 100 instances in a RAC database.

The following processes are unique to a RAC environment. You will not see them otherwise.

The additional RAC centric processes are DIAG, LCK, LMON, LMDn, and LMSn processes. We will give a brief description of each and discuss how they interact in a RAC environment next.

DIAG: This is a diagnostic daemon. It constantly monitors the health of the instances across the RAC and possible failures on the RAC. There is one per instance.

LCK: This lock process manages requests that are not cache-fusion requests. Requests like row cache requests and library cache requests. Only a single LCK process is allowed for each instance.

LMD: The Lock Manager Daemon. This is also sometimes referred to as the GES (Global Enqueue Service) daemon since its job is to manage the global enqueue and global resource access. It also detects deadlocks and monitors lock conversion timeouts.

LMON: The Lock Monitor Process. It is the GES monitor. It reconfigures the lock resources adding or removing nodes. LMON will generate a trace file every time a node reconfiguration takes place. It also monitors the RAC cluster wide and detects a node’s demise and trigger a quick reconfiguration.

LMS: This is the Lock Manager Server Process or the LMS process, sometimes also called the GCS (Global Cache Services) process. Its primary job is to transport blocks across the nodes for cache-fusion requests. If there is a consistent-read request, the LMS process rolls back the block, makes a Consistent-Read image of the block and then ship this block across the HSI (High Speed Interconnect) to the process requesting from a remote node. LMS must also check constantly with the LMD background process (or our GES process) to get the lock requests placed by the LMD process. Up to 10 such processes can be generated dynamically.

A Real Application Clusters database has the same processes as single-instance Oracle databases such as process monitor (PMON), database writer (DBWRn), log writer (LGWR), and so on. There are also additional Real Application Clusters-specific processes as shown in Figure 3-1. The exact names of these processes and the trace files that they create are platform-dependent.

Global Cache Service Processes (LMSn), where n ranges from 0 to 9 depending on the amount of messaging traffic, control the flow of messages to remote instances and manage global data block access. LMSn processes also transmit block images between the buffer caches of different instances. This processing is part of the Cache Fusion feature.
The Global Enqueue Service Monitor (LMON) monitors global enqueues and resources across the cluster and performs global enqueue recovery operations. Enqueues are shared memory structures that serialize row updates.
The Global Enqueue Service Daemon (LMD) manages global enqueue and global resource access. Within each instance, the LMD process manages incoming remote resource requests.
The Lock Process (LCK) manages non-Cache Fusion resource requests such as library and row cache requests.
The Diagnosability Daemon (DIAG) captures diagnostic data about process failures within instances. The operation of this daemon is automated and it updates an alert log file to record the activity that it performs.

The ONS Daemon Explained In RAC/CRS environment
=====================================

Purpose of the ons daemon

The Oracle Notification Service daemon is an daemon started by the CRS clusterware as part of the nodeapps. There is one ons daemon started per clustered node.

The Oracle Notification Service (ONS) daemon is an daemon started by the CRS clusterware as part of the nodeapps. There is one ons daemon started per clustered node.

The Oracle Notification Service daemon receive a subset of published clusterware events via the local evmd and racgimon clusterware daemons and forward those events to application subscribers and to the local listeners.

This in order to facilitate:

a. the FAN or Fast Application Notification feature or allowing applications to respond to database state changes.

b. the 10gR2 Load Balancing Advisory, the feature that permit load balancing accross different rac nodes dependent of the load on the different nodes. The rdbms MMON is creating an advisory for distribution of work every 30seconds and forward it via racgimon and ONS to listeners and applications.

Launching the ons daemon

ons daemon is started as part of the nodeapps in the $ORA_CRS_HOME environment with user oracle, i.e.

crs_stat -p ora.<hostname>.ons | grep ACTION_SCRIPT
ACTION_SCRIPT=/u01/app/oracle/product/crs/bin/racgwrap

crs_getperm ora.hostname.ons
Name: ora.hostname.ons
owner:oracle:rwx,pgrp:dba:r-x,other::r--,

The command used by the clusterware to start/stop/ping the ons is 'onsctl start', 'onsctl stop' and 'onsctl ping'.

It is possible to start/stop the ons daemon on one node via the clusterware commands:
crs_start ora.<hostname>.ons
crs_stop ora.<hostname>.ons
for debugging purposes.
The Global Services Daemon

The Global Services Daemon (GSD) runs on each node with one GSD process per node. The GSD coordinates with the cluster manager to receive requests from clients such as the DBCA, EM, and the SRVCTL utility to execute administrative job tasks such as instance startup or shutdown. The GSD is not an Oracle instance background process and is therefore not started with the Oracle instance.

Global Resource Directory with Distributed Architecture

The GCS and GES maintain a Global Resource Directory to record information about resources. The Global Resource Directory resides in memory, is distributed throughout the cluster, and is available to all active instances. In this distributed architecture, each node participates in the management of information in the directory. This distributed scheme provides fault tolerance and enhanced runtime performance.

The GCS and GES ensure the integrity of the Global Resource Directory even if multiple nodes fail. The shared database is always accessible if at least one instance is active after recovery is completed. The fault tolerance of the resource directory also enables Real Application Clusters instances to start and stop at any time, in any order.

What is GRD?

GRD stands for Global Resource Directory. The GES and GCS maintains records of the statuses of each datafile and each cahed block using global resource directory.This process is referred to as cache fusion and helps in data integrity.
Give Details on Cache Fusion:-

Oracle RAC is composed of two or more instances. When a block of data is read from datafile by an instance within the cluster and another instance is in need of the same block,it is easy to get the block image from the insatnce which has the block in its SGA rather than reading from the disk. To enable inter instance communication Oracle RAC makes use of interconnects. The Global Enqueue Service(GES) monitors and Instance enqueue process manages the cahce fusion. Give Details on

Components in RAC must reside in shared storage

All datafiles, controlfiles, SPFIles, redo log files must reside on cluster-aware shred storage.

Interconnect network
An interconnect network is a private network that connects all of the servers in a cluster. The interconnect network uses a switch/multiple switches that only the nodes in the cluster can access.
Cluster interconnect is used by the Cache fusion for inter instance communication.

FAN

Fast application Notification as it abbreviates to FAN relates to the events related to instances,services and nodes.This is a notification mechanism that Oracle RAc uses to notify other processes about the configuration and service level information that includes service status changes such as,UP or DOWN events.Applications can respond to FAN events and take immediate action.