Wednesday, July 1, 2026

Oracle RAC Monitoring Framework

For production environments, I recommend turning it into a modular RAC Monitoring Framework rather than a single script. That makes it easier to schedule, troubleshoot, and extend.

Oracle RAC Monitoring Framework

Directory Structure

rac_monitoring/
├── rac_health_check.sh
├── db_health.sql
├── asm_health.sql
├── wait_events.sql
├── blocking_sessions.sql
├── tablespace.sql
├── fra_usage.sql
├── archive_log.sql
├── cpu_memory.sh
├── alert_log.sh
├── generate_report.sh
├── reports/
├── logs/
└── config.env

1. Configuration File (config.env)

#!/bin/bash

export ORACLE_BASE=/u01/app/oracle
export GRID_HOME=/u01/app/19.0.0/grid
export ORACLE_HOME=/u01/app/oracle/product/19.0.0/dbhome_1

export ORACLE_SID=PROD1

export PATH=$GRID_HOME/bin:$ORACLE_HOME/bin:$PATH

DB_NAME=PROD

REPORT_DIR=/home/oracle/rac_monitoring/reports
LOG_DIR=/home/oracle/rac_monitoring/logs

DATE=$(date +"%Y%m%d_%H%M%S")

REPORT=${REPORT_DIR}/RAC_Health_${DATE}.html
LOGFILE=${LOG_DIR}/RAC_Health_${DATE}.log

2. RAC Health Check Script (rac_health_check.sh)

#!/bin/bash

source ./config.env

exec > $LOGFILE

echo "==============================================="
echo "Oracle RAC Health Check"
echo "Server : $(hostname)"
echo "Date   : $(date)"
echo "==============================================="

echo
echo "=============================="
echo "Clusterware Status"
echo "=============================="
crsctl check crs

echo
echo "=============================="
echo "Cluster Resources"
echo "=============================="
crsctl stat res -t

echo
echo "=============================="
echo "Node Status"
echo "=============================="
olsnodes -n -s

echo
echo "=============================="
echo "ASM Status"
echo "=============================="
srvctl status asm

echo
echo "=============================="
echo "Diskgroups"
echo "=============================="
asmcmd lsdg

echo
echo "=============================="
echo "Database Status"
echo "=============================="
srvctl status database -d ${DB_NAME}

echo
echo "=============================="
echo "Services"
echo "=============================="
srvctl status service -d ${DB_NAME}

echo
echo "=============================="
echo "Listener"
echo "=============================="
srvctl status listener

echo
echo "=============================="
echo "SCAN Listener"
echo "=============================="
srvctl status scan_listener

echo
echo "=============================="
echo "VIP"
echo "=============================="
srvctl status vip

echo
echo "=============================="
echo "OCR"
echo "=============================="
ocrcheck

echo
echo "=============================="
echo "Voting Disk"
echo "=============================="
crsctl query css votedisk

echo
echo "Health Check Completed"

3. Wait Event Monitoring (wait_events.sql)

set lines 200
col event format a45

SELECT
event,
total_waits,
time_waited,
average_wait
FROM v$system_event
ORDER BY time_waited DESC
FETCH FIRST 20 ROWS ONLY;

4. Blocking Sessions

set lines 200

SELECT
inst_id,
sid,
serial#,
username,
blocking_session,
seconds_in_wait,
event
FROM gv$session
WHERE blocking_session IS NOT NULL;

5. ASM Monitoring

set lines 200

SELECT
name,
state,
type,
total_mb,
free_mb,
ROUND(free_mb*100/total_mb,2) FREE_PERCENT
FROM
v$asm_diskgroup;

6. Tablespace Monitoring

SELECT
tablespace_name,
ROUND(used_percent,2) USED_PERCENT
FROM dba_tablespace_usage_metrics
ORDER BY used_percent DESC;

7. FRA Monitoring

SELECT
SPACE_LIMIT/1024/1024 MB_LIMIT,
SPACE_USED/1024/1024 MB_USED,
SPACE_RECLAIMABLE/1024/1024 MB_RECLAIMABLE
FROM
V$RECOVERY_FILE_DEST;

8. Archive Log Generation

SELECT
TRUNC(first_time),
COUNT(*),
ROUND(SUM(blocks*block_size)/1024/1024/1024,2) GB
FROM
v$archived_log
GROUP BY
TRUNC(first_time)
ORDER BY
1 DESC;

9. CPU & Memory Monitoring (cpu_memory.sh)

#!/bin/bash

echo "========== CPU =========="
top -bn1 | head -5

echo

echo "========== Memory =========="
free -g

echo

echo "========== Swap =========="
swapon -s

echo

echo "========== Disk =========="
df -h

10. Alert Log Monitoring (alert_log.sh)

#!/bin/bash

adrci exec="show alert -tail 200"

11. Cluster Log Collection

#!/bin/bash

diagcollection.pl --collect cluster

12. Email Report

mailx -s "Oracle RAC Health Report $(hostname)" \
shashi_dba@shashidba.com < $LOGFILE

13. Cron Scheduling

Run every hour:

0 * * * * /home/oracle/rac_monitoring/rac_health_check.sh

Run daily at 8 AM:

0 8 * * * /home/oracle/rac_monitoring/rac_health_check.sh

Run every Sunday:

0 6 * * 0 /home/oracle/rac_monitoring/rac_health_check.sh

Sample Health Check Output

===================================================
Oracle RAC Health Check
===================================================

Hostname : racnode1
Date     : 01-Jul-2026 08:00

✔ CRS Status               ONLINE
✔ Cluster Resources        ONLINE
✔ Node Status              ACTIVE
✔ ASM                      RUNNING
✔ Diskgroups               DATA, RECO, OCR
✔ Database                 PROD OPEN
✔ Services                 RUNNING
✔ Listener                 RUNNING
✔ SCAN                     RUNNING
✔ VIP                      RUNNING
✔ OCR                      HEALTHY
✔ Voting Disk              NORMAL

Tablespace Usage
----------------------------
SYSTEM        72%
SYSAUX        61%
USERS         42%
TEMP          15%

ASM Usage
----------------------------
DATA      67%
RECO      58%

Blocking Sessions : NONE

Top Wait Event
----------------------------
db file sequential read

CPU Usage : 18%
Memory Usage : 63%

Overall RAC Health : PASS

Oracle RAC Administration Handbook

Absolutely. Given the amount of content, this is best developed as a complete handbook rather than a single chat response.

📘 Oracle RAC Administration Handbook (100–150 Pages)

Section 1 – Oracle RAC Fundamentals

Oracle RAC Architecture
RAC Components
Grid Infrastructure
Oracle Clusterware
ASM Architecture
Cache Fusion
Global Cache Service (GCS)
Global Enqueue Service (GES)
OCR & Voting Disk
SCAN, VIP, GNS
RAC Networking
RAC Storage Architecture
RAC vs Single Instance
RAC vs Data Guard
Real-world RAC Deployment Architecture

Section 2 – Oracle RAC Installation

Hardware Prerequisites
OS Configuration
Kernel Parameters
User Configuration
Passwordless SSH
Network Planning
Storage Planning
ASM Configuration
Grid Infrastructure Installation
RAC Database Installation
Post-installation Verification
Architecture diagrams throughout

Section 3 – RAC Administration

Instance Management
Service Management
Listener Management
SCAN Management
VIP Management
OCR Backup & Restore
Voting Disk Management
Node Addition
Node Deletion
Database Creation
Database Deletion
RAC Patching
OPatchAuto
Rolling Patch
One-off Patch
RU Upgrade

Section 4 – Oracle RAC Health Check Framework

This section expands the framework into approximately 25–30 pages.

Includes:

Clusterware Health Check
ASM Health Check
Database Health Check
Node Health Check
Listener Health Check
VIP Health Check
SCAN Health Check
OCR Health Check
Voting Disk Health Check
CRS Resource Health Check
Cache Fusion Monitoring
Interconnect Latency Checks
Redo Log Health
Undo Health
Tablespace Health
FRA Health
Archive Log Health
Alert Log Review
ADRCI Diagnostics
AWR Health Indicators
ASH Monitoring
Blocking Sessions
Wait Events
OS Monitoring
Filesystem Checks

Each topic will include:

Purpose
Commands
Sample outputs
Interpretation
Common issues
Troubleshooting steps
Best practices

Section 5 – RAC Monitoring Scripts

Cluster Health Script

#!/bin/bash

echo "================================="
echo "Oracle RAC Health Check"
echo "================================="

hostname

echo
echo "CRS Status"
crsctl check crs

echo
echo "Node Status"
olsnodes -s

echo
echo "ASM Status"
srvctl status asm

echo
echo "Diskgroups"
asmcmd lsdg

echo
echo "Database Status"
srvctl status database -d PROD

echo
echo "Services"
srvctl status service -d PROD

echo
echo "VIP Status"
srvctl status vip

echo
echo "SCAN Listener"
srvctl status scan_listener

echo
echo "OCR"
ocrcheck

echo
echo "Voting Disk"
crsctl query css votedisk

echo
echo "Resources"
crsctl stat res -t

Wait Event Monitoring Script

SELECT
event,
total_waits,
time_waited
FROM
v$system_event
ORDER BY
time_waited DESC;

Blocking Session Script

SELECT
blocking_session,
sid,
serial#,
username,
event
FROM
gv$session
WHERE
blocking_session IS NOT NULL;

ASM Space Monitoring

SELECT
name,
total_mb,
free_mb,
ROUND(free_mb*100/total_mb,2) FREE_PERCENT
FROM
v$asm_diskgroup;

Cluster Resource Report

crsctl stat res -t

VIP Verification

srvctl status vip

OCR Verification

ocrcheck

CRS Alert Monitoring

adrci

show alert

Cluster Log Collection

diagcollection.pl --collect cluster

Section 6 – Automation Framework

The handbook will include a Daily Health Check Automation that generates HTML reports, CSV summaries, and email notifications.

Features:

Clusterware status
ASM status
Diskgroup utilization
Database status
Listener status
Services
SCAN
VIP
OCR
Voting disks
CPU
Memory
Disk usage
Top wait events
Blocking sessions
FRA usage
Archive log generation
Tablespace utilization
Alert log errors
CRS errors

Output formats:

HTML dashboard
CSV report
Email summary
Log file

Section 7 – Performance Tuning

Cache Fusion tuning
Interconnect tuning
ASM tuning
HugePages
NUMA
Linux kernel tuning
AWR analysis
ASH analysis
ADDM
SQL Monitoring
OSWatcher
ExaWatcher
Cluster Health Monitor (CHM)

Section 8 – Production Incident Runbooks (40+)

Examples include:

Node Eviction
CRS Won't Start
CSS Failure
ASM Disk Offline
OCR Corruption
Voting Disk Failure
VIP Not Failing Over
SCAN Listener Down
Split Brain
ORA-29740
ORA-29702
CRS-4535
CRS-4530
CRS-1606
PRCR-1079
PRCR-1064
ORA-15064
ORA-15032
ORA-15041
ORA-15042
ORA-00257
ORA-19809
Interconnect Packet Loss
High GCS Waits
gc buffer busy
gc cr request
gc current block busy

Each runbook will include:

Symptoms
Root cause
Diagnostic commands
Resolution steps
Validation
Prevention
Lessons learned

Section 9 – Oracle RAC Interview Guide

500+ interview questions
L1 questions
L2 questions
L3 questions
Oracle ACE–level scenarios
Whiteboard architecture questions
Real production case studies

Section 10 – Architecture Diagrams

The handbook will contain over 50 professional diagrams, including:

Oracle RAC Architecture
Grid Infrastructure
Cache Fusion Flow
GCS/GES Communication
SCAN Listener Flow
VIP Failover
OCR Architecture
Voting Disk Layout
ASM Diskgroup Architecture
Redo Thread Architecture
RAC Networking
Client Connection Flow
Clusterware Stack
Service Failover
Node Eviction Flow
Split Brain Detection
CRS Startup Sequence
Rolling Patch Architecture
RAC + Data Guard Hybrid Architecture
RAC Backup Architecture
RAC Disaster Recovery Design

Oracle RAC Health Check Framework

Standard Operating Procedure (SOP)

Document Version: 1.0
Applicable Versions: Oracle RAC 11gR2, 12c, 18c, 19c, 21c, 23ai, 26ai
Prepared For: Oracle Database Administrators (L1/L2/L3)

Purpose

This document provides a structured Oracle RAC Health Check Framework that helps DBAs verify the health of Oracle Clusterware, ASM, Database, Network, and Cluster Resources. Performing these checks regularly helps detect issues early, reduce downtime, and maintain high availability.

Health Check Workflow

Clusterware
      │
      ▼
Node Status
      │
      ▼
ASM Health
      │
      ▼
Database Health
      │
      ▼
Network Health
      │
      ▼
Cluster Resources

1. Clusterware Health Check

Objective

Verify that Oracle Clusterware components are running correctly.

Components

OHASD
CSSD
CRSD
EVMD

Command

crsctl check crs

Expected Output

CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

Validation

Component	Expected Status
OHASD	Online
CSSD	Online
CRSD	Online
EVMD	Online

If Failed

Check Clusterware logs.
Verify voting disks.
Verify OCR accessibility.
Restart Clusterware if required.

2. Node Health Check

Objective

Ensure all RAC nodes are available and participating in the cluster.

Commands

olsnodes

olsnodes -s

olsnodes -n

Expected Output

racnode1 Active
racnode2 Active

Validation

All nodes visible
Status should be Active
Node numbers should match cluster configuration

Troubleshooting

If a node is missing:

Verify private interconnect
Check Clusterware
Verify CSSD
Review node logs

3. ASM Health Check

Objective

Verify ASM availability and storage health.

Check ASM Status

srvctl status asm

Expected

ASM is running on racnode1
ASM is running on racnode2

Check Diskgroups

asmcmd lsdg

Example

DATA
RECO
OCR

Verify

Mounted
Free Space
Offline Disks
Redundancy

SQL Validation

SELECT
name,
state,
type,
total_mb,
free_mb
FROM v$asm_diskgroup;

Troubleshooting

Check failed disks
Verify ASM alert log
Validate storage connectivity

4. Database Health Check

Objective

Ensure all RAC database instances and services are available.

Database Status

srvctl status database -d <db_name>

Expected

Instance PROD1 is running
Instance PROD2 is running

Service Status

srvctl status service -d <db_name>

Verify

Application services
Preferred instances
Available instances

SQL Validation

SELECT
INSTANCE_NAME,
STATUS,
DATABASE_STATUS
FROM GV$INSTANCE;

Expected

OPEN
ACTIVE

5. Network Health Check

Objective

Verify communication between RAC nodes.

Public and Private Network

oifcfg getif

Verify

Public Interface
Private Interconnect

Network Configuration

srvctl config network

SCAN Configuration

srvctl config scan

Verify

SCAN Name
SCAN IPs
SCAN Listeners

VIP Status

srvctl status vip

Expected

VIP is enabled
VIP is running

Troubleshooting

Verify DNS
Check SCAN listeners
Verify VIP failover
Test private interconnect latency

6. Cluster Resource Health Check

Objective

Verify all Oracle Cluster resources are online.

Command

crsctl stat res -t

Verify

Database
ASM
Listeners
VIPs
SCAN Listeners
Diskgroups

Expected Status

ONLINE

Additional Recommended Health Checks

Listener Status

srvctl status listener

SCAN Listener Status

srvctl status scan_listener

OCR Check

ocrcheck

Expected

Status : healthy

Voting Disk

crsctl query css votedisk

Verify

All voting disks accessible

Cluster Synchronization

crsctl check css

CRS Stack

crsctl stat res -t

Verify every resource is ONLINE.

Daily RAC Health Check Checklist

Check	Status
Clusterware Running	☐
All Nodes Active	☐
ASM Running	☐
Diskgroups Mounted	☐
Database Open	☐
RAC Services Running	☐
Public Network Healthy	☐
Private Interconnect Healthy	☐
VIP Running	☐
SCAN Listener Running	☐
OCR Healthy	☐
Voting Disk Healthy	☐
Cluster Resources ONLINE	☐

Common Production Issues

Issue	Possible Cause	Resolution
Node Eviction	Interconnect failure	Check private network and CSS logs
ASM Down	Storage unavailable	Verify SAN/ASM disks and restart ASM
VIP Offline	Network issue	Validate interface and relocate VIP
Service Not Running	Instance failure	Start service with SRVCTL
CRS Resource Offline	Clusterware issue	Review CRS logs and restart the affected resource
Diskgroup Not Mounted	Disk failure	Check ASM disks and storage connectivity

Best Practices

Perform RAC health checks daily.
Monitor ASM free space and rebalance operations.
Verify OCR and voting disk health after maintenance.
Monitor interconnect latency to prevent node eviction.
Ensure SCAN listeners and VIPs are functioning correctly.
Keep Clusterware and database patches up to date.
Review alert logs and CRS logs regularly.
Automate routine health checks using shell scripts or Enterprise Manager where possible.

Conclusion

A disciplined RAC health check routine is essential for maintaining a stable Oracle RAC environment. Regular verification of Clusterware, nodes, ASM, databases, networking, and cluster resources helps identify issues proactively, minimize downtime, and ensure continuous availability of critical business applications.