Wednesday, July 1, 2026

Oracle RAC Health Check Framework

 

Standard Operating Procedure (SOP)

Document Version: 1.0
Applicable Versions: Oracle RAC 11gR2, 12c, 18c, 19c, 21c, 23ai, 26ai
Prepared For: Oracle Database Administrators (L1/L2/L3)


Purpose

This document provides a structured Oracle RAC Health Check Framework that helps DBAs verify the health of Oracle Clusterware, ASM, Database, Network, and Cluster Resources. Performing these checks regularly helps detect issues early, reduce downtime, and maintain high availability.


Health Check Workflow

Clusterware
      │
      ▼
Node Status
      │
      ▼
ASM Health
      │
      ▼
Database Health
      │
      ▼
Network Health
      │
      ▼
Cluster Resources

1. Clusterware Health Check

Objective

Verify that Oracle Clusterware components are running correctly.

Components

  • OHASD

  • CSSD

  • CRSD

  • EVMD

Command

crsctl check crs

Expected Output

CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

Validation

ComponentExpected Status
OHASDOnline
CSSDOnline
CRSDOnline
EVMDOnline

If Failed

  • Check Clusterware logs.

  • Verify voting disks.

  • Verify OCR accessibility.

  • Restart Clusterware if required.


2. Node Health Check

Objective

Ensure all RAC nodes are available and participating in the cluster.

Commands

olsnodes
olsnodes -s
olsnodes -n

Expected Output

racnode1 Active
racnode2 Active

Validation

  • All nodes visible

  • Status should be Active

  • Node numbers should match cluster configuration

Troubleshooting

If a node is missing:

  • Verify private interconnect

  • Check Clusterware

  • Verify CSSD

  • Review node logs


3. ASM Health Check

Objective

Verify ASM availability and storage health.

Check ASM Status

srvctl status asm

Expected

ASM is running on racnode1
ASM is running on racnode2

Check Diskgroups

asmcmd lsdg

Example

DATA
RECO
OCR

Verify

  • Mounted

  • Free Space

  • Offline Disks

  • Redundancy


SQL Validation

SELECT
name,
state,
type,
total_mb,
free_mb
FROM v$asm_diskgroup;

Troubleshooting

  • Check failed disks

  • Verify ASM alert log

  • Validate storage connectivity


4. Database Health Check

Objective

Ensure all RAC database instances and services are available.

Database Status

srvctl status database -d <db_name>

Expected

Instance PROD1 is running
Instance PROD2 is running

Service Status

srvctl status service -d <db_name>

Verify

  • Application services

  • Preferred instances

  • Available instances


SQL Validation

SELECT
INSTANCE_NAME,
STATUS,
DATABASE_STATUS
FROM GV$INSTANCE;

Expected

OPEN
ACTIVE

5. Network Health Check

Objective

Verify communication between RAC nodes.


Public and Private Network

oifcfg getif

Verify

  • Public Interface

  • Private Interconnect


Network Configuration

srvctl config network

SCAN Configuration

srvctl config scan

Verify

  • SCAN Name

  • SCAN IPs

  • SCAN Listeners


VIP Status

srvctl status vip

Expected

VIP is enabled
VIP is running

Troubleshooting

  • Verify DNS

  • Check SCAN listeners

  • Verify VIP failover

  • Test private interconnect latency


6. Cluster Resource Health Check

Objective

Verify all Oracle Cluster resources are online.

Command

crsctl stat res -t

Verify

  • Database

  • ASM

  • Listeners

  • VIPs

  • SCAN Listeners

  • Diskgroups

Expected Status

ONLINE

Additional Recommended Health Checks

Listener Status

srvctl status listener

SCAN Listener Status

srvctl status scan_listener

OCR Check

ocrcheck

Expected

Status : healthy

Voting Disk

crsctl query css votedisk

Verify

  • All voting disks accessible


Cluster Synchronization

crsctl check css

CRS Stack

crsctl stat res -t

Verify every resource is ONLINE.


Daily RAC Health Check Checklist

CheckStatus
Clusterware Running
All Nodes Active
ASM Running
Diskgroups Mounted
Database Open
RAC Services Running
Public Network Healthy
Private Interconnect Healthy
VIP Running
SCAN Listener Running
OCR Healthy
Voting Disk Healthy
Cluster Resources ONLINE

Common Production Issues

IssuePossible CauseResolution
Node EvictionInterconnect failureCheck private network and CSS logs
ASM DownStorage unavailableVerify SAN/ASM disks and restart ASM
VIP OfflineNetwork issueValidate interface and relocate VIP
Service Not RunningInstance failureStart service with SRVCTL
CRS Resource OfflineClusterware issueReview CRS logs and restart the affected resource
Diskgroup Not MountedDisk failureCheck ASM disks and storage connectivity

Best Practices

  • Perform RAC health checks daily.

  • Monitor ASM free space and rebalance operations.

  • Verify OCR and voting disk health after maintenance.

  • Monitor interconnect latency to prevent node eviction.

  • Ensure SCAN listeners and VIPs are functioning correctly.

  • Keep Clusterware and database patches up to date.

  • Review alert logs and CRS logs regularly.

  • Automate routine health checks using shell scripts or Enterprise Manager where possible.


Conclusion

A disciplined RAC health check routine is essential for maintaining a stable Oracle RAC environment. Regular verification of Clusterware, nodes, ASM, databases, networking, and cluster resources helps identify issues proactively, minimize downtime, and ensure continuous availability of critical business applications.

No comments:

Post a Comment