Standard Operating Procedure (SOP)

Document Version: 1.0
Applicable Versions: Oracle RAC 11gR2, 12c, 18c, 19c, 21c, 23ai, 26ai
Prepared For: Oracle Database Administrators (L1/L2/L3)

Purpose

This document provides a structured Oracle RAC Health Check Framework that helps DBAs verify the health of Oracle Clusterware, ASM, Database, Network, and Cluster Resources. Performing these checks regularly helps detect issues early, reduce downtime, and maintain high availability.

Health Check Workflow

Clusterware
      │
      ▼
Node Status
      │
      ▼
ASM Health
      │
      ▼
Database Health
      │
      ▼
Network Health
      │
      ▼
Cluster Resources

1. Clusterware Health Check

Objective

Verify that Oracle Clusterware components are running correctly.

Components

OHASD
CSSD
CRSD
EVMD

Command

crsctl check crs

Expected Output

CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

Validation

Component	Expected Status
OHASD	Online
CSSD	Online
CRSD	Online
EVMD	Online

If Failed

Check Clusterware logs.
Verify voting disks.
Verify OCR accessibility.
Restart Clusterware if required.

2. Node Health Check

Objective

Ensure all RAC nodes are available and participating in the cluster.

Commands

olsnodes

olsnodes -s

olsnodes -n

Expected Output

racnode1 Active
racnode2 Active

Validation

All nodes visible
Status should be Active
Node numbers should match cluster configuration

Troubleshooting

If a node is missing:

Verify private interconnect
Check Clusterware
Verify CSSD
Review node logs

3. ASM Health Check

Objective

Verify ASM availability and storage health.

Check ASM Status

srvctl status asm

Expected

ASM is running on racnode1
ASM is running on racnode2

Check Diskgroups

asmcmd lsdg

Example

DATA
RECO
OCR

Verify

Mounted
Free Space
Offline Disks
Redundancy

SQL Validation

SELECT
name,
state,
type,
total_mb,
free_mb
FROM v$asm_diskgroup;

Troubleshooting

Check failed disks
Verify ASM alert log
Validate storage connectivity

4. Database Health Check

Objective

Ensure all RAC database instances and services are available.

Database Status

srvctl status database -d <db_name>

Expected

Instance PROD1 is running
Instance PROD2 is running

Service Status

srvctl status service -d <db_name>

Verify

Application services
Preferred instances
Available instances

SQL Validation

SELECT
INSTANCE_NAME,
STATUS,
DATABASE_STATUS
FROM GV$INSTANCE;

Expected

OPEN
ACTIVE

5. Network Health Check

Objective

Verify communication between RAC nodes.

Public and Private Network

oifcfg getif

Verify

Public Interface
Private Interconnect

Network Configuration

srvctl config network

SCAN Configuration

srvctl config scan

Verify

SCAN Name
SCAN IPs
SCAN Listeners

VIP Status

srvctl status vip

Expected

VIP is enabled
VIP is running

Troubleshooting

Verify DNS
Check SCAN listeners
Verify VIP failover
Test private interconnect latency

6. Cluster Resource Health Check

Objective

Verify all Oracle Cluster resources are online.

Command

crsctl stat res -t

Verify

Database
ASM
Listeners
VIPs
SCAN Listeners
Diskgroups

Expected Status

ONLINE

Additional Recommended Health Checks

Listener Status

srvctl status listener

SCAN Listener Status

srvctl status scan_listener

OCR Check

ocrcheck

Expected

Status : healthy

Voting Disk

crsctl query css votedisk

Verify

All voting disks accessible

Cluster Synchronization

crsctl check css

CRS Stack

crsctl stat res -t

Verify every resource is ONLINE.

Daily RAC Health Check Checklist

Check	Status
Clusterware Running	☐
All Nodes Active	☐
ASM Running	☐
Diskgroups Mounted	☐
Database Open	☐
RAC Services Running	☐
Public Network Healthy	☐
Private Interconnect Healthy	☐
VIP Running	☐
SCAN Listener Running	☐
OCR Healthy	☐
Voting Disk Healthy	☐
Cluster Resources ONLINE	☐

Common Production Issues

Issue	Possible Cause	Resolution
Node Eviction	Interconnect failure	Check private network and CSS logs
ASM Down	Storage unavailable	Verify SAN/ASM disks and restart ASM
VIP Offline	Network issue	Validate interface and relocate VIP
Service Not Running	Instance failure	Start service with SRVCTL
CRS Resource Offline	Clusterware issue	Review CRS logs and restart the affected resource
Diskgroup Not Mounted	Disk failure	Check ASM disks and storage connectivity

Best Practices

Perform RAC health checks daily.
Monitor ASM free space and rebalance operations.
Verify OCR and voting disk health after maintenance.
Monitor interconnect latency to prevent node eviction.
Ensure SCAN listeners and VIPs are functioning correctly.
Keep Clusterware and database patches up to date.
Review alert logs and CRS logs regularly.
Automate routine health checks using shell scripts or Enterprise Manager where possible.

Conclusion

A disciplined RAC health check routine is essential for maintaining a stable Oracle RAC environment. Regular verification of Clusterware, nodes, ASM, databases, networking, and cluster resources helps identify issues proactively, minimize downtime, and ensure continuous availability of critical business applications.

Wednesday, July 1, 2026

Oracle RAC Health Check Framework

Standard Operating Procedure (SOP)

Purpose

Health Check Workflow

1. Clusterware Health Check

Objective

Components

Command

Expected Output

Validation

If Failed

2. Node Health Check

Objective

Commands

Expected Output

Validation

Troubleshooting

3. ASM Health Check

Objective

Check ASM Status

Check Diskgroups

SQL Validation

Troubleshooting

4. Database Health Check

Objective

Database Status

Service Status

SQL Validation

5. Network Health Check

Objective

Public and Private Network

Network Configuration

SCAN Configuration

VIP Status

Troubleshooting

6. Cluster Resource Health Check

Objective

Command

Additional Recommended Health Checks

Listener Status

SCAN Listener Status

OCR Check

Voting Disk

Cluster Synchronization

CRS Stack

Daily RAC Health Check Checklist

Common Production Issues

Best Practices

Conclusion

No comments:

Post a Comment

Contact Form

Total Pageviews