Saturday, July 4, 2026

Oracle RAC SCAN Listener Architecture (How It Works)



1. Introduction

One of the biggest challenges in a clustered database environment is providing a single, stable connection point for client applications. Before Oracle Database 11g Release 2, clients had to know the hostname or IP address of every RAC node. Whenever a new node was added or removed, client connection strings often needed to be updated.

To solve this problem, Oracle introduced SCAN (Single Client Access Name).

SCAN provides a single hostname that clients use to connect to the RAC database. Oracle Clusterware and the SCAN Listeners automatically redirect client connections to the most appropriate RAC node, providing load balancing, high availability, and simplified administration.

Definition:
SCAN (Single Client Access Name) is a cluster-wide virtual hostname that provides a single entry point for all client connections to an Oracle RAC database.


2. Why SCAN is Required

Before SCAN (Oracle RAC 10g)

Clients needed to specify every node in the connection string.

Example:

Node-1 : racnode1.bsl.com
Node-2 : racnode2.bsl.com
Node-3 : racnode3.bsl.com

Connection String:

(DESCRIPTION=
 (ADDRESS=(HOST=racnode1))
 (ADDRESS=(HOST=racnode2))
 (ADDRESS=(HOST=racnode3))
)

Problems:

  • Complex client configuration

  • Difficult maintenance

  • Client updates required when adding/removing nodes

  • Harder to manage large clusters


After SCAN (Oracle 11gR2 and Later)

Clients use a single hostname:

prod-scan.company.com

Oracle automatically routes the connection to an appropriate node.


3. What is SCAN?

SCAN consists of:

  • One SCAN Name (DNS hostname)

  • Typically three SCAN IP addresses

  • Three SCAN Listeners (recommended)

Example:

SCAN Name

prod-scan.bsl.com

↓

192.168.1.210

192.168.1.211

192.168.1.212

Oracle Clusterware manages the SCAN listeners as cluster resources.


4. Oracle RAC SCAN Architecture

                      Client Applications
               (SQL*Plus / JDBC / OEM / ODP.NET)
                             │
                             │
                    DNS Lookup (SCAN Name)
                             │
                 prod-scan.company.com
                             │
        ┌───────────────────────────────────────┐
        │            DNS Resolution             │
        └───────────────────────────────────────┘
                 │          │           │
                 ▼          ▼           ▼
          SCAN IP-1   SCAN IP-2   SCAN IP-3
                 │          │           │
                 ▼          ▼           ▼
         SCAN Listener1  SCAN Listener2  SCAN Listener3
                 │
                 ▼
        Cluster Resource Manager
                 │
                 ▼
       Determines Best Available Instance
                 │
        ┌────────┴────────┐
        ▼                 ▼
     Node-1            Node-2
   Local Listener    Local Listener
        │                 │
        ▼                 ▼
   Instance-1        Instance-2

5. Components of SCAN Architecture

ComponentPurpose
SCAN NameSingle hostname used by clients
DNSResolves SCAN name to multiple SCAN IPs
SCAN VIPVirtual IP for SCAN Listener
SCAN ListenerAccepts initial client connection
Local ListenerListener running on each RAC node
Database ServiceDetermines which instance receives the connection
Oracle ClusterwareManages SCAN resources

6. SCAN Name

Clients never connect directly to a node.

Instead they connect to:

prod-scan.bsl.com

Advantages:

  • One hostname

  • Easy administration

  • No client changes when nodes are added

  • Simplified configuration


7. SCAN IP Addresses

Oracle recommends three SCAN IPs.

Example:

SCAN NameSCAN IP
prod-scan.bsl.com192.168.1.210
prod-scan.bsl.com192.168.1.211
prod-scan.bsl.com192.168.1.212

The DNS server returns these IPs using round-robin resolution.


8. SCAN Listeners

Each SCAN IP hosts one SCAN Listener.

Example:

SCAN-1

↓

SCAN Listener-1

↓

Node-1
SCAN-2

↓

SCAN Listener-2

↓

Node-2
SCAN-3

↓

SCAN Listener-3

↓

Node-3

A SCAN Listener can run on any RAC node. If a node fails, Oracle Clusterware automatically relocates the SCAN Listener to another healthy node.


9. Local Listener

Every RAC node runs its own Local Listener.

Example:

Node-1

↓

Local Listener

↓

Instance-1

The Local Listener:

  • Registers local database services

  • Accepts redirected client connections

  • Creates server processes


10. Dynamic Service Registration

Oracle instances automatically register themselves with the Local Listener using PMON (or equivalent background registration mechanisms in newer releases).

The Local Listener then shares service information with the SCAN Listener.

Registered information includes:

  • Instance Name

  • Database Name

  • Services

  • Load Information

  • Instance Status


11. Complete Client Connection Flow

Step 1

Application connects:

prod-scan.bsl.com

Step 2

DNS returns:

192.168.1.210

192.168.1.211

192.168.1.212

Step 3

Client connects to one SCAN Listener.

Client

↓

SCAN Listener

Step 4

SCAN Listener checks:

  • Which services are available?

  • Which instances provide the requested service?

  • Which instance is least loaded?


Step 5

SCAN Listener redirects the client.

SCAN Listener

↓

Local Listener

↓

Node-2

Step 6

Local Listener creates a dedicated (or shared, if configured) server process.

Listener

↓

Dedicated Server Process

↓

Instance

Step 7

Client communicates directly with the database instance.

The SCAN Listener is no longer involved in SQL processing after the initial connection.


12. SCAN Connection Flow Diagram

Application
      │
      ▼
SCAN Name (prod-scan.bsl.com)
      │
      ▼
DNS (3 SCAN IPs)
      │
      ▼
SCAN Listener
      │
      ▼
Local Listener
      │
      ▼
Database Service
      │
      ▼
Oracle Instance
      │
      ▼
SQL Processing

13. Load Balancing

SCAN supports Connection Load Balancing.

Example:

100 Users

↓

SCAN Listener

↓

40 Users → Node-1

35 Users → Node-2

25 Users → Node-3

Distribution depends on service configuration and instance load.


14. Node Failure Scenario

Suppose:

Node-2 crashes.

Before Failure:

Node-1

Node-2

Node-3

After Failure:

Node-1

Node-3

Oracle Clusterware:

  • Detects node failure.

  • Restarts the SCAN Listener (if necessary) on another node.

  • Relocates affected services according to policy.

  • New client connections are redirected to healthy instances.


15. Adding a New RAC Node

Without SCAN:

  • Modify every client connection string.

  • Add the new node information.

With SCAN:

Add Node

↓

Cluster Registers Services

↓

Clients Continue Using Same SCAN Name

No client-side connection string changes are required.


16. SCAN vs VIP

SCANVIP
Single client entry pointNode-specific virtual IP
Cluster-wideOne per RAC node
Uses SCAN ListenersUses Local Listener
Simplifies client configurationProvides fast node failover
Load balancingNode availability

17. SCAN vs Local Listener

SCAN ListenerLocal Listener
Accepts initial client connectionAccepts redirected connection
Cluster-wideRuns on each node
Redirects clientEstablishes database session
Managed by ClusterwareAssociated with a specific node

18. Useful SCAN Commands

Check SCAN Configuration

srvctl config scan

Example:

SCAN name: prod-scan.bsl.com

Check SCAN Listener

srvctl status scan_listener

Check SCAN VIP

srvctl status scan

View Listener Status

lsnrctl status

View Cluster Resources

crsctl stat res -t

19. Real-Time Production Scenario

Scenario: One RAC Node Goes Down

Environment

  • 3-node Oracle RAC

  • Banking application

  • SCAN configured with three IPs

Incident

Node-2 unexpectedly crashes due to a hardware failure.

What Happens?

  1. CSSD detects heartbeat loss.

  2. CRSD marks Node-2 resources offline.

  3. SCAN Listener previously running on Node-2 is relocated to another node if required.

  4. Services configured on Node-2 are restarted on surviving instances according to policy.

  5. New client connections continue using prod-scan.company.com.

  6. The SCAN Listener redirects new sessions to Node-1 or Node-3.

Result

  • Existing sessions on Node-2 are disconnected unless protected by Application Continuity.

  • New connections continue successfully.

  • Users typically experience minimal disruption.


20. SCAN Best Practices

  • Configure three SCAN IP addresses for production clusters.

  • Register the SCAN name in DNS (or use Grid Naming Service where appropriate).

  • Ensure SCAN resolves correctly from client systems.

  • Do not hard-code node hostnames in client applications.

  • Use database services instead of instance names.

  • Monitor SCAN Listener status daily.

  • Verify that dynamic service registration is functioning.

  • Test failover and connection redirection during maintenance windows.


21. Common SCAN Issues and Troubleshooting

ProblemPossible CauseVerification
Clients cannot connectDNS resolution failurenslookup prod-scan.bsl.com
SCAN Listener offlineCluster resource issuesrvctl status scan_listener
Services not visibleDynamic registration issuelsnrctl status
Uneven connection distributionService configurationsrvctl config service -d <db_name>
Connection delaysNetwork or listener problemsCheck listener logs and Clusterware resources

22. Summary

The SCAN Listener Architecture is one of the most significant enhancements introduced in Oracle RAC. It provides a single, cluster-wide connection point that hides the complexity of multiple RAC nodes from client applications. Through DNS resolution, SCAN Listeners, Local Listeners, and database services, Oracle automatically directs client connections to the most appropriate database instance while supporting connection load balancing and high availability.

By eliminating node-specific connection strings and allowing seamless cluster expansion without client reconfiguration, SCAN greatly simplifies RAC administration and improves application resilience. In modern Oracle RAC deployments, SCAN, together with VIPs, Oracle Clusterware, and database services, forms the foundation of a robust and highly available client connectivity architecture.

Wednesday, July 1, 2026

Oracle RAC Monitoring Framework

For production environments, I recommend turning it into a modular RAC Monitoring Framework rather than a single script. That makes it easier to schedule, troubleshoot, and extend.

Oracle RAC Monitoring Framework

Directory Structure

rac_monitoring/
├── rac_health_check.sh
├── db_health.sql
├── asm_health.sql
├── wait_events.sql
├── blocking_sessions.sql
├── tablespace.sql
├── fra_usage.sql
├── archive_log.sql
├── cpu_memory.sh
├── alert_log.sh
├── generate_report.sh
├── reports/
├── logs/
└── config.env

1. Configuration File (config.env)

#!/bin/bash

export ORACLE_BASE=/u01/app/oracle
export GRID_HOME=/u01/app/19.0.0/grid
export ORACLE_HOME=/u01/app/oracle/product/19.0.0/dbhome_1

export ORACLE_SID=PROD1

export PATH=$GRID_HOME/bin:$ORACLE_HOME/bin:$PATH

DB_NAME=PROD

REPORT_DIR=/home/oracle/rac_monitoring/reports
LOG_DIR=/home/oracle/rac_monitoring/logs

DATE=$(date +"%Y%m%d_%H%M%S")

REPORT=${REPORT_DIR}/RAC_Health_${DATE}.html
LOGFILE=${LOG_DIR}/RAC_Health_${DATE}.log

2. RAC Health Check Script (rac_health_check.sh)

#!/bin/bash

source ./config.env

exec > $LOGFILE

echo "==============================================="
echo "Oracle RAC Health Check"
echo "Server : $(hostname)"
echo "Date   : $(date)"
echo "==============================================="

echo
echo "=============================="
echo "Clusterware Status"
echo "=============================="
crsctl check crs

echo
echo "=============================="
echo "Cluster Resources"
echo "=============================="
crsctl stat res -t

echo
echo "=============================="
echo "Node Status"
echo "=============================="
olsnodes -n -s

echo
echo "=============================="
echo "ASM Status"
echo "=============================="
srvctl status asm

echo
echo "=============================="
echo "Diskgroups"
echo "=============================="
asmcmd lsdg

echo
echo "=============================="
echo "Database Status"
echo "=============================="
srvctl status database -d ${DB_NAME}

echo
echo "=============================="
echo "Services"
echo "=============================="
srvctl status service -d ${DB_NAME}

echo
echo "=============================="
echo "Listener"
echo "=============================="
srvctl status listener

echo
echo "=============================="
echo "SCAN Listener"
echo "=============================="
srvctl status scan_listener

echo
echo "=============================="
echo "VIP"
echo "=============================="
srvctl status vip

echo
echo "=============================="
echo "OCR"
echo "=============================="
ocrcheck

echo
echo "=============================="
echo "Voting Disk"
echo "=============================="
crsctl query css votedisk

echo
echo "Health Check Completed"

3. Wait Event Monitoring (wait_events.sql)

set lines 200
col event format a45

SELECT
event,
total_waits,
time_waited,
average_wait
FROM v$system_event
ORDER BY time_waited DESC
FETCH FIRST 20 ROWS ONLY;

4. Blocking Sessions

set lines 200

SELECT
inst_id,
sid,
serial#,
username,
blocking_session,
seconds_in_wait,
event
FROM gv$session
WHERE blocking_session IS NOT NULL;

5. ASM Monitoring

set lines 200

SELECT
name,
state,
type,
total_mb,
free_mb,
ROUND(free_mb*100/total_mb,2) FREE_PERCENT
FROM
v$asm_diskgroup;

6. Tablespace Monitoring

SELECT
tablespace_name,
ROUND(used_percent,2) USED_PERCENT
FROM dba_tablespace_usage_metrics
ORDER BY used_percent DESC;

7. FRA Monitoring

SELECT
SPACE_LIMIT/1024/1024 MB_LIMIT,
SPACE_USED/1024/1024 MB_USED,
SPACE_RECLAIMABLE/1024/1024 MB_RECLAIMABLE
FROM
V$RECOVERY_FILE_DEST;

8. Archive Log Generation

SELECT
TRUNC(first_time),
COUNT(*),
ROUND(SUM(blocks*block_size)/1024/1024/1024,2) GB
FROM
v$archived_log
GROUP BY
TRUNC(first_time)
ORDER BY
1 DESC;

9. CPU & Memory Monitoring (cpu_memory.sh)

#!/bin/bash

echo "========== CPU =========="
top -bn1 | head -5

echo

echo "========== Memory =========="
free -g

echo

echo "========== Swap =========="
swapon -s

echo

echo "========== Disk =========="
df -h

10. Alert Log Monitoring (alert_log.sh)

#!/bin/bash

adrci exec="show alert -tail 200"

11. Cluster Log Collection

#!/bin/bash

diagcollection.pl --collect cluster

12. Email Report

mailx -s "Oracle RAC Health Report $(hostname)" \
shashi_dba@shashidba.com < $LOGFILE

13. Cron Scheduling

Run every hour:

0 * * * * /home/oracle/rac_monitoring/rac_health_check.sh

Run daily at 8 AM:

0 8 * * * /home/oracle/rac_monitoring/rac_health_check.sh

Run every Sunday:

0 6 * * 0 /home/oracle/rac_monitoring/rac_health_check.sh

Sample Health Check Output

===================================================
Oracle RAC Health Check
===================================================

Hostname : racnode1
Date     : 01-Jul-2026 08:00

✔ CRS Status               ONLINE
✔ Cluster Resources        ONLINE
✔ Node Status              ACTIVE
✔ ASM                      RUNNING
✔ Diskgroups               DATA, RECO, OCR
✔ Database                 PROD OPEN
✔ Services                 RUNNING
✔ Listener                 RUNNING
✔ SCAN                     RUNNING
✔ VIP                      RUNNING
✔ OCR                      HEALTHY
✔ Voting Disk              NORMAL

Tablespace Usage
----------------------------
SYSTEM        72%
SYSAUX        61%
USERS         42%
TEMP          15%

ASM Usage
----------------------------
DATA      67%
RECO      58%

Blocking Sessions : NONE

Top Wait Event
----------------------------
db file sequential read

CPU Usage : 18%
Memory Usage : 63%

Overall RAC Health : PASS


Oracle RAC Administration Handbook

 Absolutely. Given the amount of content, this is best developed as a complete handbook rather than a single chat response.

📘 Oracle RAC Administration Handbook (100–150 Pages)

Section 1 – Oracle RAC Fundamentals

  • Oracle RAC Architecture

  • RAC Components

  • Grid Infrastructure

  • Oracle Clusterware

  • ASM Architecture

  • Cache Fusion

  • Global Cache Service (GCS)

  • Global Enqueue Service (GES)

  • OCR & Voting Disk

  • SCAN, VIP, GNS

  • RAC Networking

  • RAC Storage Architecture

  • RAC vs Single Instance

  • RAC vs Data Guard

  • Real-world RAC Deployment Architecture


Section 2 – Oracle RAC Installation

  • Hardware Prerequisites

  • OS Configuration

  • Kernel Parameters

  • User Configuration

  • Passwordless SSH

  • Network Planning

  • Storage Planning

  • ASM Configuration

  • Grid Infrastructure Installation

  • RAC Database Installation

  • Post-installation Verification

  • Architecture diagrams throughout


Section 3 – RAC Administration

  • Instance Management

  • Service Management

  • Listener Management

  • SCAN Management

  • VIP Management

  • OCR Backup & Restore

  • Voting Disk Management

  • Node Addition

  • Node Deletion

  • Database Creation

  • Database Deletion

  • RAC Patching

  • OPatchAuto

  • Rolling Patch

  • One-off Patch

  • RU Upgrade


Section 4 – Oracle RAC Health Check Framework

This section expands the framework into approximately 25–30 pages.

Includes:

  • Clusterware Health Check

  • ASM Health Check

  • Database Health Check

  • Node Health Check

  • Listener Health Check

  • VIP Health Check

  • SCAN Health Check

  • OCR Health Check

  • Voting Disk Health Check

  • CRS Resource Health Check

  • Cache Fusion Monitoring

  • Interconnect Latency Checks

  • Redo Log Health

  • Undo Health

  • Tablespace Health

  • FRA Health

  • Archive Log Health

  • Alert Log Review

  • ADRCI Diagnostics

  • AWR Health Indicators

  • ASH Monitoring

  • Blocking Sessions

  • Wait Events

  • OS Monitoring

  • Filesystem Checks

Each topic will include:

  • Purpose

  • Commands

  • Sample outputs

  • Interpretation

  • Common issues

  • Troubleshooting steps

  • Best practices


Section 5 – RAC Monitoring Scripts

Cluster Health Script

#!/bin/bash

echo "================================="
echo "Oracle RAC Health Check"
echo "================================="

hostname

echo
echo "CRS Status"
crsctl check crs

echo
echo "Node Status"
olsnodes -s

echo
echo "ASM Status"
srvctl status asm

echo
echo "Diskgroups"
asmcmd lsdg

echo
echo "Database Status"
srvctl status database -d PROD

echo
echo "Services"
srvctl status service -d PROD

echo
echo "VIP Status"
srvctl status vip

echo
echo "SCAN Listener"
srvctl status scan_listener

echo
echo "OCR"
ocrcheck

echo
echo "Voting Disk"
crsctl query css votedisk

echo
echo "Resources"
crsctl stat res -t

Wait Event Monitoring Script

SELECT
event,
total_waits,
time_waited
FROM
v$system_event
ORDER BY
time_waited DESC;

Blocking Session Script

SELECT
blocking_session,
sid,
serial#,
username,
event
FROM
gv$session
WHERE
blocking_session IS NOT NULL;

ASM Space Monitoring

SELECT
name,
total_mb,
free_mb,
ROUND(free_mb*100/total_mb,2) FREE_PERCENT
FROM
v$asm_diskgroup;

Cluster Resource Report

crsctl stat res -t

VIP Verification

srvctl status vip

OCR Verification

ocrcheck

CRS Alert Monitoring

adrci

show alert

Cluster Log Collection

diagcollection.pl --collect cluster

Section 6 – Automation Framework

The handbook will include a Daily Health Check Automation that generates HTML reports, CSV summaries, and email notifications.

Features:

  • Clusterware status

  • ASM status

  • Diskgroup utilization

  • Database status

  • Listener status

  • Services

  • SCAN

  • VIP

  • OCR

  • Voting disks

  • CPU

  • Memory

  • Disk usage

  • Top wait events

  • Blocking sessions

  • FRA usage

  • Archive log generation

  • Tablespace utilization

  • Alert log errors

  • CRS errors

Output formats:

  • HTML dashboard

  • CSV report

  • Email summary

  • Log file


Section 7 – Performance Tuning

  • Cache Fusion tuning

  • Interconnect tuning

  • ASM tuning

  • HugePages

  • NUMA

  • Linux kernel tuning

  • AWR analysis

  • ASH analysis

  • ADDM

  • SQL Monitoring

  • OSWatcher

  • ExaWatcher

  • Cluster Health Monitor (CHM)


Section 8 – Production Incident Runbooks (40+)

Examples include:

  • Node Eviction

  • CRS Won't Start

  • CSS Failure

  • ASM Disk Offline

  • OCR Corruption

  • Voting Disk Failure

  • VIP Not Failing Over

  • SCAN Listener Down

  • Split Brain

  • ORA-29740

  • ORA-29702

  • CRS-4535

  • CRS-4530

  • CRS-1606

  • PRCR-1079

  • PRCR-1064

  • ORA-15064

  • ORA-15032

  • ORA-15041

  • ORA-15042

  • ORA-00257

  • ORA-19809

  • Interconnect Packet Loss

  • High GCS Waits

  • gc buffer busy

  • gc cr request

  • gc current block busy

Each runbook will include:

  • Symptoms

  • Root cause

  • Diagnostic commands

  • Resolution steps

  • Validation

  • Prevention

  • Lessons learned


Section 9 – Oracle RAC Interview Guide

  • 500+ interview questions

  • L1 questions

  • L2 questions

  • L3 questions

  • Oracle ACE–level scenarios

  • Whiteboard architecture questions

  • Real production case studies


Section 10 – Architecture Diagrams

The handbook will contain over 50 professional diagrams, including:

  • Oracle RAC Architecture

  • Grid Infrastructure

  • Cache Fusion Flow

  • GCS/GES Communication

  • SCAN Listener Flow

  • VIP Failover

  • OCR Architecture

  • Voting Disk Layout

  • ASM Diskgroup Architecture

  • Redo Thread Architecture

  • RAC Networking

  • Client Connection Flow

  • Clusterware Stack

  • Service Failover

  • Node Eviction Flow

  • Split Brain Detection

  • CRS Startup Sequence

  • Rolling Patch Architecture

  • RAC + Data Guard Hybrid Architecture

  • RAC Backup Architecture

  • RAC Disaster Recovery Design


Oracle RAC Health Check Framework

 

Standard Operating Procedure (SOP)

Document Version: 1.0
Applicable Versions: Oracle RAC 11gR2, 12c, 18c, 19c, 21c, 23ai, 26ai
Prepared For: Oracle Database Administrators (L1/L2/L3)


Purpose

This document provides a structured Oracle RAC Health Check Framework that helps DBAs verify the health of Oracle Clusterware, ASM, Database, Network, and Cluster Resources. Performing these checks regularly helps detect issues early, reduce downtime, and maintain high availability.


Health Check Workflow

Clusterware
      │
      ▼
Node Status
      │
      ▼
ASM Health
      │
      ▼
Database Health
      │
      ▼
Network Health
      │
      ▼
Cluster Resources

1. Clusterware Health Check

Objective

Verify that Oracle Clusterware components are running correctly.

Components

  • OHASD

  • CSSD

  • CRSD

  • EVMD

Command

crsctl check crs

Expected Output

CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

Validation

ComponentExpected Status
OHASDOnline
CSSDOnline
CRSDOnline
EVMDOnline

If Failed

  • Check Clusterware logs.

  • Verify voting disks.

  • Verify OCR accessibility.

  • Restart Clusterware if required.


2. Node Health Check

Objective

Ensure all RAC nodes are available and participating in the cluster.

Commands

olsnodes
olsnodes -s
olsnodes -n

Expected Output

racnode1 Active
racnode2 Active

Validation

  • All nodes visible

  • Status should be Active

  • Node numbers should match cluster configuration

Troubleshooting

If a node is missing:

  • Verify private interconnect

  • Check Clusterware

  • Verify CSSD

  • Review node logs


3. ASM Health Check

Objective

Verify ASM availability and storage health.

Check ASM Status

srvctl status asm

Expected

ASM is running on racnode1
ASM is running on racnode2

Check Diskgroups

asmcmd lsdg

Example

DATA
RECO
OCR

Verify

  • Mounted

  • Free Space

  • Offline Disks

  • Redundancy


SQL Validation

SELECT
name,
state,
type,
total_mb,
free_mb
FROM v$asm_diskgroup;

Troubleshooting

  • Check failed disks

  • Verify ASM alert log

  • Validate storage connectivity


4. Database Health Check

Objective

Ensure all RAC database instances and services are available.

Database Status

srvctl status database -d <db_name>

Expected

Instance PROD1 is running
Instance PROD2 is running

Service Status

srvctl status service -d <db_name>

Verify

  • Application services

  • Preferred instances

  • Available instances


SQL Validation

SELECT
INSTANCE_NAME,
STATUS,
DATABASE_STATUS
FROM GV$INSTANCE;

Expected

OPEN
ACTIVE

5. Network Health Check

Objective

Verify communication between RAC nodes.


Public and Private Network

oifcfg getif

Verify

  • Public Interface

  • Private Interconnect


Network Configuration

srvctl config network

SCAN Configuration

srvctl config scan

Verify

  • SCAN Name

  • SCAN IPs

  • SCAN Listeners


VIP Status

srvctl status vip

Expected

VIP is enabled
VIP is running

Troubleshooting

  • Verify DNS

  • Check SCAN listeners

  • Verify VIP failover

  • Test private interconnect latency


6. Cluster Resource Health Check

Objective

Verify all Oracle Cluster resources are online.

Command

crsctl stat res -t

Verify

  • Database

  • ASM

  • Listeners

  • VIPs

  • SCAN Listeners

  • Diskgroups

Expected Status

ONLINE

Additional Recommended Health Checks

Listener Status

srvctl status listener

SCAN Listener Status

srvctl status scan_listener

OCR Check

ocrcheck

Expected

Status : healthy

Voting Disk

crsctl query css votedisk

Verify

  • All voting disks accessible


Cluster Synchronization

crsctl check css

CRS Stack

crsctl stat res -t

Verify every resource is ONLINE.


Daily RAC Health Check Checklist

CheckStatus
Clusterware Running
All Nodes Active
ASM Running
Diskgroups Mounted
Database Open
RAC Services Running
Public Network Healthy
Private Interconnect Healthy
VIP Running
SCAN Listener Running
OCR Healthy
Voting Disk Healthy
Cluster Resources ONLINE

Common Production Issues

IssuePossible CauseResolution
Node EvictionInterconnect failureCheck private network and CSS logs
ASM DownStorage unavailableVerify SAN/ASM disks and restart ASM
VIP OfflineNetwork issueValidate interface and relocate VIP
Service Not RunningInstance failureStart service with SRVCTL
CRS Resource OfflineClusterware issueReview CRS logs and restart the affected resource
Diskgroup Not MountedDisk failureCheck ASM disks and storage connectivity

Best Practices

  • Perform RAC health checks daily.

  • Monitor ASM free space and rebalance operations.

  • Verify OCR and voting disk health after maintenance.

  • Monitor interconnect latency to prevent node eviction.

  • Ensure SCAN listeners and VIPs are functioning correctly.

  • Keep Clusterware and database patches up to date.

  • Review alert logs and CRS logs regularly.

  • Automate routine health checks using shell scripts or Enterprise Manager where possible.


Conclusion

A disciplined RAC health check routine is essential for maintaining a stable Oracle RAC environment. Regular verification of Clusterware, nodes, ASM, databases, networking, and cluster resources helps identify issues proactively, minimize downtime, and ensure continuous availability of critical business applications.

Tuesday, June 30, 2026

Oracle Database Performance Tuning: It's More Than Just SQL

 Many DBAs immediately focus on SQL tuning when performance issues arise. While optimizing SQL is important, production environments often reveal that the real bottlenecks exist outside the query itself.

Here are a few areas that deserve equal attention:

🔹 Leverage Materialized Views Wisely

For reporting or read-intensive workloads, Materialized Views can dramatically improve performance by storing precomputed results. They help reduce expensive joins, aggregations, and repetitive calculations. Just ensure the refresh strategy aligns with your business requirements.

🔹 Check Operating System Limits

Oracle relies heavily on the underlying OS. Review resource limits such as:
nproc
nofile

Improper values can restrict processes and file descriptors, leading to unexpected slowdowns during peak workloads.

🔹 Validate Linux Kernel Parameters

Kernel settings play a critical role in Oracle performance. Verify parameters like:
✔️ fs.file-max
✔️ kernel.shmmax
✔️ kernel.shmall

Well-configured shared memory ensures efficient SGA allocation and better database stability.

🔹 Enable HugePages

HugePages is one of the easiest ways to optimize memory usage for databases with large SGAs.

Benefits include:
• Lower CPU utilization
• Reduced memory fragmentation
• Faster memory access
• Improved overall database stability

🔹 Think Beyond the Database

Performance tuning is an end-to-end process. A healthy Oracle environment depends on tuning across every layer:

📌 SQL Execution Plans
📌 Optimizer Statistics
📌 Index Design
📌 Materialized Views
📌 SGA & PGA Configuration
📌 Linux Resource Limits
📌 Kernel Parameters
📌 Storage & I/O Performance

💡 The best Oracle DBAs don't just tune SQL—they optimize the entire ecosystem.

What are the most impactful non-SQL performance optimizations you've implemented in your production environment? Share your experience in the comments!

#Oracle #OracleDatabase #OracleDBA #PerformanceTuning #SQLTuning #Linux #Unix #HugePages #KernelParameters #MaterializedViews #DatabasePerformance #SGA #PGA #OracleCloud #Exadata #DatabaseAdministration #AIOUG #OracleACE #TechCommunity

Sunday, June 21, 2026

ORA-01034: ORACLE Not Available – Database Down After Server Reboot

 

ORA-01034: ORACLE Not Available – Database Down After Server Reboot

Incident Summary

Incident Priority: P1 (Critical)

Error Reported:

ORA-01034: ORACLE not available

Impact:

  • Production database was unavailable.

  • All application users were unable to connect.

  • Business transactions stopped.

  • Multiple application timeout alerts were generated.


Environment

  • Oracle Database: 19c

  • Operating System: Oracle Linux / RHEL

  • Environment: Production

  • Architecture: Single Instance (Applicable to RAC with additional cluster checks)


Symptoms

Application team reported:

  • ORA-01034: ORACLE not available

  • ORA-27101: Shared memory realm does not exist

  • Database connection failures

  • OEM alerts showing database status as Down


Initial Response

Step 1: Acknowledge the P1 Incident

  • Acknowledge the ServiceNow/Incident ticket immediately.

  • Inform stakeholders that investigation has started.

  • Join the bridge call if applicable.


Step 2: Verify Database Server Accessibility

SSH into the production server:

ssh oracle@prod-db-server

Verify server uptime:

uptime

Check whether the server was recently rebooted:

last reboot

Observation: The server had rebooted after scheduled OS patching.


Step 3: Verify Database Status

Set the Oracle environment:

export ORACLE_SID=PRODDB
. oraenv

Check PMON process:

ps -ef | grep pmon

No PMON process was running, confirming that the database was down.


Step 4: Review the Alert Log

Locate the diagnostic directory:

SHOW PARAMETER diagnostic_dest;

Navigate to the trace directory:

cd $ORACLE_BASE/diag/rdbms/proddb/PRODDB/trace

Monitor the alert log:

tail -100 alert_PRODDB.log

Observation:

  • No corruption or startup failure errors.

  • Database had shut down cleanly before the OS reboot.

  • No automatic startup attempt after the reboot.


Step 5: Verify Listener Status

lsnrctl status

Listener was running successfully.


Step 6: Start the Database

Connect as SYSDBA:

sqlplus / as sysdba

Start the database:

STARTUP;

Verify status:

SELECT STATUS FROM V$INSTANCE;

SELECT OPEN_MODE FROM V$DATABASE;

Expected Output:

STATUS      : OPEN
OPEN_MODE   : READ WRITE

Step 7: Application Validation

  • Ask the application team to retry connections.

  • Verify business transactions.

  • Confirm application functionality.

Issue resolved successfully.


Root Cause Analysis (RCA)

The production server was rebooted after operating system patching.

Although the listener started automatically, Oracle Database auto-start was not configured.

The /etc/oratab entry was:

PRODDB:/u01/app/oracle/product/19c/dbhome_1:N

Since the last field was N, the database did not start automatically after the reboot.


Permanent Fix

Update /etc/oratab

Change:

PRODDB:/u01/app/oracle/product/19c/dbhome_1:N

To:

PRODDB:/u01/app/oracle/product/19c/dbhome_1:Y

Enable Oracle Auto Startup Script

On Linux systems, ensure the Oracle startup service (dbstart/dbshut or the appropriate systemd service) is configured to start during server boot.


Validate

Perform a controlled server reboot and confirm:

  • Listener starts automatically.

  • Database starts automatically.

  • Application connectivity is restored without manual intervention.


Commands Used During Troubleshooting

uptime

last reboot

ps -ef | grep pmon

lsnrctl status

tail -100 alert_PRODDB.log

sqlplus / as sysdba

STARTUP;

SELECT STATUS FROM V$INSTANCE;

SELECT OPEN_MODE FROM V$DATABASE;

Lessons Learned

  • Always verify Oracle auto-start configuration after installation or cloning.

  • Review the alert log first to identify the root cause.

  • Validate database startup after every OS patching activity.

  • Maintain clear communication with application and infrastructure teams during P1 incidents.

  • Record the complete RCA, resolution steps, and preventive actions in the incident management tool (e.g., ServiceNow).


Interview Answer

Question: Tell us about a production issue you resolved.

Answer:

"During a scheduled OS patching activity, the production server rebooted successfully, but the database did not come up automatically. The application team reported ORA-01034 errors and users were unable to access the application. I immediately acknowledged the P1 incident, connected to the server, verified that the PMON process was not running, reviewed the alert log, and confirmed there were no database errors. I identified that the database auto-start was not configured because the /etc/oratab entry was set to 'N'. I manually started the database, validated application connectivity, and then permanently fixed the issue by enabling Oracle auto-start. The entire incident was resolved in approximately 17 minutes with no data loss."

Saturday, May 30, 2026

Physical Standby vs Logical Standby vs Snapshot Standby

 For Oracle DBA interviews and real production environments, understanding Physical Standby vs Logical Standby vs Snapshot Standby is critical because interviewers often ask not only the definitions but also why companies choose one over another.


Oracle Data Guard Standby Types – Complete Comparison

FeaturePhysical StandbyLogical StandbySnapshot Standby
Database StructureExact block-by-block copyLogical copy using SQLPhysical standby temporarily converted to RW
Redo Apply MethodRedo Apply (MRP)SQL Apply (LSP)Redo received but not applied
Read OnlyYes (Active Data Guard)YesNo
Read WriteNo (unless activated)YesYes
DR ReadyYesPartialNo
Switchover SupportYesYesNo
Failover SupportYesLimitedNo
Performance ImpactLowestHigherModerate
Most Common in ProductionYesRareRare

1. Physical Standby Database

How It Works Internally

The primary database generates redo logs.

Primary Database
      |
      v
 Archive Logs / Redo Transport
      |
      v
Physical Standby
      |
      v
MRP (Managed Recovery Process)
      |
      v
Apply Redo Blocks

The standby receives redo and applies changes at the block level.

Oracle does not execute SQL.

It simply recreates the exact same database blocks as the primary.


Example

Primary:

INSERT INTO EMP VALUES (101,'SCOTT');
COMMIT;

Redo generated:

Block #123 modified

Standby:

Apply Block #123

No SQL execution occurs.


Read/Write Capability

Standard Physical Standby

Read = Yes
Write = No

Active Data Guard

Read = Yes
Write = No
Real-Time Queries = Yes

Advantages

✔ Exact copy of primary

✔ Fast redo apply

✔ Supports switchover/failover

✔ Simplest architecture

✔ Best DR solution

✔ Minimal administration


Limitations

✖ Cannot modify data

✖ Requires Active Data Guard license for real-time reporting


Real Production Usage

Most companies use:

Primary RAC
      |
      |
Physical Standby

Typical industries:

  • Banking

  • Insurance

  • Healthcare

  • Manufacturing

  • E-Commerce

This is the industry standard DR architecture.


2. Logical Standby Database

How It Works Internally

Instead of applying redo blocks, Oracle converts redo into SQL statements.

Primary
   |
Redo Logs
   |
LogMiner
   |
SQL Apply
   |
Logical Standby

Oracle uses:

LogMiner

to extract SQL from redo logs.


Example

Primary:

INSERT INTO EMP VALUES (101,'SCOTT');
COMMIT;

Redo arrives.

Logical standby converts it to:

INSERT INTO EMP VALUES (101,'SCOTT');

Then executes the SQL.


Internal Flow

Redo
  |
LogMiner
  |
Generate SQL
  |
Execute SQL
  |
Update Standby

Read/Write Capability

Read = Yes
Write = Yes

Users can create additional:

  • Tables

  • Indexes

  • Materialized Views

  • Reporting Objects

without affecting replication.


Advantages

✔ Open read/write

✔ Useful for reporting

✔ Can create local objects

✔ Supports heterogeneous workloads


Limitations

✖ More CPU intensive

✖ SQL Apply slower than Redo Apply

✖ Not all datatypes supported

✖ More administration

✖ Not widely used today


Real Production Usage

Usually for:

  • Reporting databases

  • Data warehouse environments

  • Separate reporting schemas

Example:

Primary OLTP
       |
Logical Standby
       |
BI Reports

3. Snapshot Standby Database

How It Works Internally

Snapshot standby begins as a physical standby.

Physical Standby
       |
Convert
       |
Snapshot Standby

When converted:

  • Redo continues arriving

  • Redo is NOT applied

Oracle stores incoming redo for later use.


Internal Architecture

Primary
   |
Redo Transport
   |
Snapshot Standby
   |
Redo Stored
(Not Applied)

Users can modify data freely.


Example

Developer creates:

DELETE FROM EMP;
COMMIT;

Allowed.

Later:

Convert back to Physical Standby

Oracle discards local changes and applies accumulated redo.


Read/Write Capability

Read = Yes
Write = Yes

Fully read/write.


Advantages

✔ Testing environment

✔ Application upgrade testing

✔ Patch validation

✔ Production-like testing


Limitations

✖ Not DR ready

✖ Cannot failover

✖ Redo apply paused

✖ Local changes lost after conversion


Real Production Usage

Common for:

Application Upgrade Testing

Production
      |
Snapshot Standby
      |
Upgrade Testing

Patch Testing

Patch Validation
Before Production Rollout

Which Standby Type Do Companies Prefer?

Enterprise Reality

If you visit 100 companies:

Standby TypeUsage Percentage
Physical Standby~90%
Snapshot Standby~8%
Logical Standby~2%

Why Physical Standby Wins

Because organizations need:

✔ Disaster Recovery

✔ Fast Failover

✔ Data Protection

✔ Simplicity

✔ Low Administration

Physical standby provides all of these.


Licensing Considerations

Physical Standby

Included with Data Guard.


Active Data Guard

Requires additional license.

Provides:

✔ Read-only reporting

✔ Real-time query

✔ Automatic block repair


Logical Standby

Included with Data Guard.


Snapshot Standby

Included with Data Guard.


Interview Answer (2-Minute Version)

Physical Standby is an exact block-level copy of the primary database. Redo is applied using Managed Recovery Process (MRP). It is mainly used for High Availability and Disaster Recovery and is the most widely used standby type in production.

Logical Standby converts redo into SQL statements using LogMiner and applies them through SQL Apply. It supports read-write access and is mainly used for reporting environments, though it is less common due to complexity and datatype limitations.

Snapshot Standby is a temporary read-write version of a physical standby. Redo continues to be received but is not applied until the database is converted back to physical standby. It is commonly used for testing, patch validation, and upgrade rehearsals.

In real enterprise environments, Physical Standby is the preferred and most commonly deployed Data Guard configuration because it provides the best balance of data protection, performance, and operational simplicity.