Wednesday, July 1, 2026

Oracle RAC Monitoring Framework

For production environments, I recommend turning it into a modular RAC Monitoring Framework rather than a single script. That makes it easier to schedule, troubleshoot, and extend.

Oracle RAC Monitoring Framework

Directory Structure

rac_monitoring/
├── rac_health_check.sh
├── db_health.sql
├── asm_health.sql
├── wait_events.sql
├── blocking_sessions.sql
├── tablespace.sql
├── fra_usage.sql
├── archive_log.sql
├── cpu_memory.sh
├── alert_log.sh
├── generate_report.sh
├── reports/
├── logs/
└── config.env

1. Configuration File (config.env)

#!/bin/bash

export ORACLE_BASE=/u01/app/oracle
export GRID_HOME=/u01/app/19.0.0/grid
export ORACLE_HOME=/u01/app/oracle/product/19.0.0/dbhome_1

export ORACLE_SID=PROD1

export PATH=$GRID_HOME/bin:$ORACLE_HOME/bin:$PATH

DB_NAME=PROD

REPORT_DIR=/home/oracle/rac_monitoring/reports
LOG_DIR=/home/oracle/rac_monitoring/logs

DATE=$(date +"%Y%m%d_%H%M%S")

REPORT=${REPORT_DIR}/RAC_Health_${DATE}.html
LOGFILE=${LOG_DIR}/RAC_Health_${DATE}.log

2. RAC Health Check Script (rac_health_check.sh)

#!/bin/bash

source ./config.env

exec > $LOGFILE

echo "==============================================="
echo "Oracle RAC Health Check"
echo "Server : $(hostname)"
echo "Date   : $(date)"
echo "==============================================="

echo
echo "=============================="
echo "Clusterware Status"
echo "=============================="
crsctl check crs

echo
echo "=============================="
echo "Cluster Resources"
echo "=============================="
crsctl stat res -t

echo
echo "=============================="
echo "Node Status"
echo "=============================="
olsnodes -n -s

echo
echo "=============================="
echo "ASM Status"
echo "=============================="
srvctl status asm

echo
echo "=============================="
echo "Diskgroups"
echo "=============================="
asmcmd lsdg

echo
echo "=============================="
echo "Database Status"
echo "=============================="
srvctl status database -d ${DB_NAME}

echo
echo "=============================="
echo "Services"
echo "=============================="
srvctl status service -d ${DB_NAME}

echo
echo "=============================="
echo "Listener"
echo "=============================="
srvctl status listener

echo
echo "=============================="
echo "SCAN Listener"
echo "=============================="
srvctl status scan_listener

echo
echo "=============================="
echo "VIP"
echo "=============================="
srvctl status vip

echo
echo "=============================="
echo "OCR"
echo "=============================="
ocrcheck

echo
echo "=============================="
echo "Voting Disk"
echo "=============================="
crsctl query css votedisk

echo
echo "Health Check Completed"

3. Wait Event Monitoring (wait_events.sql)

set lines 200
col event format a45

SELECT
event,
total_waits,
time_waited,
average_wait
FROM v$system_event
ORDER BY time_waited DESC
FETCH FIRST 20 ROWS ONLY;

4. Blocking Sessions

set lines 200

SELECT
inst_id,
sid,
serial#,
username,
blocking_session,
seconds_in_wait,
event
FROM gv$session
WHERE blocking_session IS NOT NULL;

5. ASM Monitoring

set lines 200

SELECT
name,
state,
type,
total_mb,
free_mb,
ROUND(free_mb*100/total_mb,2) FREE_PERCENT
FROM
v$asm_diskgroup;

6. Tablespace Monitoring

SELECT
tablespace_name,
ROUND(used_percent,2) USED_PERCENT
FROM dba_tablespace_usage_metrics
ORDER BY used_percent DESC;

7. FRA Monitoring

SELECT
SPACE_LIMIT/1024/1024 MB_LIMIT,
SPACE_USED/1024/1024 MB_USED,
SPACE_RECLAIMABLE/1024/1024 MB_RECLAIMABLE
FROM
V$RECOVERY_FILE_DEST;

8. Archive Log Generation

SELECT
TRUNC(first_time),
COUNT(*),
ROUND(SUM(blocks*block_size)/1024/1024/1024,2) GB
FROM
v$archived_log
GROUP BY
TRUNC(first_time)
ORDER BY
1 DESC;

9. CPU & Memory Monitoring (cpu_memory.sh)

#!/bin/bash

echo "========== CPU =========="
top -bn1 | head -5

echo

echo "========== Memory =========="
free -g

echo

echo "========== Swap =========="
swapon -s

echo

echo "========== Disk =========="
df -h

10. Alert Log Monitoring (alert_log.sh)

#!/bin/bash

adrci exec="show alert -tail 200"

11. Cluster Log Collection

#!/bin/bash

diagcollection.pl --collect cluster

12. Email Report

mailx -s "Oracle RAC Health Report $(hostname)" \
shashi_dba@shashidba.com < $LOGFILE

13. Cron Scheduling

Run every hour:

0 * * * * /home/oracle/rac_monitoring/rac_health_check.sh

Run daily at 8 AM:

0 8 * * * /home/oracle/rac_monitoring/rac_health_check.sh

Run every Sunday:

0 6 * * 0 /home/oracle/rac_monitoring/rac_health_check.sh

Sample Health Check Output

===================================================
Oracle RAC Health Check
===================================================

Hostname : racnode1
Date     : 01-Jul-2026 08:00

✔ CRS Status               ONLINE
✔ Cluster Resources        ONLINE
✔ Node Status              ACTIVE
✔ ASM                      RUNNING
✔ Diskgroups               DATA, RECO, OCR
✔ Database                 PROD OPEN
✔ Services                 RUNNING
✔ Listener                 RUNNING
✔ SCAN                     RUNNING
✔ VIP                      RUNNING
✔ OCR                      HEALTHY
✔ Voting Disk              NORMAL

Tablespace Usage
----------------------------
SYSTEM        72%
SYSAUX        61%
USERS         42%
TEMP          15%

ASM Usage
----------------------------
DATA      67%
RECO      58%

Blocking Sessions : NONE

Top Wait Event
----------------------------
db file sequential read

CPU Usage : 18%
Memory Usage : 63%

Overall RAC Health : PASS


Oracle RAC Administration Handbook

 Absolutely. Given the amount of content, this is best developed as a complete handbook rather than a single chat response.

📘 Oracle RAC Administration Handbook (100–150 Pages)

Section 1 – Oracle RAC Fundamentals

  • Oracle RAC Architecture

  • RAC Components

  • Grid Infrastructure

  • Oracle Clusterware

  • ASM Architecture

  • Cache Fusion

  • Global Cache Service (GCS)

  • Global Enqueue Service (GES)

  • OCR & Voting Disk

  • SCAN, VIP, GNS

  • RAC Networking

  • RAC Storage Architecture

  • RAC vs Single Instance

  • RAC vs Data Guard

  • Real-world RAC Deployment Architecture


Section 2 – Oracle RAC Installation

  • Hardware Prerequisites

  • OS Configuration

  • Kernel Parameters

  • User Configuration

  • Passwordless SSH

  • Network Planning

  • Storage Planning

  • ASM Configuration

  • Grid Infrastructure Installation

  • RAC Database Installation

  • Post-installation Verification

  • Architecture diagrams throughout


Section 3 – RAC Administration

  • Instance Management

  • Service Management

  • Listener Management

  • SCAN Management

  • VIP Management

  • OCR Backup & Restore

  • Voting Disk Management

  • Node Addition

  • Node Deletion

  • Database Creation

  • Database Deletion

  • RAC Patching

  • OPatchAuto

  • Rolling Patch

  • One-off Patch

  • RU Upgrade


Section 4 – Oracle RAC Health Check Framework

This section expands the framework into approximately 25–30 pages.

Includes:

  • Clusterware Health Check

  • ASM Health Check

  • Database Health Check

  • Node Health Check

  • Listener Health Check

  • VIP Health Check

  • SCAN Health Check

  • OCR Health Check

  • Voting Disk Health Check

  • CRS Resource Health Check

  • Cache Fusion Monitoring

  • Interconnect Latency Checks

  • Redo Log Health

  • Undo Health

  • Tablespace Health

  • FRA Health

  • Archive Log Health

  • Alert Log Review

  • ADRCI Diagnostics

  • AWR Health Indicators

  • ASH Monitoring

  • Blocking Sessions

  • Wait Events

  • OS Monitoring

  • Filesystem Checks

Each topic will include:

  • Purpose

  • Commands

  • Sample outputs

  • Interpretation

  • Common issues

  • Troubleshooting steps

  • Best practices


Section 5 – RAC Monitoring Scripts

Cluster Health Script

#!/bin/bash

echo "================================="
echo "Oracle RAC Health Check"
echo "================================="

hostname

echo
echo "CRS Status"
crsctl check crs

echo
echo "Node Status"
olsnodes -s

echo
echo "ASM Status"
srvctl status asm

echo
echo "Diskgroups"
asmcmd lsdg

echo
echo "Database Status"
srvctl status database -d PROD

echo
echo "Services"
srvctl status service -d PROD

echo
echo "VIP Status"
srvctl status vip

echo
echo "SCAN Listener"
srvctl status scan_listener

echo
echo "OCR"
ocrcheck

echo
echo "Voting Disk"
crsctl query css votedisk

echo
echo "Resources"
crsctl stat res -t

Wait Event Monitoring Script

SELECT
event,
total_waits,
time_waited
FROM
v$system_event
ORDER BY
time_waited DESC;

Blocking Session Script

SELECT
blocking_session,
sid,
serial#,
username,
event
FROM
gv$session
WHERE
blocking_session IS NOT NULL;

ASM Space Monitoring

SELECT
name,
total_mb,
free_mb,
ROUND(free_mb*100/total_mb,2) FREE_PERCENT
FROM
v$asm_diskgroup;

Cluster Resource Report

crsctl stat res -t

VIP Verification

srvctl status vip

OCR Verification

ocrcheck

CRS Alert Monitoring

adrci

show alert

Cluster Log Collection

diagcollection.pl --collect cluster

Section 6 – Automation Framework

The handbook will include a Daily Health Check Automation that generates HTML reports, CSV summaries, and email notifications.

Features:

  • Clusterware status

  • ASM status

  • Diskgroup utilization

  • Database status

  • Listener status

  • Services

  • SCAN

  • VIP

  • OCR

  • Voting disks

  • CPU

  • Memory

  • Disk usage

  • Top wait events

  • Blocking sessions

  • FRA usage

  • Archive log generation

  • Tablespace utilization

  • Alert log errors

  • CRS errors

Output formats:

  • HTML dashboard

  • CSV report

  • Email summary

  • Log file


Section 7 – Performance Tuning

  • Cache Fusion tuning

  • Interconnect tuning

  • ASM tuning

  • HugePages

  • NUMA

  • Linux kernel tuning

  • AWR analysis

  • ASH analysis

  • ADDM

  • SQL Monitoring

  • OSWatcher

  • ExaWatcher

  • Cluster Health Monitor (CHM)


Section 8 – Production Incident Runbooks (40+)

Examples include:

  • Node Eviction

  • CRS Won't Start

  • CSS Failure

  • ASM Disk Offline

  • OCR Corruption

  • Voting Disk Failure

  • VIP Not Failing Over

  • SCAN Listener Down

  • Split Brain

  • ORA-29740

  • ORA-29702

  • CRS-4535

  • CRS-4530

  • CRS-1606

  • PRCR-1079

  • PRCR-1064

  • ORA-15064

  • ORA-15032

  • ORA-15041

  • ORA-15042

  • ORA-00257

  • ORA-19809

  • Interconnect Packet Loss

  • High GCS Waits

  • gc buffer busy

  • gc cr request

  • gc current block busy

Each runbook will include:

  • Symptoms

  • Root cause

  • Diagnostic commands

  • Resolution steps

  • Validation

  • Prevention

  • Lessons learned


Section 9 – Oracle RAC Interview Guide

  • 500+ interview questions

  • L1 questions

  • L2 questions

  • L3 questions

  • Oracle ACE–level scenarios

  • Whiteboard architecture questions

  • Real production case studies


Section 10 – Architecture Diagrams

The handbook will contain over 50 professional diagrams, including:

  • Oracle RAC Architecture

  • Grid Infrastructure

  • Cache Fusion Flow

  • GCS/GES Communication

  • SCAN Listener Flow

  • VIP Failover

  • OCR Architecture

  • Voting Disk Layout

  • ASM Diskgroup Architecture

  • Redo Thread Architecture

  • RAC Networking

  • Client Connection Flow

  • Clusterware Stack

  • Service Failover

  • Node Eviction Flow

  • Split Brain Detection

  • CRS Startup Sequence

  • Rolling Patch Architecture

  • RAC + Data Guard Hybrid Architecture

  • RAC Backup Architecture

  • RAC Disaster Recovery Design


Oracle RAC Health Check Framework

 

Standard Operating Procedure (SOP)

Document Version: 1.0
Applicable Versions: Oracle RAC 11gR2, 12c, 18c, 19c, 21c, 23ai, 26ai
Prepared For: Oracle Database Administrators (L1/L2/L3)


Purpose

This document provides a structured Oracle RAC Health Check Framework that helps DBAs verify the health of Oracle Clusterware, ASM, Database, Network, and Cluster Resources. Performing these checks regularly helps detect issues early, reduce downtime, and maintain high availability.


Health Check Workflow

Clusterware
      │
      ▼
Node Status
      │
      ▼
ASM Health
      │
      ▼
Database Health
      │
      ▼
Network Health
      │
      ▼
Cluster Resources

1. Clusterware Health Check

Objective

Verify that Oracle Clusterware components are running correctly.

Components

  • OHASD

  • CSSD

  • CRSD

  • EVMD

Command

crsctl check crs

Expected Output

CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

Validation

ComponentExpected Status
OHASDOnline
CSSDOnline
CRSDOnline
EVMDOnline

If Failed

  • Check Clusterware logs.

  • Verify voting disks.

  • Verify OCR accessibility.

  • Restart Clusterware if required.


2. Node Health Check

Objective

Ensure all RAC nodes are available and participating in the cluster.

Commands

olsnodes
olsnodes -s
olsnodes -n

Expected Output

racnode1 Active
racnode2 Active

Validation

  • All nodes visible

  • Status should be Active

  • Node numbers should match cluster configuration

Troubleshooting

If a node is missing:

  • Verify private interconnect

  • Check Clusterware

  • Verify CSSD

  • Review node logs


3. ASM Health Check

Objective

Verify ASM availability and storage health.

Check ASM Status

srvctl status asm

Expected

ASM is running on racnode1
ASM is running on racnode2

Check Diskgroups

asmcmd lsdg

Example

DATA
RECO
OCR

Verify

  • Mounted

  • Free Space

  • Offline Disks

  • Redundancy


SQL Validation

SELECT
name,
state,
type,
total_mb,
free_mb
FROM v$asm_diskgroup;

Troubleshooting

  • Check failed disks

  • Verify ASM alert log

  • Validate storage connectivity


4. Database Health Check

Objective

Ensure all RAC database instances and services are available.

Database Status

srvctl status database -d <db_name>

Expected

Instance PROD1 is running
Instance PROD2 is running

Service Status

srvctl status service -d <db_name>

Verify

  • Application services

  • Preferred instances

  • Available instances


SQL Validation

SELECT
INSTANCE_NAME,
STATUS,
DATABASE_STATUS
FROM GV$INSTANCE;

Expected

OPEN
ACTIVE

5. Network Health Check

Objective

Verify communication between RAC nodes.


Public and Private Network

oifcfg getif

Verify

  • Public Interface

  • Private Interconnect


Network Configuration

srvctl config network

SCAN Configuration

srvctl config scan

Verify

  • SCAN Name

  • SCAN IPs

  • SCAN Listeners


VIP Status

srvctl status vip

Expected

VIP is enabled
VIP is running

Troubleshooting

  • Verify DNS

  • Check SCAN listeners

  • Verify VIP failover

  • Test private interconnect latency


6. Cluster Resource Health Check

Objective

Verify all Oracle Cluster resources are online.

Command

crsctl stat res -t

Verify

  • Database

  • ASM

  • Listeners

  • VIPs

  • SCAN Listeners

  • Diskgroups

Expected Status

ONLINE

Additional Recommended Health Checks

Listener Status

srvctl status listener

SCAN Listener Status

srvctl status scan_listener

OCR Check

ocrcheck

Expected

Status : healthy

Voting Disk

crsctl query css votedisk

Verify

  • All voting disks accessible


Cluster Synchronization

crsctl check css

CRS Stack

crsctl stat res -t

Verify every resource is ONLINE.


Daily RAC Health Check Checklist

CheckStatus
Clusterware Running
All Nodes Active
ASM Running
Diskgroups Mounted
Database Open
RAC Services Running
Public Network Healthy
Private Interconnect Healthy
VIP Running
SCAN Listener Running
OCR Healthy
Voting Disk Healthy
Cluster Resources ONLINE

Common Production Issues

IssuePossible CauseResolution
Node EvictionInterconnect failureCheck private network and CSS logs
ASM DownStorage unavailableVerify SAN/ASM disks and restart ASM
VIP OfflineNetwork issueValidate interface and relocate VIP
Service Not RunningInstance failureStart service with SRVCTL
CRS Resource OfflineClusterware issueReview CRS logs and restart the affected resource
Diskgroup Not MountedDisk failureCheck ASM disks and storage connectivity

Best Practices

  • Perform RAC health checks daily.

  • Monitor ASM free space and rebalance operations.

  • Verify OCR and voting disk health after maintenance.

  • Monitor interconnect latency to prevent node eviction.

  • Ensure SCAN listeners and VIPs are functioning correctly.

  • Keep Clusterware and database patches up to date.

  • Review alert logs and CRS logs regularly.

  • Automate routine health checks using shell scripts or Enterprise Manager where possible.


Conclusion

A disciplined RAC health check routine is essential for maintaining a stable Oracle RAC environment. Regular verification of Clusterware, nodes, ASM, databases, networking, and cluster resources helps identify issues proactively, minimize downtime, and ensure continuous availability of critical business applications.

Tuesday, June 30, 2026

Oracle Database Performance Tuning: It's More Than Just SQL

 Many DBAs immediately focus on SQL tuning when performance issues arise. While optimizing SQL is important, production environments often reveal that the real bottlenecks exist outside the query itself.

Here are a few areas that deserve equal attention:

🔹 Leverage Materialized Views Wisely

For reporting or read-intensive workloads, Materialized Views can dramatically improve performance by storing precomputed results. They help reduce expensive joins, aggregations, and repetitive calculations. Just ensure the refresh strategy aligns with your business requirements.

🔹 Check Operating System Limits

Oracle relies heavily on the underlying OS. Review resource limits such as:
nproc
nofile

Improper values can restrict processes and file descriptors, leading to unexpected slowdowns during peak workloads.

🔹 Validate Linux Kernel Parameters

Kernel settings play a critical role in Oracle performance. Verify parameters like:
✔️ fs.file-max
✔️ kernel.shmmax
✔️ kernel.shmall

Well-configured shared memory ensures efficient SGA allocation and better database stability.

🔹 Enable HugePages

HugePages is one of the easiest ways to optimize memory usage for databases with large SGAs.

Benefits include:
• Lower CPU utilization
• Reduced memory fragmentation
• Faster memory access
• Improved overall database stability

🔹 Think Beyond the Database

Performance tuning is an end-to-end process. A healthy Oracle environment depends on tuning across every layer:

📌 SQL Execution Plans
📌 Optimizer Statistics
📌 Index Design
📌 Materialized Views
📌 SGA & PGA Configuration
📌 Linux Resource Limits
📌 Kernel Parameters
📌 Storage & I/O Performance

💡 The best Oracle DBAs don't just tune SQL—they optimize the entire ecosystem.

What are the most impactful non-SQL performance optimizations you've implemented in your production environment? Share your experience in the comments!

#Oracle #OracleDatabase #OracleDBA #PerformanceTuning #SQLTuning #Linux #Unix #HugePages #KernelParameters #MaterializedViews #DatabasePerformance #SGA #PGA #OracleCloud #Exadata #DatabaseAdministration #AIOUG #OracleACE #TechCommunity

Sunday, June 21, 2026

ORA-01034: ORACLE Not Available – Database Down After Server Reboot

 

ORA-01034: ORACLE Not Available – Database Down After Server Reboot

Incident Summary

Incident Priority: P1 (Critical)

Error Reported:

ORA-01034: ORACLE not available

Impact:

  • Production database was unavailable.

  • All application users were unable to connect.

  • Business transactions stopped.

  • Multiple application timeout alerts were generated.


Environment

  • Oracle Database: 19c

  • Operating System: Oracle Linux / RHEL

  • Environment: Production

  • Architecture: Single Instance (Applicable to RAC with additional cluster checks)


Symptoms

Application team reported:

  • ORA-01034: ORACLE not available

  • ORA-27101: Shared memory realm does not exist

  • Database connection failures

  • OEM alerts showing database status as Down


Initial Response

Step 1: Acknowledge the P1 Incident

  • Acknowledge the ServiceNow/Incident ticket immediately.

  • Inform stakeholders that investigation has started.

  • Join the bridge call if applicable.


Step 2: Verify Database Server Accessibility

SSH into the production server:

ssh oracle@prod-db-server

Verify server uptime:

uptime

Check whether the server was recently rebooted:

last reboot

Observation: The server had rebooted after scheduled OS patching.


Step 3: Verify Database Status

Set the Oracle environment:

export ORACLE_SID=PRODDB
. oraenv

Check PMON process:

ps -ef | grep pmon

No PMON process was running, confirming that the database was down.


Step 4: Review the Alert Log

Locate the diagnostic directory:

SHOW PARAMETER diagnostic_dest;

Navigate to the trace directory:

cd $ORACLE_BASE/diag/rdbms/proddb/PRODDB/trace

Monitor the alert log:

tail -100 alert_PRODDB.log

Observation:

  • No corruption or startup failure errors.

  • Database had shut down cleanly before the OS reboot.

  • No automatic startup attempt after the reboot.


Step 5: Verify Listener Status

lsnrctl status

Listener was running successfully.


Step 6: Start the Database

Connect as SYSDBA:

sqlplus / as sysdba

Start the database:

STARTUP;

Verify status:

SELECT STATUS FROM V$INSTANCE;

SELECT OPEN_MODE FROM V$DATABASE;

Expected Output:

STATUS      : OPEN
OPEN_MODE   : READ WRITE

Step 7: Application Validation

  • Ask the application team to retry connections.

  • Verify business transactions.

  • Confirm application functionality.

Issue resolved successfully.


Root Cause Analysis (RCA)

The production server was rebooted after operating system patching.

Although the listener started automatically, Oracle Database auto-start was not configured.

The /etc/oratab entry was:

PRODDB:/u01/app/oracle/product/19c/dbhome_1:N

Since the last field was N, the database did not start automatically after the reboot.


Permanent Fix

Update /etc/oratab

Change:

PRODDB:/u01/app/oracle/product/19c/dbhome_1:N

To:

PRODDB:/u01/app/oracle/product/19c/dbhome_1:Y

Enable Oracle Auto Startup Script

On Linux systems, ensure the Oracle startup service (dbstart/dbshut or the appropriate systemd service) is configured to start during server boot.


Validate

Perform a controlled server reboot and confirm:

  • Listener starts automatically.

  • Database starts automatically.

  • Application connectivity is restored without manual intervention.


Commands Used During Troubleshooting

uptime

last reboot

ps -ef | grep pmon

lsnrctl status

tail -100 alert_PRODDB.log

sqlplus / as sysdba

STARTUP;

SELECT STATUS FROM V$INSTANCE;

SELECT OPEN_MODE FROM V$DATABASE;

Lessons Learned

  • Always verify Oracle auto-start configuration after installation or cloning.

  • Review the alert log first to identify the root cause.

  • Validate database startup after every OS patching activity.

  • Maintain clear communication with application and infrastructure teams during P1 incidents.

  • Record the complete RCA, resolution steps, and preventive actions in the incident management tool (e.g., ServiceNow).


Interview Answer

Question: Tell us about a production issue you resolved.

Answer:

"During a scheduled OS patching activity, the production server rebooted successfully, but the database did not come up automatically. The application team reported ORA-01034 errors and users were unable to access the application. I immediately acknowledged the P1 incident, connected to the server, verified that the PMON process was not running, reviewed the alert log, and confirmed there were no database errors. I identified that the database auto-start was not configured because the /etc/oratab entry was set to 'N'. I manually started the database, validated application connectivity, and then permanently fixed the issue by enabling Oracle auto-start. The entire incident was resolved in approximately 17 minutes with no data loss."

Saturday, May 30, 2026

Physical Standby vs Logical Standby vs Snapshot Standby

 For Oracle DBA interviews and real production environments, understanding Physical Standby vs Logical Standby vs Snapshot Standby is critical because interviewers often ask not only the definitions but also why companies choose one over another.


Oracle Data Guard Standby Types – Complete Comparison

FeaturePhysical StandbyLogical StandbySnapshot Standby
Database StructureExact block-by-block copyLogical copy using SQLPhysical standby temporarily converted to RW
Redo Apply MethodRedo Apply (MRP)SQL Apply (LSP)Redo received but not applied
Read OnlyYes (Active Data Guard)YesNo
Read WriteNo (unless activated)YesYes
DR ReadyYesPartialNo
Switchover SupportYesYesNo
Failover SupportYesLimitedNo
Performance ImpactLowestHigherModerate
Most Common in ProductionYesRareRare

1. Physical Standby Database

How It Works Internally

The primary database generates redo logs.

Primary Database
      |
      v
 Archive Logs / Redo Transport
      |
      v
Physical Standby
      |
      v
MRP (Managed Recovery Process)
      |
      v
Apply Redo Blocks

The standby receives redo and applies changes at the block level.

Oracle does not execute SQL.

It simply recreates the exact same database blocks as the primary.


Example

Primary:

INSERT INTO EMP VALUES (101,'SCOTT');
COMMIT;

Redo generated:

Block #123 modified

Standby:

Apply Block #123

No SQL execution occurs.


Read/Write Capability

Standard Physical Standby

Read = Yes
Write = No

Active Data Guard

Read = Yes
Write = No
Real-Time Queries = Yes

Advantages

✔ Exact copy of primary

✔ Fast redo apply

✔ Supports switchover/failover

✔ Simplest architecture

✔ Best DR solution

✔ Minimal administration


Limitations

✖ Cannot modify data

✖ Requires Active Data Guard license for real-time reporting


Real Production Usage

Most companies use:

Primary RAC
      |
      |
Physical Standby

Typical industries:

  • Banking

  • Insurance

  • Healthcare

  • Manufacturing

  • E-Commerce

This is the industry standard DR architecture.


2. Logical Standby Database

How It Works Internally

Instead of applying redo blocks, Oracle converts redo into SQL statements.

Primary
   |
Redo Logs
   |
LogMiner
   |
SQL Apply
   |
Logical Standby

Oracle uses:

LogMiner

to extract SQL from redo logs.


Example

Primary:

INSERT INTO EMP VALUES (101,'SCOTT');
COMMIT;

Redo arrives.

Logical standby converts it to:

INSERT INTO EMP VALUES (101,'SCOTT');

Then executes the SQL.


Internal Flow

Redo
  |
LogMiner
  |
Generate SQL
  |
Execute SQL
  |
Update Standby

Read/Write Capability

Read = Yes
Write = Yes

Users can create additional:

  • Tables

  • Indexes

  • Materialized Views

  • Reporting Objects

without affecting replication.


Advantages

✔ Open read/write

✔ Useful for reporting

✔ Can create local objects

✔ Supports heterogeneous workloads


Limitations

✖ More CPU intensive

✖ SQL Apply slower than Redo Apply

✖ Not all datatypes supported

✖ More administration

✖ Not widely used today


Real Production Usage

Usually for:

  • Reporting databases

  • Data warehouse environments

  • Separate reporting schemas

Example:

Primary OLTP
       |
Logical Standby
       |
BI Reports

3. Snapshot Standby Database

How It Works Internally

Snapshot standby begins as a physical standby.

Physical Standby
       |
Convert
       |
Snapshot Standby

When converted:

  • Redo continues arriving

  • Redo is NOT applied

Oracle stores incoming redo for later use.


Internal Architecture

Primary
   |
Redo Transport
   |
Snapshot Standby
   |
Redo Stored
(Not Applied)

Users can modify data freely.


Example

Developer creates:

DELETE FROM EMP;
COMMIT;

Allowed.

Later:

Convert back to Physical Standby

Oracle discards local changes and applies accumulated redo.


Read/Write Capability

Read = Yes
Write = Yes

Fully read/write.


Advantages

✔ Testing environment

✔ Application upgrade testing

✔ Patch validation

✔ Production-like testing


Limitations

✖ Not DR ready

✖ Cannot failover

✖ Redo apply paused

✖ Local changes lost after conversion


Real Production Usage

Common for:

Application Upgrade Testing

Production
      |
Snapshot Standby
      |
Upgrade Testing

Patch Testing

Patch Validation
Before Production Rollout

Which Standby Type Do Companies Prefer?

Enterprise Reality

If you visit 100 companies:

Standby TypeUsage Percentage
Physical Standby~90%
Snapshot Standby~8%
Logical Standby~2%

Why Physical Standby Wins

Because organizations need:

✔ Disaster Recovery

✔ Fast Failover

✔ Data Protection

✔ Simplicity

✔ Low Administration

Physical standby provides all of these.


Licensing Considerations

Physical Standby

Included with Data Guard.


Active Data Guard

Requires additional license.

Provides:

✔ Read-only reporting

✔ Real-time query

✔ Automatic block repair


Logical Standby

Included with Data Guard.


Snapshot Standby

Included with Data Guard.


Interview Answer (2-Minute Version)

Physical Standby is an exact block-level copy of the primary database. Redo is applied using Managed Recovery Process (MRP). It is mainly used for High Availability and Disaster Recovery and is the most widely used standby type in production.

Logical Standby converts redo into SQL statements using LogMiner and applies them through SQL Apply. It supports read-write access and is mainly used for reporting environments, though it is less common due to complexity and datatype limitations.

Snapshot Standby is a temporary read-write version of a physical standby. Redo continues to be received but is not applied until the database is converted back to physical standby. It is commonly used for testing, patch validation, and upgrade rehearsals.

In real enterprise environments, Physical Standby is the preferred and most commonly deployed Data Guard configuration because it provides the best balance of data protection, performance, and operational simplicity.

Oracle DBA Interview Questions & Answers (L2/L3/Senior DBA)

 


1. Patching & Maintenance

Q1. What is Database Patching?

Database patching is the process of applying Oracle-provided fixes to resolve bugs, security vulnerabilities, performance issues, and stability problems without reinstalling the database software.

Real-time Example:
We regularly apply Release Updates (RU) on Oracle 19c every quarter to stay compliant with Oracle Critical Patch Updates (CPU).


Q2. What is an Oracle Patch?

An Oracle patch is a software update released by Oracle to fix:

  • Security vulnerabilities

  • Database bugs

  • RAC issues

  • ASM issues

  • Performance problems


Q3. Why is Patching Important?

Benefits:

  • Security compliance

  • Bug fixes

  • Improved stability

  • Better performance

  • Oracle support certification

Failure to patch may expose systems to known security exploits.


Q4. Types of Oracle Patches

Interim Patch

  • Fixes specific bug

Release Update (RU)

  • Quarterly patch bundle

Release Update Revision (RUR)

  • Additional fixes on RU

OJVM Patch

  • Java VM security fixes

One-Off Patch

  • Single bug fix

Grid Infrastructure Patch

  • RAC/ASM/Clusterware fixes


Q5. What is PSU Patching?

PSU (Patch Set Update) was used in Oracle 11g and 12c.

It contained:

  • Security fixes

  • Selected bug fixes

PSU has now been replaced by RU (Release Updates).


Q6. How to Verify PSU/RU Installed?

SQL

SELECT * FROM dba_registry_sqlpatch;

OPatch

opatch lsinventory

Q7. How to Check OPatch Conflicts?

opatch prereq CheckConflictAgainstOHWithDetail \
-phBaseDir /tmp/patch

Interviewers like this answer because it shows practical experience.


Q8. Have You Applied OEM Patches?

Yes.

OEM patching involves:

  • OMS patch

  • Agent patch

  • Plug-in patch

Commands:

emctl stop oms -all
opatch apply
emctl start oms

Always validate OMS and repository after patching.


Q9. How to Patch if OraInventory is Missing?

Recreate inventory

runInstaller -silent \
-attachHome \
ORACLE_HOME=/u01/app/oracle/product/19c

Verify:

opatch lsinventory

Q10. Why RAC Shows Rolling Patch Status?

Check:

crsctl query crs activeversion

Occurs when:

  • One node patched

  • Remaining nodes not patched

Complete patching on all nodes:

rootcrs.sh -postpatch

Then rolling patch mode disappears.


2. ASM, RAC & Clusterware

Q11. Have You Worked on ASM?

Yes.

Typical architecture:

Disk
 ↓
Failure Group
 ↓
Disk Group
 ↓
ASM Instance
 ↓
Database

Common disk groups:

  • DATA

  • FRA

  • OCR

  • VOTE


Q12. What is Cache Fusion?

Cache Fusion is RAC's mechanism to transfer data blocks directly between instances through the private interconnect.

Instead of:

Disk → Instance

Oracle performs:

Instance A → Interconnect → Instance B

Benefits:

  • Reduced disk IO

  • Faster block access


Q13. What is SCAN Listener?

SCAN = Single Client Access Name

Benefits:

  • Single connection string

  • Load balancing

  • Failover

Example:

myrac-scan.company.com

Usually resolves to 3 IP addresses.


Q14. How to Locate SCAN Listener?

srvctl config scan
srvctl config scan_listener
srvctl status scan_listener

Q15. What is TFA?

Trace File Analyzer

Used for:

  • RAC troubleshooting

  • Log collection

  • Incident diagnostics

Commands:

tfactl status
tfactl diagcollect

Q16. What is AWR?

Automatic Workload Repository

Captures:

  • SQL statistics

  • Wait events

  • IO activity

  • Memory usage

Snapshots taken every hour by default.


3. Backup & Recovery

Q17. What is Recovery Catalog?

Separate schema/database used by RMAN to store:

  • Backup metadata

  • Scripts

  • Recovery history

Benefits:

  • Centralized backup management

  • Longer retention


Q18. Media Recovery vs Crash Recovery

Crash Recovery

Occurs automatically after instance failure.

Uses:

Online Redo Logs

No DBA intervention required.


Media Recovery

Required when datafiles are lost.

Uses:

Backup + Archive Logs

Requires DBA action.


Q19. Restore vs Recover

Restore

Copies backup files back.

Backup → Disk

Recover

Applies redo logs.

Datafiles + Redo

Restore alone is not sufficient.


Q20. Thousands of Archive Log Gaps in Standby

Check gap

SELECT * FROM v$archive_gap;

Solution

  1. Incremental backup from SCN

BACKUP INCREMENTAL FROM SCN;
  1. Transfer backup

  2. Recover standby

This is faster than transferring thousands of logs.


Q21. Prevent Block Corruption in Standby

Enable:

db_block_checking=FULL
db_block_checksum=FULL

RMAN:

BACKUP VALIDATE

Data Guard automatically detects physical corruption during redo apply.


4. Data Guard & Disaster Recovery

Q22. Have You Worked on DR?

Yes.

Experience includes:

  • Physical Standby

  • Switchover

  • Failover

  • DR Drills

  • Active Data Guard


Q23. Logs Not Reaching Standby

Check:

Network

tnsping standby

Archive Destination

SELECT dest_name,status,error
FROM v$archive_dest;

Listener

lsnrctl status

Data Guard Processes

SELECT process,status
FROM v$managed_standby;

Q24. Physical vs Snapshot Standby

Physical Standby

  • Read-only

  • Redo Apply ON

  • DR ready

Snapshot Standby

  • Read/Write

  • Redo Apply OFF

  • Used for testing

Can be converted back later.


5. Performance Tuning

Q25. Important Sections in AWR

Focus on:

Database Time

Top waits

Top SQL

High resource SQL

Load Profile

Transactions/sec

Wait Events

IO bottlenecks

RAC Statistics

Global cache waits

Advisory Section

Memory recommendations


Q26. Oracle Performance Tools

  • AWR

  • ASH

  • ADDM

  • OEM

  • SQL Monitor

  • SQL Tuning Advisor

  • Explain Plan

  • TKPROF


Q27. Why Rebuild Index?

Reasons:

  • Excessive fragmentation

  • High deleted leaf blocks

  • Space reclamation

ALTER INDEX idx_name REBUILD ONLINE;

Do not rebuild routinely without evidence.


Q28. Production Slow During Peak Hours

Step 1

Check database load

SELECT * FROM v$sysmetric;

Step 2

Generate AWR

Step 3

Check Top Wait Events

Examples:

db file sequential read
log file sync
gc buffer busy

Step 4

Check blocking sessions

v$session

Step 5

Check storage latency

iostat -x

Step 6

Tune SQL

Step 7

Increase resources if required

Real Production Example:
A batch job introduced a Cartesian join causing CPU to hit 95%. After creating a composite index and gathering statistics, execution time reduced from 3 hours to 12 minutes.


6. Archive Log Full Scenario

Q29. Archive Destination Full

Immediate Actions

Identify usage:

df -h

Delete obsolete backups:

DELETE OBSOLETE;

Backup archive logs:

BACKUP ARCHIVELOG ALL DELETE INPUT;

Add temporary storage.

Resume database:

ALTER SYSTEM ARCHIVE LOG CURRENT;

Long-Term Prevention

  • FRA monitoring

  • OEM alerts

  • Archive deletion policy

CONFIGURE ARCHIVELOG DELETION POLICY;
  • Increase FRA size

  • Fix Data Guard lag issues


7. Database Administration

Q30. Minimum Parameters for Table Export

expdp system/password \
tables=EMP \
directory=DATA_PUMP_DIR \
dumpfile=emp.dmp \
logfile=emp.log

Required:

  • TABLES

  • DIRECTORY

  • DUMPFILE


Q31. High-Level Cloning Steps

Source

  • Take RMAN backup

Target

  • Restore controlfile

  • Restore datafiles

  • Recover database

Rename

nid

Update

spfile
listener
tnsnames

Open database


Q32. What is Table Partitioning?

Partitioning divides a large table into smaller logical segments.

Types

  • Range

  • List

  • Hash

  • Composite

Benefits

  • Faster queries

  • Faster maintenance

  • Partition pruning

  • Easier archival

Example:

Instead of one 5 TB SALES table:

SALES_2023
SALES_2024
SALES_2025
SALES_2026

Oracle scans only required partitions, significantly improving performance.


Senior DBA Interview Closing Statement

When answering scenario-based questions, use the format:

Problem → Analysis → Action → Result

Example:

Problem: Archive destination reached 100%.

Analysis: FRA full due to standby transport issue.

Action: Cleared obsolete archives, increased FRA temporarily, fixed Data Guard transport.

Result: Database resumed without downtime and preventive monitoring was implemented.

This structure demonstrates production support experience, which is exactly what interviewers look for in Oracle DBA candidates with 8–15+ years of experience.