Monday, April 13, 2026

ORA-01555 on Standby (Active Data Guard)



๐Ÿšจ ORA-01555 on Standby (Active Data Guard)

๐Ÿ“˜ RCA + Troubleshooting + Prevention Guide


๐ŸŽฏ Objective

To analyze and resolve ORA-01555 snapshot too old error occurring on a physical standby (Active Data Guard) and establish preventive best practices.


๐Ÿงญ 1. Problem Summary

  • ORA-01555 observed on standby database

  • Same queries working on primary

  • Occurring intermittently across multiple SQL IDs

  • Query duration varied from seconds to hours


๐Ÿ” 2. Key Observations

  • Multiple SQL IDs → ❌ Not SQL-specific issue

  • Query duration inconsistent → ❌ Not purely long-running query issue

  • Same SQL works on primary → ✅ Indicates standby-specific behavior

  • Undo retention = 900 sec (15 mins)


๐Ÿง  3. Core Concept

๐Ÿ”‘ What is ORA-01555?

Occurs when Oracle cannot reconstruct consistent read (CR) due to missing undo.

๐Ÿ‘‰ Based on consistent read mechanism


⚠️ 4. Why ORA-01555 Happens on Standby

❗ Important Difference

PrimaryStandby
User DML generates undoRedo apply updates undo
Controlled locallyDriven by primary

๐Ÿ‘‰ Even without user activity:

  • Redo apply modifies undo blocks

  • Old undo gets overwritten


๐Ÿ”ฅ 5. Critical Insight (Most Important)

๐Ÿšจ UNDO_RETENTION on standby has NO EFFECT

✔ Effective undo retention = Primary database setting


๐Ÿ“Š 6. Root Cause Analysis

๐Ÿงฉ Root Cause 1: Insufficient Undo Retention

  • Query duration > undo retention

  • Example:

    • Query: 316846 sec (~88 hours)

    • Undo retention: 900 sec

๐Ÿ‘‰ Clearly insufficient


๐Ÿงฉ Root Cause 2: Redo Apply Lag (SCN Jump Issue)

❗ Problem Scenario:

  • Missing / insufficient Standby Redo Logs (SRLs)

  • No Real-Time Apply

  • Batch redo apply


๐Ÿ“‰ SCN Jump Behavior

Primary SCN:   200 → 210 → 220 → 230 → 240 → 250
Standby SCN:   200 ───────────────→ 250 (jump)

Impact:

  • Query starts at old SCN (200)

  • Undo for SCN 200 already overwritten

  • ❌ ORA-01555 triggered


⚙️ 7. Real-Time Apply Importance

๐Ÿ‘‰ With proper configuration:

  • Continuous SCN progression

  • Better read consistency

  • Reduced ORA-01555 risk


๐Ÿงช 8. Scenario Explanation

๐ŸŸข Primary

  • Query SCN: 240

  • Undo available → Query succeeds

๐Ÿ”ด Standby

  • Query SCN: 200

  • Undo expired → Query fails


๐Ÿ› ️ 9. Troubleshooting Checklist

✅ Step 1: Validate Data Guard Configuration

Use:

DGMGRL> validate database verbose <standby_db>;

๐Ÿšจ Check:

  • Insufficient SRLs

  • Apply lag

  • Sync status


✅ Step 2: Verify Standby Redo Logs

✔ Requirements:

  • Same size as online redo logs

  • At least +1 group per thread


✅ Step 3: Check Real-Time Apply

SELECT PROCESS, STATUS FROM V$MANAGED_STANDBY;

✔ Expected:

  • MRP running in real-time apply mode


✅ Step 4: Check Undo Retention

SHOW PARAMETER undo_retention;

๐Ÿ‘‰ Change on PRIMARY ONLY


✅ Step 5: Check Tuned Undo Retention

SELECT MAX(TUNED_UNDORETENTION) FROM V$UNDOSTAT;

✅ Step 6: Check Apply Lag

SELECT NAME, VALUE FROM V$DATAGUARD_STATS;

๐Ÿ”ง 10. Resolution Steps

✔ Fix 1: Configure SRLs Properly

  • Add missing standby redo logs

  • Match size with primary redo logs


✔ Fix 2: Enable Real-Time Apply

ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT;

✔ Fix 3: Increase Undo Retention (PRIMARY)

ALTER SYSTEM SET undo_retention=3600 SCOPE=BOTH;

✔ Fix 4: Resize Undo Tablespace

  • Ensure space supports higher retention


๐Ÿš€ 11. Prevention Strategy

AreaAction
Data GuardAlways use Real-Time Apply
SRLProper sizing & count
UndoSize for longest query
MonitoringTrack apply lag
ReportingAvoid extremely long queries

๐Ÿ’ก 12. Key Takeaways

  • ORA-01555 on standby ≠ normal undo issue

  • Undo retention controlled by primary only

  • Redo apply lag can trigger early failures

  • SCN jumps break consistent reads


๐Ÿง  13. Interview-Ready Explanation (๐Ÿ”ฅ Must Use)

“We encountered ORA-01555 on Active Data Guard standby where queries were failing despite working on primary.

Root cause was twofold: insufficient undo retention on primary for long-running queries, and improper standby redo log configuration causing SCN jumps due to batch redo apply.

We resolved it by configuring SRLs correctly, enabling real-time apply, and resizing undo tablespace on primary to support longer retention. This stabilized consistent reads on standby.”


๐Ÿ Conclusion

In **Oracle Active Data Guard environments:

๐Ÿ‘‰ Consistent read depends on:

  • Primary undo availability

  • Redo apply behavior

๐Ÿ‘‰ Proper configuration ensures:

  • Reliable reporting

  • No ORA-01555 surprises



The Real Skill That Gets You Hired as a DBA

 Most DBAs prepare for interviews by listing tools.

Oracle. RMAN. Data Guard. OEM.

But in 2026, hiring decisions are not made on tools.

They are made on one question:

“๐‚๐š๐ง ๐ฒ๐จ๐ฎ ๐ก๐š๐ง๐๐ฅ๐ž ๐ฉ๐ซ๐จ๐๐ฎ๐œ๐ญ๐ข๐จ๐ง ๐ฐ๐ก๐ž๐ง ๐ญ๐ก๐ข๐ง๐ ๐ฌ ๐ ๐จ ๐ฐ๐ซ๐จ๐ง๐ ?”
Here’s what senior hiring managers actually look for.


๐—ฆ๐—ถ๐—ด๐—ป๐—ฎ๐—น 1: Problem-Solving Mindset
Strong candidates don’t jump to solutions.
They break problems down.

They can explain:
• what they observed
• how they narrowed it down
• why they chose a specific approach

This shows ๐ฌ๐ญ๐ซ๐ฎ๐œ๐ญ๐ฎ๐ซ๐ž๐ ๐ญ๐ก๐ข๐ง๐ค๐ข๐ง๐  ๐ฎ๐ง๐๐ž๐ซ ๐ฉ๐ซ๐ž๐ฌ๐ฌ๐ฎ๐ซ๐ž.

“๐‘ฎ๐’๐’๐’… ๐‘ซ๐‘ฉ๐‘จ๐’” ๐’‡๐’Š๐’™ ๐’Š๐’”๐’”๐’–๐’†๐’”. ๐‘ฎ๐’“๐’†๐’‚๐’• ๐‘ซ๐‘ฉ๐‘จ๐’” ๐’–๐’๐’…๐’†๐’“๐’”๐’•๐’‚๐’๐’… ๐’•๐’‰๐’†๐’Ž ๐’‡๐’Š๐’“๐’”๐’•.”

๐—ฆ๐—ถ๐—ด๐—ป๐—ฎ๐—น 2: Production RCA Capability ๐Ÿ”
Anyone can say “issue resolved.”
Few can explain ๐ฐ๐ก๐ฒ ๐ข๐ญ ๐ก๐š๐ฉ๐ฉ๐ž๐ง๐ž๐.

What stands out:
• connecting metrics, logs, and events
• identifying root cause vs symptom
• explaining impact and prevention

This is where real experience becomes visible.

“๐‘น๐’†๐’”๐’๐’๐’–๐’•๐’Š๐’๐’ ๐’„๐’๐’๐’”๐’†๐’” ๐’Š๐’๐’„๐’Š๐’…๐’†๐’๐’•๐’”. ๐‘น๐‘ช๐‘จ ๐’‘๐’“๐’†๐’—๐’†๐’๐’•๐’” ๐’•๐’‰๐’†๐’Ž.”

๐—ฆ๐—ถ๐—ด๐—ป๐—ฎ๐—น 3: Backup & DR Confidence
This is non-negotiable.

Hiring managers expect clarity on:
• RMAN strategies
• restore and recovery scenarios
• RPO/RTO discussions

Hesitation here signals ๐ซ๐ข๐ฌ๐ค ๐ข๐ง ๐ฉ๐ซ๐จ๐๐ฎ๐œ๐ญ๐ข๐จ๐ง ๐จ๐ฐ๐ง๐ž๐ซ๐ฌ๐ก๐ข๐ฉ.

๐—ฆ๐—ถ๐—ด๐—ป๐—ฎ๐—น 4: Monitoring & Observability Awareness

Modern DBAs don’t wait for failures.

They understand:
• wait events
• performance metrics
• alert patterns
• system behavior trends

Monitoring reflects ๐จ๐ฉ๐ž๐ซ๐š๐ญ๐ข๐จ๐ง๐š๐ฅ ๐ฆ๐š๐ญ๐ฎ๐ซ๐ข๐ญ๐ฒ.

“Strong DBAs detect early. Weak DBAs react late.”

๐—ฆ๐—ถ๐—ด๐—ป๐—ฎ๐—น 5: Automation & Efficiency Thinking

Manual processes don’t scale.

Strong candidates show:
• scripting ability (Shell/Python)
• automation of routine tasks
• consistency in operations

Automation shows you think beyond the immediate problem.


๐“๐ก๐ž ๐“๐ž๐œ๐ก๐ง๐ข๐œ๐š๐ฅ ๐‘๐ž๐š๐ฅ๐ข๐ญ๐ฒ

Hiring evaluation typically follows this model:
๐Š๐ง๐จ๐ฐ๐ฅ๐ž๐๐ ๐ž → ๐€๐ฉ๐ฉ๐ฅ๐ข๐œ๐š๐ญ๐ข๐จ๐ง → ๐ƒ๐ž๐œ๐ข๐ฌ๐ข๐จ๐ง-๐Œ๐š๐ค๐ข๐ง๐  → ๐Ž๐ฐ๐ง๐ž๐ซ๐ฌ๐ก๐ข๐ฉ

Most candidates stop at knowledge.
Senior DBAs demonstrate ownership under uncertainty.

๐Š๐ž๐ฒ ๐“๐š๐ค๐ž๐š๐ฐ๐š๐ฒ๐ฌ

• Problem-solving matters more than tool knowledge
• RCA capability separates mid-level from senior DBAs
• Backup, DR, and monitoring define production readiness

In 2026, Oracle DBAs are not just administrators.
They are ๐ซ๐ž๐ฅ๐ข๐š๐›๐ข๐ฅ๐ข๐ญ๐ฒ ๐ž๐ง๐ ๐ข๐ง๐ž๐ž๐ซ๐ฌ ๐ซ๐ž๐ฌ๐ฉ๐จ๐ง๐ฌ๐ข๐›๐ฅ๐ž ๐Ÿ๐จ๐ซ ๐›๐ฎ๐ฌ๐ข๐ง๐ž๐ฌ๐ฌ ๐œ๐จ๐ง๐ญ๐ข๐ง๐ฎ๐ข๐ญ๐ฒ.

Thursday, April 9, 2026

Automated Health Check Script (Shell + HTML report)

I’ve created a production-ready automated health check script in the canvas for you ๐Ÿ‘

https://github.com/shashidba1208/shashidba/blob/master/oracle_automated_health_check_script_shell_html

๐Ÿ”ฅ What this script does:

  • Connects via sqlplus / as sysdba

  • Generates a timestamped HTML report

  • Covers:

    • ✅ Uptime

    • ✅ DB Status

    • ✅ Tablespace usage (with color coding ๐ŸŸข๐ŸŸก๐Ÿ”ด)

    • ✅ ASM usage

    • ✅ RMAN backup status

    • ✅ Active sessions

    • ✅ Invalid objects

    • ✅ Data Guard stats


๐Ÿš€ How to Use

1. Save script

vi db_health_check.sh

2. Give permission

chmod +x db_health_check.sh

3. Run manually

./db_health_check.sh

4. Schedule via cron

crontab -e

Example (run daily at 8 AM):

0 8 * * * /path/db_health_check.sh

๐Ÿ“Š Output

  • HTML report generated in:

/tmp/db_health/

Open in browser:

xdg-open report.html


Oracle Database Health Check & Monitoring


# ๐Ÿ“Š Oracle Database Health Monitoring SOP


## ๐ŸŽฏ Objective


To proactively monitor Oracle database health, detect issues early, and ensure high availability, performance, and data integrity across environments (Standalone / RAC / Data Guard).


---


# ๐Ÿงญ 1. Daily Health Check Overview


| Category    | Check                             | Frequency |

| ----------- | --------------------------------- | --------- |

| Instance    | Uptime, Status                    | Daily     |

| Database    | Role, Open Mode                   | Daily     |

| PDB         | Open Status                       | Daily     |

| Backup      | Last RMAN Backup                  | Daily     |

| Storage     | ASM, Tablespace                   | Daily     |

| Performance | Active Sessions                   | Daily     |

| DR          | Data Guard Sync                   | Daily     |

| Integrity   | Block Corruption, Invalid Objects | Daily     |


---


# ๐ŸŸข 2. Instance & Database Status


## ✅ Database Uptime


```sql

SELECT 

    'UPTIME: ' || 

    FLOOR(SYSDATE - STARTUP_TIME) || ' days, ' ||

    FLOOR(MOD((SYSDATE - STARTUP_TIME) * 24, 24)) || ' hours, ' ||

    FLOOR(MOD((SYSDATE - STARTUP_TIME) * 24 * 60, 60)) || ' minutes'

FROM GV$INSTANCE;

```


### ✔️ Expected:


* Should be stable (no frequent restarts)


### ๐Ÿšจ Alert If:


* Unexpected restart observed


---


## ✅ Database Role & Status


```sql

SELECT database_role, name, db_unique_name, open_mode, log_mode, 

       flashback_on, protection_mode, protection_level 

FROM gv$database;

```


### ✔️ Expected:


* PRIMARY → READ WRITE

* STANDBY → READ ONLY / MOUNTED


---


# ๐Ÿงฉ 3. PDB Health Check


## ✅ PDB Status


```sql

SELECT inst_id, con_id, name, open_mode, restricted,

       TO_CHAR(open_time,'DD-MON-YYYY HH24:MI:SS') open_time

FROM gv$pdbs;

```


### ✔️ Expected:


* All required PDBs in **READ WRITE**


---


## ✅ PDB Size


```sql

SELECT con_id, name, open_mode,

       total_size/1024/1024/1024 "PDB_SIZE_GB"

FROM v$pdbs;

```


---


# ๐Ÿ’พ 4. Backup Monitoring


## ✅ Last Backup Status


```sql

SELECT 

    'LAST BACKUP: ' || 

    INPUT_TYPE || ' - ' || 

    STATUS || ' - ' || 

    TO_CHAR(START_TIME, 'DD-MON-YYYY HH24:MI') || ' to ' ||

    TO_CHAR(END_TIME, 'DD-MON-YYYY HH24:MI')

FROM V$RMAN_BACKUP_JOB_DETAILS

WHERE ROWNUM = 1;

```


### ✔️ Expected:


* Status = COMPLETED


### ๐Ÿšจ Alert If:


* FAILED / RUNNING too long / No backup


---


# ⚡ 5. Performance Monitoring


## ✅ Active Sessions


```sql

SELECT count(*), inst_id, status 

FROM gv$session 

GROUP BY inst_id, status;

```


### ✔️ Expected:


* Balanced ACTIVE vs INACTIVE sessions


### ๐Ÿšจ Alert If:


* Too many ACTIVE sessions → Possible load issue


---


# ๐Ÿ’ฝ 6. Storage Monitoring


## ✅ ASM Diskgroup Usage


```sql

SELECT NAME,

       TOTAL_MB/1024 TOTAL_GB,

       FREE_MB/1024 FREE_GB,

       (FREE_MB/TOTAL_MB)*100 FREE_PER

FROM v$asm_diskgroup;

```


### ✔️ Threshold:


* Free space > 20%


---


## ✅ Tablespace Usage


```sql

SELECT file_name, tablespace_name, bytes/1024/1024 MB, status 

FROM dba_data_files;

```


---


## ✅ TEMP Tablespace


```sql

SELECT tablespace_name, 

       GB_total, 

       GB_used, 

       GB_free

FROM ...

```


---


## ✅ UNDO Tablespace


```sql

SELECT tablespace_name, 

       tot_ts_size_GB, 

       free_ts_size_GB, 

       usedspace 

FROM ...

```


---


# ๐Ÿ” 7. Data Guard Monitoring


## ✅ Archive Sync Status


```sql

SELECT a.thread#, b.last_seq, a.applied_seq,

       b.last_seq-a.applied_seq ARC_DIFF

FROM ...

```


### ✔️ Expected:


* ARC_DIFF ≈ 0


### ๐Ÿšจ Alert If:


* Lag increasing


---


## ✅ Data Guard Stats


```sql

SELECT name, value, unit, time_computed 

FROM v$dataguard_stats;

```


---


# ๐Ÿ”’ 8. Locks & Blocking Sessions


## ✅ Blocking Locks


```sql

SELECT * FROM v$lock WHERE block != 0;

```


### ๐Ÿšจ Action:


* Identify blocking session

* Kill if required


---


# ⚠️ 9. Database Integrity Checks


## ❌ Block Corruption


```sql

SELECT * FROM v$database_block_corruption;

```


### ๐Ÿšจ Action:


* Restore block using RMAN


---


## ❌ Invalid Objects


```sql

SELECT owner, object_type, COUNT(*) 

FROM dba_objects 

WHERE status = 'INVALID'

GROUP BY owner, object_type;

```


### ๐Ÿšจ Action:


```sql

EXEC UTL_RECOMP.RECOMP_SERIAL();

```


---


# ๐Ÿšจ 10. Alert Threshold Summary


| Metric           | Warning      | Critical            |

| ---------------- | ------------ | ------------------- |

| ASM Free Space   | < 20%        | < 10%               |

| Tablespace Usage | > 80%        | > 90%               |

| Active Sessions  | Sudden spike | System slowdown     |

| Data Guard Lag   | > 5 logs     | Continuous increase |

| Backup           | Delayed      | Failed              |


---

##################


Sunday, April 5, 2026

ORACLE DATA GUARD 19c Daily Monitoring Guide

ORACLE DATA GUARD 19c Daily Monitoring Guide

 

 

1.  Check standby database role and status (run on STANDBY)

select INST_ID,INSTANCE_NAME, name DB_NAME, db_unique_name,HOST_NAME,DATABASE_ROLE,OPEN_MODE,status, to_char(STARTUP_TIME,'DD-MON-YYYY HH24:MI:SS') "DB UP TIME" from v$database,gv$instance;

 

2.  Check if redo apply (MRP) is running on standby

 

3.  Check apply lag (run on STANDBY)

4.  Check lag using v$managed_standby (MRP process)

 

 

 

5.  Archive log GAP

 

6.  Check all Data Guard processes (run on STANDBY)

 

7.  Verify Data Guard Configuration Status

 

Dgmgrl / as sysdba

Show configuration;

8.  DGMGRL>Check Database Status (Primary & Standby)

dgmgrl -silent / <<EOF >> $LOGFILE

SHOW CONFIGURATION;

DGMGRL> SHOW DATABASE VERBOSE <PRIMARY_DB>;

DGMGRL> SHOW DATABASE VERBOSE <STANDBY_DB>;

 

9.  DGMGRL>Validate Database Health

dgmgrl -silent / <<EOF >> $LOGFILE

SHOW CONFIGURATION;

DGMGRL> VALIDATE DATABASE VERBOSE <PRIMARY_DB>;

DGMGRL> VALIDATE DATABASE VERBOSE <STANDBY_DB>;

 

 

10.              Daily Data Guard Monitoring Checklist

 

 

SQL Script (dg_check.sql)

SET LINES 200

SET PAGES 200

PROMPT === DATABASE STATUS ===

SELECT NAME, DATABASE_ROLE, OPEN_MODE FROM V$DATABASE;

 

PROMPT === DATAGUARD STATS ===

SELECT NAME, VALUE FROM V$DATAGUARD_STATS;

 

PROMPT === ARCHIVE DEST ===

SELECT DEST_ID, STATUS, ERROR FROM V$ARCHIVE_DEST;

 

PROMPT === MRP STATUS ===

SELECT PROCESS, STATUS FROM V$MANAGED_STANDBY;

 

PROMPT === ARCHIVE GAP ===

SELECT * FROM V$ARCHIVE_GAP;

 

PROMPT === FRA USAGE ===

SELECT SPACE_LIMIT, SPACE_USED FROM V$RECOVERY_FILE_DEST;

 

Data Guard Daily Monitoring Automation Script

 

Shell Script (dg_health_check.sh)

 

#!/bin/bash

 

# Set Oracle Environment

export ORACLE_SID=YOUR_DB_SID

export ORACLE_HOME=/u01/app/oracle/product/19.0.0/dbhome_1

export PATH=$ORACLE_HOME/bin:$PATH

 

# Date

DATE=$(date +"%Y-%m-%d_%H-%M")

LOGFILE=/tmp/dg_health_$DATE.log

 

echo "=========================================" >> $LOGFILE

echo " Data Guard Daily Health Check - $DATE " >> $LOGFILE

echo "=========================================" >> $LOGFILE

 

# Run SQL Checks

sqlplus -s / as sysdba<<EOF >> $LOGFILE

 

SET LINESIZE 200

SET PAGESIZE 200

 

PROMPT === DATABASE ROLE & STATUS ===

SELECT NAME, DATABASE_ROLE, OPEN_MODE, SWITCHOVER_STATUS FROM V\\$DATABASE;

 

PROMPT === DATAGUARD STATS ===

SELECT NAME, VALUE, UNIT FROM V\\$DATAGUARD_STATS;

 

PROMPT === ARCHIVE DEST STATUS ===

SELECT DEST_ID, STATUS, ERROR FROM V\\$ARCHIVE_DEST WHERE STATUS <> 'INACTIVE';

 

PROMPT === ARCHIVE GAP ===

SELECT * FROM V\\$ARCHIVE_GAP;

 

PROMPT === MANAGED STANDBY ===

SELECT PROCESS, STATUS, THREAD#, SEQUENCE# FROM V\\$MANAGED_STANDBY;

 

PROMPT === STANDBY REDO LOGS ===

SELECT GROUP#, THREAD#, SEQUENCE#, STATUS FROM V\\$STANDBY_LOG;

 

PROMPT === FLASHBACK STATUS ===

SELECT FLASHBACK_ON FROM V\\$DATABASE;

 

PROMPT === FRA USAGE ===

SELECT NAME, SPACE_LIMIT, SPACE_USED FROM V\\$RECOVERY_FILE_DEST;

 

EXIT;

EOF

 

# Run DGMGRL Checks

echo "=== DGMGRL CONFIGURATION ===" >> $LOGFILE

 

dgmgrl -silent / <<EOF >> $LOGFILE

SHOW CONFIGURATION;

DGMGRL> SHOW DATABASE VERBOSE <PRIMARY_DB>;

DGMGRL> SHOW DATABASE VERBOSE <STANDBY_DB>;

DGMGRL> VALIDATE DATABASE VERBOSE <PRIMARY_DB>;

DGMGRL> VALIDATE DATABASE VERBOSE <STANDBY_DB>;

EXIT;

EOF

 

# Basic Alert Check

echo "=== ALERT LOG ERRORS (last 50 lines) ===" >> $LOGFILE

tail -50 $ORACLE_BASE/diag/rdbms/*/*/trace/alert_*.log >> $LOGFILE

 

# Output location

echo "Health check completed. Log: $LOGFILE"