Monday, April 13, 2026

ORA-01555 on Standby (Active Data Guard)



๐Ÿšจ ORA-01555 on Standby (Active Data Guard)

๐Ÿ“˜ RCA + Troubleshooting + Prevention Guide


๐ŸŽฏ Objective

To analyze and resolve ORA-01555 snapshot too old error occurring on a physical standby (Active Data Guard) and establish preventive best practices.


๐Ÿงญ 1. Problem Summary

  • ORA-01555 observed on standby database

  • Same queries working on primary

  • Occurring intermittently across multiple SQL IDs

  • Query duration varied from seconds to hours


๐Ÿ” 2. Key Observations

  • Multiple SQL IDs → ❌ Not SQL-specific issue

  • Query duration inconsistent → ❌ Not purely long-running query issue

  • Same SQL works on primary → ✅ Indicates standby-specific behavior

  • Undo retention = 900 sec (15 mins)


๐Ÿง  3. Core Concept

๐Ÿ”‘ What is ORA-01555?

Occurs when Oracle cannot reconstruct consistent read (CR) due to missing undo.

๐Ÿ‘‰ Based on consistent read mechanism


⚠️ 4. Why ORA-01555 Happens on Standby

❗ Important Difference

PrimaryStandby
User DML generates undoRedo apply updates undo
Controlled locallyDriven by primary

๐Ÿ‘‰ Even without user activity:

  • Redo apply modifies undo blocks

  • Old undo gets overwritten


๐Ÿ”ฅ 5. Critical Insight (Most Important)

๐Ÿšจ UNDO_RETENTION on standby has NO EFFECT

✔ Effective undo retention = Primary database setting


๐Ÿ“Š 6. Root Cause Analysis

๐Ÿงฉ Root Cause 1: Insufficient Undo Retention

  • Query duration > undo retention

  • Example:

    • Query: 316846 sec (~88 hours)

    • Undo retention: 900 sec

๐Ÿ‘‰ Clearly insufficient


๐Ÿงฉ Root Cause 2: Redo Apply Lag (SCN Jump Issue)

❗ Problem Scenario:

  • Missing / insufficient Standby Redo Logs (SRLs)

  • No Real-Time Apply

  • Batch redo apply


๐Ÿ“‰ SCN Jump Behavior

Primary SCN:   200 → 210 → 220 → 230 → 240 → 250
Standby SCN:   200 ───────────────→ 250 (jump)

Impact:

  • Query starts at old SCN (200)

  • Undo for SCN 200 already overwritten

  • ❌ ORA-01555 triggered


⚙️ 7. Real-Time Apply Importance

๐Ÿ‘‰ With proper configuration:

  • Continuous SCN progression

  • Better read consistency

  • Reduced ORA-01555 risk


๐Ÿงช 8. Scenario Explanation

๐ŸŸข Primary

  • Query SCN: 240

  • Undo available → Query succeeds

๐Ÿ”ด Standby

  • Query SCN: 200

  • Undo expired → Query fails


๐Ÿ› ️ 9. Troubleshooting Checklist

✅ Step 1: Validate Data Guard Configuration

Use:

DGMGRL> validate database verbose <standby_db>;

๐Ÿšจ Check:

  • Insufficient SRLs

  • Apply lag

  • Sync status


✅ Step 2: Verify Standby Redo Logs

✔ Requirements:

  • Same size as online redo logs

  • At least +1 group per thread


✅ Step 3: Check Real-Time Apply

SELECT PROCESS, STATUS FROM V$MANAGED_STANDBY;

✔ Expected:

  • MRP running in real-time apply mode


✅ Step 4: Check Undo Retention

SHOW PARAMETER undo_retention;

๐Ÿ‘‰ Change on PRIMARY ONLY


✅ Step 5: Check Tuned Undo Retention

SELECT MAX(TUNED_UNDORETENTION) FROM V$UNDOSTAT;

✅ Step 6: Check Apply Lag

SELECT NAME, VALUE FROM V$DATAGUARD_STATS;

๐Ÿ”ง 10. Resolution Steps

✔ Fix 1: Configure SRLs Properly

  • Add missing standby redo logs

  • Match size with primary redo logs


✔ Fix 2: Enable Real-Time Apply

ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT;

✔ Fix 3: Increase Undo Retention (PRIMARY)

ALTER SYSTEM SET undo_retention=3600 SCOPE=BOTH;

✔ Fix 4: Resize Undo Tablespace

  • Ensure space supports higher retention


๐Ÿš€ 11. Prevention Strategy

AreaAction
Data GuardAlways use Real-Time Apply
SRLProper sizing & count
UndoSize for longest query
MonitoringTrack apply lag
ReportingAvoid extremely long queries

๐Ÿ’ก 12. Key Takeaways

  • ORA-01555 on standby ≠ normal undo issue

  • Undo retention controlled by primary only

  • Redo apply lag can trigger early failures

  • SCN jumps break consistent reads


๐Ÿง  13. Interview-Ready Explanation (๐Ÿ”ฅ Must Use)

“We encountered ORA-01555 on Active Data Guard standby where queries were failing despite working on primary.

Root cause was twofold: insufficient undo retention on primary for long-running queries, and improper standby redo log configuration causing SCN jumps due to batch redo apply.

We resolved it by configuring SRLs correctly, enabling real-time apply, and resizing undo tablespace on primary to support longer retention. This stabilized consistent reads on standby.”


๐Ÿ Conclusion

In **Oracle Active Data Guard environments:

๐Ÿ‘‰ Consistent read depends on:

  • Primary undo availability

  • Redo apply behavior

๐Ÿ‘‰ Proper configuration ensures:

  • Reliable reporting

  • No ORA-01555 surprises



No comments:

Post a Comment