๐จ ORA-01555 on Standby (Active Data Guard)
๐ RCA + Troubleshooting + Prevention Guide
๐ฏ Objective
To analyze and resolve ORA-01555 snapshot too old error occurring on a physical standby (Active Data Guard) and establish preventive best practices.
๐งญ 1. Problem Summary
ORA-01555 observed on standby database
Same queries working on primary
Occurring intermittently across multiple SQL IDs
Query duration varied from seconds to hours
๐ 2. Key Observations
Multiple SQL IDs → ❌ Not SQL-specific issue
Query duration inconsistent → ❌ Not purely long-running query issue
Same SQL works on primary → ✅ Indicates standby-specific behavior
Undo retention = 900 sec (15 mins)
๐ง 3. Core Concept
๐ What is ORA-01555?
Occurs when Oracle cannot reconstruct consistent read (CR) due to missing undo.
๐ Based on consistent read mechanism
⚠️ 4. Why ORA-01555 Happens on Standby
❗ Important Difference
| Primary | Standby |
|---|---|
| User DML generates undo | Redo apply updates undo |
| Controlled locally | Driven by primary |
๐ Even without user activity:
Redo apply modifies undo blocks
Old undo gets overwritten
๐ฅ 5. Critical Insight (Most Important)
๐จ UNDO_RETENTION on standby has NO EFFECT
✔ Effective undo retention = Primary database setting
๐ 6. Root Cause Analysis
๐งฉ Root Cause 1: Insufficient Undo Retention
Query duration > undo retention
Example:
Query: 316846 sec (~88 hours)
Undo retention: 900 sec
๐ Clearly insufficient
๐งฉ Root Cause 2: Redo Apply Lag (SCN Jump Issue)
❗ Problem Scenario:
Missing / insufficient Standby Redo Logs (SRLs)
No Real-Time Apply
Batch redo apply
๐ SCN Jump Behavior
Primary SCN: 200 → 210 → 220 → 230 → 240 → 250
Standby SCN: 200 ───────────────→ 250 (jump)
Impact:
Query starts at old SCN (200)
Undo for SCN 200 already overwritten
❌ ORA-01555 triggered
⚙️ 7. Real-Time Apply Importance
๐ With proper configuration:
Continuous SCN progression
Better read consistency
Reduced ORA-01555 risk
๐งช 8. Scenario Explanation
๐ข Primary
Query SCN: 240
Undo available → Query succeeds
๐ด Standby
Query SCN: 200
Undo expired → Query fails
๐ ️ 9. Troubleshooting Checklist
✅ Step 1: Validate Data Guard Configuration
Use:
DGMGRL> validate database verbose <standby_db>;
๐จ Check:
Insufficient SRLs
Apply lag
Sync status
✅ Step 2: Verify Standby Redo Logs
✔ Requirements:
Same size as online redo logs
At least +1 group per thread
✅ Step 3: Check Real-Time Apply
SELECT PROCESS, STATUS FROM V$MANAGED_STANDBY;
✔ Expected:
MRP running in real-time apply mode
✅ Step 4: Check Undo Retention
SHOW PARAMETER undo_retention;
๐ Change on PRIMARY ONLY
✅ Step 5: Check Tuned Undo Retention
SELECT MAX(TUNED_UNDORETENTION) FROM V$UNDOSTAT;
✅ Step 6: Check Apply Lag
SELECT NAME, VALUE FROM V$DATAGUARD_STATS;
๐ง 10. Resolution Steps
✔ Fix 1: Configure SRLs Properly
Add missing standby redo logs
Match size with primary redo logs
✔ Fix 2: Enable Real-Time Apply
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT;
✔ Fix 3: Increase Undo Retention (PRIMARY)
ALTER SYSTEM SET undo_retention=3600 SCOPE=BOTH;
✔ Fix 4: Resize Undo Tablespace
Ensure space supports higher retention
๐ 11. Prevention Strategy
| Area | Action |
|---|---|
| Data Guard | Always use Real-Time Apply |
| SRL | Proper sizing & count |
| Undo | Size for longest query |
| Monitoring | Track apply lag |
| Reporting | Avoid extremely long queries |
๐ก 12. Key Takeaways
ORA-01555 on standby ≠ normal undo issue
Undo retention controlled by primary only
Redo apply lag can trigger early failures
SCN jumps break consistent reads
๐ง 13. Interview-Ready Explanation (๐ฅ Must Use)
“We encountered ORA-01555 on Active Data Guard standby where queries were failing despite working on primary.
Root cause was twofold: insufficient undo retention on primary for long-running queries, and improper standby redo log configuration causing SCN jumps due to batch redo apply.
We resolved it by configuring SRLs correctly, enabling real-time apply, and resizing undo tablespace on primary to support longer retention. This stabilized consistent reads on standby.”
๐ Conclusion
In **Oracle Active Data Guard environments:
๐ Consistent read depends on:
Primary undo availability
Redo apply behavior
๐ Proper configuration ensures:
Reliable reporting
No ORA-01555 surprises
No comments:
Post a Comment