π¨ War Story 1: ORA-01555 on Standby (Active Data Guard)
Situation
Reporting team complained:
“Queries failing randomly on standby with ORA-01555”
Observation
Only happening on standby (Active Data Guard)
Same query working fine on primary
Undo retention = 900 sec
Long-running reports
Root Cause
On Active Data Guard:
Undo is not retained the same way as primary
Heavy redo apply + long queries = undo overwritten
Fix
Increased undo retention:
ALTER SYSTEM SET undo_retention=3600;
Enabled:
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT;
Tuned reporting queries
Lesson
π Standby is not a reporting replica like a data warehouse
π Treat Active Data Guard carefully for long queries
π¨ War Story 2: ORA-16532 – Broker Config Missing
Situation
Post-maintenance:
SHOW CONFIGURATION;
ORA-16532: broker configuration does not exist
Panic
Team thought:
“Data Guard configuration is gone π¨”
Root Cause
Connected to:
Standby instead of Primary
Fix
dgmgrl sys@primary
Lesson
π Broker metadata is controlled from primary
π Always verify connection before troubleshooting
π¨ War Story 3: Switchover Fails with ORA-16516
Situation
During DR drill:
SWITCHOVER TO standby;
ORA-16516
Observation
Lag = 0
Broker = SUCCESS
Everything looked perfect
Hidden Issue
Standby was:
π READ ONLY WITH APPLY (Active Data Guard)
Fix
EDIT DATABASE standby SET STATE='APPLY-OFF';
Then:
SHUTDOWN IMMEDIATE;
STARTUP MOUNT;
Retry switchover → ✅ SUCCESS
Lesson
π Switchover requires MOUNT mode, not Active Data Guard
π¨ War Story 4: Switchover Stuck – NOT ALLOWED
Situation
SELECT SWITCHOVER_STATUS FROM V$DATABASE;
NOT ALLOWED
Investigation
Checked standby:
SELECT PROCESS, STATUS FROM V$MANAGED_STANDBY;
MRP0 APPLYING_LOG
Checked primary:
SHOW PARAMETER log_archive_dest_state_2;
RESET ❌
Root Cause
Redo transport disabled:
π log_archive_dest_state_2=RESET
Fix
ALTER SYSTEM SET log_archive_dest_state_2=ENABLE;
After few minutes:
RESOLVABLE GAP → TO STANDBY
Switchover → ✅ SUCCESS
Lesson
π Switchover depends on redo transport health
π Always check archive destination state
π¨ War Story 5: Redo Lag Suddenly Increased (Network Issue)
Situation
Monitoring alert:
Transport lag = 45 minutes
Apply lag increasing
Checks
SHOW DATABASE VERBOSE standby;
Found:
TransportDisconnectedThresholdbreached
Root Cause
Network latency spike
Packet drops between DCs
Fix
Switched to ASYNC temporarily:
EDIT DATABASE standby SET PROPERTY LogXptMode='ASYNC';
Network team fixed issue
Switched back to SYNC
Lesson
π Network is the backbone of Data Guard
π Always coordinate with infra team
π¨ War Story 6: Standby Not Applying Logs (MRP Stuck)
Situation
Lag increasing, but logs reaching standby
Check
SELECT PROCESS, STATUS FROM V$MANAGED_STANDBY;
Result:
MRP0 not running ❌
Root Cause
MRP crashed silently after:
Disk space issue
Archive log corruption
Fix
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT;
Lesson
π Always verify MRP0 status, not just lag
π¨ War Story 7: Archive Destination Full (Production Impact)
Situation
Primary database froze
Error
ORA-00257: archiver error. Connect internal only
Root Cause
Standby archive location full
Primary unable to ship logs
Fix
Cleaned archive logs on standby
Restarted archiver
Lesson
π Storage monitoring is critical
π Archive issues affect primary availability
π¨ War Story 8: Broker Shows SUCCESS but Data Not Syncing
Situation
Broker: SUCCESS ✅
Business: “Data mismatch”
Check
SELECT MAX(SEQUENCE#) FROM V$ARCHIVED_LOG;
Mismatch found between primary & standby
Root Cause
Gap not resolved automatically
FAL server misconfigured
Fix
ALTER SYSTEM SET FAL_SERVER=primary;
Manual gap resolution done
Lesson
π Never trust only Broker
π Always validate at SQL level
π¨ War Story 9: Accidental TRANSPORT-OFF in Production
Situation
Lag suddenly increased
Check
SHOW DATABASE primary;
Found:
Intended State: TRANSPORT-OFF
Root Cause
Someone ran:
EDIT DATABASE primary SET STATE='TRANSPORT-OFF';
Fix
EDIT DATABASE primary SET STATE='TRANSPORT-ON';
Lesson
π Always audit DGMGRL changes
π Restrict access to Broker
π¨ War Story 10: Failover Required During DC Outage
Situation
Primary DC down completely
Action
FAILOVER TO standby;
Challenge
Some redo loss possible
Decision
Used:
π FORCE FAILOVER
Post Steps
Recreated old primary as standby
Lesson
π Understand:
Switchover = Zero data loss
Failover = Possible data loss
No comments:
Post a Comment