Oracle Data Guard is one of the most critical disaster recovery solutions used in enterprise environments. But one issue every DBA eventually faces is:
“Standby Redo Logs are shipping, but logs are not applying.”
This problem can quickly lead to:
Apply lag
Data synchronization delays
Increased RPO risk
Failover inconsistency
Production escalation calls at 2 AM
In this article, we’ll walk through:
Real-time production scenario
MRP troubleshooting
Standby Redo Log (SRL) validation
Apply lag diagnosis
Parallel apply tuning
Step-by-step fixes
Best practices used in production
Real-Time Production Scenario
A banking application running on Oracle 19c RAC + Data Guard suddenly reports:
Standby database lag increasing continuously
Archive logs shipping successfully
But redo apply is stuck
Primary database generates heavy redo during:
End-of-day batch jobs
Large payment processing
ETL loads
The monitoring team raises a critical alert:
ORA-16853: apply lag has exceeded specified threshold
At first glance:
Transport lag = 0
Network looks fine
Archive logs are arriving
But standby is still behind by 45 minutes.
This is where deeper Data Guard troubleshooting begins.
Understanding the Redo Flow
Before troubleshooting, let’s quickly understand the redo flow.
Primary Database
LGWR writes redo
Redo shipped via LNS process
RFS receives redo on standby
Standby Database
Redo written into Standby Redo Logs (SRL)
MRP (Managed Recovery Process) applies redo
If any component breaks:
Apply lag increases
Recovery stops
Synchronization fails
Step 1 — Check Data Guard Status
First step is always checking overall Data Guard health.
SELECT
DEST_ID,
STATUS,
ERROR,
DESTINATION
FROM V$ARCHIVE_DEST
WHERE STATUS <> 'INACTIVE';
Check Apply Lag
SELECT
NAME,
VALUE,
TIME_COMPUTED
FROM V$DATAGUARD_STATS
WHERE NAME IN ('apply lag','transport lag');
Typical output:
apply lag +00 00:45:12
transport lag +00 00:00:00
This confirms:
Redo shipping is fine
Apply process is delayed
Step 2 — Verify MRP Process
MRP is the heart of redo apply.
Check recovery process:
SELECT
PROCESS,
STATUS,
THREAD#,
SEQUENCE#
FROM V$MANAGED_STANDBY;
Problematic output:
MRP0 WAIT_FOR_LOG
or:
MRP0 APPLYING_LOG
but sequence not progressing.
Common Reasons MRP Stops Applying
1. Missing Standby Redo Logs
One of the most common issues.
Check SRL configuration:
SELECT
GROUP#,
THREAD#,
SEQUENCE#,
ARCHIVED,
STATUS
FROM V$STANDBY_LOG;
Production Issue Example
In a 2-node RAC environment:
Primary redo logs:
Thread 1 → 4 redo groups
Thread 2 → 4 redo groups
But standby had only:
4 standby redo logs total
This is incorrect.
Oracle recommends:
(Number of Online Redo Log Groups per thread + 1)
for each thread
Correct SRL Configuration
Example:
If each RAC thread has:
4 redo log groups
Then standby should have:
5 SRLs for Thread 1
5 SRLs for Thread 2
Add Missing SRLs
ALTER DATABASE ADD STANDBY LOGFILE THREAD 1
GROUP 11 ('/u01/oradata/STD/srl11.log') SIZE 2G;
Repeat for all required groups.
Step 3 — Check Alert Logs
Always inspect standby alert log.
Common errors:
ORA-16037: user requested cancel of managed recovery
ORA-00313: open failed for members of log group
ORA-01153: an incompatible media recovery is active
Step 4 — Restart MRP Properly
Sometimes MRP hangs internally.
Cancel and restart recovery:
Stop Recovery
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
Start Recovery Again
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT FROM SESSION;
For Oracle 12c and above:
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT NODELAY;
Real-Time Incident Example
In one production incident:
Redo transport was healthy
SRLs existed
No network issue
But MRP was frozen on one sequence for hours.
Root cause:
Corrupted archived log received on standby
Solution:
Remove corrupted archive
Re-transfer archive from primary
Restart MRP
Apply resumed immediately.
Step 5 — Check Apply Rate
Heavy redo generation can overwhelm standby apply.
Monitor apply performance:
SELECT
ITEM,
SOFAR,
UNITS
FROM V$RECOVERY_PROGRESS;
Identify Apply Bottleneck
Look for:
Slow disk I/O
CPU saturation
Single-threaded apply
Excessive checkpoints
Storage latency
Step 6 — Enable Parallel Apply
One major improvement for high redo environments is parallel apply.
Enable parallel recovery:
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE PARALLEL 16 DISCONNECT;
Or dynamically:
ALTER SYSTEM SET RECOVERY_PARALLELISM=16;
Real Production Performance Improvement
A telecom production standby experienced:
Apply lag = 2 hours
Redo generation = 150 GB/hour
After enabling:
RECOVERY_PARALLELISM=32
Apply lag reduced to:
Less than 5 minutes
Step 7 — Validate Real-Time Apply
Check whether standby is using:
Archive apply
orReal-time apply
Query:
SELECT RECOVERY_MODE FROM V$ARCHIVE_DEST_STATUS;
Desired output:
MANAGED REAL TIME APPLY
If not enabled:
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT;
Step 8 — Monitor Archive Gap
Check for missing archive logs.
SELECT * FROM V$ARCHIVE_GAP;
If gaps exist:
Transfer missing archives manually
Register logs if needed
ALTER DATABASE REGISTER LOGFILE '/tmp/1_45566.arc';
Advanced Troubleshooting Scenarios
Scenario 1 — Standby Redo Logs in ACTIVE State Forever
Cause:
MRP stuck
SRL corruption
I/O freeze
Fix:
Restart MRP
Clear standby logfile
ALTER DATABASE CLEAR LOGFILE GROUP 11;
Scenario 2 — High Apply Lag but No Transport Lag
Cause:
Standby CPU bottleneck
Slow storage
Insufficient parallelism
Fix:
Increase parallel apply
Tune I/O
Add faster storage
Scenario 3 — Frequent Log Switches
Cause:
Small redo logs
Fix:
Increase redo log size.
Recommended:
15–20 minute log switch frequency
Scenario 4 — MRP Waiting for Log
Output:
WAIT_FOR_LOG
Possible causes:
Network delay
Archive gap
Transport issue
RFS not receiving logs
Useful Data Guard Monitoring Queries
Check Apply Status
SELECT PROCESS, STATUS, THREAD#, SEQUENCE# FROM V$MANAGED_STANDBY;
Check Lag
SELECT NAME, VALUE FROM V$DATAGUARD_STATS;
Check SRLs
SELECT GROUP#, THREAD#, STATUS FROM V$STANDBY_LOG;
Check Archive Gap
SELECT * FROM V$ARCHIVE_GAP;
Best Practices for Data Guard Apply Performance
1. Configure Proper SRLs
Always follow Oracle recommendation:
Redo Log Groups + 1 per thread
2. Use Real-Time Apply
Reduces failover data loss significantly.
3. Enable Parallel Apply for Heavy Workloads
Especially for:
Batch systems
Banking
Telecom
Large OLTP databases
4. Monitor Lag Continuously
Use:
OEM
Custom scripts
Data Guard Broker
5. Separate Redo and Data Disks
Avoid I/O contention.
Data Guard Broker Validation
Using DGMGRL:
show configuration;
show database verbose standbydb;
Look for:
Apply Lag
Transport Lag
Real Time Query
LogXptMode
Final Thoughts
Standby redo apply issues are among the most common Oracle Data Guard production incidents.
The key to fast resolution is understanding the entire redo pipeline:
LGWR
LNS
RFS
SRL
MRP
Most apply lag issues usually come down to:
Improper SRL configuration
MRP hangs
Archive gaps
Insufficient parallelism
Storage bottlenecks
A strong Oracle DBA not only fixes the issue quickly but also proactively designs Data Guard environments to avoid these failures entirely.
In enterprise systems where every second matters, properly tuned redo apply can make the difference between:
Seamless failover
orMajor business outage.