π§ What Every DBA Must Know
In a clustered database environment like Oracle RAC, maintaining data consistency across nodes is critical. But what happens when nodes suddenly stop communicating with each other?
Welcome to one of the most critical scenarios in RAC:
⚠️ Split Brain Syndrome
π What is Split Brain?
In an Oracle RAC cluster, nodes communicate using a private interconnect.
π If this interconnect fails:
Nodes cannot see each other
Each node assumes others are down
Each continues processing independently
This leads to:
❌ Multiple “brains” operating simultaneously
❌ No coordination
❌ High risk of data corruption
⚡ Real Problem Explained
Imagine a 2-node RAC:
Node 1 updates a data block
Node 2 updates the same block
No communication between them
π Result:
π₯ Data inconsistency / corruption
π§© Why Does This Happen?
Split brain occurs when:
Private interconnect fails
Network heartbeat is lost
Nodes are still physically UP
Database instances continue running
Each node thinks:
“I am the only surviving node.”
π Types of Heartbeats in RAC
Oracle uses two mechanisms to detect node health:
1️⃣ Network Heartbeat
Via private interconnect
Fast communication between nodes
2️⃣ Disk Heartbeat
Via Voting Disk
Backup mechanism when network fails
π³️ Role of Voting Disk
The Voting Disk is the brain behind conflict resolution.
π Each node:
Writes its presence
Checks connectivity with others
π₯ In Split Brain Scenario:
Nodes form sub-clusters
Each group tries to claim majority
Voting disk decides:
✅ Which nodes survive
❌ Which nodes get evicted
⚖️ Who Wins?
π Majority rule applies
Example:
10-node RAC cluster
6 nodes can communicate
4 nodes isolated
π Result:
6-node group survives
4-node group gets evicted
π« Node Eviction – Who Does It?
The eviction is handled by:
π CSSD (Cluster Synchronization Services Daemon)
π§ CSSD Responsibilities:
Monitor node health
Check heartbeats
Detect communication failure
Evict problematic nodes
⚙️ How CSSD Monitors Nodes
| Mechanism | Purpose |
|---|---|
| Network Heartbeat | Interconnect communication |
| Disk Heartbeat | Voting disk verification |
⚡ Node Eviction Process
When a node is unhealthy:
CSSD detects heartbeat failure
Voting disk validation occurs
Node is forcibly evicted
Node is usually rebooted automatically
Cluster reconfigures
π¨ Common Error
ORA-29740: evicted by instance
π Indicates:
Node eviction occurred
Cluster protection mechanism triggered
π§ͺ Real-World Scenario
Situation:
4-node RAC
Node 3 loses interconnect
What happens:
Node 1 detects issue
Voting disk confirms
Node 3 is evicted
Remaining nodes continue
π Why Eviction is Important
Eviction is NOT a failure.
π It is a protection mechanism
Without eviction:
Multiple nodes update same data
Corruption occurs
With eviction:
✅ Data integrity is preserved
π§ Key DBA Takeaways
Split brain is network-related issue
Always ensure:
Redundant interconnect
Stable network
Monitor:
CSSD logs
Clusterware alerts
π― Interview Questions
❓ What is Split Brain in RAC?
π When nodes cannot communicate but continue working independently, risking data corruption.
❓ How is Split Brain resolved?
π Using Voting Disk + CSSD
❓ Who evicts nodes?
π CSSD process
❓ What decides survival?
π Voting Disk (majority rule)
❓ What is fencing?
π Isolating/evicting a node to protect cluster integrity.
π Final Thought
“In RAC, survival is not about being alive…
It’s about being connected.”
