Sunday, May 3, 2026

IP Requirements for 3-Node Oracle RAC Database

 

๐ŸŽฏ Objective

This SOP explains the IP addressing requirements for a 3-node Oracle Real Application Clusters (RAC) environment, including:

  • Public IPs

  • Private Interconnect IPs

  • Virtual IPs (VIPs)

  • SCAN IPs

  • DNS / Hosts configuration

  • Best practices


๐Ÿ—️ 1. Oracle RAC Network Architecture Overview

A RAC cluster requires multiple network interfaces for:

  • Client connectivity

  • Cluster communication

  • Failover handling

  • SCAN-based load balancing


๐Ÿ“Š 2. IP Types Required in 3-Node RAC

IP TypePurposeQuantity Required
Public IPNode communication & client access3
Private IPInterconnect / Cache Fusion3
VIP (Virtual IP)Fast failover handling3
SCAN IPClient load balancing3
DNS EntryName resolutionRequired

๐Ÿง  3. Understanding Each IP Type


๐ŸŒ A. Public IP

Purpose

  • Normal communication

  • SSH access

  • Node hostname resolution

Requirement

One public IP per node.

Example

NodeHostnamePublic IP
Node1rac1192.168.1.101
Node2rac2192.168.1.102
Node3rac3192.168.1.103

๐Ÿ”’ B. Private IP (Interconnect)

Purpose

Used for:

  • Cache Fusion

  • Cluster heartbeat

  • Internode communication

Important

This network must be:

  • Private

  • Low latency

  • High speed

Example

NodePrivate HostnamePrivate IP
Node1rac1-priv10.10.10.101
Node2rac2-priv10.10.10.102
Node3rac3-priv10.10.10.103

⚡ C. VIP (Virtual IP)

Purpose

Fast client failover.

If a node fails:

  • VIP relocates to another node

  • Client gets immediate TCP reset

  • Faster reconnection

Requirement

One VIP per node.

Example

NodeVIP HostnameVIP IP
Node1rac1-vip192.168.1.111
Node2rac2-vip192.168.1.112
Node3rac3-vip192.168.1.113

๐Ÿ”„ D. SCAN IP (Single Client Access Name)

Purpose

Provides:

  • Load balancing

  • Simplified client connection

  • Transparent node addition/removal

Requirement

Minimum 3 SCAN IPs recommended by Oracle.

Example

SCAN NameSCAN IPs
rac-scan192.168.1.120
rac-scan192.168.1.121
rac-scan192.168.1.122

๐Ÿงพ 4. Total IP Requirement Summary

For 3-Node RAC

TypeCount
Public IP3
Private IP3
VIP3
SCAN IP3
Total12 IPs

๐Ÿ“ 5. Sample /etc/hosts Configuration

# Public IPs
192.168.1.101   rac1
192.168.1.102   rac2
192.168.1.103   rac3

# VIPs
192.168.1.111   rac1-vip
192.168.1.112   rac2-vip
192.168.1.113   rac3-vip

# Private IPs
10.10.10.101    rac1-priv
10.10.10.102    rac2-priv
10.10.10.103    rac3-priv

# SCAN
192.168.1.120   rac-scan
192.168.1.121   rac-scan
192.168.1.122   rac-scan

๐ŸŒ 6. DNS Requirements

Oracle strongly recommends:

  • DNS configuration
    OR

  • GNS (Grid Naming Service)


✅ DNS Checks

nslookup rac-scan
ping rac1-vip

⚙️ 7. Network Interface Example

InterfacePurpose
eth0Public
eth1Private Interconnect

๐Ÿšจ 8. Important Best Practices

✅ Public Network

  • Use bonded NICs if possible

  • Ensure low packet loss


✅ Private Interconnect

  • Dedicated network only

  • Jumbo frames recommended

  • No public traffic


✅ VIP

  • Must be unused IPs

  • Same subnet as public IP


✅ SCAN

  • Must resolve to 3 IPs

  • Round-robin DNS recommended


๐Ÿ” 9. Pre-Installation Validation Commands

Verify Interconnect

oifcfg getif

Verify Cluster Network

cluvfy comp nodecon -n rac1,rac2,rac3 -verbose

Verify SCAN

nslookup rac-scan

๐Ÿงช 10. Oracle RAC Connection Flow

Client
   ↓
SCAN Listener
   ↓
Node Listener
   ↓
RAC Instance

⚠️ 11. Common Issues

IssueCause
Node evictionPrivate interconnect failure
Slow failoverVIP misconfiguration
Connection issueSCAN DNS issue
Split brainInterconnect latency

๐Ÿ”ฅ 12. Interview Questions & Answers

Q1: Why are VIPs used in RAC?

Answer:

VIPs provide faster client failover by immediately rejecting failed TCP connections instead of waiting for timeout.


Q2: Why 3 SCAN IPs?

Answer:

Oracle recommends 3 SCAN IPs for high availability and load balancing.


Q3: Can SCAN IPs be on different subnet?

Answer:

No, SCAN IPs should typically be on same public subnet.


๐Ÿ 13. Final Checklist Before RAC Installation

CheckStatus
Public IP configured
Private IP configured
VIP available
SCAN resolves correctly
DNS working
Interconnect tested

๐Ÿš€ 14. Recommended Enterprise Design

Production Environment

  • Bonded NICs

  • Redundant switches

  • Dedicated interconnect VLAN

  • DNS-managed SCAN



Friday, May 1, 2026

Oracle Database Performance Tuning Guide (Production-Oriented)

 

๐Ÿ“˜ Objective

This document provides a structured approach to Oracle Database Performance Tuning using real production methodologies followed by senior DBAs and performance engineers.


๐ŸŽฏ What is Performance Tuning?

Performance tuning is the process of:

  • Identifying bottlenecks

  • Reducing response time

  • Improving throughput

  • Optimizing resource utilization


๐Ÿง  Core Performance Tuning Philosophy

“Do not tune blindly. Identify the bottleneck first.”


๐Ÿ—️ Oracle Performance Architecture

Performance issues usually come from one of these areas:

AreaSymptoms
CPUHigh load, slow SQL
MemorySwapping, cache misses
I/OSlow reads/writes
NetworkSession delays
SQLHigh elapsed time
LocksBlocking sessions
ConfigurationPoor parameter setup

๐Ÿ” Performance Tuning Methodology

๐Ÿ”ฅ Standard Workflow

Problem Detection
      ↓
Collect Metrics
      ↓
Identify Bottleneck
      ↓
Analyze Root Cause
      ↓
Implement Fix
      ↓
Validate Improvement

๐Ÿ“Š 1. Initial Health Check

✅ Database Load

SELECT * FROM v$sysmetric_summary;

✅ Active Sessions

SELECT inst_id, status, COUNT(*)
FROM gv$session
GROUP BY inst_id, status;

✅ Top Wait Events

SELECT event, total_waits, time_waited
FROM v$system_event
ORDER BY time_waited DESC;

⚡ 2. Wait Event Analysis (Most Important)

๐Ÿ”‘ Oracle Wait Classes

Wait EventMeaning
db file sequential readSingle block read
db file scattered readFull table scan
log file syncCommit wait
enq: TX row lock contentionLocking issue
latch freeContention
direct path read/writeParallel query/temp usage

๐Ÿง  Golden Rule

Tune the highest DB time contributor first.


๐Ÿ“˜ 3. AWR Report Analysis

Generate AWR

@?/rdbms/admin/awrrpt.sql

๐Ÿ” Important Sections in AWR

✅ Load Profile

Check:

  • DB Time

  • Logical Reads

  • Physical Reads


✅ Top Foreground Wait Events

Identify:

  • CPU bottleneck

  • I/O bottleneck

  • Lock contention


✅ SQL Ordered by Elapsed Time

Focus on:

  • High CPU SQL

  • High buffer gets

  • Full scans


✅ Instance Efficiency

Check:

  • Buffer cache hit ratio

  • Soft parse %


๐Ÿงช 4. ASH Analysis (Real-Time Troubleshooting)

Active Sessions

SELECT sample_time,
       session_id,
       sql_id,
       wait_class,
       event
FROM v$active_session_history
ORDER BY sample_time DESC;

๐Ÿ’ฅ 5. SQL Performance Tuning

Identify Expensive SQL

SELECT sql_id,
       executions,
       elapsed_time/1000000 elapsed_sec,
       cpu_time/1000000 cpu_sec
FROM v$sql
ORDER BY elapsed_time DESC;

๐Ÿ”Ž Execution Plan Analysis

SELECT * 
FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

๐Ÿšจ Common SQL Issues

ProblemSolution
Full Table ScanCreate Index
Cartesian JoinFix Join Condition
Hard ParsingUse Bind Variables
Bad Execution PlanGather Statistics

๐Ÿ“ฆ 6. Statistics Management

Gather Table Stats

EXEC DBMS_STATS.GATHER_TABLE_STATS('SCOTT','EMP');

Gather Schema Stats

EXEC DBMS_STATS.GATHER_SCHEMA_STATS('SCOTT');

๐Ÿ’พ 7. Memory Tuning

SGA Components

ComponentPurpose
Buffer CacheData blocks
Shared PoolSQL parsing
Large PoolRMAN/Parallel

Check Memory Usage

SHOW PARAMETER sga;
SHOW PARAMETER pga;

๐Ÿ”ฅ PGA Analysis

SELECT * FROM v$pgastat;

๐Ÿ’ฝ 8. I/O Performance Tuning

Check File I/O

SELECT file_name,
       phyrds,
       phywrts
FROM v$datafile df,
     v$filestat fs
WHERE df.file# = fs.file#;

๐Ÿšจ Symptoms of I/O Bottleneck

  • High db file sequential read

  • Slow queries

  • High disk latency


๐Ÿ”’ 9. Lock & Blocking Analysis

Blocking Sessions

SELECT blocking_session,
       sid,
       serial#
FROM v$session
WHERE blocking_session IS NOT NULL;

⚡ Kill Blocking Session

ALTER SYSTEM KILL SESSION 'sid,serial#' IMMEDIATE;

๐Ÿ” 10. Redo & Commit Tuning

Log File Sync Issues

Causes:

  • Frequent commits

  • Slow redo disk


Check Redo Waits

SELECT event, total_waits
FROM v$system_event
WHERE event LIKE 'log file%';

๐Ÿง  11. RAC Performance Tuning

Important RAC Waits

WaitMeaning
gc cr requestCache fusion
gc buffer busyBlock contention

๐Ÿ”ฅ RAC Tips

  • Reduce block contention

  • Optimize interconnect

  • Partition hot tables


๐Ÿ“Š 12. Data Guard Performance

Apply Lag

SELECT name, value
FROM v$dataguard_stats;

๐Ÿš€ 13. Performance Tuning Best Practices

✅ Do’s

  • Tune SQL first

  • Use AWR + ASH together

  • Gather statistics regularly

  • Monitor trends


❌ Don’ts

  • Increase memory blindly

  • Create unnecessary indexes

  • Ignore execution plans


๐Ÿงช 14. Real Production Scenarios


๐Ÿ”ฅ Scenario 1: Database Slow

Root Cause

  • Full table scan

Solution

  • Index creation

  • SQL rewrite


๐Ÿ”ฅ Scenario 2: High CPU

Root Cause

  • Bad execution plan

Solution

  • SQL tuning

  • Stats refresh


๐Ÿ”ฅ Scenario 3: Lock Contention

Root Cause

  • Uncommitted transaction

Solution

  • Kill blocker

  • Application fix


๐Ÿ“˜ 15. RCA Framework

Always document:

AreaDetails
SymptomWhat happened
ImpactBusiness impact
Root CauseWhy it happened
FixResolution
PreventionFuture avoidance

๐ŸŽค 16. Interview-Ready Answer

“I approach performance tuning by identifying the top DB time contributors using AWR and ASH. I analyze wait events, expensive SQL, and execution plans to isolate bottlenecks. Then I implement targeted fixes like SQL tuning, indexing, or configuration optimization, and validate improvements through before-vs-after analysis.”


๐Ÿ Conclusion

Performance tuning in Oracle Database is not about memorizing commands.

It is about:

  • Understanding system behavior

  • Identifying bottlenecks

  • Performing RCA

  • Implementing sustainable fixes


๐Ÿš€ Next-Level Topics

You can further expand into:

  • SQL Plan Management

  • Adaptive Query Optimization

  • Exadata tuning

  • OEM Performance Hub

  • ASH Analytics

  • Automatic Indexing

  • Parallel Query tuning


๐Ÿ’ก Final Thought

“Senior DBAs don’t just fix slow systems.
They understand why systems became slow in the first place.”

Friday, April 24, 2026

Split Brain Syndrome in Oracle RAC



๐Ÿง  What Every DBA Must Know

In a clustered database environment like Oracle RAC, maintaining data consistency across nodes is critical. But what happens when nodes suddenly stop communicating with each other?

Welcome to one of the most critical scenarios in RAC:

⚠️ Split Brain Syndrome


๐Ÿ” What is Split Brain?

In an Oracle RAC cluster, nodes communicate using a private interconnect.

๐Ÿ‘‰ If this interconnect fails:

  • Nodes cannot see each other

  • Each node assumes others are down

  • Each continues processing independently

This leads to:

❌ Multiple “brains” operating simultaneously
❌ No coordination
❌ High risk of data corruption


⚡ Real Problem Explained

Imagine a 2-node RAC:

  • Node 1 updates a data block

  • Node 2 updates the same block

  • No communication between them

๐Ÿ‘‰ Result:

๐Ÿ’ฅ Data inconsistency / corruption


๐Ÿงฉ Why Does This Happen?

Split brain occurs when:

  • Private interconnect fails

  • Network heartbeat is lost

  • Nodes are still physically UP

  • Database instances continue running

Each node thinks:

“I am the only surviving node.”


๐Ÿ” Types of Heartbeats in RAC

Oracle uses two mechanisms to detect node health:

1️⃣ Network Heartbeat

  • Via private interconnect

  • Fast communication between nodes


2️⃣ Disk Heartbeat

  • Via Voting Disk

  • Backup mechanism when network fails


๐Ÿ—ณ️ Role of Voting Disk

The Voting Disk is the brain behind conflict resolution.

๐Ÿ‘‰ Each node:

  • Writes its presence

  • Checks connectivity with others


๐Ÿ”ฅ In Split Brain Scenario:

  • Nodes form sub-clusters

  • Each group tries to claim majority

  • Voting disk decides:

✅ Which nodes survive
❌ Which nodes get evicted


⚖️ Who Wins?

๐Ÿ‘‰ Majority rule applies

Example:

  • 10-node RAC cluster

  • 6 nodes can communicate

  • 4 nodes isolated

๐Ÿ‘‰ Result:

  • 6-node group survives

  • 4-node group gets evicted


๐Ÿšซ Node Eviction – Who Does It?

The eviction is handled by:

๐Ÿ‘‰ CSSD (Cluster Synchronization Services Daemon)


๐Ÿ”ง CSSD Responsibilities:

  • Monitor node health

  • Check heartbeats

  • Detect communication failure

  • Evict problematic nodes


⚙️ How CSSD Monitors Nodes

MechanismPurpose
Network HeartbeatInterconnect communication
Disk HeartbeatVoting disk verification

⚡ Node Eviction Process

When a node is unhealthy:

  1. CSSD detects heartbeat failure

  2. Voting disk validation occurs

  3. Node is forcibly evicted

  4. Node is usually rebooted automatically

  5. Cluster reconfigures


๐Ÿšจ Common Error

ORA-29740: evicted by instance

๐Ÿ‘‰ Indicates:

  • Node eviction occurred

  • Cluster protection mechanism triggered


๐Ÿงช Real-World Scenario

Situation:

  • 4-node RAC

  • Node 3 loses interconnect

What happens:

  • Node 1 detects issue

  • Voting disk confirms

  • Node 3 is evicted

  • Remaining nodes continue


๐Ÿ”„ Why Eviction is Important

Eviction is NOT a failure.

๐Ÿ‘‰ It is a protection mechanism

Without eviction:

  • Multiple nodes update same data

  • Corruption occurs

With eviction:

✅ Data integrity is preserved


๐Ÿง  Key DBA Takeaways

  • Split brain is network-related issue

  • Always ensure:

    • Redundant interconnect

    • Stable network

  • Monitor:

    • CSSD logs

    • Clusterware alerts


๐ŸŽฏ Interview Questions


❓ What is Split Brain in RAC?

๐Ÿ‘‰ When nodes cannot communicate but continue working independently, risking data corruption.


❓ How is Split Brain resolved?

๐Ÿ‘‰ Using Voting Disk + CSSD


❓ Who evicts nodes?

๐Ÿ‘‰ CSSD process


❓ What decides survival?

๐Ÿ‘‰ Voting Disk (majority rule)


❓ What is fencing?

๐Ÿ‘‰ Isolating/evicting a node to protect cluster integrity.


๐Ÿš€ Final Thought

“In RAC, survival is not about being alive…
It’s about being connected.”



Thursday, April 23, 2026

OEM Runbook (L2/L3) – Oracle Monitoring & Alert Management



๐Ÿ“˜ OEM Runbook (L2/L3) – Oracle Monitoring & Alert Management

Using Oracle Enterprise Manager 13c


๐ŸŽฏ Objective

To:

  • Monitor database health

  • Detect issues proactively

  • Reduce alert noise

  • Troubleshoot incidents quickly


๐Ÿงญ 1. First Response Playbook (When Alert Comes)

๐Ÿšจ Step 1: Open Incident Manager

๐Ÿ“ Navigation:

Enterprise → Monitoring → Incidents

Check:

  • Severity (Critical / Warning)

  • Target (DB / Host / Listener)

  • Message (Tablespace / CPU / Lock etc.)


๐Ÿง  Step 2: Identify Issue Type

Alert TypeMeaningAction
CPU HighPerformance issueCheck SQL / load
Tablespace FullStorage issueAdd space
Session BlockingLock issueKill blocker
Host DownInfra issueCheck server
Listener DownConnectivity issueRestart

๐Ÿ” 2. Deep Dive Troubleshooting


⚡ Case 1: Database Performance Issue

Step 1: Open Performance Page

๐Ÿ“

Target → Database → Performance → Top Activity

Check:

  • CPU usage

  • Wait events

  • Active sessions


Step 2: Identify Top SQL

๐Ÿ“

Performance → SQL Monitoring / Top SQL

Action:

  • Find high elapsed time SQL

  • Capture SQL_ID


Step 3: Analyze Execution Plan

SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('<SQL_ID>', NULL, 'ALLSTATS LAST'));

Step 4: Fix

  • Add index

  • Gather stats

  • Apply SQL Profile


๐Ÿ”’ Case 2: Blocking / Locking Issue

Step 1: Check Blocking Sessions

๐Ÿ“

Performance → Blocking Sessions

OR SQL:

SELECT blocking_session, sid, serial#
FROM v$session
WHERE blocking_session IS NOT NULL;

Step 2: Kill Blocking Session

ALTER SYSTEM KILL SESSION 'SID,SERIAL#' IMMEDIATE;

Step 3: Root Cause

  • Application not committing

  • Long transactions


๐Ÿ’พ Case 3: Tablespace Full

Step 1: Check Usage

๐Ÿ“

Storage → Tablespaces

Step 2: Add Space

ALTER DATABASE DATAFILE '/path/file.dbf'
RESIZE 10G;

OR

ALTER TABLESPACE users
ADD DATAFILE '/path/file02.dbf' SIZE 5G;

๐Ÿ”ฅ Case 4: CPU Spike

Step 1: Check Load

๐Ÿ“

Performance → Top Activity

Step 2: Identify Cause

  • High SQL load

  • Batch jobs

  • Parallel queries


Step 3: Action

  • Tune SQL

  • Kill runaway sessions

  • Limit parallelism


๐ŸŒ Case 5: Listener / Connectivity Issue

Step 1: Check Listener Status

lsnrctl status

Step 2: Restart Listener

lsnrctl stop
lsnrctl start

๐Ÿ” 3. Alert Noise Reduction (VERY IMPORTANT)


๐Ÿ”ง Configure Thresholds

๐Ÿ“

Targets → Monitoring → Metric Settings

Best Practice:

  • Warning: 80%

  • Critical: 90%

  • Occurrence: 3


๐Ÿ” Configure Incident Rules

๐Ÿ“

Setup → Incidents → Incident Rules

Enable:

  • ✔ Add to existing incident

  • ✔ Event grouping


๐Ÿ”• Configure Notifications

๐Ÿ“

Setup → Notifications

Rule:

  • Only Critical alerts → Email


⛔ Configure Blackouts

๐Ÿ“

Enterprise → Monitoring → Blackouts

Use During:

  • Patching

  • Maintenance


๐Ÿ“Š 4. Daily Health Check (L2 Task)


✅ Check 1: Incident Summary

Enterprise → Incidents

✅ Check 2: DB Status

Targets → Databases

✅ Check 3: Tablespace Usage

Storage → Tablespaces

✅ Check 4: Backup Status

Availability → Backup Reports

✅ Check 5: Performance

Performance → Top Activity

๐Ÿš€ 5. L3 Advanced Activities


๐Ÿ”ฌ AWR / ADDM Analysis

๐Ÿ“

Performance → AWR → Reports

⚙️ SQL Tuning Advisor

๐Ÿ“

Performance → SQL → Tuning Advisor

๐Ÿง  ASH Analytics

๐Ÿ“

Performance → ASH Analytics

๐Ÿ”„ Corrective Actions (Auto-Healing)

๐Ÿ“

Metric Settings → Corrective Actions

Example:

lsnrctl start

๐ŸŽฏ 6. SLA / Escalation Matrix

SeverityActionSLA
CriticalImmediate fix15 mins
HighInvestigate30 mins
MediumMonitor2 hrs
LowReviewNext day

๐Ÿง  7. Interview Questions (L2/L3)


❓ How do you troubleshoot OEM alert?

Answer:

I check Incident Manager, identify alert type, drill into performance metrics, analyze SQL or system issue, and apply corrective action.


❓ How do you reduce alert noise?

Answer:

  • Tune thresholds

  • Configure incident rules

  • Use blackout

  • Filter notifications


❓ What is difference between Event and Incident?

EventIncident
Raw alertGrouped issue

❓ What is your first step in performance issue?

Answer:

Check Top Activity and wait events to identify bottleneck.


๐Ÿ Final Production Mindset

“OEM is not just a monitoring tool…
It is your control tower for the entire database ecosystem.”



Screenshots of real OEM UI visuals

 Here are real OEM UI visuals (based on Oracle docs & actual console layouts) so you can understand how screens look in Oracle Enterprise Manager 13c.

I’ll walk you through the main screens with visual-style explanation (like screenshots) ๐Ÿ‘‡


๐Ÿ–ฅ️ 1. OEM Dashboard (Home Page)

----------------------------------------------------------
| Enterprise Manager Console                            |
----------------------------------------------------------
| Targets Status | Incidents | Alerts | Performance     |
----------------------------------------------------------
| DB1  ✅ Up     | Incidents: 3 ๐Ÿ”ด                    |
| DB2  ⚠ Warning | CPU High                          |
| HOST1 ✅       | Tablespace 90%                    |
----------------------------------------------------------
| Top Activity | CPU | Memory | Sessions              |
----------------------------------------------------------
| Graphs showing load, sessions, SQL activity          |
----------------------------------------------------------

๐Ÿ” What you see:

  • Overall DB status

  • Alerts summary

  • Performance graphs

๐Ÿ‘‰ This is your first landing page


๐Ÿšจ 2. Incident Manager Screen (MOST IMPORTANT)

๐Ÿ“ Navigation:
Enterprise → Monitoring → Incident Manager

----------------------------------------------------------
| Incident Manager                                      |
----------------------------------------------------------
| Summary:                                              |
| Open: 12 | Critical: 3 ๐Ÿ”ด | Warning: 5 ⚠              |
----------------------------------------------------------
| Charts:                                               |
| - Incidents by Severity                               |
| - Incidents by Target                                 |
----------------------------------------------------------
| Incident List:                                        |
----------------------------------------------------------
| Time     | Target | Message             | Severity     |
----------------------------------------------------------
| 11:05    | DB1    | Tablespace Full     | CRITICAL ๐Ÿ”ด  |
| 11:06    | DB1    | Tablespace Full     | CRITICAL ๐Ÿ”ด  |
| 11:07    | DB1    | Tablespace Full     | CRITICAL ๐Ÿ”ด  |
----------------------------------------------------------

๐Ÿ’ก Key Point:

๐Ÿ‘‰ With alert compression enabled

  • These 3 rows → become 1 incident

๐Ÿ“Œ OEM groups events into incidents automatically (Oracle Documentation)


๐Ÿ“Š 3. Incident Dashboard (Graph View)

----------------------------------------------------------
| Incident Dashboard                                   |
----------------------------------------------------------
| ๐Ÿ”ด Critical: 3   ⚠ Warning: 5   โ„น Info: 4              |
----------------------------------------------------------
| Pie Chart:                                            |
| DB Issues: 50%                                        |
| Host Issues: 30%                                      |
| Listener: 20%                                         |
----------------------------------------------------------
| Actions:                                              |
| [Acknowledge] [Assign] [Escalate]                     |
----------------------------------------------------------

๐Ÿ” What this gives:

  • Visual distribution of issues

  • Quick filtering

๐Ÿ‘‰ Dashboard auto-refreshes every ~30 sec (Oracle Documentation)


⚙️ 4. Metric & Threshold Configuration Screen

๐Ÿ“ Navigation:
Target → Monitoring → Metric and Collection Settings

----------------------------------------------------------
| Metric Settings: CPU Utilization                     |
----------------------------------------------------------
| Warning Threshold: 80%                               |
| Critical Threshold: 90%                              |
| Occurrences: 3                                       |
----------------------------------------------------------
| Corrective Action:                                   |
| Script: restart_service.sh                           |
----------------------------------------------------------
| [Save]                                               |
----------------------------------------------------------

๐Ÿ’ก Important:

  • Occurrences = key for alert suppression

  • Prevents alert flapping


๐Ÿ” 5. Incident Rules (Alert Compression Engine)

๐Ÿ“ Navigation:
Setup → Incidents → Incident Rules

----------------------------------------------------------
| Rule Set: DB_ALERT_COMPRESSION                       |
----------------------------------------------------------
| Condition:                                           |
| Target Type = Database                              |
| Severity = Critical                                 |
----------------------------------------------------------
| Actions:                                             |
| ✔ Create Incident                                   |
| ✔ Add to existing open incident (IMPORTANT)         |
| ✔ Send Notification                                |
----------------------------------------------------------

๐Ÿ”ฅ This is the feature:

๐Ÿ‘‰ That actually performs alert grouping/compression


๐Ÿ”• 6. Notification Rules Screen

๐Ÿ“ Navigation:
Setup → Notifications → Notification Rules

----------------------------------------------------------
| Notification Rule                                   |
----------------------------------------------------------
| Target: All Databases                               |
| Event Type: Metric Alert                            |
| Severity: Critical Only                             |
----------------------------------------------------------
| Actions:                                             |
| Send Email: dba@company.com                         |
----------------------------------------------------------

๐Ÿ’ก Result:

  • No spam emails

  • Only critical alerts sent


⛔ 7. Blackout Screen (Maintenance Mode)

๐Ÿ“ Navigation:
Enterprise → Monitoring → Blackouts

----------------------------------------------------------
| Create Blackout                                     |
----------------------------------------------------------
| Target: DB1                                         |
| Start Time: 10:00                                   |
| Duration: 2 Hours                                  |
----------------------------------------------------------
| Options:                                            |
| ✔ Stop Monitoring                                  |
| ✔ Suppress Alerts                                  |
----------------------------------------------------------

๐ŸŽฏ How It All Connects (Architecture Flow)

DB Metric → Threshold Breach → Event Generated
                ↓
        Incident Rule Applied
                ↓
     Event Grouped (Compression)
                ↓
         Incident Created
                ↓
     Notification Sent (if needed)

๐Ÿ‘‰ OEM uses agents + rules to convert events into incidents (Oracle)


๐Ÿง  Real DBA Understanding

When you see OEM UI:

ScreenPurpose
DashboardOverall health
Incident ManagerActual issues
Metric SettingsControl alerts
Incident RulesReduce noise
NotificationsAlert delivery

๐Ÿš€ Pro Tip (From Production)

Most DBAs fail because:

  • They only look at Incidents

  • But never configure:

    • Thresholds

    • Rules

๐Ÿ‘‰ Result = Alert storm



OEM 13.5 – Alert Noise Reduction (UI Configuration)



๐Ÿ”ง 1. Configure Metric Thresholds (First & Most Important)

๐Ÿ‘‰ This is where most alert noise comes from.

๐Ÿ“ Navigation:

Targets → Databases → Select your DB → Monitoring → Metric and Collection Settings

๐Ÿชœ Steps:

  1. Search for a metric (e.g., CPU Utilization, Tablespace Used (%))

  2. Click Edit (pencil icon)

  3. Set thresholds properly:

    • Warning: e.g., 80%

    • Critical: e.g., 90%

  4. Set Occurrences (very important):

    • Example: Trigger only if 3 consecutive collections fail

  5. Click OK → Save

๐Ÿ’ก Tip:

  • Use “Occurrences > 1” to avoid false alerts (flapping)


๐Ÿ” 2. Configure Incident Rules (Event Grouping / Compression)

๐Ÿ‘‰ This is the real “alert compression engine”

๐Ÿ“ Navigation:

Setup → Incidents → Incident Rules

๐Ÿชœ Steps:

  1. Click Create Rule Set

  2. Name it (e.g., DB_ALERT_COMPRESSION)

  3. Click Create Rule


๐Ÿ”น Rule Configuration:

Condition:

  • Target Type = Database Instance

  • Severity = Critical / Warning

Actions:

  • ✔ Create Incident

  • Add to existing open incident (IMPORTANT)

  • ✔ Set Incident Priority

๐Ÿ‘‰ This ensures:

Same issue → 1 incident instead of many alerts


๐Ÿ”• 3. Configure Notification Rules (Avoid Spam Emails)

๐Ÿ“ Navigation:

Setup → Notifications → Notification Rules

๐Ÿชœ Steps:

  1. Click Create

  2. Define:

    • Target (DB / Host)

    • Event Type (Metric Alert)

    • Severity (Critical only recommended)

  3. Configure:

    • Send email only for Critical

    • Suppress Warning alerts (optional)


๐Ÿ’ก Pro Tip:

  • Send:

    • Warning → Dashboard only

    • Critical → Email/SMS


⛔ 4. Configure Blackouts (Maintenance Mode)

๐Ÿ‘‰ Prevent alerts during planned work

๐Ÿ“ Navigation:

Enterprise → Monitoring → Blackouts

๐Ÿชœ Steps:

  1. Click Create Blackout

  2. Select Target (DB / Host)

  3. Define:

    • Duration (e.g., 2 hours)

  4. Enable:

    • ✔ Stop monitoring

    • ✔ Suppress alerts


✅ Result:

No alerts during:

  • Patching

  • Restart

  • Maintenance


๐Ÿ”„ 5. Configure Corrective Actions (Auto-Healing)

๐Ÿ‘‰ Stops repeated alerts automatically

๐Ÿ“ Navigation:

Targets → Database → Monitoring → Metric and Collection Settings

๐Ÿชœ Steps:

  1. Select a metric (e.g., Listener Down)

  2. Click Edit

  3. Go to Corrective Actions

  4. Add script:

Example:

lsnrctl start

๐Ÿ’ก Result:

  • Issue auto-fixed

  • Alert doesn’t repeat


๐Ÿ” 6. Enable Event De-duplication & Correlation

๐Ÿ‘‰ Mostly automatic but configurable

๐Ÿ“ Navigation:

Setup → Incidents → Incident Rules → Advanced Settings

๐Ÿชœ Steps:

  1. Enable:

    • ✔ Event de-duplication

    • ✔ Event correlation

  2. Define time window (e.g., 5–10 mins)


๐Ÿ’ก Example:

  • Same alert every minute
    ➡️ Only one incident shown


๐Ÿ“Š 7. Validate Configuration

๐Ÿ“ Navigation:

Enterprise → Monitoring → Incidents

Check:

  • Alerts grouped properly

  • No duplicate incidents

  • Reduced alert count


๐ŸŽฏ Real Production Setup (Recommended)

FeatureSetting
Threshold Occurrence3
Incident GroupingEnabled
NotificationsCritical only
BlackoutsMandatory
Auto-healingEnabled

๐Ÿง  Interview-Ready Answer

๐Ÿ‘‰ “How do you reduce alert noise in OEM?”

Answer:

“I tune metric thresholds with occurrence settings, configure incident rules to group alerts, use notification filtering to avoid unnecessary emails, apply blackouts during maintenance, and enable corrective actions to auto-resolve recurring issues.”


๐Ÿš€ Final Thought

“OEM is powerful… but without tuning, it becomes noisy.
A good DBA makes OEM quiet but intelligent.”



Alert Compression / Noise Reduction in Oracle Enterprise Manager 13c


๐Ÿง  What is “Alert Compression”?

OEM doesn’t use the exact term “compression” officially, but in practice it means:

๐Ÿ‘‰ Reducing duplicate, repetitive, or noisy alerts into fewer meaningful alerts


⚡ Why It’s Needed

Without this:

  • Same issue → 100+ alerts

  • DBAs get flooded

  • Real issues get missed

With alert compression:

  • Duplicate alerts → grouped / suppressed

  • Only actionable alerts remain


๐Ÿ”ง Key Features That Enable Alert Compression

1️⃣ Incident Rules (Event Compression Engine)

๐Ÿ‘‰ Core mechanism behind alert reduction

What it does:

  • Groups multiple events into a single incident

  • Prevents duplicate alerts

Example:

  • 10 tablespace alerts
    ➡️ 1 incident instead of 10 alerts


2️⃣ Event De-duplication

OEM automatically:

  • Detects same event repeating

  • Suppresses repeated notifications

๐Ÿ‘‰ Example:

  • “CPU high” every minute
    ➡️ Only one alert generated


3️⃣ Event Correlation

๐Ÿ‘‰ Combines related alerts into one

Example:

  • DB down

  • Listener down

  • Host down

➡️ OEM shows one root incident


4️⃣ Metric Threshold Suppression

๐Ÿ‘‰ Avoids alert flapping

How:

  • Warning/Critical thresholds

  • Clear condition required before re-alert


5️⃣ Blackouts (Temporary Alert Suppression)

๐Ÿ‘‰ Used during maintenance

Patch window → No alerts triggered

6️⃣ Notification Rules Filtering

๐Ÿ‘‰ Only send alerts when needed

  • Based on severity

  • Based on target

  • Based on time


7️⃣ Corrective Actions (Auto-Healing)

๐Ÿ‘‰ Prevents repeated alerts

Example:

  • Listener down
    ➡️ Auto restart script
    ➡️ No repeated alerts


๐Ÿ“Š Real Example (Before vs After)

❌ Without Alert Compression:

  • 50 alerts for:

    • Tablespace full

    • CPU spike

    • Session blocking


✅ With OEM Features:

  • 1 incident for tablespace

  • 1 incident for CPU

  • 1 incident for blocking

๐Ÿ‘‰ Huge noise reduction


๐ŸŽฏ Best Practices (Production)

✅ 1. Use Incident Rules

  • Group related alerts

  • Define severity properly


✅ 2. Tune Thresholds

Avoid:

  • Too sensitive alerts

  • Too many false positives


✅ 3. Enable Blackouts

During:

  • Patching

  • Maintenance


✅ 4. Use Corrective Actions

Automate:

  • Restart services

  • Clear temp issues


๐Ÿง  Interview Questions


❓ What is alert compression in OEM?

Answer:

It is the process of reducing duplicate or repetitive alerts using incident rules, event correlation, and suppression mechanisms.


❓ How does OEM avoid alert flooding?

Answer:

  • Incident rules

  • Event de-duplication

  • Threshold tuning

  • Blackouts


❓ What is the difference between Event and Incident?

EventIncident
Raw alertGrouped actionable alert
ManyFew

❓ How do you reduce alert noise in OEM?

Answer:

  • Tune thresholds

  • Configure incident rules

  • Use blackout

  • Enable auto corrective actions


๐Ÿš€ Final Thought

“A good DBA doesn’t monitor more alerts…
They monitor fewer, smarter alerts.”