Thursday, April 23, 2026

OEM Runbook (L2/L3) – Oracle Monitoring & Alert Management



πŸ“˜ OEM Runbook (L2/L3) – Oracle Monitoring & Alert Management

Using Oracle Enterprise Manager 13c


🎯 Objective

To:

  • Monitor database health

  • Detect issues proactively

  • Reduce alert noise

  • Troubleshoot incidents quickly


🧭 1. First Response Playbook (When Alert Comes)

🚨 Step 1: Open Incident Manager

πŸ“ Navigation:

Enterprise → Monitoring → Incidents

Check:

  • Severity (Critical / Warning)

  • Target (DB / Host / Listener)

  • Message (Tablespace / CPU / Lock etc.)


🧠 Step 2: Identify Issue Type

Alert TypeMeaningAction
CPU HighPerformance issueCheck SQL / load
Tablespace FullStorage issueAdd space
Session BlockingLock issueKill blocker
Host DownInfra issueCheck server
Listener DownConnectivity issueRestart

πŸ” 2. Deep Dive Troubleshooting


⚡ Case 1: Database Performance Issue

Step 1: Open Performance Page

πŸ“

Target → Database → Performance → Top Activity

Check:

  • CPU usage

  • Wait events

  • Active sessions


Step 2: Identify Top SQL

πŸ“

Performance → SQL Monitoring / Top SQL

Action:

  • Find high elapsed time SQL

  • Capture SQL_ID


Step 3: Analyze Execution Plan

SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('<SQL_ID>', NULL, 'ALLSTATS LAST'));

Step 4: Fix

  • Add index

  • Gather stats

  • Apply SQL Profile


πŸ”’ Case 2: Blocking / Locking Issue

Step 1: Check Blocking Sessions

πŸ“

Performance → Blocking Sessions

OR SQL:

SELECT blocking_session, sid, serial#
FROM v$session
WHERE blocking_session IS NOT NULL;

Step 2: Kill Blocking Session

ALTER SYSTEM KILL SESSION 'SID,SERIAL#' IMMEDIATE;

Step 3: Root Cause

  • Application not committing

  • Long transactions


πŸ’Ύ Case 3: Tablespace Full

Step 1: Check Usage

πŸ“

Storage → Tablespaces

Step 2: Add Space

ALTER DATABASE DATAFILE '/path/file.dbf'
RESIZE 10G;

OR

ALTER TABLESPACE users
ADD DATAFILE '/path/file02.dbf' SIZE 5G;

πŸ”₯ Case 4: CPU Spike

Step 1: Check Load

πŸ“

Performance → Top Activity

Step 2: Identify Cause

  • High SQL load

  • Batch jobs

  • Parallel queries


Step 3: Action

  • Tune SQL

  • Kill runaway sessions

  • Limit parallelism


🌐 Case 5: Listener / Connectivity Issue

Step 1: Check Listener Status

lsnrctl status

Step 2: Restart Listener

lsnrctl stop
lsnrctl start

πŸ” 3. Alert Noise Reduction (VERY IMPORTANT)


πŸ”§ Configure Thresholds

πŸ“

Targets → Monitoring → Metric Settings

Best Practice:

  • Warning: 80%

  • Critical: 90%

  • Occurrence: 3


πŸ” Configure Incident Rules

πŸ“

Setup → Incidents → Incident Rules

Enable:

  • ✔ Add to existing incident

  • ✔ Event grouping


πŸ”• Configure Notifications

πŸ“

Setup → Notifications

Rule:

  • Only Critical alerts → Email


⛔ Configure Blackouts

πŸ“

Enterprise → Monitoring → Blackouts

Use During:

  • Patching

  • Maintenance


πŸ“Š 4. Daily Health Check (L2 Task)


✅ Check 1: Incident Summary

Enterprise → Incidents

✅ Check 2: DB Status

Targets → Databases

✅ Check 3: Tablespace Usage

Storage → Tablespaces

✅ Check 4: Backup Status

Availability → Backup Reports

✅ Check 5: Performance

Performance → Top Activity

πŸš€ 5. L3 Advanced Activities


πŸ”¬ AWR / ADDM Analysis

πŸ“

Performance → AWR → Reports

⚙️ SQL Tuning Advisor

πŸ“

Performance → SQL → Tuning Advisor

🧠 ASH Analytics

πŸ“

Performance → ASH Analytics

πŸ”„ Corrective Actions (Auto-Healing)

πŸ“

Metric Settings → Corrective Actions

Example:

lsnrctl start

🎯 6. SLA / Escalation Matrix

SeverityActionSLA
CriticalImmediate fix15 mins
HighInvestigate30 mins
MediumMonitor2 hrs
LowReviewNext day

🧠 7. Interview Questions (L2/L3)


❓ How do you troubleshoot OEM alert?

Answer:

I check Incident Manager, identify alert type, drill into performance metrics, analyze SQL or system issue, and apply corrective action.


❓ How do you reduce alert noise?

Answer:

  • Tune thresholds

  • Configure incident rules

  • Use blackout

  • Filter notifications


❓ What is difference between Event and Incident?

EventIncident
Raw alertGrouped issue

❓ What is your first step in performance issue?

Answer:

Check Top Activity and wait events to identify bottleneck.


🏁 Final Production Mindset

“OEM is not just a monitoring tool…
It is your control tower for the entire database ecosystem.”



Screenshots of real OEM UI visuals

 Here are real OEM UI visuals (based on Oracle docs & actual console layouts) so you can understand how screens look in Oracle Enterprise Manager 13c.

I’ll walk you through the main screens with visual-style explanation (like screenshots) πŸ‘‡


πŸ–₯️ 1. OEM Dashboard (Home Page)

----------------------------------------------------------
| Enterprise Manager Console                            |
----------------------------------------------------------
| Targets Status | Incidents | Alerts | Performance     |
----------------------------------------------------------
| DB1  ✅ Up     | Incidents: 3 πŸ”΄                    |
| DB2  ⚠ Warning | CPU High                          |
| HOST1 ✅       | Tablespace 90%                    |
----------------------------------------------------------
| Top Activity | CPU | Memory | Sessions              |
----------------------------------------------------------
| Graphs showing load, sessions, SQL activity          |
----------------------------------------------------------

πŸ” What you see:

  • Overall DB status

  • Alerts summary

  • Performance graphs

πŸ‘‰ This is your first landing page


🚨 2. Incident Manager Screen (MOST IMPORTANT)

πŸ“ Navigation:
Enterprise → Monitoring → Incident Manager

----------------------------------------------------------
| Incident Manager                                      |
----------------------------------------------------------
| Summary:                                              |
| Open: 12 | Critical: 3 πŸ”΄ | Warning: 5 ⚠              |
----------------------------------------------------------
| Charts:                                               |
| - Incidents by Severity                               |
| - Incidents by Target                                 |
----------------------------------------------------------
| Incident List:                                        |
----------------------------------------------------------
| Time     | Target | Message             | Severity     |
----------------------------------------------------------
| 11:05    | DB1    | Tablespace Full     | CRITICAL πŸ”΄  |
| 11:06    | DB1    | Tablespace Full     | CRITICAL πŸ”΄  |
| 11:07    | DB1    | Tablespace Full     | CRITICAL πŸ”΄  |
----------------------------------------------------------

πŸ’‘ Key Point:

πŸ‘‰ With alert compression enabled

  • These 3 rows → become 1 incident

πŸ“Œ OEM groups events into incidents automatically (Oracle Documentation)


πŸ“Š 3. Incident Dashboard (Graph View)

----------------------------------------------------------
| Incident Dashboard                                   |
----------------------------------------------------------
| πŸ”΄ Critical: 3   ⚠ Warning: 5   β„Ή Info: 4              |
----------------------------------------------------------
| Pie Chart:                                            |
| DB Issues: 50%                                        |
| Host Issues: 30%                                      |
| Listener: 20%                                         |
----------------------------------------------------------
| Actions:                                              |
| [Acknowledge] [Assign] [Escalate]                     |
----------------------------------------------------------

πŸ” What this gives:

  • Visual distribution of issues

  • Quick filtering

πŸ‘‰ Dashboard auto-refreshes every ~30 sec (Oracle Documentation)


⚙️ 4. Metric & Threshold Configuration Screen

πŸ“ Navigation:
Target → Monitoring → Metric and Collection Settings

----------------------------------------------------------
| Metric Settings: CPU Utilization                     |
----------------------------------------------------------
| Warning Threshold: 80%                               |
| Critical Threshold: 90%                              |
| Occurrences: 3                                       |
----------------------------------------------------------
| Corrective Action:                                   |
| Script: restart_service.sh                           |
----------------------------------------------------------
| [Save]                                               |
----------------------------------------------------------

πŸ’‘ Important:

  • Occurrences = key for alert suppression

  • Prevents alert flapping


πŸ” 5. Incident Rules (Alert Compression Engine)

πŸ“ Navigation:
Setup → Incidents → Incident Rules

----------------------------------------------------------
| Rule Set: DB_ALERT_COMPRESSION                       |
----------------------------------------------------------
| Condition:                                           |
| Target Type = Database                              |
| Severity = Critical                                 |
----------------------------------------------------------
| Actions:                                             |
| ✔ Create Incident                                   |
| ✔ Add to existing open incident (IMPORTANT)         |
| ✔ Send Notification                                |
----------------------------------------------------------

πŸ”₯ This is the feature:

πŸ‘‰ That actually performs alert grouping/compression


πŸ”• 6. Notification Rules Screen

πŸ“ Navigation:
Setup → Notifications → Notification Rules

----------------------------------------------------------
| Notification Rule                                   |
----------------------------------------------------------
| Target: All Databases                               |
| Event Type: Metric Alert                            |
| Severity: Critical Only                             |
----------------------------------------------------------
| Actions:                                             |
| Send Email: dba@company.com                         |
----------------------------------------------------------

πŸ’‘ Result:

  • No spam emails

  • Only critical alerts sent


⛔ 7. Blackout Screen (Maintenance Mode)

πŸ“ Navigation:
Enterprise → Monitoring → Blackouts

----------------------------------------------------------
| Create Blackout                                     |
----------------------------------------------------------
| Target: DB1                                         |
| Start Time: 10:00                                   |
| Duration: 2 Hours                                  |
----------------------------------------------------------
| Options:                                            |
| ✔ Stop Monitoring                                  |
| ✔ Suppress Alerts                                  |
----------------------------------------------------------

🎯 How It All Connects (Architecture Flow)

DB Metric → Threshold Breach → Event Generated
                ↓
        Incident Rule Applied
                ↓
     Event Grouped (Compression)
                ↓
         Incident Created
                ↓
     Notification Sent (if needed)

πŸ‘‰ OEM uses agents + rules to convert events into incidents (Oracle)


🧠 Real DBA Understanding

When you see OEM UI:

ScreenPurpose
DashboardOverall health
Incident ManagerActual issues
Metric SettingsControl alerts
Incident RulesReduce noise
NotificationsAlert delivery

πŸš€ Pro Tip (From Production)

Most DBAs fail because:

  • They only look at Incidents

  • But never configure:

    • Thresholds

    • Rules

πŸ‘‰ Result = Alert storm



OEM 13.5 – Alert Noise Reduction (UI Configuration)



πŸ”§ 1. Configure Metric Thresholds (First & Most Important)

πŸ‘‰ This is where most alert noise comes from.

πŸ“ Navigation:

Targets → Databases → Select your DB → Monitoring → Metric and Collection Settings

πŸͺœ Steps:

  1. Search for a metric (e.g., CPU Utilization, Tablespace Used (%))

  2. Click Edit (pencil icon)

  3. Set thresholds properly:

    • Warning: e.g., 80%

    • Critical: e.g., 90%

  4. Set Occurrences (very important):

    • Example: Trigger only if 3 consecutive collections fail

  5. Click OK → Save

πŸ’‘ Tip:

  • Use “Occurrences > 1” to avoid false alerts (flapping)


πŸ” 2. Configure Incident Rules (Event Grouping / Compression)

πŸ‘‰ This is the real “alert compression engine”

πŸ“ Navigation:

Setup → Incidents → Incident Rules

πŸͺœ Steps:

  1. Click Create Rule Set

  2. Name it (e.g., DB_ALERT_COMPRESSION)

  3. Click Create Rule


πŸ”Ή Rule Configuration:

Condition:

  • Target Type = Database Instance

  • Severity = Critical / Warning

Actions:

  • ✔ Create Incident

  • Add to existing open incident (IMPORTANT)

  • ✔ Set Incident Priority

πŸ‘‰ This ensures:

Same issue → 1 incident instead of many alerts


πŸ”• 3. Configure Notification Rules (Avoid Spam Emails)

πŸ“ Navigation:

Setup → Notifications → Notification Rules

πŸͺœ Steps:

  1. Click Create

  2. Define:

    • Target (DB / Host)

    • Event Type (Metric Alert)

    • Severity (Critical only recommended)

  3. Configure:

    • Send email only for Critical

    • Suppress Warning alerts (optional)


πŸ’‘ Pro Tip:

  • Send:

    • Warning → Dashboard only

    • Critical → Email/SMS


⛔ 4. Configure Blackouts (Maintenance Mode)

πŸ‘‰ Prevent alerts during planned work

πŸ“ Navigation:

Enterprise → Monitoring → Blackouts

πŸͺœ Steps:

  1. Click Create Blackout

  2. Select Target (DB / Host)

  3. Define:

    • Duration (e.g., 2 hours)

  4. Enable:

    • ✔ Stop monitoring

    • ✔ Suppress alerts


✅ Result:

No alerts during:

  • Patching

  • Restart

  • Maintenance


πŸ”„ 5. Configure Corrective Actions (Auto-Healing)

πŸ‘‰ Stops repeated alerts automatically

πŸ“ Navigation:

Targets → Database → Monitoring → Metric and Collection Settings

πŸͺœ Steps:

  1. Select a metric (e.g., Listener Down)

  2. Click Edit

  3. Go to Corrective Actions

  4. Add script:

Example:

lsnrctl start

πŸ’‘ Result:

  • Issue auto-fixed

  • Alert doesn’t repeat


πŸ” 6. Enable Event De-duplication & Correlation

πŸ‘‰ Mostly automatic but configurable

πŸ“ Navigation:

Setup → Incidents → Incident Rules → Advanced Settings

πŸͺœ Steps:

  1. Enable:

    • ✔ Event de-duplication

    • ✔ Event correlation

  2. Define time window (e.g., 5–10 mins)


πŸ’‘ Example:

  • Same alert every minute
    ➡️ Only one incident shown


πŸ“Š 7. Validate Configuration

πŸ“ Navigation:

Enterprise → Monitoring → Incidents

Check:

  • Alerts grouped properly

  • No duplicate incidents

  • Reduced alert count


🎯 Real Production Setup (Recommended)

FeatureSetting
Threshold Occurrence3
Incident GroupingEnabled
NotificationsCritical only
BlackoutsMandatory
Auto-healingEnabled

🧠 Interview-Ready Answer

πŸ‘‰ “How do you reduce alert noise in OEM?”

Answer:

“I tune metric thresholds with occurrence settings, configure incident rules to group alerts, use notification filtering to avoid unnecessary emails, apply blackouts during maintenance, and enable corrective actions to auto-resolve recurring issues.”


πŸš€ Final Thought

“OEM is powerful… but without tuning, it becomes noisy.
A good DBA makes OEM quiet but intelligent.”



Alert Compression / Noise Reduction in Oracle Enterprise Manager 13c


🧠 What is “Alert Compression”?

OEM doesn’t use the exact term “compression” officially, but in practice it means:

πŸ‘‰ Reducing duplicate, repetitive, or noisy alerts into fewer meaningful alerts


⚡ Why It’s Needed

Without this:

  • Same issue → 100+ alerts

  • DBAs get flooded

  • Real issues get missed

With alert compression:

  • Duplicate alerts → grouped / suppressed

  • Only actionable alerts remain


πŸ”§ Key Features That Enable Alert Compression

1️⃣ Incident Rules (Event Compression Engine)

πŸ‘‰ Core mechanism behind alert reduction

What it does:

  • Groups multiple events into a single incident

  • Prevents duplicate alerts

Example:

  • 10 tablespace alerts
    ➡️ 1 incident instead of 10 alerts


2️⃣ Event De-duplication

OEM automatically:

  • Detects same event repeating

  • Suppresses repeated notifications

πŸ‘‰ Example:

  • “CPU high” every minute
    ➡️ Only one alert generated


3️⃣ Event Correlation

πŸ‘‰ Combines related alerts into one

Example:

  • DB down

  • Listener down

  • Host down

➡️ OEM shows one root incident


4️⃣ Metric Threshold Suppression

πŸ‘‰ Avoids alert flapping

How:

  • Warning/Critical thresholds

  • Clear condition required before re-alert


5️⃣ Blackouts (Temporary Alert Suppression)

πŸ‘‰ Used during maintenance

Patch window → No alerts triggered

6️⃣ Notification Rules Filtering

πŸ‘‰ Only send alerts when needed

  • Based on severity

  • Based on target

  • Based on time


7️⃣ Corrective Actions (Auto-Healing)

πŸ‘‰ Prevents repeated alerts

Example:

  • Listener down
    ➡️ Auto restart script
    ➡️ No repeated alerts


πŸ“Š Real Example (Before vs After)

❌ Without Alert Compression:

  • 50 alerts for:

    • Tablespace full

    • CPU spike

    • Session blocking


✅ With OEM Features:

  • 1 incident for tablespace

  • 1 incident for CPU

  • 1 incident for blocking

πŸ‘‰ Huge noise reduction


🎯 Best Practices (Production)

✅ 1. Use Incident Rules

  • Group related alerts

  • Define severity properly


✅ 2. Tune Thresholds

Avoid:

  • Too sensitive alerts

  • Too many false positives


✅ 3. Enable Blackouts

During:

  • Patching

  • Maintenance


✅ 4. Use Corrective Actions

Automate:

  • Restart services

  • Clear temp issues


🧠 Interview Questions


❓ What is alert compression in OEM?

Answer:

It is the process of reducing duplicate or repetitive alerts using incident rules, event correlation, and suppression mechanisms.


❓ How does OEM avoid alert flooding?

Answer:

  • Incident rules

  • Event de-duplication

  • Threshold tuning

  • Blackouts


❓ What is the difference between Event and Incident?

EventIncident
Raw alertGrouped actionable alert
ManyFew

❓ How do you reduce alert noise in OEM?

Answer:

  • Tune thresholds

  • Configure incident rules

  • Use blackout

  • Enable auto corrective actions


πŸš€ Final Thought

“A good DBA doesn’t monitor more alerts…
They monitor fewer, smarter alerts.”



Thursday, April 16, 2026

One-Page Quick Execution Checklist : Rename PDB (PDB_OLD → PDB_NEW)

 Here are both options you asked for—first a one-page quick execution checklist (perfect for night deployment), followed by a scripted automation version you can adapt.


✅ One-Page Quick Execution Checklist (Night Deployment)

Change: Rename PDB (PDB_OLD → PDB_NEW)


πŸ”Ή Pre-Checks (Must Complete Before Start)

  •  Data Guard healthy

SELECT database_role, open_mode FROM v$database;
SELECT dest_id, status, error FROM v$archive_dest;
  •  No redo/apply lag

SELECT process, status FROM v$managed_standby;
  •  PDB exists and is open

SELECT name, open_mode FROM v$pdbs;
  •  Backup / Restore point created

CREATE RESTORE POINT before_pdb_rename GUARANTEE FLASHBACK DATABASE;

πŸ”Ή Execution Steps

1. Restrict PDB (Primary)

ALTER PLUGGABLE DATABASE PDB_OLD CLOSE;
ALTER PLUGGABLE DATABASE PDB_OLD OPEN RESTRICTED;

2. Close PDB (Standby)

ALTER PLUGGABLE DATABASE PDB_OLD CLOSE;

3. Rename PDB (Primary)

ALTER SESSION SET CONTAINER=PDB_OLD;
ALTER PLUGGABLE DATABASE PDB_OLD RENAME TO PDB_NEW;

4. Restart PDB (Primary)

ALTER PLUGGABLE DATABASE CLOSE IMMEDIATE;
ALTER PLUGGABLE DATABASE OPEN;

5. Monitor Standby Apply

SELECT process, status, sequence# FROM v$managed_standby;

6. Validate on Standby

SELECT name FROM v$pdbs;

7. Open PDB (Standby)

ALTER PLUGGABLE DATABASE PDB_NEW OPEN READ ONLY;

πŸ”Ή Post-Validation

  •  PDB name updated on Primary

  •  PDB name updated on Standby

  •  Redo apply running

  •  Application connectivity OK

  •  Services updated


πŸ”΄ Rollback (Quick)

If issue occurs:

ALTER SESSION SET CONTAINER=PDB_NEW;
ALTER PLUGGABLE DATABASE PDB_NEW RENAME TO PDB_OLD;
ALTER PLUGGABLE DATABASE CLOSE IMMEDIATE;
ALTER PLUGGABLE DATABASE OPEN;


⚙️ Scripted Automation Version (SQL + Bash Style)

You can adapt this for manual execution or automation tools (Ansible/Shell).


πŸ”Ή Primary Script (rename_pdb_primary.sql)

-- Pre-check
SELECT name, open_mode FROM v$pdbs;

-- Restrict PDB
ALTER PLUGGABLE DATABASE PDB_OLD CLOSE;
ALTER PLUGGABLE DATABASE PDB_OLD OPEN RESTRICTED;

-- Rename
ALTER SESSION SET CONTAINER=PDB_OLD;
ALTER PLUGGABLE DATABASE PDB_OLD RENAME TO PDB_NEW;

-- Restart
ALTER PLUGGABLE DATABASE CLOSE IMMEDIATE;
ALTER PLUGGABLE DATABASE OPEN;

-- Validate
SELECT name, open_mode FROM v$pdbs;

πŸ”Ή Standby Script (rename_pdb_standby.sql)

-- Close old PDB
ALTER PLUGGABLE DATABASE PDB_OLD CLOSE;

-- Monitor apply
SELECT process, status, sequence# FROM v$managed_standby;

-- Validate rename
SELECT name FROM v$pdbs;

-- Open new PDB
ALTER PLUGGABLE DATABASE PDB_NEW OPEN READ ONLY;

πŸ”Ή Optional Bash Wrapper

#!/bin/bash

ORACLE_SID=CDB1
export ORACLE_SID

echo "Starting PDB Rename ΰ€ͺ्ΰ€°ΰ€•्ΰ€°िΰ€―ा..."

sqlplus / as sysdba <<EOF

@rename_pdb_primary.sql

EXIT;
EOF

echo "Primary completed. Verify standby manually."

πŸ”Ή Rollback Script (rollback_pdb.sql)

ALTER SESSION SET CONTAINER=PDB_NEW;

ALTER PLUGGABLE DATABASE PDB_NEW RENAME TO PDB_OLD;

ALTER PLUGGABLE DATABASE CLOSE IMMEDIATE;
ALTER PLUGGABLE DATABASE OPEN;

SELECT name FROM v$pdbs;

πŸ’‘ Pro Tips (From Real Deployments)

  • Always keep one session monitoring standby apply

  • Keep alert logs open (tail -f) on both servers

  • Don’t rush after rename—wait for redo sync

  • If using services, update them immediately after rename