ORA-01034: ORACLE Not Available – Database Down After Server Reboot
Incident Summary
Incident Priority: P1 (Critical)
Error Reported:
ORA-01034: ORACLE not available
Impact:
Production database was unavailable.
All application users were unable to connect.
Business transactions stopped.
Multiple application timeout alerts were generated.
Environment
Oracle Database: 19c
Operating System: Oracle Linux / RHEL
Environment: Production
Architecture: Single Instance (Applicable to RAC with additional cluster checks)
Symptoms
Application team reported:
ORA-01034: ORACLE not available
ORA-27101: Shared memory realm does not exist
Database connection failures
OEM alerts showing database status as Down
Initial Response
Step 1: Acknowledge the P1 Incident
Acknowledge the ServiceNow/Incident ticket immediately.
Inform stakeholders that investigation has started.
Join the bridge call if applicable.
Step 2: Verify Database Server Accessibility
SSH into the production server:
ssh oracle@prod-db-server
Verify server uptime:
uptime
Check whether the server was recently rebooted:
last reboot
Observation: The server had rebooted after scheduled OS patching.
Step 3: Verify Database Status
Set the Oracle environment:
export ORACLE_SID=PRODDB
. oraenv
Check PMON process:
ps -ef | grep pmon
No PMON process was running, confirming that the database was down.
Step 4: Review the Alert Log
Locate the diagnostic directory:
SHOW PARAMETER diagnostic_dest;
Navigate to the trace directory:
cd $ORACLE_BASE/diag/rdbms/proddb/PRODDB/trace
Monitor the alert log:
tail -100 alert_PRODDB.log
Observation:
No corruption or startup failure errors.
Database had shut down cleanly before the OS reboot.
No automatic startup attempt after the reboot.
Step 5: Verify Listener Status
lsnrctl status
Listener was running successfully.
Step 6: Start the Database
Connect as SYSDBA:
sqlplus / as sysdba
Start the database:
STARTUP;
Verify status:
SELECT STATUS FROM V$INSTANCE;
SELECT OPEN_MODE FROM V$DATABASE;
Expected Output:
STATUS : OPEN
OPEN_MODE : READ WRITE
Step 7: Application Validation
Ask the application team to retry connections.
Verify business transactions.
Confirm application functionality.
Issue resolved successfully.
Root Cause Analysis (RCA)
The production server was rebooted after operating system patching.
Although the listener started automatically, Oracle Database auto-start was not configured.
The /etc/oratab entry was:
PRODDB:/u01/app/oracle/product/19c/dbhome_1:N
Since the last field was N, the database did not start automatically after the reboot.
Permanent Fix
Update /etc/oratab
Change:
PRODDB:/u01/app/oracle/product/19c/dbhome_1:N
To:
PRODDB:/u01/app/oracle/product/19c/dbhome_1:Y
Enable Oracle Auto Startup Script
On Linux systems, ensure the Oracle startup service (dbstart/dbshut or the appropriate systemd service) is configured to start during server boot.
Validate
Perform a controlled server reboot and confirm:
Listener starts automatically.
Database starts automatically.
Application connectivity is restored without manual intervention.
Commands Used During Troubleshooting
uptime
last reboot
ps -ef | grep pmon
lsnrctl status
tail -100 alert_PRODDB.log
sqlplus / as sysdba
STARTUP;
SELECT STATUS FROM V$INSTANCE;
SELECT OPEN_MODE FROM V$DATABASE;
Lessons Learned
Always verify Oracle auto-start configuration after installation or cloning.
Review the alert log first to identify the root cause.
Validate database startup after every OS patching activity.
Maintain clear communication with application and infrastructure teams during P1 incidents.
Record the complete RCA, resolution steps, and preventive actions in the incident management tool (e.g., ServiceNow).
Interview Answer
Question: Tell us about a production issue you resolved.
Answer:
"During a scheduled OS patching activity, the production server rebooted successfully, but the database did not come up automatically. The application team reported ORA-01034 errors and users were unable to access the application. I immediately acknowledged the P1 incident, connected to the server, verified that the PMON process was not running, reviewed the alert log, and confirmed there were no database errors. I identified that the database auto-start was not configured because the /etc/oratab entry was set to 'N'. I manually started the database, validated application connectivity, and then permanently fixed the issue by enabling Oracle auto-start. The entire incident was resolved in approximately 17 minutes with no data loss."
No comments:
Post a Comment