Namenode Not Exiting Safe Mode.

Applies To: Hadoop HDFS NameNode
Category: Troubleshooting → HDFS

Issue Summary

The HDFS NameNode remains in "safemode" even after startup, preventing write operations to HDFS and signaling that the cluster is not fully healthy. This means the NameNode is waiting for a sufficient percentage of DataNodes to report their blocks.

Possible Cause(s)

List common reasons why this issue may occur.

Insufficient number of DataNodes reporting blocks.

DataNode failures or slow DataNode startups.

Misconfigured dfs.namenode.safemode.threshold-pct (percentage of blocks required).

Network issues preventing DataNodes from communicating with the NameNode.

Disk space full on DataNodes preventing block reports.

Step-by-Step Resolution

1. Check NameNode Logs:

Examine the NameNode logs. Look for messages about entering/exiting safemode, block reports received, and any errors related to block corruption or DataNode communication.

cat $HADOOP_HOME/logs/hadoop-hadoop-namenode-<hostname>.log

Look for lines like “Exiting Safe mode” or “Leaving safe mode” to see if it's attempting to exit.

cat $HADOOP_HOME/logs/hadoop-hadoop-namenode-<hostname>.log | grep –E “Exiting Safe mode | Leaving safe mode”

2. Check DataNode Logs:

Read the DataNode logs at:

cat $HADOOP_HOME/logs/hadoop-hadoop-datanode-<hostname>.log

2. Check DataNode Status:

Access the NameNode UI.

http://<hostname>:9870/explorer.html#/

Verify that a sufficient number of DataNodes are live and registered.

http://<hostname>:9870/dfshealth.html#tab-datanode

Check for any DataNodes listed as “dead” or “unhealthy”

http://<hostname>:9870/dfshealth.html#tab-datanode

3. Inspect Missing/Corrupted Blocks:

On the NameNode UI, check the “Summary” section for “Missing Blocks” or “Corrupt Blocks”.

If missing blocks are present, identify the files causing them. These files may need to be deleted or recovered if possible.

hdfs fsck / -files –blocks -locations

4. Verify DataNode Connectivity:

Ensure DataNodes can communicate with the NameNode.

To check the connectivity from datanode to namenode:

ping <IP-address-of-namenode>

telnet <IP-address-of-namenode> 8020

Check DataNode logs for connection errors to the NameNode.

cat $HADOOP_HOME/logs/hadoop-hadoop-datanode-<hostname>.log

5. Force Exit Safemode (with caution):

Only do this if you understand the implications and are certain no critical data will be lost. Forcing an exit when blocks are truly missing can lead to data loss.

To leave the safe mode:

hdfs dfsadmin -safemode leave

6. Adjust Safemode Threshold (if necessary):

If you have a very small cluster or a specific use case, you might temporarily lower dfs.namenode.safemode.threshold-pct in hdfs-site.xml to allow the NameNode to exit safemode with fewer reported blocks. This is generally not recommended for production.

Edit on hdfs-site.xml:

<name>dfs.namenode.safemode.threshold-pct</name>

<description> Specifies the percentage of blocks that should satisfy the minimal replication requirement defined by dfs.namenode.replication.min. Values less than or equal to 0 mean not to wait for any particular percentage of blocks before exiting safemode. Values greater than 1 will make safe mode permanent.

</description>

</property>

7. Restart DataNodes (if they are stuck):

If DataNodes are sluggish or have issues reporting blocks, try restarting them one by one.

hadoop-daemon.sh stop datanode

hadoop-daemon.sh start datanode

Additional Notes:

Safemode is a protective measure. The NameNode enters it to prevent data corruption when the cluster block replica count is below the configured threshold.

The ideal solution is to bring up all DataNodes and ensure all blocks are reported successfully.

In an HA (High Availability) setup, safemode behavior is managed differently, as there are active and standby NameNodes.

Related Articles
Standby Namenode Startup Failures After a recent system Crash.
Applies To: Hadoop HDFS (HA deployement with QuorumJournalManager or zkfc) Category: Troubleshooting → HDFS, High Availability Issue Summary In a Hadoop High Availability (HA) cluster, after a failover or restart, the standby NameNode fails to ...
Critical Configuration Properties for HDFS, YARN, Spark, and Other Hadoop Components
Category: Configuration → Hadoop Platform Applies To: Hadoop 3.x, spark 3.x Issue Summary This document provides a comprehensive list of critical properties and essential configurations for the core components of the Hadoop ecosystem: HDFS, YARN, and ...
Managing HDFS Space and Replication
Managing HDFS Space and Replication Category: Troubleshooting → HDFS Applies To: Apache Hadoop HDFS 2.x, 3.x Issue summary: Effective management of HDFS disk space and data replication is crucial for the stability, performance, and data availability ...
Troubleshooting Yarn Application Failures
Troubleshooting Yarn Application Failures Category: Troubleshooting → YARN Applies To: Apache YARN 2.x, 3.x Issues Summary: YARN applications (such as Spark, MapReduce, Tez jobs) fail to complete successfully, often exiting with a FAILED status, or ...
Resolving Delayed DataNode Initialization: Effective Strategies.
Applies To: Hadoop HDFS DataNode Category: Troubleshooting → HDFS Issue Summary An HDFS DataNode is taking an unusually long time to start up and join the cluster, potentially delaying data availability and cluster operations. Possible Cause(s) List ...

Namenode Not Exiting Safe Mode.

Namenode Not Exiting Safe Mode.

Related Articles

Standby Namenode Startup Failures After a recent system Crash.

Critical Configuration Properties for HDFS, YARN, Spark, and Other Hadoop Components

Managing HDFS Space and Replication

Troubleshooting Yarn Application Failures

Resolving Delayed DataNode Initialization: Effective Strategies.