Understanding Hadoop Logs – Types, Use Cases, and Common Locations.

Category: Troubleshooting → Logging and montoring
Applies To: Hadoop HA cluster.

Issue Summary

In a distributed Hadoop HA cluster, component logs are the primary source of truth for monitoring system health, diagnosing failures, and troubleshooting performance issues. This document outlines the key log files, their locations, and how to use them effectively for clusters with High Availability.

Common Log File Locations

All Hadoop component logs are typically located in the $HADOOP_HOME/logs directory on their respective nodes.

The log file name follows the format:

hadoop-<user>-<component>-<hostname>.log.

For example, on a master node named master1, the NameNode log would be $HADOOP_HOME/logs/hadoop-hadoop-namenode-master1.log.

Step-by-Step Resolution: Log File Breakdown by Node Role

NameNode Log

Logs generate in $HADOOP_HOME/logs/hadoop-hadoop-namenode-<namenode_hostname>.log

What it logs:

The complete lifecycle of the HDFS namespace, including block management, DataNode heartbeats, file system operations, and most importantly, the synchronization state with the other NameNode.

Use for troubleshooting:

Failover Events: On both master nodes, this log will show when a NameNode transitions from active to standby or vice-versa. Look for messages like "Transitioning to active state" or "Transitioning to standby state."

Synchronization: On the standby NameNode, this log confirms it is successfully reading EditLog transactions from the JournalNodes to stay in sync with the active NameNode.

Safemode: Tracks when a NameNode enters or exits safemode, which is a key indicator of cluster health.

DataNode Health: Records heartbeats from DataNodes. Missing heartbeats from a DataNode will be logged here, indicating a potential worker node failure.

JournalNode Log

Logs generate in $HADOOP_HOME/logs/hadoop-hadoop-journalnode-<namenode_hostname>.log

What it logs:

These logs track the writes from the active NameNode and reads from the standby NameNode, ensuring consistent metadata across the cluster.

Use for troubleshooting:

EditLog Write Errors: If the active NameNode fails to write EditLog entries to the JournalNodes, it will be logged here. This is a critical issue that can prevent failover.

Standby Read Errors: Shows if the standby NameNode is having trouble reading from the JournalNodes, which would prevent it from staying in sync and taking over as active.

ZKFC Log

Logs generate in $HADOOP_HOME/logs/hadoop-hadoop-zkfc-<namenode_hostname>.log

What it logs:

The ZKFC (ZooKeeper Failover Controller) manages the automatic failover process. Its log provides details on communication with ZooKeeper and the NameNode's health checks.

Use for troubleshooting:

ZooKeeper Session: Verifies that the ZKFC has a healthy session with the ZooKeeper quorum.

Health Checks: Logs the results of periodic health checks on the NameNode. If a NameNode is unhealthy, the ZKFC will attempt a failover.

Failover Attempts: Detailed logs of failover attempts, including election results and fencing actions. This is the first place to check if an automatic failover fails.

ResourceManager Log

Logs generate in $HADOOP_HOME/logs/hadoop-hadoop-resourcemanager-<namenode_hostname>.log

What it logs:

The ResourceManager tracks overall cluster resources, application submissions, and the health of all NodeManagers.

Use for troubleshooting:

Application Lifecycle: Tracks jobs from submission to completion. If a job is stuck, this log will show if it's due to resource limitations.

NodeManager Health: Records heartbeats from NodeManagers. A lack of heartbeats indicates a worker node is down.

Resource Scheduling: Provides insight into how resources are allocated to applications.

Datanode log

Logs generate in $HADOOP_HOME/logs/hadoop-hadoop-datanode-<datanode_hostname>.log

What it logs:

Data block storage, serving read/write requests, and communication with the NameNode.

Use for troubleshooting:

Block Corruption: Reports if any data blocks on the local disk are corrupted.

Disk I/O Errors: Records any issues with the local disks where data blocks are stored.

Heartbeats: Confirms that heartbeats are being successfully sent to the NameNode.

Nodemanager log

Logs generate in $HADOOP_HOME/logs/hadoop-hadoop-nodemanager-<datanode_hostname>.log

What it logs:

The lifecycle of YARN containers on the worker node, resource usage by containers, and communication with the ResourceManager.

Use for troubleshooting:

Container Failures: Logs container-level failures, often due to exceeding resource limits (e.g., OutOfMemoryError).

Local Disk Issues: Reports issues with the local directories used for storing container logs and data.

Application-Specific Logs: Provides pointers to the location of individual application container logs.

Metastore Log

Logs generate in var/log/mysql/mysqld.log

What it logs:

All interactions with the metastore database, including table creation, partition management, and schema updates.

Use for troubleshooting:

Essential for diagnosing any metadata-related issues, such as "Table not found" errors, permissions issues on the database, or failures to connect to the external MYSQL metastore.

Hive CLI / Beeline Client Logs

Logs generate in /tmp/user_name/hive.log

What it logs:

The client-side activity, including commands issued and responses received from HiveServer2.

Use for troubleshooting:

Hive uses log4j for logging. These logs are not emitted to the standard output by default but are instead captured to a log file specified by Hive's log4j properties file. By default, Hive will use hive-log4j.default in the conf/ directory of the Hive installation, which writes logs to /tmp/<user_name>/hive.log and uses the WARN level.

Application master Log

To check the application master logs:

yarn logs -applicationId

Additional Notes:

Logging Levels: To get more detailed information, you can adjust the logging level for any component in its respective log4j.properties file (e.g., from INFO to DEBUG).

Correlating Timestamps: When troubleshooting a cluster-wide issue, always compare timestamps across logs from different nodes to create a chronological sequence of events.

Related Articles
Identifying Causes and Solutions for Job Slowness in Hadoop
Category: Troubleshooting → Performance, Job Management Applies To: Distributed Processing Systems (Hadoop, Spark, etc.), Databases, Any application with batch jobs Issue Summary A batch job, data pipeline, or long-running process is executing ...
How to Debug Spark Application Logs (YARN UI)
How to Debug Spark Application Logs (YARN UI) Category: Troubleshooting → Apache Spark Applies To: Apache Spark 2.x, 3.x running on Apache Hadoop YARN 2.x, 3.x Issue summary: When a Spark application fails on a YARN cluster, the application logs are ...
Hadoop/Yarn Jobs Not starting - stuck in accepted state
Title: hadoop yarn job stuck in accepted state - Step-by-Step Troubleshooting Guide Category: Troubleshooting Applies To: Last Updated: 23/06/2025 Issue Summary A job submitted via YARN remains in the ACCEPTED state indefinitely and does not ...
Critical Configuration Properties for HDFS, YARN, Spark, and Other Hadoop Components
Category: Configuration → Hadoop Platform Applies To: Hadoop 3.x, spark 3.x Issue Summary This document provides a comprehensive list of critical properties and essential configurations for the core components of the Hadoop ecosystem: HDFS, YARN, and ...
Namenode Not Exiting Safe Mode.
Applies To: Hadoop HDFS NameNode Category: Troubleshooting → HDFS Issue Summary The HDFS NameNode remains in "safemode" even after startup, preventing write operations to HDFS and signaling that the cluster is not fully healthy. This means the ...

Understanding Hadoop Logs – Types, Use Cases, and Common Locations.

Understanding Hadoop Logs – Types, Use Cases, and Common Locations.

Related Articles

Identifying Causes and Solutions for Job Slowness in Hadoop

How to Debug Spark Application Logs (YARN UI)

Hadoop/Yarn Jobs Not starting - stuck in accepted state

Critical Configuration Properties for HDFS, YARN, Spark, and Other Hadoop Components

Namenode Not Exiting Safe Mode.