Hadoop/Yarn Jobs Not starting - stuck in accepted state

Title: hadoop yarn job stuck in accepted state - Step-by-Step Troubleshooting Guide

Category: Troubleshooting

Applies To: Last Updated: 23/06/2025

Issue Summary

A job submitted via YARN remains in the ACCEPTED state indefinitely and does not transition to RUNNING.

Possible Cause(s)

Insufficient Resources (Memory/CPU) in the cluster to allocate the ApplicationMaster.

YARN Scheduler Misconfiguration (e.g., resource allocation or queue limits).

Cluster NodeManager not available or unhealthy.

Queue Capacity Limit Reached in Capacity/Fair scheduler.

High number of pending applications or containers.

Step-by-Step Resolution

Step 1: Check Application State

yarn application -status <Application_ID>

Look for:

State: ACCEPTED

FinalStatus: UNDEFINED

Diagnostics for any error messages.

Step 2: Resource Availability

Check if the cluster has enough free memory and vCores:

yarn node -list

Or use:

yarn rmadmin -checkHealth

Also visit the ResourceManager UI:

http://<resourcemanager-host>:8088/cluster/scheduler

Step 3: Review Queue Configuration

If using CapacityScheduler, ensure your queue has enough space:

yarn scheduler -status <queue-name>

Check capacity-scheduler.xml:

<name>yarn.scheduler.capacity.root.<queue-name>.capacity</name>

</property>

Verify:

Queue isn’t full.

User has access to that queue.

Step 4: Check NodeManager Health

Ensure enough NodeManagers are healthy:

yarn node -list

Investigate if NodeManager logs (/var/log/hadoop-yarn/nodemanager/) show any issues:

grep -i "ERROR" /var/log/hadoop-yarn/nodemanager/*.log

Step 5: ApplicationMaster Launch Delay

If the job waits too long to launch AM, increase AM resource limit or reduce other running jobs.

Also check ResourceManager logs:

/var/log/hadoop-yarn/resourcemanager/yarn-yarn-resourcemanager-*.log

Step 6: Restart ResourceManager or NodeManagers (if needed)

If system health is affected, and resources appear available but are not allocated, restart services (only if safe):

yarn --daemon stop resourcemanager

yarn --daemon stop nodemanager

yarn --daemon start resourcemanager

yarn --daemon start nodemanager

Additional notes:

This issue is more frequent during high-load conditions.

Implement autoscaling or increase NodeManager count if cluster usage is consistently high.

Related Articles
Troubleshooting Yarn Application Failures
Troubleshooting Yarn Application Failures Category: Troubleshooting → YARN Applies To: Apache YARN 2.x, 3.x Issues Summary: YARN applications (such as Spark, MapReduce, Tez jobs) fail to complete successfully, often exiting with a FAILED status, or ...
Critical Configuration Properties for HDFS, YARN, Spark, and Other Hadoop Components
Category: Configuration → Hadoop Platform Applies To: Hadoop 3.x, spark 3.x Issue Summary This document provides a comprehensive list of critical properties and essential configurations for the core components of the Hadoop ecosystem: HDFS, YARN, and ...
Job Not Progressing - stuck in NEW_SAVING After Submission
Title: Hadoop YARN Job Stuck in NEW_SAVING State Category: Troubleshooting Applies To: Hadoop 3.4.1 Last Updated: 23/06/2025 Issue Summary A submitted YARN application or job remains indefinitely in the NEW_SAVING state and does not transition to ...
Understanding Hadoop Logs – Types, Use Cases, and Common Locations.
Category: Troubleshooting → Logging and montoring Applies To: Hadoop HA cluster. Issue Summary In a distributed Hadoop HA cluster, component logs are the primary source of truth for monitoring system health, diagnosing failures, and troubleshooting ...
How to Debug Spark Application Logs (YARN UI)
How to Debug Spark Application Logs (YARN UI) Category: Troubleshooting → Apache Spark Applies To: Apache Spark 2.x, 3.x running on Apache Hadoop YARN 2.x, 3.x Issue summary: When a Spark application fails on a YARN cluster, the application logs are ...

Hadoop/Yarn Jobs Not starting - stuck in accepted state

Hadoop/Yarn Jobs Not starting - stuck in accepted state

Related Articles

Troubleshooting Yarn Application Failures

Critical Configuration Properties for HDFS, YARN, Spark, and Other Hadoop Components

Job Not Progressing - stuck in NEW_SAVING After Submission

Understanding Hadoop Logs – Types, Use Cases, and Common Locations.

How to Debug Spark Application Logs (YARN UI)