Hadoop/Yarn Jobs Not starting - stuck in accepted state

Hadoop/Yarn Jobs Not starting - stuck in accepted state

Title: hadoop yarn job stuck in accepted state - Step-by-Step Troubleshooting Guide 

Category: Troubleshooting  

Applies To: Last Updated: 23/06/2025 

Issue Summary 

A job submitted via YARN remains in the ACCEPTED state indefinitely and does not transition to RUNNING. 

Possible Cause(s) 

  • Insufficient Resources (Memory/CPU) in the cluster to allocate the ApplicationMaster 

  • YARN Scheduler Misconfiguration (e.g., resource allocation or queue limits).  

  • Cluster NodeManager not available or unhealthy.  

  • Queue Capacity Limit Reached in Capacity/Fair scheduler.  

  • High number of pending applications or containers.  

Step-by-Step Resolution 

Step 1: Check Application State 

yarn application -status <Application_ID> 

Look for: 

State: ACCEPTED 

FinalStatus: UNDEFINED 

Diagnostics for any error messages. 

Step 2: Resource Availability 

Check if the cluster has enough free memory and vCores: 

yarn node -list 

Or use: 

yarn rmadmin -checkHealth 

Also visit the ResourceManager UI: 

http://<resourcemanager-host>:8088/cluster/scheduler 

Step 3: Review Queue Configuration 

If using CapacityScheduler, ensure your queue has enough space: 

yarn scheduler -status <queue-name> 

Check capacity-scheduler.xml: 

<property> 

  <name>yarn.scheduler.capacity.root.<queue-name>.capacity</name> 

  <value>...</value> 

</property> 

Verify: 

Queue isn’t full. 

User has access to that queue. 

Step 4: Check NodeManager Health 

Ensure enough NodeManagers are healthy: 

yarn node -list 

Investigate if NodeManager logs (/var/log/hadoop-yarn/nodemanager/) show any issues: 

grep -i "ERROR" /var/log/hadoop-yarn/nodemanager/*.log 

Step 5: ApplicationMaster Launch Delay 

If the job waits too long to launch AM, increase AM resource limit or reduce other running jobs. 

Also check ResourceManager logs: 

/var/log/hadoop-yarn/resourcemanager/yarn-yarn-resourcemanager-*.log 

Step 6: Restart ResourceManager or NodeManagers (if needed) 

If system health is affected, and resources appear available but are not allocated, restart services (only if safe): 

yarn  --daemon stop resourcemanager  

yarn  --daemon stop nodemanager 

yarn  --daemon start resourcemanager  

yarn  --daemon start nodemanager 

Additional notes: 

This issue is more frequent during high-load conditions. 

Implement autoscaling or increase NodeManager count if cluster usage is consistently high. 

    • Related Articles

    • Troubleshooting Yarn Application Failures

      Troubleshooting Yarn Application Failures Category: Troubleshooting → YARN Applies To: Apache YARN 2.x, 3.x Issues Summary: YARN applications (such as Spark, MapReduce, Tez jobs) fail to complete successfully, often exiting with a FAILED status, or ...
    • Critical Configuration Properties for HDFS, YARN, Spark, and Other Hadoop Components

      Category: Configuration → Hadoop Platform Applies To: Hadoop 3.x, spark 3.x Issue Summary This document provides a comprehensive list of critical properties and essential configurations for the core components of the Hadoop ecosystem: HDFS, YARN, and ...
    • Job Not Progressing - stuck in NEW_SAVING After Submission

      Title: Hadoop YARN Job Stuck in NEW_SAVING State Category: Troubleshooting Applies To: Hadoop 3.4.1 Last Updated: 23/06/2025 Issue Summary A submitted YARN application or job remains indefinitely in the NEW_SAVING state and does not transition to ...
    • Understanding Hadoop Logs – Types, Use Cases, and Common Locations.

      Category: Troubleshooting → Logging and montoring Applies To: Hadoop HA cluster. Issue Summary In a distributed Hadoop HA cluster, component logs are the primary source of truth for monitoring system health, diagnosing failures, and troubleshooting ...
    • How to Debug Spark Application Logs (YARN UI)

      How to Debug Spark Application Logs (YARN UI) Category: Troubleshooting → Apache Spark Applies To: Apache Spark 2.x, 3.x running on Apache Hadoop YARN 2.x, 3.x Issue summary: When a Spark application fails on a YARN cluster, the application logs are ...