Title: hadoop yarn job stuck in accepted state - Step-by-Step Troubleshooting Guide
Category: Troubleshooting
Applies To: Last Updated: 23/06/2025
Issue Summary
A job submitted via YARN remains in the ACCEPTED state indefinitely and does not transition to RUNNING.
Possible Cause(s)
Insufficient Resources (Memory/CPU) in the cluster to allocate the ApplicationMaster.
YARN Scheduler Misconfiguration (e.g., resource allocation or queue limits).
Cluster NodeManager not available or unhealthy.
Queue Capacity Limit Reached in Capacity/Fair scheduler.
High number of pending applications or containers.
Step-by-Step Resolution
Step 1: Check Application State
yarn application -status <Application_ID>
Look for:
State: ACCEPTED
FinalStatus: UNDEFINED
Diagnostics for any error messages.
Step 2: Resource Availability
Check if the cluster has enough free memory and vCores:
yarn node -list
Or use:
yarn rmadmin -checkHealth
Also visit the ResourceManager UI:
http://<resourcemanager-host>:8088/cluster/scheduler
Step 3: Review Queue Configuration
If using CapacityScheduler, ensure your queue has enough space:
yarn scheduler -status <queue-name>
Check capacity-scheduler.xml:
<property>
<name>yarn.scheduler.capacity.root.<queue-name>.capacity</name>
<value>...</value>
</property>
Verify:
Queue isn’t full.
User has access to that queue.
Step 4: Check NodeManager Health
Ensure enough NodeManagers are healthy:
yarn node -list
Investigate if NodeManager logs (/var/log/hadoop-yarn/nodemanager/) show any issues:
grep -i "ERROR" /var/log/hadoop-yarn/nodemanager/*.log
Step 5: ApplicationMaster Launch Delay
If the job waits too long to launch AM, increase AM resource limit or reduce other running jobs.
Also check ResourceManager logs:
/var/log/hadoop-yarn/resourcemanager/yarn-yarn-resourcemanager-*.log
Step 6: Restart ResourceManager or NodeManagers (if needed)
If system health is affected, and resources appear available but are not allocated, restart services (only if safe):
yarn --daemon stop resourcemanager
yarn --daemon stop nodemanager
yarn --daemon start resourcemanager
yarn --daemon start nodemanager
Additional notes:
This issue is more frequent during high-load conditions.
Implement autoscaling or increase NodeManager count if cluster usage is consistently high.