Title: Hadoop YARN Job Stuck in NEW_SAVING State
Category: Troubleshooting
Applies To: Hadoop 3.4.1
Last Updated: 23/06/2025
Issue Summary
A submitted YARN application or job remains indefinitely in the NEW_SAVING state and does not transition to ACCEPTED or RUNNING. This prevents the job from executing.
Possible Cause(s)
Zookeeper qourum failure or HA issues (if HA is enabled)
ResourceManager in standby mode or unable to coordinate with ZK.
Corrupted YARN state-store
Step-by-Step Resolution
Step 1: Check Job Status
yarn application -status <application_id>
If it shows NEW, continue to next step.
Step 2: Inspect ResourceManager Logs
Check logs on the active ResourceManager node:
cd $HADOOP_HOME/logs/
less yarn-yarn-resourcemanager-*.log
Look for errors or messages around the application ID, like:
Scheduler queue not found
Cannot assign application to queue
User not authorized
Step 3: Check Available Cluster Resources
yarn node -list
Verify total memory and vCores are sufficient to run ApplicationMaster (default is 1024 MB and 1 vCore).
Also:
yarn cluster --list
Check the active/standby state of Resource Managers.
Step 5: Restart Resource Manager (if necessary)
If the RM is unresponsive or misbehaving:
yarn --daemon stop resourcemanager
yarn --daemon start resourcemanager
Or restart the RM service via systemctl if installed as a service.
Additional Notes
For HA setups, make sure only one ResourceManager is in active state.
Monitor UI at http://<rm-host>:8088 to track job transitions.