Job Not Progressing - stuck in NEW_SAVING After Submission

Title: Hadoop YARN Job Stuck in NEW_SAVING State

Category: Troubleshooting

Applies To: Hadoop 3.4.1

Last Updated: 23/06/2025

Issue Summary

A submitted YARN application or job remains indefinitely in the NEW_SAVING state and does not transition to ACCEPTED or RUNNING. This prevents the job from executing.

Possible Cause(s)

Zookeeper qourum failure or HA issues (if HA is enabled)

ResourceManager in standby mode or unable to coordinate with ZK.

Corrupted YARN state-store

Step-by-Step Resolution

Step 1: Check Job Status

yarn application -status <application_id>

If it shows NEW, continue to next step.

Step 2: Inspect ResourceManager Logs

Check logs on the active ResourceManager node:

cd $HADOOP_HOME/logs/

less yarn-yarn-resourcemanager-*.log

Look for errors or messages around the application ID, like:

Scheduler queue not found

Cannot assign application to queue

User not authorized

Step 3: Check Available Cluster Resources

yarn node -list

Verify total memory and vCores are sufficient to run ApplicationMaster (default is 1024 MB and 1 vCore).

Also:

yarn cluster --list

Check the active/standby state of Resource Managers.

Step 4: Verify Zookeeper Quorum Health:

check Zookeeper health and availability on all nodes where zookeeper hosted.

echo stat | nc <host> <port>

If command given above is failed on majority of nodes then start the zookeeper on failed nodes using:

zkServer.sh start

Step 5: Restart Resource Manager (if necessary)

If the RM is unresponsive or misbehaving:

yarn --daemon stop resourcemanager

yarn --daemon start resourcemanager

Or restart the RM service via systemctl if installed as a service.

Step 6: After Restart the Resource Manager if the issue persist then format the RM state store and Restart again.

Command:

yarn resourcemanager format-state-store

Additional Notes

For HA setups, make sure only one ResourceManager is in active state.

Monitor UI at http://<rm-host>:8088 to track job transitions.

Related Articles
Hadoop/Yarn Jobs Not starting - stuck in accepted state
Title: hadoop yarn job stuck in accepted state - Step-by-Step Troubleshooting Guide Category: Troubleshooting Applies To: Last Updated: 23/06/2025 Issue Summary A job submitted via YARN remains in the ACCEPTED state indefinitely and does not ...
Identifying Causes and Solutions for Job Slowness in Hadoop
Category: Troubleshooting → Performance, Job Management Applies To: Distributed Processing Systems (Hadoop, Spark, etc.), Databases, Any application with batch jobs Issue Summary A batch job, data pipeline, or long-running process is executing ...
Troubleshooting Yarn Application Failures
Troubleshooting Yarn Application Failures Category: Troubleshooting → YARN Applies To: Apache YARN 2.x, 3.x Issues Summary: YARN applications (such as Spark, MapReduce, Tez jobs) fail to complete successfully, often exiting with a FAILED status, or ...
NewEvol Release Note 1.4.0
Following points were covered in the version 1.4.0: New feature: 12 Enhancement: 22 Issues resolved: 16 Known Issues to be Fixed: 2
Understanding Hadoop Logs – Types, Use Cases, and Common Locations.
Category: Troubleshooting → Logging and montoring Applies To: Hadoop HA cluster. Issue Summary In a distributed Hadoop HA cluster, component logs are the primary source of truth for monitoring system health, diagnosing failures, and troubleshooting ...

Job Not Progressing - stuck in NEW_SAVING After Submission

Job Not Progressing - stuck in NEW_SAVING After Submission

Related Articles

Hadoop/Yarn Jobs Not starting - stuck in accepted state

Identifying Causes and Solutions for Job Slowness in Hadoop

Troubleshooting Yarn Application Failures

NewEvol Release Note 1.4.0

Understanding Hadoop Logs – Types, Use Cases, and Common Locations.