Job Not Progressing - stuck in NEW_SAVING After Submission

Job Not Progressing - stuck in NEW_SAVING After Submission

Title: Hadoop YARN Job Stuck in NEW_SAVING State 

Category: Troubleshooting  

Applies To: Hadoop 3.4.1  

Last Updated: 23/06/2025 

Issue Summary 

A submitted YARN application or job remains indefinitely in the NEW_SAVING state and does not transition to ACCEPTED or RUNNING. This prevents the job from executing. 

Possible Cause(s) 

  • Zookeeper qourum failure or HA issues (if HA is enabled) 

  • ResourceManager in standby mode or unable to coordinate with ZK. 

  • Corrupted YARN state-store


Step-by-Step Resolution 

Step 1: Check Job Status 

yarn application -status <application_id> 

If it shows NEW, continue to next step.  

Step 2: Inspect ResourceManager Logs 

Check logs on the active ResourceManager node: 

cd $HADOOP_HOME/logs/ 

less yarn-yarn-resourcemanager-*.log 

Look for errors or messages around the application ID, like: 

Scheduler queue not found 

Cannot assign application to queue 

User not authorized    

Step 3: Check Available Cluster Resources 

yarn node -list 

Verify total memory and vCores are sufficient to run ApplicationMaster (default is 1024 MB and 1 vCore). 

Also: 

yarn cluster --list 

Check the active/standby state of Resource Managers. 

Step 4: Verify Zookeeper Quorum Health: 
check Zookeeper health and availability on all nodes where zookeeper hosted.
echo stat | nc <host> <port> 

If command given above is failed on majority of nodes then start the zookeeper on failed nodes using:
zkServer.sh start

Step 5: Restart Resource Manager (if necessary) 

If the RM is unresponsive or misbehaving: 

yarn  --daemon stop resourcemanager 

yarn  --daemon start resourcemanager 

Or restart the RM service via systemctl if installed as a service. 


Step 6: After Restart the Resource Manager if the issue persist then format the RM state store and Restart again.
Command:
yarn resourcemanager format-state-store

Additional Notes 

For HA setups, make sure only one ResourceManager is in active state. 

Monitor UI at http://<rm-host>:8088 to track job transitions. 

    • Related Articles

    • Hadoop/Yarn Jobs Not starting - stuck in accepted state

      Title: hadoop yarn job stuck in accepted state - Step-by-Step Troubleshooting Guide Category: Troubleshooting Applies To: Last Updated: 23/06/2025 Issue Summary A job submitted via YARN remains in the ACCEPTED state indefinitely and does not ...
    • Identifying Causes and Solutions for Job Slowness in Hadoop

      Category: Troubleshooting → Performance, Job Management Applies To: Distributed Processing Systems (Hadoop, Spark, etc.), Databases, Any application with batch jobs Issue Summary A batch job, data pipeline, or long-running process is executing ...
    • Troubleshooting Yarn Application Failures

      Troubleshooting Yarn Application Failures Category: Troubleshooting → YARN Applies To: Apache YARN 2.x, 3.x Issues Summary: YARN applications (such as Spark, MapReduce, Tez jobs) fail to complete successfully, often exiting with a FAILED status, or ...
    • NewEvol Release Note 1.4.0

      Following points were covered in the version 1.4.0: New feature: 12 Enhancement: 22 Issues resolved: 16 Known Issues to be Fixed: 2
    • Understanding Hadoop Logs – Types, Use Cases, and Common Locations.

      Category: Troubleshooting → Logging and montoring Applies To: Hadoop HA cluster. Issue Summary In a distributed Hadoop HA cluster, component logs are the primary source of truth for monitoring system health, diagnosing failures, and troubleshooting ...