Memory Overhead and OOM in Spark – Tuning Executor Settings
Title: Memory Overhead and OOM in Spark - Tuning Executor Settings
Category: Troubleshooting → Spark
Applies To: Apache Spark 2.x, 3.x (running on YARN, Mesos, or Standalone)
Issue Summary
Spark applications frequently encounter OutOfMemoryError (OOM) exceptions, causing tasks or entire executors to fail, or leading to YARN killing containers due to exceeding memory limits. This often stems from an insufficient understanding and improper configuration of Spark's memory model, particularly the interplay between Java Heap memory and non-Heap "memory overhead."
Possible Cause(s):
Insufficient spark.executor.memory: The allocated JVM heap size (-Xmx) for the executor process is too small to accommodate the data being processed or cached within Spark's managed heap space. This often results in java.lang.OutOfMemoryError: Java heap space.
Insufficient spark.executor.memoryOverhead: The non-heap memory usage (e.g., JVM's internal memory for stacks, code cache, direct buffers, off-heap memory used by native libraries like Netty for shuffle, or by Tungsten if off-heap is enabled) exceeds the configured memoryOverhead value. YARN (or other resource managers) then kills the container without an OutOfMemoryError being logged by the JVM itself within the executor's stdout/stderr. The NodeManager logs will show "Container killed by YARN for exceeding memory limits."
Data Skew: A single task (processing one partition) receives a disproportionately large amount of data, causing its executor to run out of memory, even if other tasks are fine.
Inefficient Code/Data Structures: Application code creates excessive objects, large intermediate data structures, or uses inefficient data types (e.g., many small String objects instead of primitive types).
Long Garbage Collection (GC) Pauses: The JVM spends too much time performing garbage collection, leading to tasks timing out or being perceived as hung, potentially leading to container restarts or kills.
spark.memory.fraction / spark.memory.storageFraction Misconfiguration: These advanced Spark internal memory parameters are set too aggressively, limiting the memory available for execution or storage, even if the total JVM heap is large.
Step-by-Step Resolution / Tuning Strategies
Understand Spark's Memory Model & YARN Integration:
spark.executor.memory: This is the JVM Heap Size (-Xmx) for each Spark executor. It's used for cached RDDs, DataFrame operations, and general Java objects.
spark.executor.memoryOverhead: This is additional memory allocated outside the JVM heap. It's for JVM overhead (metaspace, thread stacks), off-heap allocations (e.g., by Netty for shuffle, or by native libraries), and direct byte buffers.
Total YARN Container Size: When running on YARN, the total memory requested from YARN for an executor is spark.executor.memory + spark.executor.memoryOverhead. This total must fit within YARN's yarn.scheduler.maximum-allocation-mb.
Rule of Thumb: spark.executor.memoryOverhead is usually max(384MB, 10% of spark.executor.memory).
Diagnose the OOM Type from Logs:
Search executor logs (stdout/stderr/log4j):
If you see java.lang.OutOfMemoryError: Java heap space: Your spark.executor.memory is too low for the actual data/computations. Increase it.
If you don't see an OOM in the executor logs, but YARN logs state "Container killed by YARN for exceeding memory limits": Your spark.executor.memoryOverhead is likely too low.
Tune Executor Memory (spark.executor.memory):
Increase Gradually: Start by incrementally increasing spark.executor.memory (e.g., from 4g to 6g, then 8g, etc.).
Consider Data Size: If you're caching data or working with very large partitions, you might need more memory.
Monitor Spark UI: Check the "Executors" tab in the Spark UI for the "Storage Memory" and "Task Memory" metrics to see how much memory is being used.
Tune Memory Overhead (spark.executor.memoryOverhead):
When to Adjust: When containers are killed by YARN due to exceeding memory limits without a Java OutOfMemoryError in the Spark logs.
How to Adjust: Explicitly set spark.executor.memoryOverhead (e.g., to 512m or 1024m). It's usually safe to set it to 10-20% of spark.executor.memory.
Example: If spark.executor.memory=8g, try spark.executor.memoryOverhead=1g or 1200m.
Optimize spark.executor.cores:
Too many cores per executor (e.g., >5) can lead to:
I/O contention on local disks for shuffle.
Increased memoryOverhead requirements (more threads, more stack space).
Recommendation: Aim for 3-5 cores per executor. This balances parallelism within an executor with I/O and overhead.
JVM Garbage Collection Tuning:
Enable GC Logs: Add spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCDateStamps to spark-defaults.conf. Analyze the logs for long GC pauses.
Use G1GC: For larger heaps (above 4-6GB), spark.executor.extraJavaOptions=-XX:+UseG1GC is generally recommended over the default Parallel GC, as G1GC aims for predictable pause times.
Tune Heap Size: A smaller spark.executor.memory might lead to more frequent but shorter GCs. A larger heap might lead to fewer but longer GCs. Find the right balance for your workload.
Handle Data Skew:
Identification: In Spark UI, look at the "Stages" tab. If one or a few tasks in a stage run significantly longer or fail while others succeed, it's a strong sign of data skew.
Strategies:
Salting: For highly skewed keys in joins/aggregations, add a random prefix/suffix to distribute them evenly across partitions, then remove it.
spark.sql.adaptive.enabled=true (Spark 3.x+): Adaptive Query Execution (AQE) can dynamically coalesce shuffle partitions and handle skew.
Manual repartition()/repartitionByRange(): Use these to redistribute data more evenly before a shuffle, if you know the skew pattern.
Leverage Off-Heap Memory:
For very large shuffles, aggregations, or working with native libraries, off-heap memory can reduce GC pressure.
Enable with spark.memory.offHeap.enabled=true and set size with spark.memory.offHeap.size=<bytes>. This memory is managed by Spark, not the JVM GC.
Efficient Serialization and Data Formats:
Kryo: Always set spark.serializer=org.apache.spark.serializer.KryoSerializer. It's faster and more compact than Java serialization. Register for custom classes for better performance.
Columnar Formats: Use Parquet or ORC for storing data in HDFS. These formats are optimized for read performance, compression, and predicate pushdown, which reduces the amount of data Spark needs to process in memory.