rdd - What does "Stage Skipped" mean in Apache Spark web UI?

Question

Welcome To Ask or Share your Answers For Others

rdd - What does "Stage Skipped" mean in Apache Spark web UI?

1 Answer

深蓝 · Answer 1 · 2021-10-16T22:16:40+0000

Typically it means that data has been fetched from cache and there was no need to re-execute given stage. It is consistent with your DAG which shows that the next stage requires shuffling (reduceByKey). Whenever there is shuffling involved Spark automatically caches generated data:

Shuffle also generates a large number of intermediate files on disk. As of Spark 1.3, these files are preserved until the corresponding RDDs are no longer used and are garbage collected. This is done so the shuffle files don’t need to be re-created if the lineage is re-computed.

Categories

rdd - What does "Stage Skipped" mean in Apache Spark web UI?

rdd - What does "Stage Skipped" mean in Apache Spark web UI?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags