After a spark program completes, there are 3 temporary directories remain in the temp directory.
The directory names are like this: spark-2e389487-40cc-4a82-a5c7-353c0feefbb7
The directories are empty.
And when the Spark program runs on Windows, a snappy DLL file also remains in the temp directory.
The file name is like this: snappy-1.0.4.1-6e117df4-97b6-4d69-bf9d-71c4a627940c-snappyjava
They are created every time the Spark program runs. So the number of files and directories keeps growing.
How can let them be deleted?
Spark version is 1.3.1 with Hadoop 2.6.
UPDATE
I've traced the spark source code.
The module methods that create the 3 'temp' directories are as follows:
- DiskBlockManager.createLocalDirs
- HttpFileServer.initialize
- SparkEnv.sparkFilesDir
They (eventually) call Utils.getOrCreateLocalRootDirs and then Utils.createDirectory, which intentionally does NOT mark the directory for automatic deletion.
The comment of createDirectory method says: "The directory is guaranteed to be
newly created, and is not marked for automatic deletion."
I don't know why they are not marked. Is this really intentional?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…