In spark, what is the best way to control file size of the output file. For example, in log4j, we can specify max file size, after which the file rotates.
I am looking for similar solution for parquet file. Is there a max file size option available when writing a file?
I have few workarounds, but none is good. If I want to limit files to 64mb, then One option is to repartition the data and write to temp location. And then merge the files together using the file size in the temp location. But getting the correct file size is difficult.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…