I have a Spark batch job that consumes data from a Kafka topic with 300 partitions.
Since the partition count is fixed, shall I set the executor count to 75 with 4 cores each (75*4 = 300), or shall spark.dynamicAllocation.enabled=true with min executor = 75(since with 4 cores/executor the reading from Kafka will be parallel) and max as some higher number? After reading from Kafka, the job performs some narrow and wide transformation as well.
spark.dynamicAllocation.enabled=true
I am setting spark.executor.cores=4 in both cases.
spark.executor.cores=4
Thanks in advance.
2.1m questions
2.1m answers
60 comments
57.0k users