I had encountered a very similar problem.
Our build was similar to yours (but we used sbt
) and is described in detail here: https://stackoverflow.com/a/45479379/1549135
Running this solution locally works fine, but then spark-submit
would ignore all the exclusions and new logging framework (logback
) because spark's classpath has priority over the deployed jar. And since it contains log4j 1.2.xx
it would simply load it and ignore our setup.
Solution
I have used several sources. But quoting Spark 1.6.1 docs (applies to Spark latest / 2.2.0 as well):
spark.driver.extraClassPath
Extra classpath entries to prepend to the classpath of the driver.
Note: In client mode, this config must not be set through the SparkConf directly in your application, because the driver JVM has already started at that point. Instead, please set this through the --driver-class-path command line option or in your default properties file.
spark.executor.extraClassPath
Extra classpath entries to prepend to the classpath of executors. This exists primarily for backwards-compatibility with older versions of Spark. Users typically should not need to set this option.
What is not written here, though is that extraClassPath
takes precedence before default Spark's classpath!
So now the solution should be quite obvious.
1. Download those jars:
- log4j-over-slf4j-1.7.25.jar
- logback-classic-1.2.3.jar
- logback-core-1.2.3.jar
2. Run the spark-submit
:
libs="/absolute/path/to/libs/*"
spark-submit
...
--master yarn
--conf "spark.driver.extraClassPath=$libs"
--conf "spark.executor.extraClassPath=$libs"
...
/my/application/application-fat.jar
param1 param2
I am just not yet sure if you can put those jars on HDFS. We have them locally next to the application jar.
userClassPathFirst
Strangely enough, using Spark 1.6.1
I have also found this option in docs:
spark.driver.userClassPathFirst, spark.executor.userClassPathFirst
(Experimental) Whether to give user-added jars precedence over Spark's own jars when loading classes in the the driver. This feature can be used to mitigate conflicts between Spark's dependencies and user dependencies. It is currently an experimental feature. This is used in cluster mode only.
But simply setting:
--conf "spark.driver.userClassPathFirst=true"
--conf "spark.executor.userClassPathFirst=true"
Did not work for me. So I am gladly using extraClassPath
!
Cheers!
Loading logback.xml
If you face any problems loading logback.xml
to Spark, my question here might help you out:
Pass system property to spark-submit and read file from classpath or custom path