Creating Scala Uber JAR with Spark 3.1 Included
Problem
Package scala Spark application into a single .jar, to be executed as a console application elsewhere. None of the dependencies are provided.
Solution
You might known about sbt-assembly plugin that does exactly this. Unfortunately because of library documentation’s ego it’s extremely unclear how to use it. I’ve tried to use Alvin’s instruction which also didn’t work for me, hence very simple instructions here.
- Create
project/plugins.sbt
in the same folder as yourbuild.sbt
. The folder is actually calledproject
, not your project name or anything else. This is literally the files, their exact names:
|-- project
| |-- plugins.sbt
|-- build.sbt
- Set content of
plugins.sbt
to:
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.15.0")
- This is how
build.sbt
should look like:
name := "whatever"
version := "0.1"
scalaVersion := "2.12.13"
// https://mvnrepository.com/artifact/org.apache.spark/spark-sql
libraryDependencies += "org.apache.spark" %% "spark-sql" % "3.1.1"
libraryDependencies += "io.delta" %% "delta-core" % "0.8.0"
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs @ _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
Note that assemblyMergeStrategy
is required, otherwise you’ll hit another alien error, something like:
[info] set current project to delta-ingest (in build file:...)
[error] 18 errors were encountered during merge
[error] java.lang.RuntimeException: deduplicate: different file contents found in the following:
[error] C:\dev\spark-s3-raw-ingest\src\null\Coursier\cache\v1\https\repo1.maven.org\maven2\org\apache\arrow\arrow-format\2.0.0\arrow-format-2.0.0.jar:git.properties
- To generate uber jar, run
sbt assembly
:
> sbt assembly
[info] welcome to sbt 1.4.7 (Oracle Corporation Java 1.8.0_261)
[info] loading global plugins from C:\Users\ivang\.sbt\1.0\plugins
[info] loading settings for project src-build from plugins.sbt ...
[info] loading project definition from C:\dev\spark-s3-raw-ingest\src\project
[info] loading settings for project src from build.sbt ...
[info] set current project to delta-ingest (in build file:/C:/dev/spark-s3-raw-ingest/src/)
[info] Strategy 'discard' was applied to 610 files (Run the task at debug level to see details)
[info] Strategy 'first' was applied to 19 files (Run the task at debug level to see details)
[success] Total time: 52 s, completed 05-Mar-2021 14:55:46
- The resulting
.jar
is placed undertarget/scala-2.12
:
> ls
Directory: C:\dev\spark-s3-raw-ingest\src\target\scala-2.12
Mode LastWriteTime Length Name
---- ------------- ------ ----
d---- 05/03/2021 12:57 1 classes
d---- 05/03/2021 12:37 1 update
d---- 05/03/2021 14:49 1 zinc
-a--- 05/03/2021 14:55 124.80MB whatever-assembly-0.1.jar
.------.____
.-' \ ___)
.-' \\\
.-' ___ \\)
.-' / (\ |)
__ \ ( | |
/ \ \__'| |
/ \____).-'
.' / |
/ . / |
.' / \/ |
/ / \ |
/ / _|_
\ / /\ /\
\ / /__v__\
' | |
| .#|
|#. .##|
|#######|
|#######|
To contact me, send an email anytime or leave a comment below.