Creating Scala Uber JAR with Spark 3.1 Included

Problem

Package scala Spark application into a single .jar, to be executed as a console application elsewhere. None of the dependencies are provided.

Solution

You might known about sbt-assembly plugin that does exactly this. Unfortunately because of library documentation’s ego it’s extremely unclear how to use it. I’ve tried to use Alvin’s instruction which also didn’t work for me, hence very simple instructions here.

  1. Create project/plugins.sbt in the same folder as your build.sbt. The folder is actually called project, not your project name or anything else. This is literally the files, their exact names:
|-- project
|   |-- plugins.sbt
|-- build.sbt
  1. Set content of plugins.sbt to:
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.15.0")
  1. This is how build.sbt should look like:
name := "whatever"

version := "0.1"

scalaVersion := "2.12.13"

// https://mvnrepository.com/artifact/org.apache.spark/spark-sql
libraryDependencies += "org.apache.spark" %% "spark-sql" % "3.1.1"

libraryDependencies += "io.delta" %% "delta-core" % "0.8.0"

assemblyMergeStrategy in assembly := {
  case PathList("META-INF", xs @ _*) => MergeStrategy.discard
  case x => MergeStrategy.first
}

Note that assemblyMergeStrategy is required, otherwise you’ll hit another alien error, something like:

[info] set current project to delta-ingest (in build file:...)
[error] 18 errors were encountered during merge
[error] java.lang.RuntimeException: deduplicate: different file contents found in the following:
[error] C:\dev\spark-s3-raw-ingest\src\null\Coursier\cache\v1\https\repo1.maven.org\maven2\org\apache\arrow\arrow-format\2.0.0\arrow-format-2.0.0.jar:git.properties
  1. To generate uber jar, run sbt assembly:
> sbt assembly
[info] welcome to sbt 1.4.7 (Oracle Corporation Java 1.8.0_261)
[info] loading global plugins from C:\Users\ivang\.sbt\1.0\plugins
[info] loading settings for project src-build from plugins.sbt ...
[info] loading project definition from C:\dev\spark-s3-raw-ingest\src\project
[info] loading settings for project src from build.sbt ...
[info] set current project to delta-ingest (in build file:/C:/dev/spark-s3-raw-ingest/src/)
[info] Strategy 'discard' was applied to 610 files (Run the task at debug level to see details)
[info] Strategy 'first' was applied to 19 files (Run the task at debug level to see details)
[success] Total time: 52 s, completed 05-Mar-2021 14:55:46
  1. The resulting .jar is placed under target/scala-2.12:
> ls
    Directory:  C:\dev\spark-s3-raw-ingest\src\target\scala-2.12

Mode                LastWriteTime     Length Name
----                -------------     ------ ----
d----        05/03/2021     12:57        1   classes
d----        05/03/2021     12:37        1   update
d----        05/03/2021     14:49        1   zinc
-a---        05/03/2021     14:55   124.80MB whatever-assembly-0.1.jar
            .------.____
         .-'       \ ___)
      .-'         \\\
   .-'        ___  \\)
.-'          /  (\  |)
         __  \  ( | |
        /  \  \__'| |
       /    \____).-'
     .'       /   |
    /     .  /    |
  .'     / \/     |
 /      /   \     |
       /    /    _|_
       \   /    /\ /\
        \ /    /__v__\
         '    |       |
              |     .#|
              |#.  .##|
              |#######|
              |#######|


To contact me, send an email anytime or leave a comment below.