Set up Standalone Scala SBT Application with Delta Lake

Setting it up is relatively easy, and the only issue you might face is that certain delta lake dependencies are not compatible with certain versions of spark. There is a compatibility matrix:

Delta Lake version Apache Spark version
1.1 3.2.x
1.0.x 3.1.x
0.7.x and 0.8.x 3.0.x
Below 0.7.0 2.4.2 - 2.4.

Sample working build.sbt

name := "myproject"

version := "0.1"

scalaVersion := "2.12.15"
val sparkVersion = "3.2.0"

libraryDependencies += "org.apache.spark" %% "spark-sql" % sparkVersion

libraryDependencies += "io.delta" %% "delta-core" % "1.1.0"

And minimal code:

package uk.aloneguid.myproject

import org.apache.spark.sql.{Row, SaveMode, SparkSession}
import org.apache.spark.sql.types.{IntegerType, StringType, StructField, StructType}
import io.delta.tables._

object Delta {
  def main(args: Array[String]): Unit = {
    val spark = SparkSession
      .builder()
      .master("local[1]")
      .getOrCreate()

    val data = Seq(Row(1, "Aloneguid"), Row(2, "Blogging"))

    val schema = StructType(Seq(
      StructField("id", IntegerType),
      StructField("subject", StringType)))

    val df = spark
      .createDataFrame(
        spark.sparkContext.parallelize(data),
        schema)

    df.write.format("delta").mode(SaveMode.Overwrite).save("c://tmp//delta.test")

    df.show()
  }
}

Have a question⁉ Contact me.