Set up Standalone Scala SBT Application with Delta Lake
Setting it up is relatively easy, and the only issue you might face is that certain delta lake dependencies are not compatible with certain versions of spark. There is a compatibility matrix:
Delta Lake version | Apache Spark version |
---|---|
1.1 | 3.2.x |
1.0.x | 3.1.x |
0.7.x and 0.8.x | 3.0.x |
Below 0.7.0 | 2.4.2 - 2.4. |
Sample working build.sbt
name := "myproject"
version := "0.1"
scalaVersion := "2.12.15"
val sparkVersion = "3.2.0"
libraryDependencies += "org.apache.spark" %% "spark-sql" % sparkVersion
libraryDependencies += "io.delta" %% "delta-core" % "1.1.0"
And minimal code:
package uk.aloneguid.myproject
import org.apache.spark.sql.{Row, SaveMode, SparkSession}
import org.apache.spark.sql.types.{IntegerType, StringType, StructField, StructType}
import io.delta.tables._
object Delta {
def main(args: Array[String]): Unit = {
val spark = SparkSession
.builder()
.master("local[1]")
.getOrCreate()
val data = Seq(Row(1, "Aloneguid"), Row(2, "Blogging"))
val schema = StructType(Seq(
StructField("id", IntegerType),
StructField("subject", StringType)))
val df = spark
.createDataFrame(
spark.sparkContext.parallelize(data),
schema)
df.write.format("delta").mode(SaveMode.Overwrite).save("c://tmp//delta.test")
df.show()
}
}
Thanks! You can always email me or use contact form for more questions/comments etc.