Create Apache Spark DataFrame in memory
Let’s say you have in-memory list and you want to create a DataFrame
from it:
id | subject |
---|---|
1 | Aloneguid |
2 | Blogging |
Using PySpark
from pyspark.sql import SparkSession
spark = (SparkSession
.builder
.master("local[1]")
.getOrCreate())
df = spark.createDataFrame([(1, "Aloneguid"), (2, "Blogging")], "id int, subject string")
df.show()
+---+---------+
| id| subject|
+---+---------+
| 1|Aloneguid|
| 2| Blogging|
+---+---------+
Using Scala
import org.apache.spark.sql.{Row, SparkSession}
import org.apache.spark.sql.types.{IntegerType, StringType, StructField, StructType}
val spark = SparkSession
.builder()
.master("local[*]")
.getOrCreate()
val data = Seq(Row(1, "Aloneguid"), Row(2, "Blogging"))
val schema = StructType(Seq(
StructField("id", IntegerType),
StructField("subject", StringType)))
val df = spark
.createDataFrame(
spark.sparkContext.parallelize(data),
schema)
df.show()
To contact me, send an email anytime or leave a comment below.