Spark - Export DataFrame Schema, and then Import it Later.
Problem
During some execution I’ve ended up with a DataFrame
which has a very specific schema I don’t know beforehand. I’d like to export this schema to disk, to be able to use it later.
Solution
Export Schema as JSON
json: str = df.schema.json()
Then save it somewhere.
Import Schema from JSON
import json
from pyspark.sql.types import StructType
json_object = json.loads(json_text)
schema = StructType.fromJson(json_object)
To contact me, send an email anytime or leave a comment below.