Spark - Export DataFrame Schema, and then Import it Later.

Problem

During some execution I’ve ended up with a DataFrame which has a very specific schema I don’t know beforehand. I’d like to export this schema to disk, to be able to use it later.

Solution

Export Schema as JSON

json: str = df.schema.json()

Then save it somewhere.

Import Schema from JSON

import json
from pyspark.sql.types import StructType

json_object = json.loads(json_text)

schema = StructType.fromJson(json_object)


To contact me, send an email anytime or leave a comment below.