How to convert .parquet to JSON with Python

To do that, you need to pip install pyarrow pandas first, these 2 packages can achieve the task.

Here is a simple script that accepts .parquet file path as an argument, and prints JSON to standard output:

import sys
import pyarrow as pa
import pyarrow.parquet as pq
import pandas as pd

if __name__ == "__main__":

    tbl: pa.Table = pq.read_table(sys.argv[1])

    df: pd.DataFrame = tbl.to_pandas()

    j: str = df.to_json()
    print(j)

Why using pandas? You might be tempted to use .to_pydict() method on the arrow table, and then just dump it with json module, but it fails on complex datatypes. The snippet above seems to work for everything I’ve experienced so far, which is not much.


To contact me, send an email anytime or leave a comment below.