To do that, you need to
pip install pyarrow pandas first, these 2 packages can achieve the task.
Here is a simple script that accepts
.parquet file path as an argument, and prints JSON to standard output:
import sys import pyarrow as pa import pyarrow.parquet as pq import pandas as pd if __name__ == "__main__": tbl: pa.Table = pq.read_table(sys.argv) df: pd.DataFrame = tbl.to_pandas() j: str = df.to_json() print(j)
Why using pandas? You might be tempted to use
.to_pydict() method on the arrow table, and then just dump it with
json module, but it fails on complex datatypes. The snippet above seems to work for everything I’ve experienced so far, which is not much.
To contact me, send an email anytime or leave a comment below.