Parquet.Net: Pure Apache Parquet Port to .NET

Fully managed .NET library to read and write Apache Parquet files. Supports:

Runs on all flavours of Windows, Linux, MacOSX, mobile devices (iOS, Android) via Xamarin, gaming consoles or anywhere .NET Standard runs which is a lot! This is a picture of Xbox One running Parquet.Net in my old living room:

Performs integration tests with parquet-mr (original Java parquet implementation) to test for identical behavior. We are planning to add more third-party platforms integration as well.

Why

Parquet library is mostly available for Java, C++ and Python, which somewhat limits .NET/C# platform in big data applications. Whereas C# is a beautiful language (C# is just Java done right) working on all platforms and devices, we still don’t have anything good in this area. Note that ParquetSharp provides a P/Invoke wrapper around parquet-cpp library, however it’s a windows-only version with plenty of limitations around usability, is generally slower and leaks memory.

Who

Parquet.Net is used by many small and large organisations across the globe. Unfortunately I can’t list these organisations because they can’t be bothered to give legal right to do so (MIT License doesn’t oblige anyone do contribute back anything!). The official public NuGet stats are already saying it’s being used by Azure Machine Learning and ML.NET, which are both big, but I have bigger and smaller users as well.

How

Despite the size of the codebase and importance of this library, it was still created and written mostly (99.9%) by myself. Some contributions came from OSS community, and they were important, however neither a considerable chunk of work, or any financing/income was received by this project. Today I can understand the mistakes made, but can’t call them mistakes because the original idea didn’t involve earning any income, the project just because quite popular.

Performance

As of year 2020, this is the fastest Parquet library in the world not just in .NET runtime but comparing to all platforms.

Source Code

Is available on GitHub <- this is the original source code repository.