Fully managed .NET library to read and write Apache Parquet files. Supports:
.NET 4.5
and up..NET Standard 1.4
and up (for those who are in a tank that means it supports.NET Core
(all versions) implicitly)
Runs on all flavours of Windows, Linux, MacOSX, mobile devices (iOS, Android) via Xamarin, gaming consoles or anywhere .NET Standard runs which is a lot! This is a picture of Xbox One running Parquet.Net in my old living room:
Performs integration tests with parquet-mr (original Java parquet implementation) to test for identical behavior. We are planning to add more third-party platforms integration as well.
Why
Parquet library is mostly available for Java, C++ and Python, which somewhat limits .NET/C# platform in big data applications. Whereas C# is a beautiful language (C# is just Java done right) working on all platforms and devices, we still don’t have anything good in this area. Note that ParquetSharp provides a P/Invoke wrapper around parquet-cpp library, however it’s a windows-only version with plenty of limitations around usability, is generally slower and leaks memory.
Who
Parquet.Net is used by many small and large organisations across the globe. Unfortunately I can’t list these organisations because they can’t be bothered to give legal right to do so (MIT License doesn’t oblige anyone do contribute back anything!). The official public NuGet stats are already saying it’s being used by Azure Machine Learning and ML.NET, which are both big, but I have bigger and smaller users as well.
How
Despite the size of the codebase and importance of this library, it was still created and written mostly (99.9%) by myself. Some contributions came from OSS community, and they were important, however neither a considerable chunk of work, or any financing/income was received by this project. Today I can understand the mistakes made, but can’t call them mistakes because the original idea didn’t involve earning any income, the project just because quite popular.
Performance
As of year 2020, this is the fastest Parquet library in the world not just in .NET runtime but comparing to all platforms.
Source Code
Is available on GitHub <- this is the original source code repository.
To contact me, send an email anytime or leave a comment below.