
Embedding user-defined indexes in Apache Parquet
created: July 14, 2025, 4:29 p.m. | updated: July 15, 2025, 12:51 p.m.
In fact, footer metadata and offset-based addressing already provide everything needed to embed user-defined index structures within Parquet files without breaking compatibility with other Parquet readers.
However, Parquet is extensible with user-defined indexes: Parquet tolerates unknown bytes within the file body and permits arbitrary key/value pairs in its footer metadata.
Parquet File Anatomy & Standard Index StructuresLogically, Parquet files contain row groups, each with column chunks, which in turn contain data pages.
Example: Embedding a User Defined Distinct Value Index in Parquet FilesThis section demonstrates how to embed a simple distinct value index in Parquet files and use it for file-level pruning (skipping) in DataFusion.
Introduction to Distinct Value IndexesA distinct value index stores the unique values of a specific column.
1 day, 9 hours ago: Hacker News