• Computer Science > Databases [Submitted on 19 Feb 2026] Title:Do GPUs Really Need New Tabular File Formats? • View PDF HTML (experimental)Abstract:Parquet is the de facto columnar file format in modern analytical systems, yet its configuration guidelines have largely been shaped by CPU-centric execution models. • As GPU-accelerated data processing becomes increasingly prevalent, Parquet files generated with CPU-oriented defaults can severely underutilize GPU parallelism, turning GPU scans into a performance bottleneck. • In this work, we systematically study how Parquet configurations affect GPU scan performance. • We show that Parquet’s poor GPU performance is not inherent to the format itself but rather a consequence of suboptimal configuration choices. • By applying GPU-aware configurations, we increase effective read bandwidth up to 125 GB/s without modifying the Parquet specification.
Article Summaries:
- Computer Science > Databases [Submitted on 19 Feb 2026] Title:Do GPUs Really Need New Tabular File Formats? View PDF HTML (experimental)Abstract:Parquet is the de facto columnar file format in modern analytical systems, yet its configuration guidelines have largely been shaped by CPU-centric execution models. As GPU-accelerated data processing becomes increasingly prevalent, Parquet files generated with CPU-oriented defaults can severely underutilize GPU parallelism, turning GPU scans into a performance bottleneck. In this work, we systematically study how Parquet configurations affect GPU sca
Sources: