Announcing support for GROUP BY, SUM, and other aggregation queries in R2 SQL

• When youâre dealing with large amounts of data, itâs helpful to get a quick overview â which is exactly what aggregations provide in SQL. • Aggregations, known as âGROUP BY queriesâ, provide a birdâs eye view, so you can quickly gain insights from vast volumes of data. • Thatâs why we are excited to announce support for aggregations in R2 SQL, Cloudflare’s serverless, distributed, analytics query engine, which is capable of running SQL queries over data stored in R2 Data Catalog. • Aggregations will allow users of R2 SQL to spot important trends and changes in the data, generate reports and find anomalies in logs. • This release builds on the already supported filter queries, which are foundational for analytical workloads, and allow users to find needles in haystacks of Apache Parquet files. • In this post, weâll unpack the utility and quirks of aggregations, and then dive into how we extended R2 SQL to support running such queries over vast amounts of data stored in R2 Data Catalog.

Article Summaries:

When youâre dealing with large amounts of data, itâs helpful to get a quick overview â which is exactly what aggregations provide in SQL. Aggregations, known as âGROUP BY queriesâ, provide a birdâs eye view, so you can quickly gain insights from vast volumes of data. Thatâs why we are excited to announce support for aggregations in R2 SQL, Cloudflare’s serverless, distributed, analytics query engine, which is capable of running SQL queries over data stored in R2 Data Catalog. Aggregations will allow users of R2 SQL to spot important trends and changes in the data, generate reports and find ano
Cloudflare has added full aggregation support to its R2 SQL engine, enabling users to run GROUP BY, SUM, COUNT, ORDER BY, and HAVING clauses on data stored in the R2 Data Catalog. The update lets analysts generate concise reports, spot trends, and detect anomalies across large Parquet datasets without moving data. R2 SQL now performs a two‑phase execution: first computing aggregate columns, then applying filters or sorting. This builds on the engine’s existing filter‑query capabilities and expands its analytical workload support, making it easier to summarize and explore vast volumes of data directly in the cloud.

Sources:

https://blog.cloudflare.com/r2-sql-aggregations/