-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write a blog post fast Vectorized grouping for high cardinality #6988
Comments
I have drafted a blog about this with @tustvold and @Dandandan -- it will be published on the InfluxData blog first and then I will propose reposting it on the arrow blog site. I expect to have a draft up later this week |
here is a blog we wrote about how to do high cardinality grouping really fast: https://www.influxdata.com/blog/aggregating-millions-groups-fast-apache-arrow-datafusion/ I will propose a PR to cross-post the content to the arrow blog as well in the coming days |
PR on arrow-site ready: apache/arrow-site#386 |
…ion 28.0.0 (#386) Closes apache/datafusion#6988 **Note**: This describes work @tustvold @Dandandan and I did in DataFusion 28.0.0. This content was originally published on the [InfluxData Blog](https://www.influxdata.com/blog/aggregating-millions-groups-fast-apache-arrow-datafusion/) but since it is general applicable to Apache Arrow DataFusion I would like to syndicate it here becase: 1. This is a form where the community can comment / keep it up to date via PR 2. It is hosted on a platform with a different lifetime than a company blog This is the same model we followed with https://arrow.apache.org/blog/2022/12/26/querying-parquet-with-millisecond-latency/ which was also republished on the arrow blog after the InfluxData blog It also gives me an example to use my original ASCII art diagrams :)
It is now re-published on https://arrow.apache.org/blog/2023/08/05/datafusion_fast_grouping/ ✅ |
The idea here is to write a blog post explaining / motivating the improvement in DataFusion grouping made in #6904
The text was updated successfully, but these errors were encountered: