Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve grouping performance via better vectorization in accumulate functions #6954

Closed
wants to merge 4 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -139,10 +139,14 @@ impl NullState {
// no nulls, no filter,
(false, None) => {
let iter = group_indices.iter().zip(data.iter());

for (&group_index, &new_value) in iter {
seen_values.set_bit(group_index, true);
value_fn(group_index, new_value);
}
// update seen values in separate loop
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the idea is to use a separate loop

Note we can skip this update entirely once all the groups have been seen -- I will explore if I can figure out how to do that next.

for &group_index in group_indices.iter() {
seen_values.set_bit(group_index, true);
Copy link
Contributor

@Dandandan Dandandan Jul 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we assume seen_values is true for every group_index for "no nulls, no filter" until we observed a first null-value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is an excellent point. I will try and take that into account

}
}
// nulls, no filter
(true, None) => {
Expand All @@ -157,6 +161,7 @@ impl NullState {
let data_remainder = data_chunks.remainder();

group_indices_chunks
.clone()
.zip(data_chunks)
.zip(bit_chunks.iter())
.for_each(|((group_index_chunk, data_chunk), mask)| {
Expand All @@ -167,14 +172,28 @@ impl NullState {
// valid bit was set, real value
let is_valid = (mask & index_mask) != 0;
if is_valid {
seen_values.set_bit(group_index, true);
value_fn(group_index, new_value);
}
index_mask <<= 1;
},
)
});

group_indices_chunks.zip(bit_chunks.iter()).for_each(
|(group_index_chunk, mask)| {
// index_mask has value 1 << i in the loop
let mut index_mask = 1;
group_index_chunk.iter().for_each(|&group_index| {
// valid bit was set, real value
let is_valid = (mask & index_mask) != 0;
if is_valid {
seen_values.set_bit(group_index, true);
}
index_mask <<= 1;
})
},
);

// handle any remaining bits (after the initial 64)
let remainder_bits = bit_chunks.remainder_bits();
group_indices_remainder
Expand All @@ -184,10 +203,17 @@ impl NullState {
.for_each(|(i, (&group_index, &new_value))| {
let is_valid = remainder_bits & (1 << i) != 0;
if is_valid {
seen_values.set_bit(group_index, true);
value_fn(group_index, new_value);
}
});
group_indices_remainder.iter().enumerate().for_each(
|(i, &group_index)| {
let is_valid = remainder_bits & (1 << i) != 0;
if is_valid {
seen_values.set_bit(group_index, true);
}
},
);
}
// no nulls, but a filter
(false, Some(filter)) => {
Expand All @@ -201,10 +227,17 @@ impl NullState {
.zip(filter.iter())
.for_each(|((&group_index, &new_value), filter_value)| {
if let Some(true) = filter_value {
seen_values.set_bit(group_index, true);
value_fn(group_index, new_value);
}
})
});

group_indices.iter().zip(filter.iter()).for_each(
|(&group_index, filter_value)| {
if let Some(true) = filter_value {
seen_values.set_bit(group_index, true);
}
},
)
}
// both null values and filters
(true, Some(filter)) => {
Expand Down