Sort out testcases in `aggregate.slt` #14301

logan-keede · 2025-01-25T21:38:28Z

Which issue does this PR close?

Rationale for this change

Better Readability and Navigation.

What changes are included in this PR?

Refactor of aggregate.slt to multiple files,

Are these changes tested?

cargo test --test sqllogictests -- --complete (this does not check expected behaviour of sql commands) runs without failure, using patch mentioned in #14254.

Are there any user-facing changes?

No

logan-keede · 2025-01-25T21:40:05Z

cc @Rachelint @Omega359

Rachelint · 2025-01-26T03:43:59Z

Thanks @logan-keede

I see still some failed tests exist, like:

External error: query failed: DataFusion error: Error during planning: table 'datafusion.public.float_table' not found

logan-keede · 2025-01-26T08:24:03Z

Thanks @logan-keede

I see still some failed tests exist, like:
External error: query failed: DataFusion error: Error during planning: table 'datafusion.public.float_table' not found

fixed!
thanks for your patience

buraksenn · 2025-01-26T09:10:15Z

nit: I think PR title should reflect this is sqllogictest refactor to be more descriptive

Rachelint · 2025-01-26T10:26:00Z

Main thing I am worried about is that this pr seems too large, it seems hard to ensure all exists testcases are moved rightly. Maybe we can push it forward more incrementally? I have some thoughts about it:

Only extract the common part(mainly table creatings, e.g. aggregate_test_100, d_table, test...), and the files structure after extracting may be:

-- aggregate
  -- init_data.slt.part
  -- aggregate.slt

Sort out the cases in aggregate.slt using comments firstly before moving to a specific slt, and think a way to ensure no cases are lost(I can help this step). It may be like:

aggregate.slt

#######
# Count tests
#######
...

#######
# Sum tests
#######
...

#######
# Min/Max tests
#######
...

Move and split the testcases to specific slts.

-- aggregate
  -- init_data.slt.part
  -- aggregate.slt (cases difficult to classify, or still waiting to be moved in following prs)
  -- count.slt
  -- sum.slt
  -- min/max.slt
  ...

Rachelint · 2025-01-26T10:27:53Z

Main thing I am worried about is that this pr seems too large, it seems hard to ensure all exists testcases are moved rightly. Maybe we can push it forward more incrementally? I have some thoughts about it:

* Only extract the common part(mainly table creatings, e.g. `aggregate_test_100`, `d_table`, `test`...), and the files structure after extracting may be:

-- aggregate
  -- init_data.slt.part
  -- aggregate.slt

* Sort the cases in `aggregate.slt` using comments firstly before moving to a specific `slt`, and think a way to ensure no cases are lost(I can help this step). It may be like:

aggregate.slt

#######
# Count tests
#######
...

#######
# Sum tests
#######
...

#######
# Min/Max tests
#######
...

* Move and split the testcases to specific `slt`s.

-- aggregate
  -- init_data.slt.part
  -- aggregate.slt (cases difficult to classify, or still waiting to be moved in following prs)
  -- count.slt
  -- sum.slt
  -- min/max.slt
  ...

And how about we just start from the first step in this pr? And continue pushing it forward in following prs?

Rachelint · 2025-01-26T08:56:45Z

datafusion/sqllogictest/test_files/aggregate/aggregate.slt

+
+# csv_query_bit_and_distinct
+query IIIII
+SELECT bit_and(distinct c5), bit_and(distinct c6), bit_and(distinct c7), bit_and(distinct c8), bit_and(distinct c9) FROM aggregate_test_100


It seems a part of bit_and_or_xor?

Rachelint · 2025-01-26T09:28:53Z

datafusion/sqllogictest/test_files/aggregate/string.slt

Seems switching to use data type to split rather than function?

function name is STRING_AGG maybe we should switch that. I think it might not be bad to have some split based on datatype like date_time.slt (present in this PR) even if we decide to make them redundant.

🤔 just a bit worried that, when the splitting approach becomes complex, if it will be confusing about deciding which file should we place the new testcases into

aggregate.slt already had loose subsections for eg. most of the tests in date_time.slt are made to work together(because of mix of different functions) + there is a section for postgres dialect only somewhere in original file. I doubt there would be much confusing tests left after that and we can just put them in both files.

logan-keede · 2025-01-26T11:07:41Z

agreed, I think if we are going to do this it might be better to open a new PR and start anew.

Sort the cases in aggregate.slt using comments firstly before moving to a specific slt, and think a way to ensure no cases are lost(I can help this step). It may be like:

a brutish way to solve this might be to keep aggregate.slt as a whole even after refactoring, it hardly adds a second to tests(we can add a redundancy warning to avoid misunderstanding), this way we have better navigation but also original testcases.

logan-keede · 2025-01-26T11:27:43Z

Sort the cases in aggregate.slt using comments firstly before moving to a specific slt, and think a way to ensure no cases are lost(I can help this step). It may be like:

I think an alternative to this can be to divide this PR into increments and keep a aggregate_supplement.slt file and initially it will be same as aggregate.slt but as we move out queries we subtract them from the supplement file only. This way even if we there is some query that is left in supplement we can just move it later when it is discovered by using reference from original. (We can always find it in commit history but that's a bit inconvenient)

dejavu

It seems a part of bit_and_or_xor?

I say this because this sorting might become a huge PR by itself, this approach divides this sorting into increments with PRs.
This is kind of what I am doing but in a more incremental manner with added redundancy.
@Rachelint Let me know if you think this is a bad Idea. I will open a new PR once we are on the same page.
Thanks,
Logan

Rachelint · 2025-01-26T12:08:40Z

I think an alternative to this can be to divide this PR into increments and keep a aggregate_supplement.slt file and initially it will be same as aggregate.slt but as we move out queries we subtract them from the supplement file only. This way even if we there is some query that is left in supplement we can just move it later when it is discovered by using reference from original. (We can always find it in commit history but that's a bit inconvenient

@logan-keede it is said that still one pr, but we perform extract and subtract for a specific function in one commit, and then we check it between commits?

It seems workable too, but actullay a bit inconvenient to check if large changes in one pr?

And for the extract and subtract approach, I think it seems really good.

Rachelint · 2025-01-26T13:04:09Z

I think an alternative to this can be to divide this PR into increments and keep a aggregate_supplement.slt file and initially it will be same as aggregate.slt but as we move out queries we subtract them from the supplement file only. This way even if we there is some query that is left in supplement we can just move it later when it is discovered by using reference from original. (We can always find it in commit history but that's a bit inconvenient

@logan-keede it is said that still one pr, but we perform extract and subtract for a specific function in one commit, and then we check it between commits?

It seems workable too, but actullay a bit inconvenient to check if large changes in one pr?

And for the extract and subtract approach, I think it seems really good.

Or I misunderstand? We will split it into multiple prs and we perform extract and subtract for a specific function in one pr? I think we may be on the same page if it means that.

logan-keede · 2025-01-26T13:05:26Z

Thanks for the review.

It seems workable too, but actullay a bit inconvenient to check if large changes in one pr?

I meant to divide it into multiple PRs not commits.

I will be opening a new pr with just addition of init.slt.aprt and supplement.slt shortly.

Rachelint · 2025-01-26T13:09:59Z

Thanks for the review.

It seems workable too, but actullay a bit inconvenient to check if large changes in one pr?

I meant to divide it into multiple PRs not commits.

I will be opening a new pr with just addition of init.slt.aprt and supplement.slt shortly.

Ok, thanks @logan-keede

refactor aggregate

af04147

github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Jan 25, 2025

remove falsely generated code

9caf659

logan-keede marked this pull request as draft January 26, 2025 07:33

fix: repetition and mislocation

87205fd

logan-keede marked this pull request as ready for review January 26, 2025 08:23

Rachelint changed the title ~~refactor aggregate~~ Extract common part in aggregation.slt Jan 26, 2025

Rachelint changed the title ~~Extract common part in aggregation.slt~~ Sort out testcases in aggregation.slt Jan 26, 2025

Rachelint reviewed Jan 26, 2025

View reviewed changes

logan-keede changed the title ~~Sort out testcases in aggregation.slt~~ Sort out testcases in aggregate.slt Jan 26, 2025

logan-keede closed this Jan 26, 2025

logan-keede mentioned this pull request Jan 26, 2025

start refactoring process by setting up base + init #14306

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sort out testcases in `aggregate.slt` #14301

Sort out testcases in `aggregate.slt` #14301

logan-keede commented Jan 25, 2025 •

edited by Rachelint

Loading

logan-keede commented Jan 25, 2025

Rachelint commented Jan 26, 2025 •

edited

Loading

logan-keede commented Jan 26, 2025

buraksenn commented Jan 26, 2025

Rachelint commented Jan 26, 2025 •

edited

Loading

Rachelint commented Jan 26, 2025

Rachelint Jan 26, 2025

Rachelint Jan 26, 2025

logan-keede Jan 26, 2025

Rachelint Jan 26, 2025

logan-keede Jan 26, 2025

logan-keede commented Jan 26, 2025

logan-keede commented Jan 26, 2025 •

edited

Loading

Rachelint commented Jan 26, 2025 •

edited

Loading

Rachelint commented Jan 26, 2025

logan-keede commented Jan 26, 2025

Rachelint commented Jan 26, 2025

Sort out testcases in aggregate.slt #14301

Sort out testcases in aggregate.slt #14301

Conversation

logan-keede commented Jan 25, 2025 • edited by Rachelint Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

logan-keede commented Jan 25, 2025

Rachelint commented Jan 26, 2025 • edited Loading

logan-keede commented Jan 26, 2025

buraksenn commented Jan 26, 2025

Rachelint commented Jan 26, 2025 • edited Loading

Rachelint commented Jan 26, 2025

Rachelint Jan 26, 2025

Choose a reason for hiding this comment

Rachelint Jan 26, 2025

Choose a reason for hiding this comment

logan-keede Jan 26, 2025

Choose a reason for hiding this comment

Rachelint Jan 26, 2025

Choose a reason for hiding this comment

logan-keede Jan 26, 2025

Choose a reason for hiding this comment

logan-keede commented Jan 26, 2025

logan-keede commented Jan 26, 2025 • edited Loading

Rachelint commented Jan 26, 2025 • edited Loading

Rachelint commented Jan 26, 2025

logan-keede commented Jan 26, 2025

Rachelint commented Jan 26, 2025

Sort out testcases in `aggregate.slt` #14301

Sort out testcases in `aggregate.slt` #14301

logan-keede commented Jan 25, 2025 •

edited by Rachelint

Loading

Rachelint commented Jan 26, 2025 •

edited

Loading

Rachelint commented Jan 26, 2025 •

edited

Loading

logan-keede commented Jan 26, 2025 •

edited

Loading

Rachelint commented Jan 26, 2025 •

edited

Loading