Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort out testcases in aggregate.slt #14301

Closed
wants to merge 3 commits into from

Conversation

logan-keede
Copy link
Contributor

@logan-keede logan-keede commented Jan 25, 2025

Which issue does this PR close?

Part of #13723

Rationale for this change

Better Readability and Navigation.

What changes are included in this PR?

Refactor of aggregate.slt to multiple files,

Are these changes tested?

cargo test --test sqllogictests -- --complete (this does not check expected behaviour of sql commands) runs without failure, using patch mentioned in #14254.
image

Are there any user-facing changes?

No

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Jan 25, 2025
@logan-keede
Copy link
Contributor Author

cc @Rachelint @Omega359

@Rachelint
Copy link
Contributor

Rachelint commented Jan 26, 2025

Thanks @logan-keede

I see still some failed tests exist, like:

External error: query failed: DataFusion error: Error during planning: table 'datafusion.public.float_table' not found

@logan-keede logan-keede marked this pull request as draft January 26, 2025 07:33
@logan-keede logan-keede marked this pull request as ready for review January 26, 2025 08:23
@logan-keede
Copy link
Contributor Author

Thanks @logan-keede

I see still some failed tests exist, like:

External error: query failed: DataFusion error: Error during planning: table 'datafusion.public.float_table' not found

fixed!
thanks for your patience

@buraksenn
Copy link
Contributor

nit: I think PR title should reflect this is sqllogictest refactor to be more descriptive

@Rachelint
Copy link
Contributor

Rachelint commented Jan 26, 2025

Main thing I am worried about is that this pr seems too large, it seems hard to ensure all exists testcases are moved rightly. Maybe we can push it forward more incrementally? I have some thoughts about it:

  • Only extract the common part(mainly table creatings, e.g. aggregate_test_100, d_table, test...), and the files structure after extracting may be:
-- aggregate
  -- init_data.slt.part
  -- aggregate.slt
  • Sort out the cases in aggregate.slt using comments firstly before moving to a specific slt, and think a way to ensure no cases are lost(I can help this step). It may be like:
aggregate.slt

#######
# Count tests
#######
...

#######
# Sum tests
#######
...

#######
# Min/Max tests
#######
...

  • Move and split the testcases to specific slts.
-- aggregate
  -- init_data.slt.part
  -- aggregate.slt (cases difficult to classify, or still waiting to be moved in following prs)
  -- count.slt
  -- sum.slt
  -- min/max.slt
  ...

@Rachelint
Copy link
Contributor

Main thing I am worried about is that this pr seems too large, it seems hard to ensure all exists testcases are moved rightly. Maybe we can push it forward more incrementally? I have some thoughts about it:

* Only extract the common part(mainly table creatings, e.g. `aggregate_test_100`, `d_table`, `test`...), and the files structure after extracting may be:
-- aggregate
  -- init_data.slt.part
  -- aggregate.slt
* Sort the cases in `aggregate.slt` using comments firstly before moving to a specific `slt`, and think a way to ensure no cases are lost(I can help this step). It may be like:
aggregate.slt

#######
# Count tests
#######
...

#######
# Sum tests
#######
...

#######
# Min/Max tests
#######
...
* Move and split the testcases to specific `slt`s.
-- aggregate
  -- init_data.slt.part
  -- aggregate.slt (cases difficult to classify, or still waiting to be moved in following prs)
  -- count.slt
  -- sum.slt
  -- min/max.slt
  ...

And how about we just start from the first step in this pr? And continue pushing it forward in following prs?

@Rachelint Rachelint changed the title refactor aggregate Extract common part in aggregation.slt Jan 26, 2025
@Rachelint Rachelint changed the title Extract common part in aggregation.slt Sort out testcases in aggregation.slt Jan 26, 2025

# csv_query_bit_and_distinct
query IIIII
SELECT bit_and(distinct c5), bit_and(distinct c6), bit_and(distinct c7), bit_and(distinct c8), bit_and(distinct c9) FROM aggregate_test_100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a part of bit_and_or_xor?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems switching to use data type to split rather than function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

function name is STRING_AGG maybe we should switch that. I think it might not be bad to have some split based on datatype like date_time.slt (present in this PR) even if we decide to make them redundant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 just a bit worried that, when the splitting approach becomes complex, if it will be confusing about deciding which file should we place the new testcases into

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aggregate.slt already had loose subsections for eg. most of the tests in date_time.slt are made to work together(because of mix of different functions) + there is a section for postgres dialect only somewhere in original file. I doubt there would be much confusing tests left after that and we can just put them in both files.

@logan-keede logan-keede changed the title Sort out testcases in aggregation.slt Sort out testcases in aggregate.slt Jan 26, 2025
@logan-keede
Copy link
Contributor Author

agreed, I think if we are going to do this it might be better to open a new PR and start anew.

  • Sort the cases in aggregate.slt using comments firstly before moving to a specific slt, and think a way to ensure no cases are lost(I can help this step). It may be like:

a brutish way to solve this might be to keep aggregate.slt as a whole even after refactoring, it hardly adds a second to tests(we can add a redundancy warning to avoid misunderstanding), this way we have better navigation but also original testcases.

@logan-keede
Copy link
Contributor Author

logan-keede commented Jan 26, 2025

  • Sort the cases in aggregate.slt using comments firstly before moving to a specific slt, and think a way to ensure no cases are lost(I can help this step). It may be like:

I think an alternative to this can be to divide this PR into increments and keep a aggregate_supplement.slt file and initially it will be same as aggregate.slt but as we move out queries we subtract them from the supplement file only. This way even if we there is some query that is left in supplement we can just move it later when it is discovered by using reference from original. (We can always find it in commit history but that's a bit inconvenient)

dejavu

It seems a part of bit_and_or_xor?

I say this because this sorting might become a huge PR by itself, this approach divides this sorting into increments with PRs.
This is kind of what I am doing but in a more incremental manner with added redundancy.
@Rachelint Let me know if you think this is a bad Idea. I will open a new PR once we are on the same page.
Thanks,
Logan

@Rachelint
Copy link
Contributor

Rachelint commented Jan 26, 2025

I think an alternative to this can be to divide this PR into increments and keep a aggregate_supplement.slt file and initially it will be same as aggregate.slt but as we move out queries we subtract them from the supplement file only. This way even if we there is some query that is left in supplement we can just move it later when it is discovered by using reference from original. (We can always find it in commit history but that's a bit inconvenient

@logan-keede it is said that still one pr, but we perform extract and subtract for a specific function in one commit, and then we check it between commits?

It seems workable too, but actullay a bit inconvenient to check if large changes in one pr?

And for the extract and subtract approach, I think it seems really good.

@Rachelint
Copy link
Contributor

I think an alternative to this can be to divide this PR into increments and keep a aggregate_supplement.slt file and initially it will be same as aggregate.slt but as we move out queries we subtract them from the supplement file only. This way even if we there is some query that is left in supplement we can just move it later when it is discovered by using reference from original. (We can always find it in commit history but that's a bit inconvenient

@logan-keede it is said that still one pr, but we perform extract and subtract for a specific function in one commit, and then we check it between commits?

It seems workable too, but actullay a bit inconvenient to check if large changes in one pr?

And for the extract and subtract approach, I think it seems really good.

Or I misunderstand? We will split it into multiple prs and we perform extract and subtract for a specific function in one pr? I think we may be on the same page if it means that.

@logan-keede
Copy link
Contributor Author

Thanks for the review.

It seems workable too, but actullay a bit inconvenient to check if large changes in one pr?

I meant to divide it into multiple PRs not commits.

I will be opening a new pr with just addition of init.slt.aprt and supplement.slt shortly.

@Rachelint
Copy link
Contributor

Thanks for the review.

It seems workable too, but actullay a bit inconvenient to check if large changes in one pr?

I meant to divide it into multiple PRs not commits.

I will be opening a new pr with just addition of init.slt.aprt and supplement.slt shortly.

Ok, thanks @logan-keede

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants