Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update fuzzy dedup test args to account for minhash algo changes. #442

Merged
merged 1 commit into from
Dec 20, 2024

Conversation

ayushdg
Copy link
Collaborator

@ayushdg ayushdg commented Dec 19, 2024

Description

24.12 moves to using the new minhash_permuted api which use different seed values and produce different minhash values. For tests with few minhashes 3 and strings
"The quick brown fox jumps over the lazy dog."
"The quick black cat jumps over the lazy dog."
Given 12 ngrams do not match each other there is a chance that a few hashes do not have any overlaps leading to failing tests.

Increased the number of minhashes to increase the probability of a collision in the tests.

Usage

Checklist

  • I am familiar with the Contributing Guide.
  • New or Existing tests cover these changes.
  • The documentation is up to date with these changes.

@ayushdg ayushdg added gpuci Run GPU CI/CD on PR bugfix Fixes a bug in the codebase labels Dec 19, 2024
@ayushdg ayushdg marked this pull request as ready for review December 20, 2024 10:58
@ayushdg ayushdg added gpuci Run GPU CI/CD on PR and removed gpuci Run GPU CI/CD on PR labels Dec 20, 2024
Copy link
Collaborator

@sarahyurick sarahyurick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@ayushdg ayushdg merged commit c929203 into NVIDIA:main Dec 20, 2024
7 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix Fixes a bug in the codebase gpuci Run GPU CI/CD on PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants