Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor updates to duplicate removal #570

Merged
merged 2 commits into from
Feb 25, 2025

Conversation

sarahyurick
Copy link
Collaborator

Fixes a few nits from #509.

Comment on lines 275 to 277
if not duplicates_to_remove:
return None
warnings.warn("No fuzzy duplicates to remove, returning original dataset")
return dataset
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main thing I wanted to change with this PR is to return the original dataset when there are no duplicates, instead of returning None.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also prefer removing the Dataset, but I think @ayushdg might have some thoughts
#326 (comment)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment is a bit different than the change I am making here. This does not return an empty dataset, it returns the original dataset (AKA a dataset with no duplicates removed).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it sounds like identify will still retain the behavior of returning None but remove might default to returning the original dataset.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that is correct.

Copy link
Collaborator

@praateekmahajan praateekmahajan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Signed-off-by: Sarah Yurick <[email protected]>
@sarahyurick sarahyurick added the gpuci Run GPU CI/CD on PR label Feb 25, 2025
@sarahyurick sarahyurick merged commit a080400 into NVIDIA:main Feb 25, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gpuci Run GPU CI/CD on PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants