-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix gpu_exact_dups
due to deprecated flag
#351
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this, the PR looks good to me, you will have to sign your commits though, using DCO though. A
Lines 76 to 85 in 93b7922
1) Make sure your PR does one thing. Have a clear answer to "What does this PR do?". | |
2) Read General Principles and style guide above | |
3) Make sure you sign your commits. E.g. use ``git commit -sS`` when committing. | |
1) If you forget to do this, please follow the steps below to undo the commits and reapply the changes under a new (signed and signed-off) commit. Note: This will preserve your changes, but delete the git history of commits. | |
```bash | |
git reset --soft HEAD~N | |
git add <insert all files you want to include> | |
git commit -sS -m "My commit message" | |
git push --force | |
``` |
See below for learning how to fix this:
https://github.com/NVIDIA/NeMo-Curator/pull/351/checks?check_run_id=32854401493
Hi @VibhuJawa, I have done the sign-off. lmk if there is any other issues. Thank you! |
@davzoku , Its still missing them to signed, i think we missed the https://docs.github.com/en/authentication/managing-commit-signature-verification/signing-commits |
Signed-off-by: Walter Teng <[email protected]>
@VibhuJawa, the commit should be signed now |
@davzoku , Thanks a lot for your contribution, merged the fix in . Appreciate it. |
Signed-off-by: Walter Teng <[email protected]> Signed-off-by: Vinay Raman <[email protected]>
@VibhuJawa @davzoku Is this update applied to the docker image? With the docker image of nvcr.io/nvidia/nemo:24.09, the exact duplicate still not work |
Signed-off-by: Walter Teng <[email protected]> Signed-off-by: Rucha Apte <[email protected]>
Description
This PR is to fix
gpu_exact_dups
mention in this issue, #350Referencing these 2 commits:
I noticed the
args.no_gpu
is a deprecated argument, however during the refactoring process, one of this argument is being left out. This blocksgpu_exact_dups
from executing correctly.Usage
Prereqs:
add_id is executed on the jsonl to add id for dedup
existing .jsonl files in file directory eg.books_dedup/
Checklist
Results
Notice that after this commit, we can run
data:image/s3,"s3://crabby-images/7bd36/7bd3633537f47f8e282d0ddcf78f5e5a9e2c1939" alt="image"
gpu_exact_dups
to completion.