Skip to content

Commit

Permalink
Update deduplication docs (NVIDIA#258)
Browse files Browse the repository at this point in the history
* initial exact dedup doc updates

Signed-off-by: Ayush Dattagupta <[email protected]>

* More fuzzy dedup doc updates

Signed-off-by: Ayush Dattagupta <[email protected]>

* more updates

Signed-off-by: Ayush Dattagupta <[email protected]>

* Add semdedup to GPU modules

Signed-off-by: Ayush Dattagupta <[email protected]>

* Apply suggestions from code review

Co-authored-by: Sarah Yurick <[email protected]>
Signed-off-by: Ayush Dattagupta <[email protected]>

* Address reviews

Signed-off-by: Ayush Dattagupta <[email protected]>

* Apply suggestions from code review

Co-authored-by: Sarah Yurick <[email protected]>
Signed-off-by: Ayush Dattagupta <[email protected]>

* address more review comments

Signed-off-by: Ayush Dattagupta <[email protected]>

* Apply suggestions from code review

Co-authored-by: Sarah Yurick <[email protected]>
Signed-off-by: Ayush Dattagupta <[email protected]>

* Fix position of message for exact dedup api

Signed-off-by: Ayush Dattagupta <[email protected]>

* Add id field param to cli scripts

Signed-off-by: Ayush Dattagupta <[email protected]>

---------

Signed-off-by: Ayush Dattagupta <[email protected]>
Co-authored-by: Sarah Yurick <[email protected]>
  • Loading branch information
ayushdg and sarahyurick authored Oct 14, 2024
1 parent b5f6827 commit f130aed
Show file tree
Hide file tree
Showing 2 changed files with 361 additions and 116 deletions.
1 change: 1 addition & 0 deletions docs/user-guide/cpuvsgpu.rst
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ The following NeMo Curator modules are GPU based.

* Exact Deduplication
* Fuzzy Deduplication
* Semantic Deduplication
* Distributed Data Classification

* Domain Classification
Expand Down
Loading

0 comments on commit f130aed

Please sign in to comment.