Skip to content

Commit

Permalink
Fix a few bugs in fuzzy dedup and docs (NVIDIA#156)
Browse files Browse the repository at this point in the history
* Fix arg type

Signed-off-by: Ryan Wolf <[email protected]>

* Fix arg in docs

Signed-off-by: Ryan Wolf <[email protected]>

---------

Signed-off-by: Ryan Wolf <[email protected]>
  • Loading branch information
ryantwolf authored Jul 29, 2024
1 parent 9c50fb0 commit e654281
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
2 changes: 1 addition & 1 deletion docs/user-guide/gpudeduplication.rst
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@ steps (all scripts are included in the :code:`nemo_curator/scripts/` subdirector
# same as `python connected_components.py`
gpu_connected_component \
--jaccard-pairs_path /path/to/dedup_output/jaccard_similarity_results.parquet \
--jaccard-pairs-path /path/to/dedup_output/jaccard_similarity_results.parquet \
--output-dir /path/to/dedup_output \
--cache-dir /path/to/cc_cache \
--jaccard-threshold 0.8
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ def attach_args(parser=None):
)
parser.add_argument(
"--jaccard-threshold",
type=int,
type=float,
default=0.8,
help="Jaccard threshold below which we don't consider documents"
" to be duplicate",
Expand Down

0 comments on commit e654281

Please sign in to comment.