Skip to content

Commit

Permalink
add comment to clear cache bw runs
Browse files Browse the repository at this point in the history
Signed-off-by: Praateek <[email protected]>
  • Loading branch information
praateekmahajan committed Oct 30, 2024
1 parent ad6a11b commit a921c6e
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
2 changes: 1 addition & 1 deletion docs/user-guide/gpudeduplication.rst
Original file line number Diff line number Diff line change
Expand Up @@ -184,7 +184,7 @@ Python API
from nemo_curator import FuzzyDuplicatesConfig
config = FuzzyDuplicatesConfig(
cache_dir="/path/to/dedup_outputs",
cache_dir="/path/to/dedup_outputs", # must be cleared between runs
id_field="my_id",
text_field="text",
seed=42,
Expand Down
2 changes: 1 addition & 1 deletion examples/fuzzy_deduplication.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ def main(args):

dataset_dir = "/path/to/dataset"
log_dir = "./"
cache_dir = "./fuzzy_cache"
cache_dir = "./fuzzy_cache" # must be cleared between runs
output_dir = "./output"
dataset_id_field = "id"
dataset_text_field = "text"
Expand Down

0 comments on commit a921c6e

Please sign in to comment.