-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add READMEs to examples/
and nemo_curator/scripts
directories
#332
Conversation
Signed-off-by: Sarah Yurick <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]>
|
||
| Python Command | CLI Command | | ||
|------------------------------------------|--------------------------------| | ||
| python add_id.py | add_id | | ||
| python blend_datasets.py | blend_datasets | | ||
| python download_and_extract.py | download_and_extract | | ||
| python filter_documents.py | filter_documents | | ||
| python find_exact_duplicates.py | gpu_exact_dups | | ||
| python find_matching_ngrams.py | find_matching_ngrams | | ||
| python find_pii_and_deidentify.py | deidentify | | ||
| python get_common_crawl_urls.py | get_common_crawl_urls | | ||
| python get_wikipedia_urls.py | get_wikipedia_urls | | ||
| python make_data_shards.py | make_data_shards | | ||
| python prepare_fasttext_training_data.py | prepare_fasttext_training_data | | ||
| python prepare_task_data.py | prepare_task_data | | ||
| python remove_matching_ngrams.py | remove_matching_ngrams | | ||
| python separate_by_metadata.py | separate_by_metadata | | ||
| python text_cleaning.py | text_cleaning | | ||
| python train_fasttext.py | train_fasttext | | ||
| python verify_classification_results.py | verify_classification_results | | ||
|
||
For more information about the arguments needed for each script, you can use `add_id --help`, etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love this so much. I will do a review later but wanted to share that this is super helpful.
Signed-off-by: Sarah Yurick <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for making this clean and useable for us @sarahyurick
…IDIA#332) * save progress Signed-off-by: Sarah Yurick <[email protected]> * add remaining docs Signed-off-by: Sarah Yurick <[email protected]> * add titles and table Signed-off-by: Sarah Yurick <[email protected]> * remove trailing whitespace Signed-off-by: Sarah Yurick <[email protected]> * add --help instructions Signed-off-by: Sarah Yurick <[email protected]> --------- Signed-off-by: Sarah Yurick <[email protected]> Signed-off-by: Rucha Apte <[email protected]>
Closes #108.