-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip reading files with incorrect extension #318
Skip reading files with incorrect extension #318
Conversation
Signed-off-by: Sarah Yurick <[email protected]>
We might need to expand the list of extensions since some files are format like |
Signed-off-by: Sarah Yurick <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]>
|
||
input_extensions = {os.path.splitext(f)[-1] for f in input_files} | ||
if len(input_extensions) != 1: | ||
raise RuntimeError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An example of when we would expect this RuntimeError
is for:
doc = DocumentDataset.read_json(in_files)
Where in_files
is a string path to a directory with multiple JSONL files and a CRC file. Since the CRC file is not explicitly being filtered out, we raise the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can leave this as is for now.
In theory there might be cases where a user filters by [.json, .jsonl]
using the file filter, but will raise errors here. In practice I expect it to be unlikely so we can wait an see if there is any user feedback around this.
nemo_curator/utils/file_utils.py
Outdated
root: str, | ||
recurse_subdirectories: bool = True, | ||
followlinks: bool = False, | ||
filter_by: Optional[Union[str, List[str]]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of these examples work:
(1)
input_files = get_all_files_paths_under(in_files, filter_by="jsonl")
input_dataset = DocumentDataset.read_json(input_files)
(2)
input_files = get_all_files_paths_under(in_files, filter_by=["jsonl"])
input_dataset = DocumentDataset.read_json(input_files)
(3)
# Returns a list containing only .jsonl, .parquet, and .csv files
input_files = get_all_files_paths_under(in_files, filter_by=["jsonl", "parquet", "csv"])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes. Overall changes lgtm! Minor nits/comments.
As a followup it might make sense to track updating tutorials/notebooks to use this newer filter arg in the api but not required for this pr.
nemo_curator/utils/file_utils.py
Outdated
if file.endswith(tuple(file_extensions)): | ||
filtered_files.append(file) | ||
else: | ||
warnings.warn(f"Skipping read for file: {file}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this might get too noisy in some cases. I'm leaning towards warning once if we have to skip, but not for every file we skip.
|
||
input_extensions = {os.path.splitext(f)[-1] for f in input_files} | ||
if len(input_extensions) != 1: | ||
raise RuntimeError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can leave this as is for now.
In theory there might be cases where a user filters by [.json, .jsonl]
using the file filter, but will raise errors here. In practice I expect it to be unlikely so we can wait an see if there is any user feedback around this.
Signed-off-by: Sarah Yurick <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]>
Thanks @ayushdg ! Updated. |
Signed-off-by: Sarah Yurick <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]>
Thank you @praateekmahajan ! I have addressed all your comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for adding type hint as well. (left two small nits on comment / typehint)
Signed-off-by: Sarah Yurick <[email protected]>
Review has been addressed, thanks!
* filter_files_by_extension function Signed-off-by: Sarah Yurick <[email protected]> * add type checking Signed-off-by: Sarah Yurick <[email protected]> * add filter_by param to get_all_files_paths_under Signed-off-by: Sarah Yurick <[email protected]> * isort Signed-off-by: Sarah Yurick <[email protected]> * address ayush's comments Signed-off-by: Sarah Yurick <[email protected]> * run black Signed-off-by: Sarah Yurick <[email protected]> * trailing whitespace Signed-off-by: Sarah Yurick <[email protected]> * more whitespace Signed-off-by: Sarah Yurick <[email protected]> * address praateek's review Signed-off-by: Sarah Yurick <[email protected]> * praateek's review Signed-off-by: Sarah Yurick <[email protected]> --------- Signed-off-by: Sarah Yurick <[email protected]> Signed-off-by: Sarah Yurick <[email protected]> Signed-off-by: Walter Teng <[email protected]>
* update obsolete flag Signed-off-by: Walter Teng <[email protected]> * build: Improve caching (#352) Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Walter Teng <[email protected]> * ci: Run on main (#354) * ci: Run gpuci on main * fix checkout Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Walter Teng <[email protected]> * ci: Run on merge commit (#355) Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Walter Teng <[email protected]> * build: Add conda env to `$PATH` (#357) * build: Add conda env to `$PATH` Signed-off-by: Oliver Koenig <[email protected]> * test Signed-off-by: Oliver Koenig <[email protected]> * add newline Signed-off-by: Oliver Koenig <[email protected]> * run cleanup always Signed-off-by: Oliver Koenig <[email protected]> --------- Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Walter Teng <[email protected]> * Add `build-test-publish-wheel` CI file (#356) * Create build-test-publish-wheel.yml Signed-off-by: Sarah Yurick <[email protected]> * Create package_info.py Signed-off-by: Sarah Yurick <[email protected]> * run black Signed-off-by: Sarah Yurick <[email protected]> * Update __init__.py Signed-off-by: Sarah Yurick <[email protected]> * Update package_info.py Signed-off-by: Sarah Yurick <[email protected]> * Update .github/workflows/build-test-publish-wheel.yml Signed-off-by: Sarah Yurick <[email protected]> * remove extra version string Signed-off-by: Sarah Yurick <[email protected]> * Update __init__.py Signed-off-by: Sarah Yurick <[email protected]> * add `__all__` Signed-off-by: Sarah Yurick <[email protected]> * Fix version Signed-off-by: oliver könig <[email protected]> * Update .github/workflows/build-test-publish-wheel.yml Signed-off-by: oliver könig <[email protected]> * Ko3n1g/sarahyurick/ci/build test publish wheel (#358) * fix * fix Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix * fix Signed-off-by: Oliver Koenig <[email protected]> * fix * fix --------- Signed-off-by: Oliver Koenig <[email protected]> * run black Signed-off-by: Sarah Yurick <[email protected]> * run isort Signed-off-by: Sarah Yurick <[email protected]> * Update __init__.py Signed-off-by: Sarah Yurick <[email protected]> * Update pyproject.toml Signed-off-by: Sarah Yurick <[email protected]> --------- Signed-off-by: Sarah Yurick <[email protected]> Signed-off-by: oliver könig <[email protected]> Signed-off-by: Oliver Koenig <[email protected]> Co-authored-by: oliver könig <[email protected]> Signed-off-by: Walter Teng <[email protected]> * Fix broken TestPyPi builder (#362) * Update build-test-publish-wheel.yml Signed-off-by: Sarah Yurick <[email protected]> * Update Dockerfile Signed-off-by: Sarah Yurick <[email protected]> * Update build-test-publish-wheel.yml Signed-off-by: Sarah Yurick <[email protected]> --------- Signed-off-by: Sarah Yurick <[email protected]> Signed-off-by: Walter Teng <[email protected]> * chore: Add `CHANGELOG.md` file (#359) * chore: Add `CHANGELOG.md` file * fix * add end of line Signed-off-by: Oliver Koenig <[email protected]> --------- Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Walter Teng <[email protected]> * Release workflow (#360) * add file Signed-off-by: Sarah Yurick <[email protected]> * trailing whitespace Signed-off-by: Sarah Yurick <[email protected]> --------- Signed-off-by: Sarah Yurick <[email protected]> Signed-off-by: Walter Teng <[email protected]> * ci: Bump release workflow to allow of `devN` semver (#366) * ci: Bump release workflow for `devN` Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> --------- Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Walter Teng <[email protected]> * ci: Add code-freeze workflow (#367) Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Walter Teng <[email protected]> * ci: Add cherry pick workflow (#368) * ci: Add cherry pick workflow Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> --------- Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Walter Teng <[email protected]> * Fix broken NeMo dependencies (#372) * add packaging Signed-off-by: Sarah Yurick <[email protected]> * move to requires Signed-off-by: Sarah Yurick <[email protected]> * move to github ci file Signed-off-by: Sarah Yurick <[email protected]> * add pin Signed-off-by: Sarah Yurick <[email protected]> * add torch Signed-off-by: Sarah Yurick <[email protected]> * add suggestion from mamba readme Signed-off-by: Sarah Yurick <[email protected]> * try github install Signed-off-by: Sarah Yurick <[email protected]> * add comma Signed-off-by: Sarah Yurick <[email protected]> * another attempt Signed-off-by: Sarah Yurick <[email protected]> * remove nemo toolkit Signed-off-by: Sarah Yurick <[email protected]> * add datasets Signed-off-by: Sarah Yurick <[email protected]> * try removing cython Signed-off-by: Sarah Yurick <[email protected]> * remove cython Signed-off-by: Sarah Yurick <[email protected]> * sentencepiece Signed-off-by: Sarah Yurick <[email protected]> * run black Signed-off-by: Sarah Yurick <[email protected]> * apply ryan's suggestion Signed-off-by: Sarah Yurick <[email protected]> --------- Signed-off-by: Sarah Yurick <[email protected]> Signed-off-by: Walter Teng <[email protected]> * ci: Bump release workflow (#373) Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Walter Teng <[email protected]> * Skip reading files with incorrect extension (#318) * filter_files_by_extension function Signed-off-by: Sarah Yurick <[email protected]> * add type checking Signed-off-by: Sarah Yurick <[email protected]> * add filter_by param to get_all_files_paths_under Signed-off-by: Sarah Yurick <[email protected]> * isort Signed-off-by: Sarah Yurick <[email protected]> * address ayush's comments Signed-off-by: Sarah Yurick <[email protected]> * run black Signed-off-by: Sarah Yurick <[email protected]> * trailing whitespace Signed-off-by: Sarah Yurick <[email protected]> * more whitespace Signed-off-by: Sarah Yurick <[email protected]> * address praateek's review Signed-off-by: Sarah Yurick <[email protected]> * praateek's review Signed-off-by: Sarah Yurick <[email protected]> --------- Signed-off-by: Sarah Yurick <[email protected]> Signed-off-by: Sarah Yurick <[email protected]> Signed-off-by: Walter Teng <[email protected]> * remove deprecated convert_str_ids args from ConnectedComponents Signed-off-by: Walter Teng <[email protected]> --------- Signed-off-by: Walter Teng <[email protected]> Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Sarah Yurick <[email protected]> Signed-off-by: oliver könig <[email protected]> Signed-off-by: Sarah Yurick <[email protected]> Co-authored-by: oliver könig <[email protected]> Co-authored-by: Sarah Yurick <[email protected]>
* filter_files_by_extension function Signed-off-by: Sarah Yurick <[email protected]> * add type checking Signed-off-by: Sarah Yurick <[email protected]> * add filter_by param to get_all_files_paths_under Signed-off-by: Sarah Yurick <[email protected]> * isort Signed-off-by: Sarah Yurick <[email protected]> * address ayush's comments Signed-off-by: Sarah Yurick <[email protected]> * run black Signed-off-by: Sarah Yurick <[email protected]> * trailing whitespace Signed-off-by: Sarah Yurick <[email protected]> * more whitespace Signed-off-by: Sarah Yurick <[email protected]> * address praateek's review Signed-off-by: Sarah Yurick <[email protected]> * praateek's review Signed-off-by: Sarah Yurick <[email protected]> --------- Signed-off-by: Sarah Yurick <[email protected]> Signed-off-by: Sarah Yurick <[email protected]> Signed-off-by: Vinay Raman <[email protected]>
* update obsolete flag Signed-off-by: Walter Teng <[email protected]> * build: Improve caching (NVIDIA#352) Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Walter Teng <[email protected]> * ci: Run on main (NVIDIA#354) * ci: Run gpuci on main * fix checkout Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Walter Teng <[email protected]> * ci: Run on merge commit (NVIDIA#355) Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Walter Teng <[email protected]> * build: Add conda env to `$PATH` (NVIDIA#357) * build: Add conda env to `$PATH` Signed-off-by: Oliver Koenig <[email protected]> * test Signed-off-by: Oliver Koenig <[email protected]> * add newline Signed-off-by: Oliver Koenig <[email protected]> * run cleanup always Signed-off-by: Oliver Koenig <[email protected]> --------- Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Walter Teng <[email protected]> * Add `build-test-publish-wheel` CI file (NVIDIA#356) * Create build-test-publish-wheel.yml Signed-off-by: Sarah Yurick <[email protected]> * Create package_info.py Signed-off-by: Sarah Yurick <[email protected]> * run black Signed-off-by: Sarah Yurick <[email protected]> * Update __init__.py Signed-off-by: Sarah Yurick <[email protected]> * Update package_info.py Signed-off-by: Sarah Yurick <[email protected]> * Update .github/workflows/build-test-publish-wheel.yml Signed-off-by: Sarah Yurick <[email protected]> * remove extra version string Signed-off-by: Sarah Yurick <[email protected]> * Update __init__.py Signed-off-by: Sarah Yurick <[email protected]> * add `__all__` Signed-off-by: Sarah Yurick <[email protected]> * Fix version Signed-off-by: oliver könig <[email protected]> * Update .github/workflows/build-test-publish-wheel.yml Signed-off-by: oliver könig <[email protected]> * Ko3n1g/sarahyurick/ci/build test publish wheel (NVIDIA#358) * fix * fix Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix * fix Signed-off-by: Oliver Koenig <[email protected]> * fix * fix --------- Signed-off-by: Oliver Koenig <[email protected]> * run black Signed-off-by: Sarah Yurick <[email protected]> * run isort Signed-off-by: Sarah Yurick <[email protected]> * Update __init__.py Signed-off-by: Sarah Yurick <[email protected]> * Update pyproject.toml Signed-off-by: Sarah Yurick <[email protected]> --------- Signed-off-by: Sarah Yurick <[email protected]> Signed-off-by: oliver könig <[email protected]> Signed-off-by: Oliver Koenig <[email protected]> Co-authored-by: oliver könig <[email protected]> Signed-off-by: Walter Teng <[email protected]> * Fix broken TestPyPi builder (NVIDIA#362) * Update build-test-publish-wheel.yml Signed-off-by: Sarah Yurick <[email protected]> * Update Dockerfile Signed-off-by: Sarah Yurick <[email protected]> * Update build-test-publish-wheel.yml Signed-off-by: Sarah Yurick <[email protected]> --------- Signed-off-by: Sarah Yurick <[email protected]> Signed-off-by: Walter Teng <[email protected]> * chore: Add `CHANGELOG.md` file (NVIDIA#359) * chore: Add `CHANGELOG.md` file * fix * add end of line Signed-off-by: Oliver Koenig <[email protected]> --------- Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Walter Teng <[email protected]> * Release workflow (NVIDIA#360) * add file Signed-off-by: Sarah Yurick <[email protected]> * trailing whitespace Signed-off-by: Sarah Yurick <[email protected]> --------- Signed-off-by: Sarah Yurick <[email protected]> Signed-off-by: Walter Teng <[email protected]> * ci: Bump release workflow to allow of `devN` semver (NVIDIA#366) * ci: Bump release workflow for `devN` Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> --------- Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Walter Teng <[email protected]> * ci: Add code-freeze workflow (NVIDIA#367) Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Walter Teng <[email protected]> * ci: Add cherry pick workflow (NVIDIA#368) * ci: Add cherry pick workflow Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> --------- Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Walter Teng <[email protected]> * Fix broken NeMo dependencies (NVIDIA#372) * add packaging Signed-off-by: Sarah Yurick <[email protected]> * move to requires Signed-off-by: Sarah Yurick <[email protected]> * move to github ci file Signed-off-by: Sarah Yurick <[email protected]> * add pin Signed-off-by: Sarah Yurick <[email protected]> * add torch Signed-off-by: Sarah Yurick <[email protected]> * add suggestion from mamba readme Signed-off-by: Sarah Yurick <[email protected]> * try github install Signed-off-by: Sarah Yurick <[email protected]> * add comma Signed-off-by: Sarah Yurick <[email protected]> * another attempt Signed-off-by: Sarah Yurick <[email protected]> * remove nemo toolkit Signed-off-by: Sarah Yurick <[email protected]> * add datasets Signed-off-by: Sarah Yurick <[email protected]> * try removing cython Signed-off-by: Sarah Yurick <[email protected]> * remove cython Signed-off-by: Sarah Yurick <[email protected]> * sentencepiece Signed-off-by: Sarah Yurick <[email protected]> * run black Signed-off-by: Sarah Yurick <[email protected]> * apply ryan's suggestion Signed-off-by: Sarah Yurick <[email protected]> --------- Signed-off-by: Sarah Yurick <[email protected]> Signed-off-by: Walter Teng <[email protected]> * ci: Bump release workflow (NVIDIA#373) Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Walter Teng <[email protected]> * Skip reading files with incorrect extension (NVIDIA#318) * filter_files_by_extension function Signed-off-by: Sarah Yurick <[email protected]> * add type checking Signed-off-by: Sarah Yurick <[email protected]> * add filter_by param to get_all_files_paths_under Signed-off-by: Sarah Yurick <[email protected]> * isort Signed-off-by: Sarah Yurick <[email protected]> * address ayush's comments Signed-off-by: Sarah Yurick <[email protected]> * run black Signed-off-by: Sarah Yurick <[email protected]> * trailing whitespace Signed-off-by: Sarah Yurick <[email protected]> * more whitespace Signed-off-by: Sarah Yurick <[email protected]> * address praateek's review Signed-off-by: Sarah Yurick <[email protected]> * praateek's review Signed-off-by: Sarah Yurick <[email protected]> --------- Signed-off-by: Sarah Yurick <[email protected]> Signed-off-by: Sarah Yurick <[email protected]> Signed-off-by: Walter Teng <[email protected]> * remove deprecated convert_str_ids args from ConnectedComponents Signed-off-by: Walter Teng <[email protected]> --------- Signed-off-by: Walter Teng <[email protected]> Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Sarah Yurick <[email protected]> Signed-off-by: oliver könig <[email protected]> Signed-off-by: Sarah Yurick <[email protected]> Co-authored-by: oliver könig <[email protected]> Co-authored-by: Sarah Yurick <[email protected]> Signed-off-by: Vinay Raman <[email protected]>
* filter_files_by_extension function Signed-off-by: Sarah Yurick <[email protected]> * add type checking Signed-off-by: Sarah Yurick <[email protected]> * add filter_by param to get_all_files_paths_under Signed-off-by: Sarah Yurick <[email protected]> * isort Signed-off-by: Sarah Yurick <[email protected]> * address ayush's comments Signed-off-by: Sarah Yurick <[email protected]> * run black Signed-off-by: Sarah Yurick <[email protected]> * trailing whitespace Signed-off-by: Sarah Yurick <[email protected]> * more whitespace Signed-off-by: Sarah Yurick <[email protected]> * address praateek's review Signed-off-by: Sarah Yurick <[email protected]> * praateek's review Signed-off-by: Sarah Yurick <[email protected]> --------- Signed-off-by: Sarah Yurick <[email protected]> Signed-off-by: Sarah Yurick <[email protected]> Signed-off-by: Rucha Apte <[email protected]>
* update obsolete flag Signed-off-by: Walter Teng <[email protected]> * build: Improve caching (NVIDIA#352) Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Walter Teng <[email protected]> * ci: Run on main (NVIDIA#354) * ci: Run gpuci on main * fix checkout Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Walter Teng <[email protected]> * ci: Run on merge commit (NVIDIA#355) Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Walter Teng <[email protected]> * build: Add conda env to `$PATH` (NVIDIA#357) * build: Add conda env to `$PATH` Signed-off-by: Oliver Koenig <[email protected]> * test Signed-off-by: Oliver Koenig <[email protected]> * add newline Signed-off-by: Oliver Koenig <[email protected]> * run cleanup always Signed-off-by: Oliver Koenig <[email protected]> --------- Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Walter Teng <[email protected]> * Add `build-test-publish-wheel` CI file (NVIDIA#356) * Create build-test-publish-wheel.yml Signed-off-by: Sarah Yurick <[email protected]> * Create package_info.py Signed-off-by: Sarah Yurick <[email protected]> * run black Signed-off-by: Sarah Yurick <[email protected]> * Update __init__.py Signed-off-by: Sarah Yurick <[email protected]> * Update package_info.py Signed-off-by: Sarah Yurick <[email protected]> * Update .github/workflows/build-test-publish-wheel.yml Signed-off-by: Sarah Yurick <[email protected]> * remove extra version string Signed-off-by: Sarah Yurick <[email protected]> * Update __init__.py Signed-off-by: Sarah Yurick <[email protected]> * add `__all__` Signed-off-by: Sarah Yurick <[email protected]> * Fix version Signed-off-by: oliver könig <[email protected]> * Update .github/workflows/build-test-publish-wheel.yml Signed-off-by: oliver könig <[email protected]> * Ko3n1g/sarahyurick/ci/build test publish wheel (NVIDIA#358) * fix * fix Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix * fix Signed-off-by: Oliver Koenig <[email protected]> * fix * fix --------- Signed-off-by: Oliver Koenig <[email protected]> * run black Signed-off-by: Sarah Yurick <[email protected]> * run isort Signed-off-by: Sarah Yurick <[email protected]> * Update __init__.py Signed-off-by: Sarah Yurick <[email protected]> * Update pyproject.toml Signed-off-by: Sarah Yurick <[email protected]> --------- Signed-off-by: Sarah Yurick <[email protected]> Signed-off-by: oliver könig <[email protected]> Signed-off-by: Oliver Koenig <[email protected]> Co-authored-by: oliver könig <[email protected]> Signed-off-by: Walter Teng <[email protected]> * Fix broken TestPyPi builder (NVIDIA#362) * Update build-test-publish-wheel.yml Signed-off-by: Sarah Yurick <[email protected]> * Update Dockerfile Signed-off-by: Sarah Yurick <[email protected]> * Update build-test-publish-wheel.yml Signed-off-by: Sarah Yurick <[email protected]> --------- Signed-off-by: Sarah Yurick <[email protected]> Signed-off-by: Walter Teng <[email protected]> * chore: Add `CHANGELOG.md` file (NVIDIA#359) * chore: Add `CHANGELOG.md` file * fix * add end of line Signed-off-by: Oliver Koenig <[email protected]> --------- Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Walter Teng <[email protected]> * Release workflow (NVIDIA#360) * add file Signed-off-by: Sarah Yurick <[email protected]> * trailing whitespace Signed-off-by: Sarah Yurick <[email protected]> --------- Signed-off-by: Sarah Yurick <[email protected]> Signed-off-by: Walter Teng <[email protected]> * ci: Bump release workflow to allow of `devN` semver (NVIDIA#366) * ci: Bump release workflow for `devN` Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> --------- Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Walter Teng <[email protected]> * ci: Add code-freeze workflow (NVIDIA#367) Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Walter Teng <[email protected]> * ci: Add cherry pick workflow (NVIDIA#368) * ci: Add cherry pick workflow Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> --------- Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Walter Teng <[email protected]> * Fix broken NeMo dependencies (NVIDIA#372) * add packaging Signed-off-by: Sarah Yurick <[email protected]> * move to requires Signed-off-by: Sarah Yurick <[email protected]> * move to github ci file Signed-off-by: Sarah Yurick <[email protected]> * add pin Signed-off-by: Sarah Yurick <[email protected]> * add torch Signed-off-by: Sarah Yurick <[email protected]> * add suggestion from mamba readme Signed-off-by: Sarah Yurick <[email protected]> * try github install Signed-off-by: Sarah Yurick <[email protected]> * add comma Signed-off-by: Sarah Yurick <[email protected]> * another attempt Signed-off-by: Sarah Yurick <[email protected]> * remove nemo toolkit Signed-off-by: Sarah Yurick <[email protected]> * add datasets Signed-off-by: Sarah Yurick <[email protected]> * try removing cython Signed-off-by: Sarah Yurick <[email protected]> * remove cython Signed-off-by: Sarah Yurick <[email protected]> * sentencepiece Signed-off-by: Sarah Yurick <[email protected]> * run black Signed-off-by: Sarah Yurick <[email protected]> * apply ryan's suggestion Signed-off-by: Sarah Yurick <[email protected]> --------- Signed-off-by: Sarah Yurick <[email protected]> Signed-off-by: Walter Teng <[email protected]> * ci: Bump release workflow (NVIDIA#373) Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Walter Teng <[email protected]> * Skip reading files with incorrect extension (NVIDIA#318) * filter_files_by_extension function Signed-off-by: Sarah Yurick <[email protected]> * add type checking Signed-off-by: Sarah Yurick <[email protected]> * add filter_by param to get_all_files_paths_under Signed-off-by: Sarah Yurick <[email protected]> * isort Signed-off-by: Sarah Yurick <[email protected]> * address ayush's comments Signed-off-by: Sarah Yurick <[email protected]> * run black Signed-off-by: Sarah Yurick <[email protected]> * trailing whitespace Signed-off-by: Sarah Yurick <[email protected]> * more whitespace Signed-off-by: Sarah Yurick <[email protected]> * address praateek's review Signed-off-by: Sarah Yurick <[email protected]> * praateek's review Signed-off-by: Sarah Yurick <[email protected]> --------- Signed-off-by: Sarah Yurick <[email protected]> Signed-off-by: Sarah Yurick <[email protected]> Signed-off-by: Walter Teng <[email protected]> * remove deprecated convert_str_ids args from ConnectedComponents Signed-off-by: Walter Teng <[email protected]> --------- Signed-off-by: Walter Teng <[email protected]> Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Sarah Yurick <[email protected]> Signed-off-by: oliver könig <[email protected]> Signed-off-by: Sarah Yurick <[email protected]> Co-authored-by: oliver könig <[email protected]> Co-authored-by: Sarah Yurick <[email protected]> Signed-off-by: Rucha Apte <[email protected]>
Closes #214.