Skip to content
This repository has been archived by the owner on Sep 30, 2024. It is now read-only.

Fix FileNotFound error for Parquet files in S3 #430

Merged
merged 5 commits into from
Oct 31, 2023

Conversation

esheehan-gsl
Copy link
Contributor

No description provided.

We need to exercise this function against different storage locations
because there's a bug when using an `s3://` URI. First is to add a test
for the local files with a `file://` URI so that we don't regress.
Using pathlib.Path for the string we pass to pandas.read_parquet doesn't
work with S3 because Path doesn't understand the protocols. So instead
we just assume this is a URI and join the path components with a "/"
@esheehan-gsl esheehan-gsl self-assigned this Oct 30, 2023
@esheehan-gsl esheehan-gsl linked an issue Oct 30, 2023 that may be closed by this pull request
@esheehan-gsl esheehan-gsl requested a review from ian-noaa October 30, 2023 21:23
Dunno why the pre-commit hook didn't catch this
@github-actions
Copy link

Code Coverage

Package Line Rate Branch Rate Health
unified_graphics 83% 72%
unified_graphics.etl 97% 96%
utils.s3 68% 69%
Summary 86% (412 / 481) 82% (98 / 120)

Minimum allowed line rate is 60%

@esheehan-gsl esheehan-gsl temporarily deployed to vlab October 30, 2023 21:37 — with GitHub Actions Inactive
Copy link
Collaborator

@ian-noaa ian-noaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good to know that pathlib doesn't support URI's. Switching to os.path seems like a reasonable option.

@esheehan-gsl esheehan-gsl merged commit a015271 into main Oct 31, 2023
9 checks passed
@esheehan-gsl esheehan-gsl deleted the 429-parquet-files-not-found-with-s3-uri branch October 31, 2023 15:44
@esheehan-gsl esheehan-gsl temporarily deployed to vlab October 31, 2023 15:44 — with GitHub Actions Inactive
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parquet files not found with S3 URI
2 participants