Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add support for s3:// into spark dataset #1008

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions kedro-datasets/RELEASE.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@

- Fixed `polars.CSVDataset` `save` method on Windows using `utf-8` as default encoding.
- Made `table_name` a keyword argument in the `ibis.FileDataset` implementation to be compatible with Ibis 10.0.
- Fixed adding `s3://` in spark dataset for supporting old implementation.

## Breaking Changes

Expand Down
4 changes: 2 additions & 2 deletions kedro-datasets/kedro_datasets/spark/spark_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,7 @@ def __init__( # noqa: PLR0913
None, the latest version will be loaded. If its ``save``
attribute is None, save version will be autogenerated.
credentials: Credentials to access the S3 bucket, such as
``key``, ``secret``, if ``filepath`` prefix is ``s3a://`` or ``s3n://``.
``key``, ``secret``, if ``filepath`` prefix is ``s3a://`` or ``s3://``.
Optional keyword arguments passed to ``hdfs.client.InsecureClient``
if ``filepath`` prefix is ``hdfs://``. Ignored otherwise.
metadata: Any arbitrary metadata.
Expand All @@ -216,7 +216,7 @@ def __init__( # noqa: PLR0913
"filepath is a known source of error. You must add this prefix to %s",
filepath,
)
if fs_prefix and fs_prefix in ("s3a://"):
if fs_prefix and fs_prefix in ("s3://", "s3a://"):
_s3 = S3FileSystem(**credentials)
exists_function = _s3.exists
# Ensure cache is not used so latest version is retrieved correctly.
Expand Down
Loading