-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using pandas-path on the right side of /
with cloudpathlib
#12
Comments
What is This comment may be helpful in showing why that happens: |
@ejm714 pandas-path doesn't support cloudpathlib (edit:) out-of-the-box, and the behavior you're observing is correct behavior for regular paths. When it converts the singleton string from pathlib import Path
Path("s3://mybucket/data/raw")
#> PosixPath('s3:/mybucket/data/raw') |
Also, see here for cloupathlib support: https://github.com/drivendataorg/pandas-path?tab=readme-ov-file#custom-path-accessors |
The same behavior happens if I don't see a way that the custom path accessor helps here but let me know if I'm missing something. I think a join with an s3 path on the left side is just an unsupported case |
I think this is a cloudpathlib bug where we we should raise import pandas as pd
from pandas_path import path
from cloudpathlib import CloudPath
CloudPath("s3://bucket/path") / pd.Series(["a", "b", "c"]).path
#> Traceback (most recent call last):
#> File "<string>", line 1, in <module>
#> File "/Users/bull/miniconda3/envs/sandbox/lib/python3.11/site-packages/cloudpathlib/cloudpath.py", line 891, in __truediv__
#> raise TypeError(f"Can only join path {repr(self)} with strings or posix paths.")
#> TypeError: Can only join path S3Path('s3://bucket/path') with strings or posix paths. |
Ironic because the bug in Limitation 3 has been fixed for many years as of Python 3.8 and we made a mistake of a similar conceptual vein in cloudpathlib. |
I would expect the below to work portably (if cloudpathlib was working correctly). @pjbull's example would still break on a Windows computer. edited per below comment to use from pathlib import PurePosixPath
import pandas as pd
from pandas_path import path, register_path_accessor
from cloudpathlib import S3Path
register_path_accessor("pure_posix_path", PurePosixPath)
S3Path("s3://bucket/dir") / pd.Series(['a', 'b', 'c']).pure_posix_path
#> Traceback (most recent call last):
#> File "<string>", line 1, in <module>
#> File "/Users/jqi/Downloads/pp-test/.venv/lib/python3.12/site-packages/cloudpathlib/cloudpath.py", line 891, in __truediv__
#> raise TypeError(f"Can only join path {repr(self)} with strings or posix paths.")
#> TypeError: Can only join path S3Path('s3://bucket/dir') with strings or posix paths. |
Good point. I believe that would also break on windows so should be |
s3://
/
with cloudpathlib
I'm trying to create a dataframe with two column, one that has the original filepath and one that has the destination filepath so that I can iterate over the dataframe and do something like
These paths will be s3 paths.
However, when I go to set up the filepaths with chaining, the second slash in
s3://
gets dropped, making the result an invalid s3pathThe text was updated successfully, but these errors were encountered: