Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is_dir() and is_file() are not working properly for gs #493

Open
shanirosen-airis opened this issue Dec 24, 2024 · 4 comments
Open

is_dir() and is_file() are not working properly for gs #493

shanirosen-airis opened this issue Dec 24, 2024 · 4 comments

Comments

@shanirosen-airis
Copy link

shanirosen-airis commented Dec 24, 2024

Hey,
I tried to use the is_dir and is_file functions from both s3 and gs, but discovered that:

from cloudpathlib import AnyPath

p1 = AnyPath("gs://my-bucket/test/test_dir")
p2 = AnyPath("gs://my-bucket/test/test_dir/")

print(p1.is_dir())  # True
print(p2.is_dir())  # False

p1 = AnyPath("s3://my-bucket/test/test_dir")
p2 = AnyPath("s3://my-bucket/test/test_dir/")

print(p1.is_dir())  # True
print(p2.is_dir())  # True

looks like in gs everything is classified as a file unless I strip the last "/".
Any Idea why is this happening?
using cloudpathlib==0.20.0

Thanks in advance

@pjbull
Copy link
Member

pjbull commented Dec 24, 2024

This doesn't repro generically, so it has something to do with the configuration of your bucket/storage and the objects that actually exist in your storage. For example, I see:

In [1]: from cloudpathlib import CloudPath

In [2]: CloudPath('gs://cloudpathlib-test-bucket/performance_tests/').is_dir()
Out[2]: True

In [2]: CloudPath('gs://cloudpathlib-test-bucket/performance_tests').is_dir()
Out[2]: True

Do you have more information about your use case?

A few helpful questions:

@shanirosen-airis
Copy link
Author

shanirosen-airis commented Dec 29, 2024

Thanks for the response! I'm still getting this for some reason, I just created a folder using the gs ui:
image

>>> from cloudpathlib import AnyPath
>>> p1 = AnyPath("gs://airis-packages-tests/folder1")
>>> p1.exists()
True
>>> p1.is_dir()
True
>>> p2 = AnyPath("gs://airis-packages-tests/folder1/")
>>> p2.is_dir()
False

For some reason the blob metadata function runs only on "folder1/" and on "folder" I got None:

Blob: folder1/
Bucket: airis-packages-tests
Storage class: STANDARD
ID: airis-packages-tests/folder1//1735484279960870
Size: 0 bytes
Updated: 2024-12-29 14:58:00.020000+00:00
Generation: 1735484279960870
Metageneration: 1
Etag: CKbqlOCezYoDEAE=
Owner: None
Component count: None
Crc32c: AAAAAA==
md5_hash: 1B2M2Y8AsgTpgAmY7PhCfg==
Cache-control: None
Content-type: text/plain
Content-disposition: None
Content-encoding: None
Content-language: None
Metadata: None
Medialink: https://storage.googleapis.com/download/storage/v1/b/airis-packages-tests/o/folder1%2F?generation=1735484279960870&alt=media
Custom Time: None
Temporary hold:  disabled
Event based hold:  disabled
Retention mode: None
Retention retain until time: None

From this looks like the blob object has "/' which I don't know why. Maybe that's the issue?
The bucket itself was created with gcp's default configuration

@shanirosen-airis
Copy link
Author

Quick update: I tried creating a path with gsutil: gsutil cp file1.txt gs://airis-packages-tests/folder8/123.txt and is_dir worked perfectly fine on folder8. Seems like the issue is with creating folders with gs UI, maybe it creates an empty blob or something like that. Is it possible to support that as well? Thanks in advance!

@shanirosen-airis
Copy link
Author

Hey @pjbull, any updates on this? 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants