Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to delete local cloudpath cache after upload #29

Open
lazear opened this issue Nov 15, 2022 · 2 comments
Open

Ability to delete local cloudpath cache after upload #29

lazear opened this issue Nov 15, 2022 · 2 comments

Comments

@lazear
Copy link

lazear commented Nov 15, 2022

I found myself downloading a large amount of data from PRIDE (PXD004452) on a small EC2 instance (64 GB disk space) with the goal of directly transferring the data to an S3 bucket (I have done this several times, I ❤️ ppx). I have always just started a small instance with minimal disk space, because I figured that since I was just directly transferring to S3 it wouldn't matter... This is not the case though! I am out of memory due to cloudpath local caching.

If I delete the files in the /tmp directory, I can free up space and try to resume the search - but when I restart the search, the completed raw files are re-synced back to the /tmp directory. I think there should be a way (based on issues linked below) to manually delete the locally cached file after upload - not sure how it works for a re-started search. I can try and take a stab at this if it's something you feel could be supported in ppx. This is probably too specialized to be upstreamed to cloudpath - I would say raw files downloaded from PRIDE/etc are immutable and we don't need to worry about syncing changes from local to cloud - just whether the file is synced between cloud storage & repository.

https://cloudpathlib.drivendata.org/stable/caching/

drivendataorg/cloudpathlib#233
drivendataorg/cloudpathlib#153

@lazear
Copy link
Author

lazear commented Nov 15, 2022

Hmm, I can also resolve the problem by just performing downloads in smaller chunks - so perhaps low priority, since this is probably an unusual use case.

@wfondrie
Copy link
Owner

Interesting - I'll have to look into this. Thanks for bringing to my attention!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants