You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I found myself downloading a large amount of data from PRIDE (PXD004452) on a small EC2 instance (64 GB disk space) with the goal of directly transferring the data to an S3 bucket (I have done this several times, I ❤️ ppx). I have always just started a small instance with minimal disk space, because I figured that since I was just directly transferring to S3 it wouldn't matter... This is not the case though! I am out of memory due to cloudpath local caching.
If I delete the files in the /tmp directory, I can free up space and try to resume the search - but when I restart the search, the completed raw files are re-synced back to the /tmp directory. I think there should be a way (based on issues linked below) to manually delete the locally cached file after upload - not sure how it works for a re-started search. I can try and take a stab at this if it's something you feel could be supported in ppx. This is probably too specialized to be upstreamed to cloudpath - I would say raw files downloaded from PRIDE/etc are immutable and we don't need to worry about syncing changes from local to cloud - just whether the file is synced between cloud storage & repository.
Hmm, I can also resolve the problem by just performing downloads in smaller chunks - so perhaps low priority, since this is probably an unusual use case.
I found myself downloading a large amount of data from PRIDE (PXD004452) on a small EC2 instance (64 GB disk space) with the goal of directly transferring the data to an S3 bucket (I have done this several times, I ❤️ ppx). I have always just started a small instance with minimal disk space, because I figured that since I was just directly transferring to S3 it wouldn't matter... This is not the case though! I am out of memory due to cloudpath local caching.
If I delete the files in the /tmp directory, I can free up space and try to resume the search - but when I restart the search, the completed raw files are re-synced back to the /tmp directory. I think there should be a way (based on issues linked below) to manually delete the locally cached file after upload - not sure how it works for a re-started search. I can try and take a stab at this if it's something you feel could be supported in ppx. This is probably too specialized to be upstreamed to cloudpath - I would say raw files downloaded from PRIDE/etc are immutable and we don't need to worry about syncing changes from local to cloud - just whether the file is synced between cloud storage & repository.
https://cloudpathlib.drivendata.org/stable/caching/
drivendataorg/cloudpathlib#233
drivendataorg/cloudpathlib#153
The text was updated successfully, but these errors were encountered: