-
Notifications
You must be signed in to change notification settings - Fork 0
File management in S3 buckets
Jonathan Rochkind edited this page Dec 20, 2018
·
6 revisions
Different classes of objects we want in S3. Dividing into different categories depending on access control; lifecycle management; and/or CORS setting differences. I may be missing some, hopefully not, but others categories may come up as we develop!
- No public ACLs, S3 policies preventing
- local_env key: s3_bucket_originals
- staging bucket: scihi-kithe-stage-originals
- special backup/lifecycle treatment (documented (in text or code) where?)
- DO have public acls right now, we'd need to do more app work to keep them private
- local_env key: s3_bucket_derivatives
- staging bucket: scihi-kithe-stage-derivatives
- Can be re-generated at any time, loss would at most be temporary (although can take a while to regenerate)
- backup details: TBD
- No public ACLs, S3 policies preventing
- Uploaded from a browser directly to an S3 bucket. Correspond to the "uploaded files" location in sufia, but in new app this doesn't need to be mounted block storage, it can be all S3.
- Don't need to be backed up at all.
- local_env key: s3_bucket_uploads
- staging bucket: scihi-kithe-stage-uploads
- DO need life cycle rules to purge files older than X days. That app can't manage to clean up after itself in all cases no matter what, as files can be abandoned by a browser mid-process with no way for app to know next ingest step won't be taken.
- Does need special CORS settings
- no public ACL needed, S3 permissions limiting such recommended
- The one that gets an icon on Windows desktops, used for ingest process
- Currently shared with production sufia app, one windows-mounted ingest bucket
- local_env key: s3_bucket_ingest
- bucket_name: scih-uploads
- DO Need to have public S3 acls, as we still haven't figured out any good access controls that work with out setup
- Are otherwise derivatives where backups not important
- See the note about regeneration time in derivatives
- Don't need public acl
- Can be re-generated at any time, in fact our front-end code will automatically re-generate on demand
- We treat these as 'cache', don't actually want to hold onto them forever, so also need lifecycle rules to delete files that haven't been accessed in X days
- Currently on S3 just cause, why not, avoids the need for other persistent file system (that can be served by web)
- Does't need to be backed up at all, doesn't need any lifecycle management
- public ACL
- Since this is a single file, we can look at this on the file level rather than a bucket/prefix level if we want.
Any other categories/types of S3 files? Not that I'm thinking of now!
The app has mode for using S3 in development, where it puts all files in a shared S3 dev bucket. This is triggered by env storage_mode: dev_s3
, which is default in dev.
It will use the bucket in env s3_dev_bucket
(default scih-uploads-dev
), and put a given dev apps files in a segregated prefix by default created by the username and hostname where the app is running, but can be set with env s3_dev_prefix
.