-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
ec9f51c
commit 088cf6d
Showing
1 changed file
with
141 additions
and
144 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,82 +1,96 @@ | ||
# AwsApps | ||
Genomic data focused toolkit for working with AWS services (e.g. S3 and EC2). Includes exhaustive JUnit testing for each app. See [Misc/WorkingWithTheAWSJobRunner.pdf](https://github.com/HuntsmanCancerInstitute/AwsApps/blob/master/Misc/WorkingWithTheAWSJobRunner.pdf) for details. | ||
<pre> | ||
u0028003$ java -jar -Xmx1G ~/Code/AwsApps/target/GSync_0.6.jar | ||
Genomic data focused toolkit for working with AWS services (e.g. S3 and EC2). Includes exhaustive JUnit testing for each app. | ||
|
||
<pre> | ||
u0028003$ java -jar -Xmx1G ~/Code/AwsApps/target/S3Copy_0.3.jar | ||
************************************************************************************** | ||
** GSync : June 2020 ** | ||
** S3 Copy : Feb 2024 ** | ||
************************************************************************************** | ||
GSync pushes files with a particular extension that exceed a given size and age to | ||
Amazon's S3 object store. Associated genomic index files are also moved. Once | ||
correctly uploaded, GSync replaces the original file with a local txt placeholder file | ||
containing information about the S3 object. Files are restored or deleted by modifying | ||
the name of the placeholder file. Symbolic links are ignored. | ||
|
||
WARNING! This app has the potential to destroy precious genomic data. TEST IT on a | ||
pilot system before deploying in production. BACKUP your local files and ENABLE S3 | ||
Object Versioning before running. This app is provided with no guarantee of proper | ||
function. | ||
SC copies AWS S3 objects, unarchiving them as needed, within the same or different | ||
accounts or downloads them to your local computer. Run this as a daemon with -l or run | ||
repeatedly until complete. To upload files to S3, use the AWS CLI. | ||
|
||
To use the app: | ||
1) Create a new S3 bucket dedicated solely to this purpose. Use it for nothing else. | ||
2) Enable S3 Object Locking and Versioning on the bucket to assist in preventing | ||
accidental object overwriting. Add lifecycle rules to | ||
AbortIncompleteMultipartUpload and move objects to Deep Glacier. | ||
3) It is a good policy when working on AWS S3 to limit your ability to accidentally | ||
delete buckets and objects. To do so, create and assign yourself to an AWS Group | ||
called AllExceptS3Delete with a custom permission policy that denies s3:Delete*: | ||
{"Version": "2012-10-17", "Statement": [ | ||
{"Effect": "Allow", "Action": "*", "Resource": "*"}, | ||
{"Effect": "Deny", "Action": "s3:Delete*", "Resource": "*"} ]} | ||
For standard upload and download gsyncs, assign yourself to the AllExceptS3Delete | ||
group. When you need to delete or update objects, switch to the Admin group, then | ||
switch back. Accidental overwrites are OK since object versioning is enabled. | ||
To add another layer of protection, apply object legal locks via the aws cli. | ||
3) Create a ~/.aws/credentials file with your access, secret, and region info, chmod | ||
600 the file and keep it private. Use a txt editor or the aws cli configure | ||
command, see https://aws.amazon.com/cli Example ~/.aws/credentials file: | ||
[default] | ||
aws_access_key_id = AKIARHBDRGYUIBR33RCJK6A | ||
aws_secret_access_key = BgDV2UHZv/T5ENs395867ueESMPGV65HZMpUQ | ||
region = us-west-2 | ||
4) Execute GSync to upload large old files to S3 and replace them with a placeholder | ||
file named xxx.S3.txt | ||
5) To download and restore an archived file, rename the placeholder | ||
xxx.S3.txt.restore and run GSync. | ||
6) To delete an S3 archived file, it's placeholder, and any local files, rename the | ||
placeholder xxx.S3.txt.delete and run GSync. | ||
Before executing, switch the GSync/AWS user to the Admin group. | ||
7) Placeholder files may be moved, see -u | ||
Create a ~/.aws/credentials file with your access, secret, and region info, chmod | ||
600 the file and keep it private. Use a txt editor or the AWS CLI configure | ||
command, see https://aws.amazon.com/cli Example ~/.aws/credentials file: | ||
[default] | ||
aws_access_key_id = AKIARHBDRGYUIBR33RCJK6A | ||
aws_secret_access_key = BgDV2UHZv/T5ENs395867ueESMPGV65HZMpUQ | ||
region = us-west-2 | ||
Repeat these entries for multiple accounts replacing the word 'default' with a single | ||
unique account name. | ||
|
||
Required: | ||
-d One or more local directories with the same parent to sync. This parent dir | ||
becomes the base key in S3, e.g. BucketName/Parent/.... Comma delimited, no | ||
spaces, see the example. | ||
-b Dedicated S3 bucket name | ||
-j Provide a comma delimited string of copy jobs or a txt file with one per line. | ||
A copy job consists of a full S3 URI as the source and a destination separated | ||
by '>', e.g. 's3://source/tumor.cram > s3://destination/collabTumor.cram' or | ||
folders 's3://source/alignments/tumor > s3://destination/Collab/' or local | ||
's3://source/alignments/tumor > .' Note, the trailing '/' is required in the | ||
S3 destination for a recursive copy or when the local folder doesn't exist. | ||
|
||
Optional: | ||
-f File extensions to consider, comma delimited, no spaces, case sensitive. Defaults | ||
to '.bam,.cram,.gz,.zip' | ||
-a Minimum days old for archiving, defaults to 120 | ||
-g Minimum gigabyte size for archiving, defaults to 5 | ||
-r Perform a real run, defaults to just listing the actions that would be taken. | ||
-k Delete local files that were successfully uploaded. | ||
-u Update S3 Object keys to match current placeholder paths. | ||
-c Recreate deleted placeholder files using info from orphaned S3 Objects. | ||
-q Quiet verbose output. | ||
-e Email addresses to send gsync messages, comma delimited, no spaces. | ||
-s Smtp host, defaults to hci-mail.hci.utah.edu | ||
-x Execute every 6 hrs until complete, defaults to just once, good for downloading | ||
latent glacier objects. | ||
Optional/ Defaults: | ||
-d Perform a dry run to list the actions that would be taken | ||
-r Perform a recursive copy, defaults to an exact source key match | ||
-e Email addresse(s) to send status messages, comma delimited, no spaces. Note, | ||
the sendmail app must be configured on your system. Test it: | ||
echo 'Subject: Hello' | sendmail [email protected] | ||
-x Expedite archive retrieval, increased cost $0.03/GB vs $0.01/GB, 1-5min vs 3-12hr, | ||
defaults to standard. | ||
-l Execute every hour (standard) or minute (expedited) until complete | ||
-t Maximum threads to utilize, defaults to 8 | ||
-p AWS credentials profile, defaults to 'default' | ||
-n Number of days to keep restored files in S3, defaults to 1 | ||
-a Print instructions for copying files between different accounts | ||
|
||
Example: java -Xmx20G -jar pathTo/GSync_X.X.jar -r -u -k -b hcibioinfo_gsync_repo | ||
-q -a 90 -g 1 -d -d /Repo/DNA,/Repo/RNA,/Repo/Fastq -e [email protected] | ||
Example: java -Xmx10G -jar pathTo/S3Copy_x.x.jar -e [email protected] -p obama -d -l | ||
-j 's3://source/Logs.zip>s3://destination/,s3://source/normal > ~/Downloads/' -r | ||
************************************************************************************** | ||
</pre> | ||
|
||
<pre> | ||
u0028003$ java -jar -Xmx1G ~/Code/AwsApps/target/VersionManager_0.2.jar | ||
************************************************************************************** | ||
** AWS S3 Version Manager : August 2023 ** | ||
************************************************************************************** | ||
Bucket versioning in S3 protects objects from being deleted or overwritten by hiding | ||
the original when 'deleting' or over writing an existing object. Use this tool to | ||
delete these hidden S3 objects and any deletion marks from your buckets. Use the | ||
options to select particular redundant objects to delete in a dry run, review the | ||
actions, and rerun it with the -r option to actually delete them. This app will not | ||
delete any isLatest=true object. | ||
|
||
WARNING! This app has the potential to destroy precious data. TEST IT on a | ||
pilot system before deploying in production. Although extensively unit tested, this | ||
app is provided with no guarantee of proper function. | ||
|
||
To use the app: | ||
1) Enable S3 Object versioning on your bucket. | ||
2) Install and configure the aws cli with your region, access and secret keys. See | ||
https://aws.amazon.com/cli | ||
3) Use cli commands like 'aws s3 rm s3://myBucket/myObj.txt' or the AWS web Console to | ||
'delete' particular objects. Then run this app to actually delete them. | ||
|
||
Required Parameters: | ||
-b Versioned S3 bucket name | ||
|
||
Optional Parameters: | ||
-r Perform a real run, defaults to a dry run where no objects are deleted | ||
-c Credentials profile name, defaults to 'default' | ||
-a Minimum age, in days, of object to delete, defaults to 30 | ||
-s Object key suffixes to delete, comma delimited, no spaces | ||
-p Object key prefixes to delete, comma delimited, no spaces | ||
-v Verbose output | ||
-t Maximum threads to use, defaults to 8 | ||
|
||
Example: java -Xmx10G -jar pathTo/VersionManager_X.X.jar -b mybucket-vm-test | ||
-s .cram,.bam,.gz,.zip -a 7 -c MiloLab | ||
|
||
************************************************************************************** | ||
</pre> | ||
|
||
See [Misc/WorkingWithTheAWSJobRunner.pdf](https://github.com/HuntsmanCancerInstitute/AwsApps/blob/master/Misc/WorkingWithTheAWSJobRunner.pdf) for details. | ||
<pre> | ||
u0028003$ java -jar -Xmx1G ~/Code/AwsApps/target/JobRunner_0.3.jar | ||
|
||
************************************************************************************** | ||
|
@@ -137,96 +151,79 @@ Example: java -jar -Xmx1G JobRunner.jar -x -t | |
-c 'https://my-jr.s3.us-west-2.amazonaws.com/aws.cred.txt?X-AmRun...' | ||
|
||
************************************************************************************** | ||
</pre> | ||
|
||
|
||
|
||
|
||
u0028003$ java -jar ~/Code/AwsApps/target/VersionManager_0.1.jar | ||
|
||
************************************************************************************** | ||
** AWS S3 Version Manager : January 2022 ** | ||
<pre> | ||
u0028003$ java -jar -Xmx1G ~/Code/AwsApps/target/GSync_0.6.jar | ||
************************************************************************************** | ||
Bucket versioning in S3 protects objects from being deleted or overwritten by hiding | ||
the original when 'deleting' or over writing an existing object. Use this tool to | ||
delete these hidden S3 objects and any deletion marks from your buckets. Use the | ||
options to select particular redundant objects to delete in a dry run, review the | ||
actions, and rerun it with the -r option to actually delete them. This app will not | ||
delete any isLatest=true object. | ||
|
||
WARNING! This app has the potential to destroy precious data. TEST IT on a | ||
pilot system before deploying in production. Although extensively unit tested, this | ||
app is provided with no guarantee of proper function. | ||
|
||
To use the app: | ||
1) Enable S3 Object versioning on your bucket. | ||
2) Install and configure the aws cli with your region, access and secret keys. See | ||
https://aws.amazon.com/cli | ||
3) Use cli commands like 'aws s3 rm s3://myBucket/myObj.txt' or the AWS web Console to | ||
'delete' particular objects. Then run this app to actually delete them. | ||
|
||
Required Parameters: | ||
-b Versioned S3 bucket name | ||
-l Bucket region location | ||
|
||
Optional Parameters: | ||
-r Perform a real run, defaults to a dry run where no objects are deleted | ||
-c Credentials profile name, defaults to 'default' | ||
-a Minimum age, in days, of object to delete, defaults to 30 | ||
-s Object key suffixes to delete, comma delimited, no spaces | ||
-p Object key prefixes to delete, comma delimited, no spaces | ||
-q Quiet output. | ||
|
||
Example: java -Xmx10G -jar pathTo/VersionManager_X.X.jar -b mybucket-vm-test | ||
-s .cram,.bam,.gz,.zip -a 7 -c MiloLab -l us-west-2 | ||
|
||
** GSync : June 2020 ** | ||
************************************************************************************** | ||
GSync pushes files with a particular extension that exceed a given size and age to | ||
Amazon's S3 object store. Associated genomic index files are also moved. Once | ||
correctly uploaded, GSync replaces the original file with a local txt placeholder file | ||
containing information about the S3 object. Files are restored or deleted by modifying | ||
the name of the placeholder file. Symbolic links are ignored. | ||
|
||
|
||
|
||
u0028003$ java -jar ~/Code/AwsApps/target/S3Copy_0.1.jar | ||
|
||
************************************************************************************** | ||
** S3 Copy : Jan 2023 ** | ||
************************************************************************************** | ||
SC copies AWS S3 objects, unarchiving them as needed, within the same or different | ||
accounts or downloads them to your local computer. Run this as a daemon with -l or run | ||
repeatedly until complete. To upload files to S3, use the AWS CLI. | ||
WARNING! This app has the potential to destroy precious genomic data. TEST IT on a | ||
pilot system before deploying in production. BACKUP your local files and ENABLE S3 | ||
Object Versioning before running. This app is provided with no guarantee of proper | ||
function. | ||
|
||
To use the app: | ||
Create a ~/.aws/credentials file with your access, secret, and region info, chmod | ||
600 the file and keep it private. Use a txt editor or the AWS CLI configure | ||
command, see https://aws.amazon.com/cli Example ~/.aws/credentials file: | ||
[default] | ||
aws_access_key_id = AKIARHBDRGYUIBR33RCJK6A | ||
aws_secret_access_key = BgDV2UHZv/T5ENs395867ueESMPGV65HZMpUQ | ||
region = us-west-2 | ||
Repeat these entries for multiple accounts replacing the word 'default' with a single | ||
unique account name. | ||
1) Create a new S3 bucket dedicated solely to this purpose. Use it for nothing else. | ||
2) Enable S3 Object Locking and Versioning on the bucket to assist in preventing | ||
accidental object overwriting. Add lifecycle rules to | ||
AbortIncompleteMultipartUpload and move objects to Deep Glacier. | ||
3) It is a good policy when working on AWS S3 to limit your ability to accidentally | ||
delete buckets and objects. To do so, create and assign yourself to an AWS Group | ||
called AllExceptS3Delete with a custom permission policy that denies s3:Delete*: | ||
{"Version": "2012-10-17", "Statement": [ | ||
{"Effect": "Allow", "Action": "*", "Resource": "*"}, | ||
{"Effect": "Deny", "Action": "s3:Delete*", "Resource": "*"} ]} | ||
For standard upload and download gsyncs, assign yourself to the AllExceptS3Delete | ||
group. When you need to delete or update objects, switch to the Admin group, then | ||
switch back. Accidental overwrites are OK since object versioning is enabled. | ||
To add another layer of protection, apply object legal locks via the aws cli. | ||
3) Create a ~/.aws/credentials file with your access, secret, and region info, chmod | ||
600 the file and keep it private. Use a txt editor or the aws cli configure | ||
command, see https://aws.amazon.com/cli Example ~/.aws/credentials file: | ||
[default] | ||
aws_access_key_id = AKIARHBDRGYUIBR33RCJK6A | ||
aws_secret_access_key = BgDV2UHZv/T5ENs395867ueESMPGV65HZMpUQ | ||
region = us-west-2 | ||
4) Execute GSync to upload large old files to S3 and replace them with a placeholder | ||
file named xxx.S3.txt | ||
5) To download and restore an archived file, rename the placeholder | ||
xxx.S3.txt.restore and run GSync. | ||
6) To delete an S3 archived file, it's placeholder, and any local files, rename the | ||
placeholder xxx.S3.txt.delete and run GSync. | ||
Before executing, switch the GSync/AWS user to the Admin group. | ||
7) Placeholder files may be moved, see -u | ||
|
||
Required: | ||
-j Provide a comma delimited string of copy jobs or a txt file with one per line. | ||
A copy job consists of a full S3 URI as the source and a destination separated | ||
by '>', e.g. 's3://source/tumor.cram > s3://destination/collabTumor.cram' or | ||
folders 's3://source/alignments/tumor > s3://destination/Collab/' or local | ||
's3://source/alignments/tumor > .' Note, the trailing '/' is required in the | ||
S3 destination for a recursive copy or when the local folder doesn't exist. | ||
-d One or more local directories with the same parent to sync. This parent dir | ||
becomes the base key in S3, e.g. BucketName/Parent/.... Comma delimited, no | ||
spaces, see the example. | ||
-b Dedicated S3 bucket name | ||
|
||
Optional/ Defaults: | ||
-d Perform a dry run to list the actions that would be taken | ||
-r Perform a recursive copy, defaults to an exact source key match | ||
-e Email addresse(s) to send status messages, comma delimited, no spaces. Note, | ||
the sendmail app must be configured on your system. Test it: | ||
echo 'Subject: Hello' | sendmail [email protected] | ||
-x Expedite archive retrieval, increased cost $0.03/GB vs $0.01/GB, 1-5min vs 3-12hr, | ||
defaults to standard. | ||
-l Execute every hour (standard) or minute (expedited) until complete | ||
-t Maximum threads to utilize, defaults to 8 | ||
-p AWS credentials profile, defaults to 'default' | ||
-n Number of days to keep restored files in S3, defaults to 1 | ||
-a Print instructions for copying files between different accounts | ||
Optional: | ||
-f File extensions to consider, comma delimited, no spaces, case sensitive. Defaults | ||
to '.bam,.cram,.gz,.zip' | ||
-a Minimum days old for archiving, defaults to 120 | ||
-g Minimum gigabyte size for archiving, defaults to 5 | ||
-r Perform a real run, defaults to just listing the actions that would be taken. | ||
-k Delete local files that were successfully uploaded. | ||
-u Update S3 Object keys to match current placeholder paths. | ||
-c Recreate deleted placeholder files using info from orphaned S3 Objects. | ||
-q Quiet verbose output. | ||
-e Email addresses to send gsync messages, comma delimited, no spaces. | ||
-s Smtp host, defaults to hci-mail.hci.utah.edu | ||
-x Execute every 6 hrs until complete, defaults to just once, good for downloading | ||
latent glacier objects. | ||
|
||
Example: java -Xmx20G -jar pathTo/S3Copy_x.x.jar -e [email protected] -p obama -d -l | ||
-j 's3://source/Logs.zip>s3://destination/,s3://source/normal > ~/Downloads/' -r | ||
************************************************************************************** | ||
Example: java -Xmx20G -jar pathTo/GSync_X.X.jar -r -u -k -b hcibioinfo_gsync_repo | ||
-q -a 90 -g 1 -d -d /Repo/DNA,/Repo/RNA,/Repo/Fastq -e [email protected] | ||
|
||
************************************************************************************** | ||
</pre> | ||
|