These are scripts for working with the HCI GNomEx Bio-Repository file server, particularly with respect in preparation and migration of GNomEx data to AWS Buckets (and formerly Seven Bridges).
Main scripts are in the bin
directory. Modules required for execution are in the lib
directory.
Current operation is predicated around collecting disparate bits of information, including metadata from the GNomEx database, metadata for each GNomEx project directory on the bio-repository file server, collated information regarding known AWS lab divisions, and metadata regarding actions performed on GNomEx projects, all coalesced into a Catalog database file. Actions on the GNomEx projects take place through, and recorded into, the Catalog file.
Main scripts in the bin
folder.
manage_repository.pl
The primary script for working with the GNomEx repository projects and file directories. This is a comprehensive application for cataloging, managing, processing, and uploading GNomEx projects. All functions revolve around a local Catalog database file.
process_project.pl
General unified script for processing both GNomEx Experiment Request and Analysis
projects in the repository. Writes a unified structure project MANIFEST.csv
file
with the metadata of the files, including file type, size, date stamp, MD5 checksum,
and sequencing information (for Fastq files only). Writes zip archive and remove file
list files in parent directory and hidden from the GNomEx web application.
upload_repo_projects.pl
Manifest-based, multi-threaded, file uploader of GNomEx repository projects to AWS buckets.
- other
There are a few other sundry scripts for various one-off purposes and such.
Special purpose scripts and modules, mostly retired, are moved into separate folders for sanity purposes.
- Seven Bridges
There are multiple scripts for working with the
Seven Bridges platform (now part of
Velsera), and the
Cancer Genomics Cloud platform. These include
sbg_project_manager.pl
(an extensive file and project manager for listing, moving,
and exporting files), sbg_vol_manager.pl
(manager for mounted AWS buckets),
sbg_async_folder_copy.pl
(bulk, recursive, asynchronous file copy within or between
projects), check_divisions.pl
(list projects, members, tasks, and more), and others.
These are mostly retired. These are in the sevenbridges directory.
Perl libraries that wrap around the Seven Bridges API have been moved to a private repository, Net::SB. The code was originally part of this package but split off to make it more generally accessible. It is currently not on CPAN and must be installed manually.
- Migration
Several scripts used in the Great Data Migration from the Seven Bridges platform to
private AWS buckets. These include generate_project_export_prefixes.pl
for automatic
generation of destination buckets and prefixes for manual inspection,
select_sbg_project_files_to_delete.pl
for finding and removing unwanted files from
Seven Bridges projects, generate_bulk_export_commands.pl
for generating the actual
shell scripts for managing and executing the transfer, and verify_transfers.pl
for
directly comparing the paths and sizes between Seven Bridges projects and AWS buckets
for confirmation of successful file transfer. These are in the migration
directory.
- Deprecated
Old scripts and stuff are placed in the deprecated directory for posterity only. See its dedicated page for more information.
There are specific Perl library modules for working with the Catalog database file, Repository project folders on the file server, and GNomEx. These are fairly specific to the scripts here and unlikely to be of general interest.
NOTE: These are not installed by a package manager. All of the applications in the
script
folder will look for the lib
folder in the parent directory. Downloading
the package as-is and running in place should work. Otherwise, they can be manually
placed in your PERL5LIB
path.
The scripts are written in Perl and can be executed under the system Perl. Additional
Perl modules are required for execution. These may be installed with a Perl package
manager, such as CPANminus or CPAN, using
the included cpanfile
, or if you're brave, your Linux OS package manager.
cpanm --installdeps .
The Net::SB module is not currently on CPAN and must be installed manually.
To ease distribution to other servers, single file executables can be generated using PAR-Packer. Note that this is not perfect, and may not work across different systems, e.g. Linux distributions with different kernels.
pp -c -o executables/script bin/script.pl
For some scripts that use runtime loader modules, you will have to actually execute
the script with real arguments (no, simple --help
won’t cut it).
pp -x --xargs " --option foo --bar" -o executables/script bin/script.pl
This package is free software; you can redistribute it and/or modify it under the terms of the Artistic License 2.0.
Timothy J. Parnell, PhD
Bioinformatic Analysis Shared Resource
Huntsman Cancer Institute
University of Utah
Salt Lake City, UT, 84112