Run workflow on All of US platform

Install and initilize Google Cloud CLI

How to install google clould: instruction
How to initilize google cloud CLI here
Use a different google cloud account (not the AoU email one, should be the ucsd email) to loggin the google cloud and creat a project or select a exist project
Switch between multiple account and projects: the current project information is PROJECT_ID: . * For multiple projects or account, need to create separate configuration for each project/account, for details typegcloud topic configurations. * To create a new configurations, using gcloud init(tested, don't know how to change config name) or gcloud config configurations create <my-config>. * To activate a configuration, using gcloud config configurations activate <my-config>; to display the path of the activate configuration run gcloud info --format="get(config.paths.active_config_path). * To view current activate configuration use gcloud config list; to view all configurations using gcloud config configurations list.
Other common use command

* Parameters of configuration file can change using `gcloud config set`.
* List available accounts: `gcloud auth list`
* Switch the active account: `gcloud config set account <account-email>` 
* List available projects: `gcloud config list project`
* Switch to project `gcloud config set project <project-id>`

Set up billing

Set up GCR (based on this tutorial

Enable API (may only need to first time use):

Use Google Cloud Console

Use gcloud command:

gcloud services enable containerregistry.googleapis.com

To disable API: go to this link, select the project, click Manage, then click Disable API.

Commands for set up the gcr

gcloud auth login
gcloud config set project PROJECT_ID
gcloud auth configure-docker
docker tag <image-name> <gcr-path>
docker tag yli091230/hipstr:amd64 gcr.io/ucsd-medicine-cast/hipstr:amd64\n
docker push gcr.io/ucsd-medicine-cast/hipstr:amd64

Role and permissions:

Recommend to use a service account.
1. The first push requires Storage Admin role to create a storage bucket for the registry.
2. After the initial image push:
- Stoc

How to use workbench tools

cohort builder --> concept set selector --> Dataset builder --> Jupyter notebook

Cohort builder:
- Create review set
Dataset Builder:
- Cohorts:Participants, Concept set (for each sample):Rows, Values:Columns

Jupyter notebook build directly
Docker images must be stored in GCR.

Example to push docker image busybox to gcr: my-project is the project ID.\

docker pull busybox
docker tag busybox gcr.io/my-project/busybox
docker push gcr.io/my-project/busybox

The user need permission to pull and push images.

Submit job through command line

Check dsub

Where to store the data

Google cloud bucket and local (workspace bucket)

Need to check ggogle cloud bucket and how it works in AoU platform
Can we output all of the files to the fix/permenant bucket?
Do we need to left the notebook run during wdl

Google clould storage access

NOTE, for All of Us project, the WORKSPACE_BUCKET can not access through local terminal.

Cromwell

Set up the configuration file

For transfer multiple large files to instance, enable the Parallel composite uploads in the cromwell configuration file. Example file:

backend {
  ...
  providers {
    ...
    PapiV2 {
      actor-factory = "cromwell.backend.google.pipelines.v2beta.PipelinesApiLifecycleActorFactory"
      config {
        ...
        genomics {
          ...
          parallel-composite-upload-threshold = 150M
          ...
        }
        ...
      }
    }
  }
}

Set up optional parameters

List of Google pipelines API workflow options.

Run with preemptible instance This is to run job on a preemptible instance for 1 time, if premptied, then use on-demand device.

options_filename = "options.json"
options_content = f'{{\n  "jes_gcs_root": "{output_bucket}",\n  "default_runtime_attributes": {{\n    "preemptible": "1"\n    }}\n}}'
fp = open(options_filename, 'w')
fp.write(options_content)
fp.close()

Output directory

{
    "final_workflow_outputs_dir": "/Users/michael_scott/cromwell/outputs",
    "use_relative_output_paths": true,
    "final_workflow_log_dir": "/Users/michael_scott/cromwell/wf_logs",
    "final_call_logs_dir": "/Users/michael_scott/cromwell/call_logs"
}

Questions

How to ssh into VM locally
Parallel transfer file?

Using Parallel Composite Uploads
how to use the enable_fuse in cromwell for google cloud

How to custom configuration files

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GOOGLE_CLOUD.md

GOOGLE_CLOUD.md

Run workflow on All of US platform

Install and initilize Google Cloud CLI

Set up billing

Set up GCR (based on this tutorial

How to use workbench tools

Submit job through command line

Where to store the data

Google clould storage access

Cromwell

Set up the configuration file

Set up optional parameters

Questions

Files

GOOGLE_CLOUD.md

Latest commit

History

GOOGLE_CLOUD.md

File metadata and controls

Run workflow on All of US platform

Install and initilize Google Cloud CLI

Set up billing

Set up GCR (based on this tutorial

How to use workbench tools

Submit job through command line

Where to store the data

Google clould storage access

Cromwell

Set up the configuration file

Set up optional parameters

Questions