Pipeline on GCP fails with "Error: pipeline dependencies not found" #224

amtseng · 2021-04-06T21:06:49Z

Describe the bug

I've submitted a good number (25) of ChIP-seq jobs to Caper, and the jobs begin running, but somehow halfway through, the Caper server dies suddenly. Examining the logs and grepping for "error", I find that all of the job logs (in cromwell-workflow-logs/) contain "Error: pipeline dependencies not found".

I have consulted Issue #172, but I have verified that I have activated the encode-chip-seq-pipeline einvironment both when launching the Caper server and when submitting the jobs. I am also experiencing these issues on GCP, and not on MacOS, so I felt it was prudent to create a new issue for this.

OS/Platform

OS/Platform: Google Cloud
Conda version: 4.7.12
Pipeline version: I'm not sure how to check this, sorry
Caper version: 1.4.2

Caper configuration file

backend=gcp
gcp-prj=gbsc-gcp-lab-kundaje
tmp-dir=/data/tmp_amtseng
singularity-cachedir=/data/singularity_cachedir_amtseng
file-db=/data/caper_db/caper_file_db_amtseng
db-timeout=120000
max-concurrent-tasks=1000
max-concurrent-workflows=50
use-google-cloud-life-sciences=True
gcp-region=us-central1

Input JSON file

Here, I'm showing one of the 25 jobs submitted.

{
  "chip.title": "A549_cJun_FLAG cells untreated",
  "chip.description": "A549_cJun_FLAG cells untreated",

  "chip.pipeline_type": "tf",

  "chip.aligner": "bowtie2",
  "chip.align_only": false,
  "chip.true_rep_only": false,

  "chip.genome_tsv": "https://storage.googleapis.com/encode-pipeline-genome-data/genome_tsv/v3/hg38.tsv",

  "chip.paired_end": false,
  "chip.ctl_paired_end": false,

  "chip.always_use_pooled_ctl": true,

  "chip.align_cpu": 4,
  "chip.call_peak_cpu": 4,

  "chip.fastqs_rep1_R1": [
    "gs://caper_in/amtseng/AP1/fastqs/SRR12090532.fastq.gz"
  ],
  "chip.fastqs_rep2_R1": [
    "gs://caper_in/amtseng/AP1/fastqs/SRR12090533.fastq.gz"
  ],
  "chip.fastqs_rep3_R1": [
    "gs://caper_in/amtseng/AP1/fastqs/SRR12090534.fastq.gz"
  ],

  "chip.ctl_fastqs_rep1_R1": [
    "gs://caper_in/amtseng/AP1/fastqs/SRR12090601.fastq.gz"
  ],
  "chip.ctl_fastqs_rep2_R1": [
    "gs://caper_in/amtseng/AP1/fastqs/SRR12090602.fastq.gz"
  ],
  "chip.ctl_fastqs_rep3_R1": [
    "gs://caper_in/amtseng/AP1/fastqs/SRR12090603.fastq.gz"
  ]
}

Troubleshooting result

Unfortunately, because the Caper server dies, I am unable to use caper troubleshoot {jobID} to diagnose.
Instead, I've attached the cromwell log for the job. The end of this log is:

I've also attached cromwell.out.
workflow.3d1cb136-9b32-4514-9a33-3262d8303d6f.log

cromwell.out

Thanks!

The text was updated successfully, but these errors were encountered:

leepc12 · 2021-04-06T21:18:40Z

Looked at two files but can't find any helpful information for debugging.
It looks like cromwell got SIGTERM and gracefully shutdown itself.

2021-04-06 19:33:06,677  ERROR - Timed out trying to gracefully stop WorkflowStoreActor. Forcefully stopping it.

Can you upgrade Caper (which includes Cromwell version upgrade 52->59) and try again? Please follow upgrade instruction on Caper's release note.

$ pip3 install autouri caper --upgrade

amtseng · 2021-04-06T21:24:45Z

I'll give that a try and report back. Thanks, Jin!

amtseng · 2021-04-07T19:23:51Z

I've upgraded Caper/Cromwell (and verified the version update). Running on the same 25 jobs, I still get the exact same errors, and the Caper server crashes.

I then tried running just one job only. Intriguingly, it succeeded! So that suggests to me that either a subset of the jobs are crashing and causing the entire Caper server to crash and take the other jobs with them, or simply having too many jobs at a time is causing troubles...

Very strange! Any ideas? In the meantime, I'm going to try running a few more jobs on their own and see how that goes...

leepc12 · 2021-04-07T20:09:05Z

How did you run the server? Did you use Caper's shell script to make a server instance?
https://github.com/ENCODE-DCC/caper/tree/master/scripts/gcp_caper_server

amtseng · 2021-04-07T20:17:33Z

I started the server using this command in a tmux session:

caper server --port 8000 --gcp-loc-dir=gs://caper_out/amtseng/.caper_tmp --gcp-out-dir gs://caper_out/amtseng/

leepc12 · 2021-04-07T20:23:30Z

That command line looks good if your Google user account settings have enough permission to GCE, GCS and Google Life Sciences API and on on.

Why don't use a configuration file ~/.caper/default.conf? You can make a good template of it by running the following:

# this will overwrite on the existing conf file. please make a backup if you need.
$ caper init gcp

BTW I strongly recommend to use the above shell script because ENCODE DCC runs thousands of pipeline without any problem on the instance created by that shell script.

Not sure if you have a service account with correct permissions settings. Please use the above script.

amtseng · 2021-04-08T00:06:57Z

I generated the default configuration file using caper init gcp, specifying only the gcp-prj and gcp-out-dir fields. I also started running a Caper server using just caper server in a tmux session.
Caper still crashed, although the logs now have not only the pipeline dependencies not found error, but I also see java.lang.OutOfMemoryError: GC overhead limit exceeded errors.

I've attached cromwell.out and an example workflow log, again.

cromwell.out.txt
workflow.225a8edd-5ee7-45c2-b77f-d5123797d313.log.txt

leepc12 · 2021-04-08T16:57:14Z

It looks like Java memory issue?

java.sql.SQLException: java.lang.OutOfMemoryError: GC overhead limit exceeded

Thanks why I recommend the shell script. That script will make an instance with enough memory and all caper settings are automatically configured.

amtseng · 2021-04-08T17:00:27Z

Ah, I'm sorry. I misunderstood which script you were referring to. I'll try to create an instance using create_instance.sh instead of the pre-existing instance we have in the lab.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pipeline on GCP fails with "Error: pipeline dependencies not found" #224

Pipeline on GCP fails with "Error: pipeline dependencies not found" #224

amtseng commented Apr 6, 2021

leepc12 commented Apr 6, 2021

amtseng commented Apr 6, 2021

amtseng commented Apr 7, 2021

leepc12 commented Apr 7, 2021

amtseng commented Apr 7, 2021

leepc12 commented Apr 7, 2021 •

edited

Loading

amtseng commented Apr 8, 2021 •

edited

Loading

leepc12 commented Apr 8, 2021

amtseng commented Apr 8, 2021

Pipeline on GCP fails with "Error: pipeline dependencies not found" #224

Pipeline on GCP fails with "Error: pipeline dependencies not found" #224

Comments

amtseng commented Apr 6, 2021

Describe the bug

OS/Platform

Caper configuration file

Input JSON file

Troubleshooting result

leepc12 commented Apr 6, 2021

amtseng commented Apr 6, 2021

amtseng commented Apr 7, 2021

leepc12 commented Apr 7, 2021

amtseng commented Apr 7, 2021

leepc12 commented Apr 7, 2021 • edited Loading

amtseng commented Apr 8, 2021 • edited Loading

leepc12 commented Apr 8, 2021

amtseng commented Apr 8, 2021

leepc12 commented Apr 7, 2021 •

edited

Loading

amtseng commented Apr 8, 2021 •

edited

Loading