Fail at chip.call_peak_ppr1 step #219

huynhvietlinh · 2021-03-24T00:31:35Z

Describe the bug

I installed and run the test successfully (ENCSR000DYI (subsampled 1/25, chr19_chrM only)) but when I run the pipeline on my data it failed at the step call_peak_ppr1

OS/Platform

OS/Platform: CentOS 6.3
Conda version: 4.9.2
Pipeline version: 1.7.1
Caper version: 1.4.2

Caper configuration file

Paste contents of ~/.caper/default.conf.

backend=slurm

# define one of the followings (or both) according to your
# cluster's SLURM configuration.
slurm-partition=all
#slurm-account=

# Hashing strategy for call-caching (3 choices)
# This parameter is for local (local/slurm/sge/pbs) backend only.
# This is important for call-caching,
# which means re-using outputs from previous/failed workflows.
# Cache will miss if different strategy is used.
# "file" method has been default for all old versions of Caper<1.0.
# "path+modtime" is a new default for Caper>=1.0,
#   file: use md5sum hash (slow).
#   path: use path.
#   path+modtime: use path and modification time.
local-hash-strat=path+modtime

# Local directory for localized files and Cromwell's intermediate files
# If not defined, Caper will make .caper_tmp/ on local-out-dir or CWD.
# /tmp is not recommended here since Caper store all localized data files
# on this directory (e.g. input FASTQs defined as URLs in input JSON).
local-loc-dir=/mnt/work1/users/hoffmangroup/lhuynh/Tools/chip-seq-pipeline2/tmp

cromwell=/mnt/work1/users/home/linhh/.caper/cromwell_jar/cromwell-52.jar
womtool=/mnt/work1/users/home/linhh/.caper/womtool_jar/womtool-52.jar

Input JSON file

Paste contents of your input JSON file.

{
    "chip.pipeline_type" : "histone",
    "chip.genome_tsv" : "/mnt/work1/users/hoffmangroup/lhuynh/ChromatinHub/data/2021_03_22_raw_ChIP_seq/mm9.tsv",
    "chip.fastqs_rep1_R1" : ["/mnt/work1/users/hoffmangroup/lhuynh/ChromatinHub/data/2021_03_22_raw_ChIP_seq/Med1_ChIPseq_siCtrl_Rep1.fastq"],
    "chip.fastqs_rep2_R1" : ["/mnt/work1/users/hoffmangroup/lhuynh/ChromatinHub/data/2021_03_22_raw_ChIP_seq/Med1_ChIPseq_siCtrl_Rep2.fastq"],
    "chip.ctl_fastqs_rep1_R1" : ["/mnt/work1/users/hoffmangroup/lhuynh/ChromatinHub/data/2021_03_22_raw_ChIP_seq/Input_ChIPseq_siCtrl_Rep1.fastq"],
    "chip.ctl_fastqs_rep2_R1" : ["/mnt/work1/users/hoffmangroup/lhuynh/ChromatinHub/data/2021_03_22_raw_ChIP_seq/Input_ChIPseq_siCtrl_Rep2.fastq"],
    "chip.paired_end" : false,
    "chip.title" : "Med1Ctrl_mm9",
    "chip.description" : "Med1 ChIP-seq of Ctrl mESC mm9"
}

And here is what I got from slurm log file (the full log file is attached)

==== NAME=chip.call_peak_ppr1, STATUS=RetryableFailure, PARENT=
SHARD_IDX=-1, RC=1, JOB_ID=32045
START=2021-03-23T04:29:16.227Z, END=2021-03-23T08:25:58.647Z
STDOUT=/mnt/work1/users/hoffmangroup/lhuynh/Tools/chip-seq-pipeline2/Linh_Output/chip/b83eb85b-c760-440b-84d5-3d7b5e4cf3ec/call-call_peak_ppr1/execution/stdout
STDERR=/mnt/work1/users/hoffmangroup/lhuynh/Tools/chip-seq-pipeline2/Linh_Output/chip/b83eb85b-c760-440b-84d5-3d7b5e4cf3ec/call-call_peak_ppr1/execution/stderr
STDERR_CONTENTS=
Traceback (most recent call last):
  File "/mnt/work1/users/hoffmangroup/lhuynh/miniconda3/envs/encode-chip-seq-pipeline/bin/encode_task_spp.py", line 133, in <module>
    main()
  File "/mnt/work1/users/hoffmangroup/lhuynh/miniconda3/envs/encode-chip-seq-pipeline/bin/encode_task_spp.py", line 119, in main
    args.ctl_subsample, args.ctl_paired_end, args.nth, args.out_dir)
  File "/mnt/work1/users/hoffmangroup/lhuynh/miniconda3/envs/encode-chip-seq-pipeline/bin/encode_task_spp.py", line 90, in spp
    run_shell_cmd(cmd0)
  File "/mnt/work1/users/hoffmangroup/lhuynh/miniconda3/envs/encode-chip-seq-pipeline/bin/encode_lib_common.py", line 331, in 
[encode_chip_Med1_KD-1671587.log](https://github.com/ENCODE-DCC/chip-seq-pipeline2/files/6193467/encode_chip_Med1_KD-1671587.log)
run_shell_cmd
    raise Exception(err_str)
Exception: PID=17945, PGID=17945, RC=137, DURATION_SEC=13967.0
STDERR=Loading required package: Rcpp
Warning: stack imbalance in 'lapply', 20 then 107
Warning: stack imbalance in 'lapply', 11 then 98
/bin/bash: line 1: 17948 Killed                  Rscript --max-ppsize=500000 $(which run_spp.R) -c=/mnt/work1/users/hoffmangroup/lhuynh/Tools/chip-seq-pipeline2/Linh_Output/chip/b83eb85b-c760-440b-84d5-3d7b5e4cf3ec/call-call_peak_ppr1/inputs/-1843169244/rep-pr1.pooled.tagAlign.gz -i=/mnt/work1/users/hoffmangroup/lhuynh/Tools/chip-seq-pipeline2/Linh_Output/chip/b83eb85b-c760-440b-84d5-3d7b5e4cf3ec/call-call_peak_ppr1/inputs/-333660208/ctl.pooled.tagAlign.gz -npeak=300000 -odir=/mnt/work1/users/hoffmangroup/lhuynh/Tools/chip-seq-pipeline2/Linh_Output/chip/b83eb85b-c760-440b-84d5-3d7b5e4cf3ec/call-call_peak_ppr1/execution -speak=145 -savr=rep-pr1.pooled_x_ctl.pooled.300K.regionPeak.gz.tmp -fdr=0.01 -rf

The text was updated successfully, but these errors were encountered:

leepc12 · 2021-03-24T17:20:12Z

Killed message in STDERR usually means out-of-memory error.
Please define the following in your input JSON and try again. This will double the amount of memory for the failed call_peak job.

{
    "chip.call_peak_spp_mem_factor": 10.0
}

huynhvietlinh · 2021-03-25T17:12:59Z

I doubled the amount of memory as you suggested and the job could run longer (12 hours vs 9.5 hours) but it still stopped at the call peak step.
Should I increase chip.call_peak_spp_mem_factor to 20.0?

2021-03-24 23:28:03,447|caper.cromwell_workflow_monitor|INFO| Task: id=9d74db00-84e2-449d-803b-dd5745e1007a, task=chip.call_peak_ppr1:-1, retry=1, status=Done
2021-03-25 01:50:30,059|caper.cromwell_workflow_monitor|INFO| Task: id=9d74db00-84e2-449d-803b-dd5745e1007a, task=chip.call_peak_pooled:-1, retry=1, status=Done
2021-03-25 01:50:31,293|caper.cromwell_workflow_monitor|INFO| Workflow: id=9d74db00-84e2-449d-803b-dd5745e1007a, status=Failed
2021-03-25 01:50:45,230|caper.cromwell_metadata|INFO| Wrote metadata file. /mnt/work1/users/hoffmangroup/lhuynh/Tools/chip-seq-pipeline2/Linh_workspace/Med1_Ctrl_Error/chip/9d74db00-84e2-449d-803b-dd5745e1007a/metadata.json
2021-03-25 01:50:45,231|caper.cromwell|INFO| Workflow failed. Auto-troubleshooting...
2021-03-25 01:50:45,875|caper.nb_subproc_thread|ERROR| Cromwell failed. returncode=1
2021-03-25 01:50:45,876|caper.cli|ERROR| Check stdout in /mnt/work1/users/hoffmangroup/lhuynh/Tools/chip-seq-pipeline2/Linh_workspace/Med1_Ctrl_Error/cromwell.out
* Started troubleshooting workflow: id=9d74db00-84e2-449d-803b-dd5745e1007a, status=Failed
* Found failures JSON object.
[
    {
        "causedBy": [
            {
                "message": "Job chip.call_peak_pr2:1:2 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.",
                "causedBy": []
            },
            {
                "message": "Job chip.call_peak_pr2:0:2 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.",
                "causedBy": []
            },
            {
                "message": "Job chip.call_peak:1:2 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.",
                "causedBy": []
            },
            {
                "message": "Job chip.call_peak_pr1:0:2 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.",
                "causedBy": []
            },
            {
                "message": "Job chip.call_peak:0:2 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.",
                "causedBy": []
            },
            {
                "message": "Job chip.call_peak_pr1:1:2 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.",
                "causedBy": []
            },
            {
                "message": "Job chip.call_peak_ppr2:NA:2 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.",
                "causedBy": []
            },
            {
                "message": "Job chip.call_peak_ppr1:NA:2 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.",
                "causedBy": []
            },
            {
                "causedBy": [],
                "message": "Job chip.call_peak_pooled:NA:2 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details."
            }
        ],
        "message": "Workflow failed"
    }
]
* Recursively finding failures in calls (tasks)...

==== NAME=chip.call_peak_ppr1, STATUS=RetryableFailure, PARENT=
SHARD_IDX=-1, RC=1, JOB_ID=15174
START=2021-03-24T19:31:58.414Z, END=2021-03-24T23:30:37.757Z
STDOUT=/mnt/work1/users/hoffmangroup/lhuynh/Tools/chip-seq-pipeline2/Linh_workspace/Med1_Ctrl_Error/chip/9d74db00-84e2-449d-803b-dd5745e1007a/call-call_peak_ppr1/execution/stdout
STDERR=/mnt/work1/users/hoffmangroup/lhuynh/Tools/chip-seq-pipeline2/Linh_workspace/Med1_Ctrl_Error/chip/9d74db00-84e2-449d-803b-dd5745e1007a/call-call_peak_ppr1/execution/stderr
STDERR_CONTENTS=
Traceback (most recent call last):
  File "/mnt/work1/users/hoffmangroup/lhuynh/miniconda3/envs/encode-chip-seq-pipeline/bin/encode_task_spp.py", line 133, in <module>
    main()
  File "/mnt/work1/users/hoffmangroup/lhuynh/miniconda3/envs/encode-chip-seq-pipeline/bin/encode_task_spp.py", line 119, in main
    args.ctl_subsample, args.ctl_paired_end, args.nth, args.out_dir)
  File "/mnt/work1/users/hoffmangroup/lhuynh/miniconda3/envs/encode-chip-seq-pipeline/bin/encode_task_spp.py", line 90, in spp
    run_shell_cmd(cmd0)
  File "/mnt/work1/users/hoffmangroup/lhuynh/miniconda3/envs/encode-chip-seq-pipeline/bin/encode_lib_common.py", line 331, in run_shell_cmd
    raise Exception(err_str)
Exception: PID=31395, PGID=31395, RC=137, DURATION_SEC=14121.0
STDERR=Loading required package: Rcpp
Warning: stack imbalance in 'lapply', 20 then 107
Warning: stack imbalance in 'lapply', 11 then 98
/bin/bash: line 1: 31397 Killed                  Rscript --max-ppsize=500000 $(which run_spp.R) -c=/mnt/work1/users/hoffmangroup/lhuynh/Tools/chip-seq-pipeline2/Linh_workspace/Med1_Ctrl_Error/chip/9d74db00-84e2-449d-803b-dd5745e1007a/call-call_peak_ppr1/inputs/-916781023/rep-pr1.pooled.tagAlign.gz -i=/mnt/work1/users/hoffmangroup/lhuynh/Tools/chip-seq-pipeline2/Linh_workspace/Med1_Ctrl_Error/chip/9d74db00-84e2-449d-803b-dd5745e1007a/call-call_peak_ppr1/inputs/592728013/ctl.pooled.tagAlign.gz -npeak=300000 -odir=/mnt/work1/users/hoffmangroup/lhuynh/Tools/chip-seq-pipeline2/Linh_workspace/Med1_Ctrl_Error/chip/9d74db00-84e2-449d-803b-dd5745e1007a/call-call_peak_ppr1/execution -speak=145 -savr=rep-pr1.pooled_x_ctl.pooled.300K.regionPeak.gz.tmp -fdr=0.01 -rf

leepc12 · 2021-03-26T17:10:50Z

It's Killed again. Can you triple it and try again?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fail at chip.call_peak_ppr1 step #219

Fail at chip.call_peak_ppr1 step #219

huynhvietlinh commented Mar 24, 2021 •

edited

Loading

leepc12 commented Mar 24, 2021

huynhvietlinh commented Mar 25, 2021 •

edited

Loading

leepc12 commented Mar 26, 2021

Fail at chip.call_peak_ppr1 step #219

Fail at chip.call_peak_ppr1 step #219

Comments

huynhvietlinh commented Mar 24, 2021 • edited Loading

Describe the bug

OS/Platform

Caper configuration file

Input JSON file

leepc12 commented Mar 24, 2021

huynhvietlinh commented Mar 25, 2021 • edited Loading

leepc12 commented Mar 26, 2021

huynhvietlinh commented Mar 24, 2021 •

edited

Loading

huynhvietlinh commented Mar 25, 2021 •

edited

Loading