Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail at chip.call_peak_ppr1 step #219

Open
huynhvietlinh opened this issue Mar 24, 2021 · 3 comments
Open

Fail at chip.call_peak_ppr1 step #219

huynhvietlinh opened this issue Mar 24, 2021 · 3 comments

Comments

@huynhvietlinh
Copy link

huynhvietlinh commented Mar 24, 2021

encode_chip_Med1_KD-1671587.log

Describe the bug

I installed and run the test successfully (ENCSR000DYI (subsampled 1/25, chr19_chrM only)) but when I run the pipeline on my data it failed at the step call_peak_ppr1

OS/Platform

  • OS/Platform: CentOS 6.3
  • Conda version: 4.9.2
  • Pipeline version: 1.7.1
  • Caper version: 1.4.2

Caper configuration file

Paste contents of ~/.caper/default.conf.

backend=slurm

# define one of the followings (or both) according to your
# cluster's SLURM configuration.
slurm-partition=all
#slurm-account=

# Hashing strategy for call-caching (3 choices)
# This parameter is for local (local/slurm/sge/pbs) backend only.
# This is important for call-caching,
# which means re-using outputs from previous/failed workflows.
# Cache will miss if different strategy is used.
# "file" method has been default for all old versions of Caper<1.0.
# "path+modtime" is a new default for Caper>=1.0,
#   file: use md5sum hash (slow).
#   path: use path.
#   path+modtime: use path and modification time.
local-hash-strat=path+modtime

# Local directory for localized files and Cromwell's intermediate files
# If not defined, Caper will make .caper_tmp/ on local-out-dir or CWD.
# /tmp is not recommended here since Caper store all localized data files
# on this directory (e.g. input FASTQs defined as URLs in input JSON).
local-loc-dir=/mnt/work1/users/hoffmangroup/lhuynh/Tools/chip-seq-pipeline2/tmp

cromwell=/mnt/work1/users/home/linhh/.caper/cromwell_jar/cromwell-52.jar
womtool=/mnt/work1/users/home/linhh/.caper/womtool_jar/womtool-52.jar

Input JSON file

Paste contents of your input JSON file.

{
    "chip.pipeline_type" : "histone",
    "chip.genome_tsv" : "/mnt/work1/users/hoffmangroup/lhuynh/ChromatinHub/data/2021_03_22_raw_ChIP_seq/mm9.tsv",
    "chip.fastqs_rep1_R1" : ["/mnt/work1/users/hoffmangroup/lhuynh/ChromatinHub/data/2021_03_22_raw_ChIP_seq/Med1_ChIPseq_siCtrl_Rep1.fastq"],
    "chip.fastqs_rep2_R1" : ["/mnt/work1/users/hoffmangroup/lhuynh/ChromatinHub/data/2021_03_22_raw_ChIP_seq/Med1_ChIPseq_siCtrl_Rep2.fastq"],
    "chip.ctl_fastqs_rep1_R1" : ["/mnt/work1/users/hoffmangroup/lhuynh/ChromatinHub/data/2021_03_22_raw_ChIP_seq/Input_ChIPseq_siCtrl_Rep1.fastq"],
    "chip.ctl_fastqs_rep2_R1" : ["/mnt/work1/users/hoffmangroup/lhuynh/ChromatinHub/data/2021_03_22_raw_ChIP_seq/Input_ChIPseq_siCtrl_Rep2.fastq"],
    "chip.paired_end" : false,
    "chip.title" : "Med1Ctrl_mm9",
    "chip.description" : "Med1 ChIP-seq of Ctrl mESC mm9"
}

And here is what I got from slurm log file (the full log file is attached)

==== NAME=chip.call_peak_ppr1, STATUS=RetryableFailure, PARENT=
SHARD_IDX=-1, RC=1, JOB_ID=32045
START=2021-03-23T04:29:16.227Z, END=2021-03-23T08:25:58.647Z
STDOUT=/mnt/work1/users/hoffmangroup/lhuynh/Tools/chip-seq-pipeline2/Linh_Output/chip/b83eb85b-c760-440b-84d5-3d7b5e4cf3ec/call-call_peak_ppr1/execution/stdout
STDERR=/mnt/work1/users/hoffmangroup/lhuynh/Tools/chip-seq-pipeline2/Linh_Output/chip/b83eb85b-c760-440b-84d5-3d7b5e4cf3ec/call-call_peak_ppr1/execution/stderr
STDERR_CONTENTS=
Traceback (most recent call last):
  File "/mnt/work1/users/hoffmangroup/lhuynh/miniconda3/envs/encode-chip-seq-pipeline/bin/encode_task_spp.py", line 133, in <module>
    main()
  File "/mnt/work1/users/hoffmangroup/lhuynh/miniconda3/envs/encode-chip-seq-pipeline/bin/encode_task_spp.py", line 119, in main
    args.ctl_subsample, args.ctl_paired_end, args.nth, args.out_dir)
  File "/mnt/work1/users/hoffmangroup/lhuynh/miniconda3/envs/encode-chip-seq-pipeline/bin/encode_task_spp.py", line 90, in spp
    run_shell_cmd(cmd0)
  File "/mnt/work1/users/hoffmangroup/lhuynh/miniconda3/envs/encode-chip-seq-pipeline/bin/encode_lib_common.py", line 331, in 
[encode_chip_Med1_KD-1671587.log](https://github.com/ENCODE-DCC/chip-seq-pipeline2/files/6193467/encode_chip_Med1_KD-1671587.log)
run_shell_cmd
    raise Exception(err_str)
Exception: PID=17945, PGID=17945, RC=137, DURATION_SEC=13967.0
STDERR=Loading required package: Rcpp
Warning: stack imbalance in 'lapply', 20 then 107
Warning: stack imbalance in 'lapply', 11 then 98
/bin/bash: line 1: 17948 Killed                  Rscript --max-ppsize=500000 $(which run_spp.R) -c=/mnt/work1/users/hoffmangroup/lhuynh/Tools/chip-seq-pipeline2/Linh_Output/chip/b83eb85b-c760-440b-84d5-3d7b5e4cf3ec/call-call_peak_ppr1/inputs/-1843169244/rep-pr1.pooled.tagAlign.gz -i=/mnt/work1/users/hoffmangroup/lhuynh/Tools/chip-seq-pipeline2/Linh_Output/chip/b83eb85b-c760-440b-84d5-3d7b5e4cf3ec/call-call_peak_ppr1/inputs/-333660208/ctl.pooled.tagAlign.gz -npeak=300000 -odir=/mnt/work1/users/hoffmangroup/lhuynh/Tools/chip-seq-pipeline2/Linh_Output/chip/b83eb85b-c760-440b-84d5-3d7b5e4cf3ec/call-call_peak_ppr1/execution -speak=145 -savr=rep-pr1.pooled_x_ctl.pooled.300K.regionPeak.gz.tmp -fdr=0.01 -rf
@leepc12
Copy link
Contributor

leepc12 commented Mar 24, 2021

Killed message in STDERR usually means out-of-memory error.
Please define the following in your input JSON and try again. This will double the amount of memory for the failed call_peak job.

{
    "chip.call_peak_spp_mem_factor": 10.0
}

@huynhvietlinh
Copy link
Author

huynhvietlinh commented Mar 25, 2021

I doubled the amount of memory as you suggested and the job could run longer (12 hours vs 9.5 hours) but it still stopped at the call peak step.
Should I increase chip.call_peak_spp_mem_factor to 20.0?

2021-03-24 23:28:03,447|caper.cromwell_workflow_monitor|INFO| Task: id=9d74db00-84e2-449d-803b-dd5745e1007a, task=chip.call_peak_ppr1:-1, retry=1, status=Done
2021-03-25 01:50:30,059|caper.cromwell_workflow_monitor|INFO| Task: id=9d74db00-84e2-449d-803b-dd5745e1007a, task=chip.call_peak_pooled:-1, retry=1, status=Done
2021-03-25 01:50:31,293|caper.cromwell_workflow_monitor|INFO| Workflow: id=9d74db00-84e2-449d-803b-dd5745e1007a, status=Failed
2021-03-25 01:50:45,230|caper.cromwell_metadata|INFO| Wrote metadata file. /mnt/work1/users/hoffmangroup/lhuynh/Tools/chip-seq-pipeline2/Linh_workspace/Med1_Ctrl_Error/chip/9d74db00-84e2-449d-803b-dd5745e1007a/metadata.json
2021-03-25 01:50:45,231|caper.cromwell|INFO| Workflow failed. Auto-troubleshooting...
2021-03-25 01:50:45,875|caper.nb_subproc_thread|ERROR| Cromwell failed. returncode=1
2021-03-25 01:50:45,876|caper.cli|ERROR| Check stdout in /mnt/work1/users/hoffmangroup/lhuynh/Tools/chip-seq-pipeline2/Linh_workspace/Med1_Ctrl_Error/cromwell.out
* Started troubleshooting workflow: id=9d74db00-84e2-449d-803b-dd5745e1007a, status=Failed
* Found failures JSON object.
[
    {
        "causedBy": [
            {
                "message": "Job chip.call_peak_pr2:1:2 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.",
                "causedBy": []
            },
            {
                "message": "Job chip.call_peak_pr2:0:2 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.",
                "causedBy": []
            },
            {
                "message": "Job chip.call_peak:1:2 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.",
                "causedBy": []
            },
            {
                "message": "Job chip.call_peak_pr1:0:2 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.",
                "causedBy": []
            },
            {
                "message": "Job chip.call_peak:0:2 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.",
                "causedBy": []
            },
            {
                "message": "Job chip.call_peak_pr1:1:2 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.",
                "causedBy": []
            },
            {
                "message": "Job chip.call_peak_ppr2:NA:2 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.",
                "causedBy": []
            },
            {
                "message": "Job chip.call_peak_ppr1:NA:2 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.",
                "causedBy": []
            },
            {
                "causedBy": [],
                "message": "Job chip.call_peak_pooled:NA:2 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details."
            }
        ],
        "message": "Workflow failed"
    }
]
* Recursively finding failures in calls (tasks)...

==== NAME=chip.call_peak_ppr1, STATUS=RetryableFailure, PARENT=
SHARD_IDX=-1, RC=1, JOB_ID=15174
START=2021-03-24T19:31:58.414Z, END=2021-03-24T23:30:37.757Z
STDOUT=/mnt/work1/users/hoffmangroup/lhuynh/Tools/chip-seq-pipeline2/Linh_workspace/Med1_Ctrl_Error/chip/9d74db00-84e2-449d-803b-dd5745e1007a/call-call_peak_ppr1/execution/stdout
STDERR=/mnt/work1/users/hoffmangroup/lhuynh/Tools/chip-seq-pipeline2/Linh_workspace/Med1_Ctrl_Error/chip/9d74db00-84e2-449d-803b-dd5745e1007a/call-call_peak_ppr1/execution/stderr
STDERR_CONTENTS=
Traceback (most recent call last):
  File "/mnt/work1/users/hoffmangroup/lhuynh/miniconda3/envs/encode-chip-seq-pipeline/bin/encode_task_spp.py", line 133, in <module>
    main()
  File "/mnt/work1/users/hoffmangroup/lhuynh/miniconda3/envs/encode-chip-seq-pipeline/bin/encode_task_spp.py", line 119, in main
    args.ctl_subsample, args.ctl_paired_end, args.nth, args.out_dir)
  File "/mnt/work1/users/hoffmangroup/lhuynh/miniconda3/envs/encode-chip-seq-pipeline/bin/encode_task_spp.py", line 90, in spp
    run_shell_cmd(cmd0)
  File "/mnt/work1/users/hoffmangroup/lhuynh/miniconda3/envs/encode-chip-seq-pipeline/bin/encode_lib_common.py", line 331, in run_shell_cmd
    raise Exception(err_str)
Exception: PID=31395, PGID=31395, RC=137, DURATION_SEC=14121.0
STDERR=Loading required package: Rcpp
Warning: stack imbalance in 'lapply', 20 then 107
Warning: stack imbalance in 'lapply', 11 then 98
/bin/bash: line 1: 31397 Killed                  Rscript --max-ppsize=500000 $(which run_spp.R) -c=/mnt/work1/users/hoffmangroup/lhuynh/Tools/chip-seq-pipeline2/Linh_workspace/Med1_Ctrl_Error/chip/9d74db00-84e2-449d-803b-dd5745e1007a/call-call_peak_ppr1/inputs/-916781023/rep-pr1.pooled.tagAlign.gz -i=/mnt/work1/users/hoffmangroup/lhuynh/Tools/chip-seq-pipeline2/Linh_workspace/Med1_Ctrl_Error/chip/9d74db00-84e2-449d-803b-dd5745e1007a/call-call_peak_ppr1/inputs/592728013/ctl.pooled.tagAlign.gz -npeak=300000 -odir=/mnt/work1/users/hoffmangroup/lhuynh/Tools/chip-seq-pipeline2/Linh_workspace/Med1_Ctrl_Error/chip/9d74db00-84e2-449d-803b-dd5745e1007a/call-call_peak_ppr1/execution -speak=145 -savr=rep-pr1.pooled_x_ctl.pooled.300K.regionPeak.gz.tmp -fdr=0.01 -rf

@leepc12
Copy link
Contributor

leepc12 commented Mar 26, 2021

It's Killed again. Can you triple it and try again?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants