Peak calling parameter chip.fdr_thresh never changes from default 0.01 #204

alexadowdell · 2020-12-11T19:31:03Z

Describe the bug

I'm running the transcription factor chipseq pipeline. I have one sample out of 16 that failed the pipeline. I'm trying to determine whether the sample may be poor or whether I can recover some peaks by loosening the fdr threshold from the default 0.01 it was originally run at. The pipeline failed with the following error:
Exception: File is empty (20200622_Chip_H1_S8_L001_R1_001.merged.nodup.pr2_x_20200622_Chip_G1_S7_L001_R1_001.merged.nodup.300K.regionPeak.gz). Help: No peaks found. FDR threshold (fdr_thresh in your input JSON) might be too stringent or poor quality sample?

I went back into my input json and added a chip.fdr_thresh of 0.05 and re-ran the pipeline. I received the exact same results. I re-ran again with fdr of 0.2 in an attempt to sanity check and received the same results. For the other samples that successfully made it through the pipeline the html output at fdr of 0.05 and 0.2 never changes, the number of peaks and everything remains the same as what originally was called at the default fdr level. Along those same lines the number of raw peaks called (capped at 300000) says "at an fdr of 0.01" in the html in every case, even when I specifically changed the fdr parameter in the input json. I then went into the metadata.json and grepped for fdr_thresh and confirmed the fdr threshold was the value I passed in the input json but the results are always at a fdr of 0.01 regardless of what the input json and metadata.json has. Down below in the troubleshooting section is the output from grepping the metadata.json for "fdr_thresh".

OS/Platform

OS/Platform: Ubuntu 16.04
Conda version: conda 4.8.3
Pipeline version: [e.g. v1.6.0]
Caper version: [e.g. v1.2.0]

Caper configuration file

Paste contents of ~/.caper/default.conf.

backend=local

# Hashing strategy for call-caching (3 choices)
# This parameter is for local (local/slurm/sge/pbs) backend only.
# This is important for call-caching,
# which means re-using outputs from previous/failed workflows.
# Cache will miss if different strategy is used.
# "file" method has been default for all old versions of Caper<1.0.
# "path+modtime" is a new default for Caper>=1.0,
#   file: use md5sum hash (slow).
#   path: use path.
#   path+modtime: use path and modification time.
local-hash-strat=path+modtime

# Local directory for localized files and Cromwell's intermediate files
# If not defined, Caper will make .caper_tmp/ on local-out-dir or CWD.
# /tmp is not recommended here since Caper store all localized data files
# on this directory (e.g. input FASTQs defined as URLs in input JSON).
local-loc-dir=/home/ubuntu/20200730_KRISTINA_CHIPSEQ/tmp-caper-cache

cromwell=/home/ubuntu/.caper/cromwell_jar/cromwell-52.jar
womtool=/home/ubuntu/.caper/womtool_jar/womtool-52.jar

Input JSON file

Paste contents of your input JSON file.

{
    "chip.title" : "Kristina CHIPSeq (paired-end) H1 vs G1",
    "chip.description" : "Chip H1 vs Chip G1 as control",

    "chip.pipeline_type" : "tf",
    "chip.aligner" : "bowtie2",
    "chip.align_only" : false,
    "chip.true_rep_only" : false,

    "chip.genome_tsv" : "/home/ubuntu/20200730_KRISTINA_CHIPSEQ/software/chip-seq-pipeline2/mm10.tsv",

    "chip.paired_end" : true,
    "chip.ctl_paired_end" : true,

    "chip.always_use_pooled_ctl" : true,
    "chip.fdr_thresh" : 0.05,

    "chip.fastqs_rep1_R1" : [ "/home/ubuntu/20200730_KRISTINA_CHIPSEQ/FastQs/Young-ChIP-Seq/Chip_H1/20200622_Chip_H1_S8_L001_R1_001.fastq.gz", "/home/ubuntu/20200730_KRISTINA_CHIPSEQ/FastQs/Young-ChIP-Seq/Chip_H1/20200622_Chip_H1_S8_L002_R1_001.fastq.gz" ],
    "chip.fastqs_rep1_R2" : [ "/home/ubuntu/20200730_KRISTINA_CHIPSEQ/FastQs/Young-ChIP-Seq/Chip_H1/20200622_Chip_H1_S8_L001_R2_001.fastq.gz", "/home/ubuntu/20200730_KRISTINA_CHIPSEQ/FastQs/Young-ChIP-Seq/Chip_H1/20200622_Chip_H1_S8_L002_R2_001.fastq.gz" ],
    
    "chip.ctl_fastqs_rep1_R1" : [ "/home/ubuntu/20200730_KRISTINA_CHIPSEQ/FastQs/Young-ChIP-Seq/Chip_G1/20200622_Chip_G1_S7_L001_R1_001.fastq.gz", "/home/ubuntu/20200730_KRISTINA_CHIPSEQ/FastQs/Young-ChIP-Seq/Chip_G1/20200622_Chip_G1_S7_L002_R1_001.fastq.gz" ],
    "chip.ctl_fastqs_rep1_R2" : [ "/home/ubuntu/20200730_KRISTINA_CHIPSEQ/FastQs/Young-ChIP-Seq/Chip_G1/20200622_Chip_G1_S7_L001_R2_001.fastq.gz", "/home/ubuntu/20200730_KRISTINA_CHIPSEQ/FastQs/Young-ChIP-Seq/Chip_G1/20200622_Chip_G1_S7_L002_R2_001.fastq.gz" ]
        
}

Troubleshooting result

If you ran caper run without Caper server then Caper automatically runs a troubleshooter for failed workflows. Find troubleshooting result in the bottom of Caper's screen log.

If you ran caper submit with a running Caper server then first find your workflow ID (1st column) with caper list and run caper debug [WORKFLOW_ID].

Paste troubleshooting result.

Since the pipeline failing isn't exactly the problem right now, below are contents of metadata.json confirming changed fdr_thresh value.

"inputs": "{\n    \"chip.title\" : \"Kristina CHIPSeq (paired-end) H1 vs G1\",\n    \"chip.description\" : \"Chip H1 vs Chip G1 as control\",\n\n    \"chip.pipeline_type\" : \"tf\",\n    \"chip.aligner\" : \"bowtie2\",\n    \"chip.align_only\" : false,\n    \"chip.true_rep_only\" : false,\n\n    \"chip.genome_tsv\" : \"/home/ubuntu/20200730_KRISTINA_CHIPSEQ/software/chip-seq-pipeline2/mm10.tsv\",\n\n    \"chip.paired_end\" : true,\n    \"chip.ctl_paired_end\" : true,\n\n    \"chip.always_use_pooled_ctl\" : true,\n    \"chip.fdr_thresh\" : 0.05,\n\n    \"chip.fastqs_rep1_R1\" : [ \"/home/ubuntu/20200730_KRISTINA_CHIPSEQ/FastQs/Young-ChIP-Seq/Chip_H1/20200622_Chip_H1_S8_L001_R1_001.fastq.gz\", \"/home/ubuntu/20200730_KRISTINA_CHIPSEQ/FastQs/Young-ChIP-Seq/Chip_H1/20200622_Chip_H1_S8_L002_R1_001.fastq.gz\" ],\n    \"chip.fastqs_rep1_R2\" : [ \"/home/ubuntu/20200730_KRISTINA_CHIPSEQ/FastQs/Young-ChIP-Seq/Chip_H1/20200622_Chip_H1_S8_L001_R2_001.fastq.gz\", \"/home/ubuntu/20200730_KRISTINA_CHIPSEQ/FastQs/Young-ChIP-Seq/Chip_H1/20200622_Chip_H1_S8_L002_R2_001.fastq.gz\" ],\n    \n    \"chip.ctl_fastqs_rep1_R1\" : [ \"/home/ubuntu/20200730_KRISTINA_CHIPSEQ/FastQs/Young-ChIP-Seq/Chip_G1/20200622_Chip_G1_S7_L001_R1_001.fastq.gz\", \"/home/ubuntu/20200730_KRISTINA_CHIPSEQ/FastQs/Young-ChIP-Seq/Chip_G1/20200622_Chip_G1_S7_L002_R1_001.fastq.gz\" ],\n    \"chip.ctl_fastqs_rep1_R2\" : [ \"/home/ubuntu/20200730_KRISTINA_CHIPSEQ/FastQs/Young-ChIP-Seq/Chip_G1/20200622_Chip_G1_S7_L001_R2_001.fastq.gz\", \"/home/ubuntu/20200730_KRISTINA_CHIPSEQ/FastQs/Young-ChIP-Seq/Chip_G1/20200622_Chip_G1_S7_L002_R2_001.fastq.gz\" ]\n    \n}\n",
                            "Float fdr_thresh": "B14399CBAAC6DA4B5B733B483106383F",
                    "fdr_thresh": 0.05,
                            "Float fdr_thresh": "B14399CBAAC6DA4B5B733B483106383F",
                    "fdr_thresh": 0.05,
                            "Float fdr_thresh": "B14399CBAAC6DA4B5B733B483106383F",
                    "fdr_thresh": 0.05,
                            "Float fdr_thresh": "B14399CBAAC6DA4B5B733B483106383F",
                    "fdr_thresh": 0.05,
        "fdr_thresh": 0.05,

The text was updated successfully, but these errors were encountered:

ambeys · 2020-12-15T17:47:49Z

Exactly the same issue I am facing with. "fdr_thresh" value never changes.

akundaje · 2020-12-16T00:57:48Z

We'll fix this very soon

…

On Tue, Dec 15, 2020, 9:48 AM ambeys ***@***.***> wrote: Exactly the same issue I am facing with. "fdr_thresh" value never changes. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#204 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABDWEL3745WXJEWZHYQIKDSU6ONLANCNFSM4UXEARWQ> .

leepc12 · 2021-01-07T21:24:25Z

@alexadowdell: Sorry about late response. I would like to check if chip.fdr_thresh defined in the input JSON is correctly passed to the SPP app run_spp.R. Can you post a full metadata.json file here?

alexadowdell · 2021-01-07T21:41:38Z

@leepc12: Thanks for working on it! I've attached the metadata.json with a txt extension since Github doesn't allow JSON file attachments.
metadata.json.txt

leepc12 · 2021-01-07T22:22:17Z

I checked that fdr_thresh as 0.05 was correctly passed to the WDL task level. But still need to check if it's passed to the R app run_spp.R.

Can you also post /home/ubuntu/20200730_KRISTINA_CHIPSEQ/ENCODE_pipeline/chip/48b7b1c3-fd32-4b62-a287-c976e200f5f8/call-call_peak/shard-0/execution/stdout?

I would like to check if run_spp.R ... -fdr=0.05 is there in the file.

alexadowdell · 2021-01-07T22:35:42Z

@leepc12 Please see attached stdout file

stdout.txt

leepc12 · 2021-01-07T22:44:42Z

I found this in the script.

Rscript --max-ppsize=500000 $(which run_spp.R) ... -fdr=0.05

Also this one in run_spp.R's log.

...
exclusion(max): NaN 
num parallel nodes: NA 
FDR threshold: 0.05 
NumPeaks Threshold: 3e+05 
Output Directory: /home/ubuntu/20200730_KRISTINA_CHIPSEQ/ENCODE_pipeline/chip/48b7b1c3-fd32-4b62-a287-c976e200f5f8/call-call_peak/shard-0/execution 
...

So it looks like run_spp.R took in fdr_thresh as 0.05 correctly, but it ignored the threshold somehow or your peaks didn't even hit the relaxed threshold 0.05.

Can you try with more relaxed threshold and also check the quality of your samples in the alignment level? e.g. mapping rate, number of reads after filtering.

akundaje · 2021-01-07T22:55:21Z

If numpeaks_threshold is specified, the FDR parameter gets ignored. If you want to use a specific FDR then don't set the num_peaks parameter.

…

-Anshul.

On Thu, Jan 7, 2021 at 2:44 PM Jin Lee ***@***.***> wrote: I found this in the script. Rscript --max-ppsize=500000 $(which run_spp.R) ... -fdr=0.05 Also this one in run_spp.R's log. ... exclusion(max): NaN num parallel nodes: NA FDR threshold: 0.05 NumPeaks Threshold: 3e+05 Output Directory: /home/ubuntu/20200730_KRISTINA_CHIPSEQ/ENCODE_pipeline/chip/48b7b1c3-fd32-4b62-a287-c976e200f5f8/call-call_peak/shard-0/execution ... So it looks like run_spp.R took in fdr_thresh as 0.05 correctly, but it ignored the threshold somehow or your peaks didn't even hit the relaxed threshold 0.05. Can you try with more relaxed threshold and also check the quality of your samples in the alignment level? e.g. mapping rate, number of reads after filtering. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#204 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABDWEKBKI635KE7PB3MOXLSYY2ORANCNFSM4UXEARWQ> .

akundaje · 2021-01-07T22:57:59Z

To be more clear, num_peaks overrides the FDR parameter. If num_peaks is set, the FDR gets set to 0.99 to a complete ranked list of bins and then upto num_peaks of the top ranked peaks are used. If you set the FDR and just don't set num_peaks, then it will just use the FDR to get whatever peaks pass the FDR. The SPP FDR is not very stable or easy to set parameter. For small changes in FDR, you can get massive changes in peaks. Or for large changes in FDR you can get no changes in peaks. This is why we usually just call as many peaks as possible, take the top 300K and pass it through IDR to adaptively learn the thresholds.

…

-Anshul.

On Thu, Jan 7, 2021 at 2:55 PM Anshul Kundaje ***@***.***> wrote: If numpeaks_threshold is specified, the FDR parameter gets ignored. If you want to use a specific FDR then don't set the num_peaks parameter. -Anshul. On Thu, Jan 7, 2021 at 2:44 PM Jin Lee ***@***.***> wrote: > I found this in the script. > > Rscript --max-ppsize=500000 $(which run_spp.R) ... -fdr=0.05 > > Also this one in run_spp.R's log. > > ... > exclusion(max): NaN > num parallel nodes: NA > FDR threshold: 0.05 > NumPeaks Threshold: 3e+05 > Output Directory: /home/ubuntu/20200730_KRISTINA_CHIPSEQ/ENCODE_pipeline/chip/48b7b1c3-fd32-4b62-a287-c976e200f5f8/call-call_peak/shard-0/execution > ... > > So it looks like run_spp.R took in fdr_thresh as 0.05 correctly, but it > ignored the threshold somehow or your peaks didn't even hit the relaxed > threshold 0.05. > > Can you try with more relaxed threshold and also check the quality of > your samples in the alignment level? e.g. mapping rate, number of reads > after filtering. > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#204 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AABDWEKBKI635KE7PB3MOXLSYY2ORANCNFSM4UXEARWQ> > . >

alexadowdell · 2021-01-07T22:58:02Z

It looks like the numpeaks_threshold is set by default to 3e+05. It's not a parameter specifically passed in the input json. Is there a way to not set the num_peaks parameter?

akundaje · 2021-01-07T23:01:17Z

The run_spp.R script can take it either parameter or both. Jin - you may need to adjust the way num_peaks is being set to allow for direct use of the FDR parameter. If the user specifies FDR explicitly in the JSON then num_peaks should not be used when calling run_spp.R

…

-Anshul.

On Thu, Jan 7, 2021 at 2:58 PM Alexa Dowdell ***@***.***> wrote: It looks like the numpeaks_threshold is set by default to 3e+05. It's not a parameter specifically passed in the input json. Is there a way to not set the num_peaks parameter? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#204 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABDWENH4EFOQ3TXH7JE26LSYY4ARANCNFSM4UXEARWQ> .

leepc12 · 2021-01-07T23:01:23Z

@alexadowdell: Currently there is no way to unset numpeaks_threshold (run_spp.R -npeak=) in the pipeline. I will fix this in the next release so that chip.fdr_thresh is defined in an input JSON then chip.cap_num_peak_spp is not passed to run_spp.R.

olechnwin · 2023-10-17T19:58:38Z

I am trying to change chip.fdr_thresh as well. I'm guessing this is still not working?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Peak calling parameter chip.fdr_thresh never changes from default 0.01 #204

Peak calling parameter chip.fdr_thresh never changes from default 0.01 #204

alexadowdell commented Dec 11, 2020

ambeys commented Dec 15, 2020

akundaje commented Dec 16, 2020 via email

leepc12 commented Jan 7, 2021

alexadowdell commented Jan 7, 2021 •

edited

Loading

leepc12 commented Jan 7, 2021

alexadowdell commented Jan 7, 2021

leepc12 commented Jan 7, 2021

akundaje commented Jan 7, 2021 via email

akundaje commented Jan 7, 2021 via email

alexadowdell commented Jan 7, 2021

akundaje commented Jan 7, 2021 via email

leepc12 commented Jan 7, 2021

olechnwin commented Oct 17, 2023

Peak calling parameter chip.fdr_thresh never changes from default 0.01 #204

Peak calling parameter chip.fdr_thresh never changes from default 0.01 #204

Comments

alexadowdell commented Dec 11, 2020

Describe the bug

OS/Platform

Caper configuration file

Input JSON file

Troubleshooting result

ambeys commented Dec 15, 2020

akundaje commented Dec 16, 2020 via email

leepc12 commented Jan 7, 2021

alexadowdell commented Jan 7, 2021 • edited Loading

leepc12 commented Jan 7, 2021

alexadowdell commented Jan 7, 2021

leepc12 commented Jan 7, 2021

akundaje commented Jan 7, 2021 via email

akundaje commented Jan 7, 2021 via email

alexadowdell commented Jan 7, 2021

akundaje commented Jan 7, 2021 via email

leepc12 commented Jan 7, 2021

olechnwin commented Oct 17, 2023

alexadowdell commented Jan 7, 2021 •

edited

Loading