Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: cutadapt adapter trimming with default 3 bases overlap instead of 6 - bug in Snakemake templates #21

Open
mbatiuk opened this issue Apr 28, 2024 · 0 comments

Comments

@mbatiuk
Copy link

mbatiuk commented Apr 28, 2024

cutadapt command in snakemake templates (for example cemba_data/mapping/Snakefile_template/mc.Snakefile) has a bug

Trim reads

rule trim_r1:
input:
"fastq/{cell_id}-R1.fq.gz"
output:
fq=temp("fastq/{cell_id}-R1.trimmed.fq.gz"),
stats=temp("fastq/{cell_id}-R1.trimmed.stats.tsv")
threads:
2
shell:
"cutadapt --report=minimal -a {r1_adapter} {input} 2> {output.stats} | "
"cutadapt --report=minimal -O 6 -q 20 -u {r1_left_cut} -u -{r1_right_cut} -m 30 "
"-o {output.fq} - >> {output.stats}"

rule trim_r2:
input:
"fastq/{cell_id}-R2.fq.gz"
output:
fq=temp("fastq/{cell_id}-R2.trimmed.fq.gz"),
stats=temp("fastq/{cell_id}-R2.trimmed.stats.tsv")
threads:
2
shell:
"cutadapt --report=minimal -a {r2_adapter} {input} 2> {output.stats} | "
"cutadapt --report=minimal -O 6 -q 20 -u {r2_left_cut} -u -{r2_right_cut} -m 30 "
"-o {output.fq} - >> {output.stats}"

Adaptor trimming is defined here:

"cutadapt --report=minimal -a {r1_adapter} {input} 2> {output.stats} | "

then output is piped to another instance of cutadapt:

"cutadapt --report=minimal -O 6 -q 20 -u {r2_left_cut} -u -{r2_right_cut} -m 30 "

that has -O 6 option for specifying minimal overlap of adaptor and read

But -O option is not taken into account since adaptor trimming happened already at the first execution of cutadapt

Thus reads are not trimmed from adaptor if there is 6 bases overlap but only if there is 3 bases overlap

Broad institute version of this pipeline (https://broadinstitute.github.io/warp/docs/Pipelines/snm3C/README#snm3c-installation) does not have this problem:

      start=$(date +%s)
      echo "Run cutadapt"
      /opt/conda/bin/cutadapt \
      -a R1Adapter=~{r1_adapter} \
      -A R2Adapter=~{r2_adapter} \
      --report=minimal -O 6 -q 20 \
      -u ~{r1_left_cut} \
      -u -~{r1_right_cut} \
      -U ~{r2_left_cut} \
      -U -~{r2_right_cut} \
      -Z \
      -m ${min_read_length}:${min_read_length} \
      --pair-filter 'both' \
      -o ${sample_id}-R1_trimmed.fq.gz \
      -p ${sample_id}-R2_trimmed.fq.gz \
      ${sample_id}-R1_sorted.fq ${sample_id}-R2_sorted.fq \
      > ${sample_id}.trimmed.stats.txt
      end=$(date +%s) 
      elapsed=$((end - start)) 
      echo "Elapsed time to run cutadapt: $elapsed seconds"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant