Remove redundant BAM file open in paired mode #32

siddharthab · 2024-09-04T06:23:57Z

Fixes #31.

Opening a BAM file is an expensive operation as the index needs to be
fully read. In paired reads mode, at every contig change, the file was
being opened again to iterate over all reads from the previous contig.
This is usually not an issue for genome alignments, but transcriptome
alignments may have ~100k contigs, which makes this an expensive
operation.

Ideally, the two-pass mode should not have to read the file again, and
instead just maintain a rolling window of reads in memory.

Opening a BAM file is an expensive operation as the index needs to be fully read. In paired reads mode, at every contig change, the file was being opened again to iterate over all reads from the previous contig. This is usually not an issue for genome alignments, but transcriptome alignments may have ~100k contigs, which makes this an expensive operation. Ideally, the two-pass mode should not have to read the file again, and instead just maintain a rolling window of reads in memory.

siddharthab · 2024-09-04T07:08:31Z

With this change, the test case in the linked issue takes 14 minutes now instead of 6.3 hours.

siddharthab · 2024-09-16T22:43:05Z

@Daniel-Liu-c0deb0t Can you please accept this PR?

MatthiasZepper · 2024-09-19T12:48:55Z

I just wanted to express explicit support for this proposal!

While I am not familiar with the implementation details, I think, it is a very important fix. Transcriptomic alignments or draft genome assemblies typically have numerous contigs and if this fix streamlines the deduplication of those input files so dramatically, I would love to see it merged!

MatthiasZepper mentioned this pull request Oct 2, 2024

RC 3.16.0 nf-core/rnaseq#1395

Merged

11 tasks

MatthiasZepper mentioned this pull request Oct 18, 2024

Update Version of UMICollapse to 1.1.0 nf-core/modules#6805

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove redundant BAM file open in paired mode #32

Remove redundant BAM file open in paired mode #32

siddharthab commented Sep 4, 2024 •

edited

Loading

siddharthab commented Sep 4, 2024 •

edited

Loading

siddharthab commented Sep 16, 2024

MatthiasZepper commented Sep 19, 2024

Remove redundant BAM file open in paired mode #32

Are you sure you want to change the base?

Remove redundant BAM file open in paired mode #32

Conversation

siddharthab commented Sep 4, 2024 • edited Loading

siddharthab commented Sep 4, 2024 • edited Loading

siddharthab commented Sep 16, 2024

MatthiasZepper commented Sep 19, 2024

siddharthab commented Sep 4, 2024 •

edited

Loading

siddharthab commented Sep 4, 2024 •

edited

Loading