Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defaulted to excluding duplicate and QC failing reads from pileup. #42

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

tfenne
Copy link
Member

@tfenne tfenne commented Jul 16, 2024

No description provided.

@tfenne tfenne requested review from msto, nh13 and ameynert July 16, 2024 19:28
Copy link
Member

@nh13 nh13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ameynert
Copy link

ameynert commented Jul 16, 2024

I ran the new version against the example data sent internally.

Lines output:

  • samtools markdup marking duplicates only, prior to this change: 2030
  • samtools markdup marking duplicates only, after to this change: 588
  • samtools markdup | samtools view -F 0x400, both prior to and after this change: 550

When I took a look at what the differences were, I noticed the read support reported appears to include the duplicate reads. In one example, the hard-filtered version reported 37 total reads and the soft-marked 129.

If the user wants to exclude duplicates, then only the unmarked reads should be counted.

Copy link

@ameynert ameynert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read support reporting shouldn't count the duplicate-marked reads.

@msto
Copy link
Contributor

msto commented Jul 17, 2024

I'm also surprised that the number of pileups changes when duplicate reads are excluded - given that pileups are reported with any number of supporting reads, I would expect the number of pileups to be constant while the number of supporting reads per pileup may decrease

@@ -4,7 +4,7 @@ title: fgsv tools

# fgsv tools

The following tools are available in fgsv version 0.2.0-d603e95.
The following tools are available in fgsv version 0.2.0-5ce8bc6.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please bump the major version, as excluding duplicates (and qc fails) by default willl be a breaking change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SvPileup should ignore reads marked as duplicates
4 participants