Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Fix - SamRecordClipper.numBasesExtendingPastMate #937

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

JoeVieira
Copy link
Contributor

@JoeVieira JoeVieira commented Sep 20, 2023

@nh13 @tfenne

When operating on SamRecords with multiple serialized operations, use of mate cigar record information is relied upon for clipping past mate end, but it is not synchronized between operations with it's corresponding SamRecord object's data.

This results in an inconsistent data model for clipping.

With previous code this test case would fail.

it should "clip FR reads that extend past mate in asymmetrical read" in {
    val builder = new SamBuilder(readLength = 200, sort = Some(Queryname))
    val clipper = new ClipBam(input = dummyBam, output = dummyBam, ref = ref, clipBasesPastMate = true,
      readTwoFivePrime = 50, readOneFivePrime = 10)
    val Seq(r1, r2) = builder.addPair(start1 = 100, start2 = 90)

    r1.end shouldBe 299
    r2.end shouldBe 289
    clipper.clipPair(r1, r2)
    r2.end shouldBe r2.end
    r1.start shouldBe r2.start
  }

It would erroneously calculate r2.start == 100 & r1.start != r2.start, because the mate.start data from the MC tag was used, which was not updated with the 5' clipping from the first operation.

Math.max(0, rec.readPosAtRefPos(pos=rec.mateStart, returnLastBaseIfDeleted=false) - 1)

This reliance on a possibly dirty MC tag is the cause of this #878 inconsistent behavior. I believe this fix, makes this issue irrelevant & it certainly might fix some other oddities that people seem to have pointed out.

I've removed convenience functions which allow this to happen, and enforce either explicit passing of mates or start / end.

I do this rather than updating the MC record, because that extra step between each operation doesn't seem worthwhile, when we have the object loaded in memory already, the tags should be updated after all operations are complete a single time.

The exception to this is a single method which is used in consensus calling, where mate isn't easily available - this pattern still requires getting from mate cigar, which might still result in this bug occurring, as each mate could have clipping applied but again the data isn't synchronized to the MC tag.

I'll also work on that bug also, which I believe should be handled by operating on mates together, so all relevant data is loaded in memory explicitly, rather than using the tag.

As a sidenote: I don't understand the white space formatting rules for the project, do you happen to have an intellj profile i could use to keep these consistent for your formatting?

@eboyden
Copy link

eboyden commented Mar 28, 2024

Bumping this. As it pertains to a stealthy bug that can produce incorrect output, rather than a new feature, I hope it can be prioritized.

Copy link

codecov bot commented Jul 15, 2024

Codecov Report

Attention: Patch coverage is 91.66667% with 1 line in your changes missing coverage. Please review.

Project coverage is 95.63%. Comparing base (ab8959d) to head (a2a37da).
Report is 25 commits behind head on main.

Files Patch % Lines
...ala/com/fulcrumgenomics/bam/SamRecordClipper.scala 91.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #937      +/-   ##
==========================================
+ Coverage   95.60%   95.63%   +0.02%     
==========================================
  Files         126      126              
  Lines        7307     7308       +1     
  Branches      507      479      -28     
==========================================
+ Hits         6986     6989       +3     
+ Misses        321      319       -2     
Flag Coverage Δ
unittests 95.63% <91.66%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants