-
-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run length bwt #440
Run length bwt #440
Conversation
} | ||
|
||
func (bwt BWT) getFirstColumnStr() string { | ||
firstColumn := strings.Builder{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is helpful for debugging. I recommend keeping it for the future.
I've rewritten this thing about 7 times thinking I wouldn't need it again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add some comments explaining why it's helpful for debugging so we know not to delete in future? A test in bwt_test.go
would be a good place to keep an example since it's unexported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TimothyStiles I added an explanation and demo in this commit: fe3b686#diff-c08e2ebed9f275352a42178cdcc0b15abc0ca64e019428e5fe8b6ff5ea3af188R636
Also, I'm not sure if we have any better way of switching on debug for Poly's internals (didn't find anything), but I added that debug field in the BWT. LMK if there's another approach that is preferred or if the explanation along with the lint ignore annotation is good enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the debugging functions and the lint directive but without some example usage I'm not sure how they'd be used to debug and would probably end up removing them in the future if they were merged without them.
Think you could write some tiny tests just to demo their usage? You can just put each example in a simple non-example test and refer to it in-line with something like, "To see usage example check out TestDebugRelatedFunction
in bwt_test.go
.
} | ||
|
||
func (bwt BWT) getFirstColumnStr() string { | ||
firstColumn := strings.Builder{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add some comments explaining why it's helpful for debugging so we know not to delete in future? A test in bwt_test.go
would be a good place to keep an example since it's unexported.
Changes in this PR
This PR is a memory optimization for the BWT. This implements the RLBWT, Run-Length Burrows Wheeler Transform. The difference between the RLBWT and the BWT is that instead of having the whole last column (the actual BWT string), we have a run length compressed version of the last column. For example, the Last column of banana is: "annb$aa". It's run length compression is "anb$a".
The RLBWT is very effective for repetitive texts like genomic data. For example, in the test data we have for the BWT, we repeat the base string 3 times. The resulting Last column has a length of 340 in the normal BWT implementation. In this implementation, it only has a length of 94 (along with some negligible auxiliary info about run start positions and character cumulative counts).
Ben would do a much better job at explaining the mechanics and O complexity than I:
Why are you making these changes?
This is a needed change for the BWT to be a viable tool for real workloads. There will be two follow ups after this PR:
Are any changes breaking? (IMPORTANT)
No
Pre-merge checklist
All of these must be satisfied before this PR is considered
ready for merging. Mergeable PRs will be prioritized for review.
primers/primers_test.go
for what this might look like.CHANGELOG.md
in the[Unreleased]
section.PS: I didn't update the changelog since the BWT is still unreleased at the time of writing.