-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Emit primer pairs in penalty order. #87
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #87 +/- ##
==========================================
+ Coverage 96.66% 96.70% +0.03%
==========================================
Files 26 26
Lines 1708 1728 +20
Branches 189 193 +4
==========================================
+ Hits 1651 1671 +20
Misses 31 31
Partials 26 26 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @tfenne
Thanks for opening this, I think this will be a nice improvement.
I think this PR should likely be broken into ~3 separate PRs to facilitate the introduction of this feature.
- Fix the bug where amplicons that are too small are still emitted.
- Refactor
build_primer_pairs()
to extract the logic associated with the construction ofPrimerPair
objects into aclassmethod
on that class (closing Extract the body of the main for loop inbuild_primer_pairs()
into an alternative constructor onPrimerPair
#85). - Ensure that primer pairs are emitted in penalty order.
Separating out this bug fix into a separate PR will make it easier to review that fix in isolation. Similarly, reducing the complexity of build_primer_pairs()
before adding functionality to it will make it easier to review and test those changes. In particular, I think we lack coverage over any cases where primer pairs are actually built.
a7cef00
to
aded082
Compare
WalkthroughThe pull request introduces modifications to the Possibly related PRs
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
prymer/api/picking.py (2)
Line range hint
126-126
: Update docstring to reflect sorting.Docstring incorrectly states "unsorted". Pairs are now sorted by penalty.
- an iterator over all the valid primer pairs, unsorted + an iterator over all the valid primer pairs, sorted by penalty score
176-176
: Document sort criteria.Add comment explaining sort order.
+ # Sort primer pairs by their penalty scores (lower is better) pairings.sort(key=lambda tup: tup[2])
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
prymer/api/picking.py
(3 hunks)
🔇 Additional comments (3)
prymer/api/picking.py (3)
24-24
: LGTM!
Clean import addition for type hinting.
85-90
: LGTM!
Consistent handling of zero optimal temperature case.
142-143
: 🛠️ Refactor suggestion
Consider using a dataclass instead of tuple.
4-element tuple with multiple elements of same type is error-prone. A dataclass would provide named fields and type safety.
from dataclasses import dataclass
@dataclass
class PrimerPairing:
left_idx: int
right_idx: int
penalty: float
tm: float
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
tests/api/test_picking.py (1)
Line range hint
276-309
: Add test for penalty orderTest verifies size filtering but not penalty-based ordering. Add test case to verify pairs are emitted in ascending penalty order.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
prymer/api/picking.py
(4 hunks)tests/api/test_picking.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- prymer/api/picking.py
🔇 Additional comments (1)
tests/api/test_picking.py (1)
277-277
: LGTM: Test range change aligns with PR objectives
Change expands test coverage for minimum amplicon size filtering.
weights=weights, | ||
) | ||
|
||
pairings.append((i, j, penalty, amp_tm)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion You might consider constructing PrimerPair
here, and then using dataclasses.replace
to populate bases
below (where the PrimerPair
is currently being instantiated).
That way the function is using established data structures from the library instead of tuples, the sort can reference pair.penalty
instead of a tuple element, and I think it would save a few LOC overall.
Take it or leave it though 😄
first_right_primer_idx = max(first_right_primer_idx, j+1) | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue Could we cover these lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
prymer/api/picking.py (2)
124-125
: Fix trailing whitespaceRemove trailing whitespace on line 124.
- An iterator over all the valid primer pairs, sorted by primer pair penalty. + An iterator over all the valid primer pairs, sorted by primer pair penalty. Primer pairs with smaller penalties are returned first.🧰 Tools
🪛 Ruff (0.8.2)
124-124: Trailing whitespace
Remove trailing whitespace
(W291)
137-139
: Document sorting impactAdd comment explaining how sorting enables the optimization strategy in the nested loops.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
prymer/api/picking.py
(4 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
prymer/api/picking.py
124-124: Trailing whitespace
Remove trailing whitespace
(W291)
🔇 Additional comments (4)
prymer/api/picking.py (4)
24-24
: LGTM: Import needed for type hints
85-90
: LGTM: Improved tm_penalty calculation
Zero check prevents unnecessary penalty calculations when optimal temperature is zero.
147-148
: Consider using a dataclass instead of tuple
A dataclass would provide named fields and prevent index errors. Example:
@dataclass
class PrimerPairing:
left_idx: int
right_idx: int
penalty: float
tm: float
153-160
: Verify optimization assumptions
The optimization relies on two key assumptions. Let's verify they hold true:
✅ Verification successful
Optimization assumptions are correctly implemented
Code validates both optimization assumptions:
- Early exit when right primer is leftward (
rp.span.start < lp.span.start
) or amplicon too short - Break loop when amplicon too large (
amp_span.length > amplicon_sizes.max
)
Test suite confirms functionality with various primer combinations and edge cases.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Verify sorting and size constraints are correctly enforced
# Check if primers are properly sorted
ast-grep --pattern 'def build_primer_pairs($_) {
$$$
left_primers = sorted($_, key=$_)
right_primers = sorted($_, key=$_)
$$$
}'
# Look for any test cases validating the optimization
rg -l "test.*build_primer_pairs.*optimization"
Length of output: 187
Script:
#!/bin/bash
# Let's check the actual implementation and test cases
rg -A 20 "def build_primer_pairs"
# Check test files for primer pair validation
fd "test.*\.py" --exec rg -A 10 "build_primer_pairs"
Length of output: 6108
Script:
#!/bin/bash
# Let's check the actual sorting and position-based logic
rg -B 5 -A 15 "for left_primer in left_primers:" prymer/api/picking.py
Length of output: 72
Script:
#!/bin/bash
# Search for the nested loop implementation
rg -A 30 "# Nested loops over indices" prymer/api/picking.py
Length of output: 1838
b6490ab
to
f7f889b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
prymer/api/picking.py (1)
153-161
: Add examples to clarify optimization logicCurrent comments explain what happens but not why. Add example scenarios.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
prymer/api/picking.py
(4 hunks)tests/api/test_picking.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/api/test_picking.py
🔇 Additional comments (6)
prymer/api/picking.py (6)
24-24
: LGTM: Import addition is correct
124-125
: LGTM: Docstring accurately reflects new sorting behavior
147-148
: Consider using dataclass instead of tuple
Four-element tuple risks index errors. Dataclass would be safer.
200-201
: Add comment about penalty-based sorting
85-90
: LGTM: Improved tm_penalty calculation
Zero check for optimal tm matches size_penalty logic.
✅ Verification successful
Tests confirm tm_penalty logic is well covered
Tests in tests/api/test_picking.py
verify both cases:
- Higher than optimal tm:
test_score_when_amplicon_tm_higher_than_optimal
- Lower than optimal tm:
test_score_when_amplicon_tm_lower_than_optimal
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check test coverage for tm_penalty edge cases
rg -A 5 "def test.*tm_penalty" tests/
Length of output: 39
Script:
#!/bin/bash
# Search for test files related to picking or tm calculations
fd "test.*\.py" tests/
# After finding test files, let's look for tm-related tests with broader patterns
rg -i "test.*tm|tm.*test" tests/
Length of output: 4585
208-210
: Verify heterodimer check implementation
✅ Verification successful
Heterodimer check implementation is properly tested
The implementation is verified through comprehensive test coverage:
test_duplex_tm_default_params
tests various sequence combinations including reverse complementstest_valid_ntthal_params
verifies thermodynamic parameters- Heterodimer filtering is tested in
test_build_primers_heterodimer_filtering
with specific test cases
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check test coverage for heterodimer filtering
rg -A 5 "def test.*heterodimer" tests/
Length of output: 363
Script:
#!/bin/bash
# Get the test implementation details
cat tests/api/test_picking.py | grep -A 20 "test_build_primers_heterodimer_filtering"
# Check the ntthal usage in the codebase
rg -A 3 "ntthal\.duplex_tm"
Length of output: 1180
Script:
#!/bin/bash
# Check ntthal implementation and imports
rg -A 3 "import.*ntthal"
# Look for ntthal tests
rg -A 5 "test.*ntthal" tests/
Length of output: 1760
Script:
# Get the full test implementation for ntthal
cat tests/ntthal/test_ntthal.py
Length of output: 3904
This PR does two things:
(2) requires materializing a small tuple for all valid pairs, and then sorting by score. The tuple contains two ints (indices into the primer sequences) and two floats (the penalty and the tm, the last for convenience so we don't have to recompute it). It the sorts the tuples by penalty, and starts generating PrimerPairs in penalty order.
If you have e.g. 500 left and 500 right primers, this could construct ~250k tuples and calculate 250k Tms, but in reality the number is probably substantially smaller constrained by amplicon sizes.