Emit primer pairs in penalty order. #87

tfenne · 2024-11-14T03:27:53Z

This PR does two things:

Fix a bug where amplicons that were too small were still emitted
Ensure that primer pairs are emitted in penalty order

(2) requires materializing a small tuple for all valid pairs, and then sorting by score. The tuple contains two ints (indices into the primer sequences) and two floats (the penalty and the tm, the last for convenience so we don't have to recompute it). It the sorts the tuples by penalty, and starts generating PrimerPairs in penalty order.

If you have e.g. 500 left and 500 right primers, this could construct ~250k tuples and calculate 250k Tms, but in reality the number is probably substantially smaller constrained by amplicon sizes.

codecov · 2024-11-14T03:28:08Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.70%. Comparing base (fbd1aa1) to head (f7f889b).
Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #87      +/-   ##
==========================================
+ Coverage   96.66%   96.70%   +0.03%     
==========================================
  Files          26       26              
  Lines        1708     1728      +20     
  Branches      189      193       +4     
==========================================
+ Hits         1651     1671      +20     
  Misses         31       31              
  Partials       26       26

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

msto

Hey @tfenne

Thanks for opening this, I think this will be a nice improvement.

I think this PR should likely be broken into ~3 separate PRs to facilitate the introduction of this feature.

Fix the bug where amplicons that are too small are still emitted.
Refactor build_primer_pairs() to extract the logic associated with the construction of PrimerPair objects into a classmethod on that class (closing Extract the body of the main for loop in build_primer_pairs() into an alternative constructor on PrimerPair #85).
Ensure that primer pairs are emitted in penalty order.

Separating out this bug fix into a separate PR will make it easier to review that fix in isolation. Similarly, reducing the complexity of build_primer_pairs() before adding functionality to it will make it easier to review and test those changes. In particular, I think we lack coverage over any cases where primer pairs are actually built.

prymer/api/picking.py

tests/api/test_picking.py

prymer/api/picking.py

coderabbitai · 2024-12-04T16:11:40Z

Walkthrough

The pull request introduces modifications to the score and build_primer_pairs functions in the prymer/api/picking.py file. The score function's logic for calculating the melting temperature penalty has been updated to check if the optimal melting temperature is zero before making comparisons with the provided amplicon_tm. In the build_primer_pairs function, a new list named pairings has been added to store tuples of primer indices, penalties, and melting temperatures. The nested loops responsible for generating primer pairs have been restructured to validate pairings based on size and melting temperature constraints before processing them. This change improves the organization of handling valid primer pairs. The method signatures for both functions have been updated, though the parameters remain unchanged. Additionally, an import statement for Tuple has been added for type hinting purposes.

Possibly related PRs

Small changes to simplify build_primer_pairs(). #89: Modifies the build_primer_pairs function, directly related to the changes in the main PR.
Add check for amplicon size too small. #90: Introduces a new check for amplicon size in build_primer_pairs, relevant to the main PR's logic modifications.
Use Sequence instead of list[] in parameters to build_primer_pairs() #86: Changes the parameter types in the build_primer_pairs function, related to the main PR's modifications.
Reducing picking.py to be simpler, more focused, and more performant. #75: Simplifies the picking.py module, including changes to the score and build_primer_pairs functions also modified in the main PR.

Suggested reviewers

nh13: Suggested for review due to familiarity with the relevant codebase.
msto: Recommended for review based on expertise in the area affected by the changes.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Experiment)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)

prymer/api/picking.py (2)
Line range hint 126-126: Update docstring to reflect sorting.

Docstring incorrectly states "unsorted". Pairs are now sorted by penalty.
-        an iterator over all the valid primer pairs, unsorted
+        an iterator over all the valid primer pairs, sorted by penalty score
176-176: Document sort criteria.

Add comment explaining sort order.
+    # Sort primer pairs by their penalty scores (lower is better)
     pairings.sort(key=lambda tup: tup[2])

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 385e1b1 and aded082.

📒 Files selected for processing (1)

prymer/api/picking.py (3 hunks)

🔇 Additional comments (3)

prymer/api/picking.py (3)

24-24: LGTM!

Clean import addition for type hinting.

85-90: LGTM!

Consistent handling of zero optimal temperature case.

142-143: 🛠️ Refactor suggestion

Consider using a dataclass instead of tuple.

4-element tuple with multiple elements of same type is error-prone. A dataclass would provide named fields and type safety.

from dataclasses import dataclass

@dataclass
class PrimerPairing:
    left_idx: int
    right_idx: int
    penalty: float
    tm: float

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)

tests/api/test_picking.py (1)

Line range hint 276-309: Add test for penalty order

Test verifies size filtering but not penalty-based ordering. Add test case to verify pairs are emitted in ascending penalty order.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between aded082 and 23e73e5.

📒 Files selected for processing (2)

prymer/api/picking.py (4 hunks)
tests/api/test_picking.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

prymer/api/picking.py

🔇 Additional comments (1)

tests/api/test_picking.py (1)

277-277: LGTM: Test range change aligns with PR objectives

Change expands test coverage for minimum amplicon size filtering.

prymer/api/picking.py

msto · 2024-12-05T14:03:24Z

prymer/api/picking.py

+                weights=weights,
+            )
+
+            pairings.append((i, j, penalty, amp_tm))


suggestion You might consider constructing PrimerPair here, and then using dataclasses.replace to populate bases below (where the PrimerPair is currently being instantiated).

That way the function is using established data structures from the library instead of tuples, the sort can reference pair.penalty instead of a tuple element, and I think it would save a few LOC overall.

Take it or leave it though 😄

msto · 2024-12-05T14:04:32Z

prymer/api/picking.py

+                first_right_primer_idx = max(first_right_primer_idx, j+1)
+                continue


issue Could we cover these lines?

Co-authored-by: Matt Stone <[email protected]>

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)

prymer/api/picking.py (2)
124-125: Fix trailing whitespace

Remove trailing whitespace on line 124.
-        An iterator over all the valid primer pairs, sorted by primer pair penalty. 
+        An iterator over all the valid primer pairs, sorted by primer pair penalty.
         Primer pairs with smaller penalties are returned first.
🧰 Tools

🪛 Ruff (0.8.2)

124-124: Trailing whitespace

Remove trailing whitespace

(W291)

137-139: Document sorting impact

Add comment explaining how sorting enables the optimization strategy in the nested loops.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 23e73e5 and b6490ab.

📒 Files selected for processing (1)

prymer/api/picking.py (4 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

prymer/api/picking.py

124-124: Trailing whitespace

Remove trailing whitespace

(W291)

🔇 Additional comments (4)

prymer/api/picking.py (4)

24-24: LGTM: Import needed for type hints

85-90: LGTM: Improved tm_penalty calculation

Zero check prevents unnecessary penalty calculations when optimal temperature is zero.

147-148: Consider using a dataclass instead of tuple

A dataclass would provide named fields and prevent index errors. Example:

@dataclass
class PrimerPairing:
    left_idx: int
    right_idx: int
    penalty: float
    tm: float

153-160: Verify optimization assumptions

The optimization relies on two key assumptions. Let's verify they hold true:

✅ Verification successful

Optimization assumptions are correctly implemented

Code validates both optimization assumptions:

Early exit when right primer is leftward (rp.span.start < lp.span.start) or amplicon too short
Break loop when amplicon too large (amp_span.length > amplicon_sizes.max)

Test suite confirms functionality with various primer combinations and edge cases.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify sorting and size constraints are correctly enforced

# Check if primers are properly sorted
ast-grep --pattern 'def build_primer_pairs($_) {
  $$$
  left_primers = sorted($_, key=$_)
  right_primers = sorted($_, key=$_)
  $$$
}'

# Look for any test cases validating the optimization
rg -l "test.*build_primer_pairs.*optimization"

Length of output: 187

Script:

#!/bin/bash
# Let's check the actual implementation and test cases
rg -A 20 "def build_primer_pairs" 

# Check test files for primer pair validation
fd "test.*\.py" --exec rg -A 10 "build_primer_pairs"

Length of output: 6108

Script:

#!/bin/bash
# Let's check the actual sorting and position-based logic
rg -B 5 -A 15 "for left_primer in left_primers:" prymer/api/picking.py

Length of output: 72

Script:

#!/bin/bash
# Search for the nested loop implementation
rg -A 30 "# Nested loops over indices" prymer/api/picking.py

Length of output: 1838

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)

prymer/api/picking.py (1)

153-161: Add examples to clarify optimization logic

Current comments explain what happens but not why. Add example scenarios.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b6490ab and f7f889b.

📒 Files selected for processing (2)

prymer/api/picking.py (4 hunks)
tests/api/test_picking.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

tests/api/test_picking.py

🔇 Additional comments (6)

prymer/api/picking.py (6)

24-24: LGTM: Import addition is correct

124-125: LGTM: Docstring accurately reflects new sorting behavior

147-148: Consider using dataclass instead of tuple

Four-element tuple risks index errors. Dataclass would be safer.

200-201: Add comment about penalty-based sorting

85-90: LGTM: Improved tm_penalty calculation

Zero check for optimal tm matches size_penalty logic.

✅ Verification successful

Tests confirm tm_penalty logic is well covered

Tests in tests/api/test_picking.py verify both cases:

Higher than optimal tm: test_score_when_amplicon_tm_higher_than_optimal
Lower than optimal tm: test_score_when_amplicon_tm_lower_than_optimal

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check test coverage for tm_penalty edge cases
rg -A 5 "def test.*tm_penalty" tests/

Length of output: 39

Script:

#!/bin/bash
# Search for test files related to picking or tm calculations
fd "test.*\.py" tests/

# After finding test files, let's look for tm-related tests with broader patterns
rg -i "test.*tm|tm.*test" tests/

Length of output: 4585

208-210: Verify heterodimer check implementation

✅ Verification successful

Heterodimer check implementation is properly tested
The implementation is verified through comprehensive test coverage:

test_duplex_tm_default_params tests various sequence combinations including reverse complements
test_valid_ntthal_params verifies thermodynamic parameters
Heterodimer filtering is tested in test_build_primers_heterodimer_filtering with specific test cases

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check test coverage for heterodimer filtering
rg -A 5 "def test.*heterodimer" tests/

Length of output: 363

Script:

#!/bin/bash
# Get the test implementation details
cat tests/api/test_picking.py | grep -A 20 "test_build_primers_heterodimer_filtering"

# Check the ntthal usage in the codebase
rg -A 3 "ntthal\.duplex_tm"

Length of output: 1180

Script:

#!/bin/bash
# Check ntthal implementation and imports
rg -A 3 "import.*ntthal"

# Look for ntthal tests
rg -A 5 "test.*ntthal" tests/

Length of output: 1760

Script:

# Get the full test implementation for ntthal
cat tests/ntthal/test_ntthal.py

Length of output: 3904

tfenne requested a review from msto November 14, 2024 03:27

tfenne requested a review from nh13 as a code owner November 14, 2024 03:27

msto requested changes Nov 14, 2024

View reviewed changes

prymer/api/picking.py Show resolved Hide resolved

tests/api/test_picking.py Outdated Show resolved Hide resolved

prymer/api/picking.py Show resolved Hide resolved

Base automatically changed from tf_build_primers_list_to_sequence to main November 14, 2024 23:09

fulcrumgenomics deleted a comment from msto Nov 14, 2024

msto assigned tfenne Nov 15, 2024

tfenne requested a review from msto December 4, 2024 15:42

msto reviewed Dec 4, 2024

View reviewed changes

prymer/api/picking.py Outdated Show resolved Hide resolved

prymer/api/picking.py Show resolved Hide resolved

coderabbitai bot approved these changes Dec 4, 2024

View reviewed changes

tfenne force-pushed the tf_emit_pairs_in_order branch from a7cef00 to aded082 Compare December 4, 2024 16:09

coderabbitai bot reviewed Dec 4, 2024

View reviewed changes

msto approved these changes Dec 5, 2024

View reviewed changes

nh13 approved these changes Dec 13, 2024

View reviewed changes

tfenne and others added 6 commits December 13, 2024 13:47

Add .idea to .gitignore

1950caa

Emit primer pairs in penalty order.

2e0ba25

Little more optimization

372c271

Fix return doc

00eb5c4

More optimization.

b69e231

Update prymer/api/picking.py

354e5bf

Co-authored-by: Matt Stone <[email protected]>

coderabbitai bot reviewed Dec 13, 2024

View reviewed changes

review fixups

f7f889b

tfenne force-pushed the tf_emit_pairs_in_order branch from b6490ab to f7f889b Compare December 13, 2024 20:51

coderabbitai bot reviewed Dec 13, 2024

View reviewed changes

tfenne merged commit b781cd6 into main Dec 13, 2024
7 checks passed

tfenne deleted the tf_emit_pairs_in_order branch December 13, 2024 21:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Emit primer pairs in penalty order. #87

Emit primer pairs in penalty order. #87

tfenne commented Nov 14, 2024

codecov bot commented Nov 14, 2024 •

edited

Loading

msto left a comment

coderabbitai bot commented Dec 4, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot left a comment

msto Dec 5, 2024

msto Dec 5, 2024

coderabbitai bot left a comment

coderabbitai bot left a comment

		first_right_primer_idx = max(first_right_primer_idx, j+1)
		continue

Emit primer pairs in penalty order. #87

Emit primer pairs in penalty order. #87

Conversation

tfenne commented Nov 14, 2024

codecov bot commented Nov 14, 2024 • edited Loading

Codecov Report

msto left a comment

Choose a reason for hiding this comment

coderabbitai bot commented Dec 4, 2024 • edited Loading

Walkthrough

Possibly related PRs

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

msto Dec 5, 2024

Choose a reason for hiding this comment

msto Dec 5, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

codecov bot commented Nov 14, 2024 •

edited

Loading

coderabbitai bot commented Dec 4, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)