Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emit primer pairs in penalty order. #87

Merged
merged 7 commits into from
Dec 13, 2024
Merged

Emit primer pairs in penalty order. #87

merged 7 commits into from
Dec 13, 2024

Conversation

tfenne
Copy link
Member

@tfenne tfenne commented Nov 14, 2024

This PR does two things:

  1. Fix a bug where amplicons that were too small were still emitted
  2. Ensure that primer pairs are emitted in penalty order

(2) requires materializing a small tuple for all valid pairs, and then sorting by score. The tuple contains two ints (indices into the primer sequences) and two floats (the penalty and the tm, the last for convenience so we don't have to recompute it). It the sorts the tuples by penalty, and starts generating PrimerPairs in penalty order.

If you have e.g. 500 left and 500 right primers, this could construct ~250k tuples and calculate 250k Tms, but in reality the number is probably substantially smaller constrained by amplicon sizes.

@tfenne tfenne requested a review from msto November 14, 2024 03:27
@tfenne tfenne requested a review from nh13 as a code owner November 14, 2024 03:27
Copy link

codecov bot commented Nov 14, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.70%. Comparing base (fbd1aa1) to head (f7f889b).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #87      +/-   ##
==========================================
+ Coverage   96.66%   96.70%   +0.03%     
==========================================
  Files          26       26              
  Lines        1708     1728      +20     
  Branches      189      193       +4     
==========================================
+ Hits         1651     1671      +20     
  Misses         31       31              
  Partials       26       26              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@msto msto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @tfenne

Thanks for opening this, I think this will be a nice improvement.

I think this PR should likely be broken into ~3 separate PRs to facilitate the introduction of this feature.

  1. Fix the bug where amplicons that are too small are still emitted.
  2. Refactor build_primer_pairs() to extract the logic associated with the construction of PrimerPair objects into a classmethod on that class (closing Extract the body of the main for loop in build_primer_pairs() into an alternative constructor on PrimerPair #85).
  3. Ensure that primer pairs are emitted in penalty order.

Separating out this bug fix into a separate PR will make it easier to review that fix in isolation. Similarly, reducing the complexity of build_primer_pairs() before adding functionality to it will make it easier to review and test those changes. In particular, I think we lack coverage over any cases where primer pairs are actually built.

prymer/api/picking.py Show resolved Hide resolved
tests/api/test_picking.py Outdated Show resolved Hide resolved
prymer/api/picking.py Show resolved Hide resolved
Base automatically changed from tf_build_primers_list_to_sequence to main November 14, 2024 23:09
@fulcrumgenomics fulcrumgenomics deleted a comment from msto Nov 14, 2024
@tfenne tfenne requested a review from msto December 4, 2024 15:42
prymer/api/picking.py Outdated Show resolved Hide resolved
prymer/api/picking.py Show resolved Hide resolved
@tfenne tfenne force-pushed the tf_emit_pairs_in_order branch from a7cef00 to aded082 Compare December 4, 2024 16:09
Copy link

coderabbitai bot commented Dec 4, 2024

Walkthrough

The pull request introduces modifications to the score and build_primer_pairs functions in the prymer/api/picking.py file. The score function's logic for calculating the melting temperature penalty has been updated to check if the optimal melting temperature is zero before making comparisons with the provided amplicon_tm. In the build_primer_pairs function, a new list named pairings has been added to store tuples of primer indices, penalties, and melting temperatures. The nested loops responsible for generating primer pairs have been restructured to validate pairings based on size and melting temperature constraints before processing them. This change improves the organization of handling valid primer pairs. The method signatures for both functions have been updated, though the parameters remain unchanged. Additionally, an import statement for Tuple has been added for type hinting purposes.

Possibly related PRs

Suggested reviewers

  • nh13: Suggested for review due to familiarity with the relevant codebase.
  • msto: Recommended for review based on expertise in the area affected by the changes.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Experiment)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
prymer/api/picking.py (2)

Line range hint 126-126: Update docstring to reflect sorting.

Docstring incorrectly states "unsorted". Pairs are now sorted by penalty.

-        an iterator over all the valid primer pairs, unsorted
+        an iterator over all the valid primer pairs, sorted by penalty score

176-176: Document sort criteria.

Add comment explaining sort order.

+    # Sort primer pairs by their penalty scores (lower is better)
     pairings.sort(key=lambda tup: tup[2])
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 385e1b1 and aded082.

📒 Files selected for processing (1)
  • prymer/api/picking.py (3 hunks)
🔇 Additional comments (3)
prymer/api/picking.py (3)

24-24: LGTM!

Clean import addition for type hinting.


85-90: LGTM!

Consistent handling of zero optimal temperature case.


142-143: 🛠️ Refactor suggestion

Consider using a dataclass instead of tuple.

4-element tuple with multiple elements of same type is error-prone. A dataclass would provide named fields and type safety.

from dataclasses import dataclass

@dataclass
class PrimerPairing:
    left_idx: int
    right_idx: int
    penalty: float
    tm: float

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
tests/api/test_picking.py (1)

Line range hint 276-309: Add test for penalty order

Test verifies size filtering but not penalty-based ordering. Add test case to verify pairs are emitted in ascending penalty order.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between aded082 and 23e73e5.

📒 Files selected for processing (2)
  • prymer/api/picking.py (4 hunks)
  • tests/api/test_picking.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • prymer/api/picking.py
🔇 Additional comments (1)
tests/api/test_picking.py (1)

277-277: LGTM: Test range change aligns with PR objectives

Change expands test coverage for minimum amplicon size filtering.

prymer/api/picking.py Outdated Show resolved Hide resolved
weights=weights,
)

pairings.append((i, j, penalty, amp_tm))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion You might consider constructing PrimerPair here, and then using dataclasses.replace to populate bases below (where the PrimerPair is currently being instantiated).

That way the function is using established data structures from the library instead of tuples, the sort can reference pair.penalty instead of a tuple element, and I think it would save a few LOC overall.

Take it or leave it though 😄

Comment on lines +167 to +169
first_right_primer_idx = max(first_right_primer_idx, j+1)
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue Could we cover these lines?

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
prymer/api/picking.py (2)

124-125: Fix trailing whitespace

Remove trailing whitespace on line 124.

-        An iterator over all the valid primer pairs, sorted by primer pair penalty. 
+        An iterator over all the valid primer pairs, sorted by primer pair penalty.
         Primer pairs with smaller penalties are returned first.
🧰 Tools
🪛 Ruff (0.8.2)

124-124: Trailing whitespace

Remove trailing whitespace

(W291)


137-139: Document sorting impact

Add comment explaining how sorting enables the optimization strategy in the nested loops.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 23e73e5 and b6490ab.

📒 Files selected for processing (1)
  • prymer/api/picking.py (4 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
prymer/api/picking.py

124-124: Trailing whitespace

Remove trailing whitespace

(W291)

🔇 Additional comments (4)
prymer/api/picking.py (4)

24-24: LGTM: Import needed for type hints


85-90: LGTM: Improved tm_penalty calculation

Zero check prevents unnecessary penalty calculations when optimal temperature is zero.


147-148: Consider using a dataclass instead of tuple

A dataclass would provide named fields and prevent index errors. Example:

@dataclass
class PrimerPairing:
    left_idx: int
    right_idx: int
    penalty: float
    tm: float

153-160: Verify optimization assumptions

The optimization relies on two key assumptions. Let's verify they hold true:

✅ Verification successful

Optimization assumptions are correctly implemented

Code validates both optimization assumptions:

  • Early exit when right primer is leftward (rp.span.start < lp.span.start) or amplicon too short
  • Break loop when amplicon too large (amp_span.length > amplicon_sizes.max)

Test suite confirms functionality with various primer combinations and edge cases.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify sorting and size constraints are correctly enforced

# Check if primers are properly sorted
ast-grep --pattern 'def build_primer_pairs($_) {
  $$$
  left_primers = sorted($_, key=$_)
  right_primers = sorted($_, key=$_)
  $$$
}'

# Look for any test cases validating the optimization
rg -l "test.*build_primer_pairs.*optimization" 

Length of output: 187


Script:

#!/bin/bash
# Let's check the actual implementation and test cases
rg -A 20 "def build_primer_pairs" 

# Check test files for primer pair validation
fd "test.*\.py" --exec rg -A 10 "build_primer_pairs"

Length of output: 6108


Script:

#!/bin/bash
# Let's check the actual sorting and position-based logic
rg -B 5 -A 15 "for left_primer in left_primers:" prymer/api/picking.py

Length of output: 72


Script:

#!/bin/bash
# Search for the nested loop implementation
rg -A 30 "# Nested loops over indices" prymer/api/picking.py

Length of output: 1838

@tfenne tfenne force-pushed the tf_emit_pairs_in_order branch from b6490ab to f7f889b Compare December 13, 2024 20:51
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
prymer/api/picking.py (1)

153-161: Add examples to clarify optimization logic

Current comments explain what happens but not why. Add example scenarios.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b6490ab and f7f889b.

📒 Files selected for processing (2)
  • prymer/api/picking.py (4 hunks)
  • tests/api/test_picking.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/api/test_picking.py
🔇 Additional comments (6)
prymer/api/picking.py (6)

24-24: LGTM: Import addition is correct


124-125: LGTM: Docstring accurately reflects new sorting behavior


147-148: Consider using dataclass instead of tuple

Four-element tuple risks index errors. Dataclass would be safer.


200-201: Add comment about penalty-based sorting


85-90: LGTM: Improved tm_penalty calculation

Zero check for optimal tm matches size_penalty logic.

✅ Verification successful

Tests confirm tm_penalty logic is well covered

Tests in tests/api/test_picking.py verify both cases:

  • Higher than optimal tm: test_score_when_amplicon_tm_higher_than_optimal
  • Lower than optimal tm: test_score_when_amplicon_tm_lower_than_optimal
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check test coverage for tm_penalty edge cases
rg -A 5 "def test.*tm_penalty" tests/

Length of output: 39


Script:

#!/bin/bash
# Search for test files related to picking or tm calculations
fd "test.*\.py" tests/

# After finding test files, let's look for tm-related tests with broader patterns
rg -i "test.*tm|tm.*test" tests/

Length of output: 4585


208-210: Verify heterodimer check implementation

✅ Verification successful

Heterodimer check implementation is properly tested
The implementation is verified through comprehensive test coverage:

  • test_duplex_tm_default_params tests various sequence combinations including reverse complements
  • test_valid_ntthal_params verifies thermodynamic parameters
  • Heterodimer filtering is tested in test_build_primers_heterodimer_filtering with specific test cases
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check test coverage for heterodimer filtering
rg -A 5 "def test.*heterodimer" tests/

Length of output: 363


Script:

#!/bin/bash
# Get the test implementation details
cat tests/api/test_picking.py | grep -A 20 "test_build_primers_heterodimer_filtering"

# Check the ntthal usage in the codebase
rg -A 3 "ntthal\.duplex_tm" 

Length of output: 1180


Script:

#!/bin/bash
# Check ntthal implementation and imports
rg -A 3 "import.*ntthal"

# Look for ntthal tests
rg -A 5 "test.*ntthal" tests/

Length of output: 1760


Script:

# Get the full test implementation for ntthal
cat tests/ntthal/test_ntthal.py

Length of output: 3904

@tfenne tfenne merged commit b781cd6 into main Dec 13, 2024
7 checks passed
@tfenne tfenne deleted the tf_emit_pairs_in_order branch December 13, 2024 21:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants