Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small changes to simplify build_primer_pairs(). #89

Merged
merged 3 commits into from
Nov 18, 2024
Merged

Conversation

tfenne
Copy link
Member

@tfenne tfenne commented Nov 14, 2024

No description provided.

@tfenne tfenne requested a review from msto November 14, 2024 23:58
@tfenne tfenne requested a review from nh13 as a code owner November 14, 2024 23:58
Copy link

coderabbitai bot commented Nov 15, 2024

Walkthrough

The pull request modifies the build_primer_pairs function in prymer/api/picking.py, enhancing the calculation of the amplicon span by replacing direct calculations with a call to the static method PrimerPair.calculate_amplicon_span(lp, rp). This change improves modularity and eliminates the intermediate variable amp_mapping, opting to use amp_span directly. The error handling remains intact, ensuring primer pairs are generated only if they meet specific constraints.

In prymer/api/primer_pair.py, the PrimerPair class is updated to include the new static method calculate_amplicon_span, which enhances validation checks and error messaging related to primer validation. The private method _calculate_amplicon is removed, and the _amplicon attribute is now initialized during the __post_init__ method. Additionally, the tests in tests/api/test_primer_pair.py have updated error messages for clarity and introduced a new test function to validate the behavior of the calculate_amplicon_span method.

Possibly related PRs

Suggested reviewers

  • nh13
  • msto

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between febbefb and e503bf6.

📒 Files selected for processing (1)
  • prymer/api/picking.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • prymer/api/picking.py

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 8c39970 and 4d22e2b.

📒 Files selected for processing (3)
  • prymer/api/picking.py (3 hunks)
  • prymer/api/primer_pair.py (2 hunks)
  • tests/api/test_primer_pair.py (3 hunks)
🔇 Additional comments (6)
prymer/api/picking.py (2)

144-145: Good refactor.

Moving span calculation to PrimerPair class improves modularity.


162-162: Clean parameter passing.

Direct use of amp_span reduces unnecessary variables.

tests/api/test_primer_pair.py (2)

Line range hint 430-439: LGTM: Clearer error message for reference mismatch

More concise error message that maintains the same test coverage.


453-453: LGTM: Improved error clarity for primer order

More descriptive error message that better explains the validation failure.

prymer/api/primer_pair.py (2)

87-91: Assignment of _amplicon using calculate_amplicon_span is clear and correct


233-263: Static method calculate_amplicon_span is well-defined and correctly implemented

prymer/api/picking.py Outdated Show resolved Hide resolved
prymer/api/picking.py Outdated Show resolved Hide resolved
prymer/api/primer_pair.py Show resolved Hide resolved
tests/api/test_primer_pair.py Outdated Show resolved Hide resolved
prymer/api/primer_pair.py Show resolved Hide resolved
prymer/api/primer_pair.py Show resolved Hide resolved
Comment on lines +87 to +91
object.__setattr__(
self,
"_amplicon",
PrimerPair.calculate_amplicon_span(self.left_primer, self.right_primer),
)
Copy link
Collaborator

@msto msto Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion Is this a good opportunity to refactor to make amplicon a cached_property, rather than mucking about with a private field and setattr?

e.g.

@cached_property
def amplicon(self) -> Span:
    """The interval spanned by the pair's amplicon."""
    return self.calculate_amplicon_span(self.left_primer, self.right_primer)

and then no need for the post-init

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's questionable. For better or worse, PrimerPair relies on the checks in calculate_amplicon_span() to enforce that the pairing makes sense ... so we could make it a cached property and the still have a post_init that accesses it, or replicate the checks ... or separate the checks into yet another function. None of which seem obviously better.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd really like to remove _amplicon as a "private" field.

As a consequence of the current implementation, both asdict() and fields() return a field that isn't accepted by the class constructor. This is unusual behavior and requires the user to remember and manually remove the field.

Could we make amplicon a cached_property, and update the post-init checks to reference that property? That protects all the cases I'm aware of motivating the current implementation:

  1. Mutation-free access to an amplicon "attribute".
  2. The value of amplicon is still derived from the input primers.

As a bonus, it removes the need to use setattr (currently necessary because we're trying to mutate a frozen dataclass after instantiation).

Copy link
Member Author

@tfenne tfenne Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried ... but failed. Changed it to a cached property and tried calling it from post_init() to run the validation. Now I get this in tests, which is super unhelpful and I'm not sure why:

ERROR tests/api/test_primer_pair.py - TypeError: No '__dict__' attribute on 'PrimerPair' instance to cache 'amplicon' property.
ERROR tests/api/test_primer_pair.py - TypeError: No '__dict__' attribute on 'PrimerPair' instance to cache 'amplicon' property.```

Going to merge without making the change, and we can circle back to it separately.

prymer/api/picking.py Show resolved Hide resolved
@tfenne tfenne requested a review from msto November 15, 2024 17:37
Copy link

codecov bot commented Nov 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.66%. Comparing base (8c39970) to head (e503bf6).
Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #89   +/-   ##
=======================================
  Coverage   96.66%   96.66%           
=======================================
  Files          26       26           
  Lines        1707     1710    +3     
  Branches      331      333    +2     
=======================================
+ Hits         1650     1653    +3     
  Misses         31       31           
  Partials       26       26           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)
prymer/api/primer_pair.py (2)

267-267: Fix copy-pasted comment.

Comment is incorrect - this check verifies the right primer ends after left primer ends.


242-244: Clarify overlap rules in docstring.

Specify allowed overlap percentage between primers.

-            a Span starting at the first base of the left primer and ending at the last base of
-             the right primer
+            A Span starting at the first base of the left primer and ending at the last base of
+            the right primer. Primers may overlap by up to 50% of their length.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 4d22e2b and febbefb.

📒 Files selected for processing (2)
  • prymer/api/primer_pair.py (2 hunks)
  • tests/api/test_primer_pair.py (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/api/test_primer_pair.py

Comment on lines +259 to +264
# Require that the left primer starts before the right primer
if left_primer.span.start > right_primer.span.start:
raise ValueError(
"The reference must be the same across primers in a pair; received "
f"left primer ref: {self.left_primer.span.refname}, "
f"right primer ref: {self.right_primer.span.refname}"
"Left primer does not start before the right primer. "
f"Left primer span: {left_primer.span}, "
f"Right primer span: {right_primer.span}"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add validation for primer overlap.

Current check only verifies start positions. Also validate that left primer doesn't overlap significantly with right primer.

Add this check after the existing validation:

 if left_primer.span.start > right_primer.span.start:
     raise ValueError(
         "Left primer does not start before the right primer. "
         f"Left primer span: {left_primer.span}, "
         f"Right primer span: {right_primer.span}"
     )
+
+# Ensure primers don't have significant overlap
+overlap = min(left_primer.span.end, right_primer.span.end) - max(left_primer.span.start, right_primer.span.start)
+if overlap > len(left_primer.bases) / 2:  # Allow up to 50% overlap
+    raise ValueError(
+        f"Primers overlap too much ({overlap} bases). "
+        f"Left primer span: {left_primer.span}, "
+        f"Right primer span: {right_primer.span}"
+    )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Require that the left primer starts before the right primer
if left_primer.span.start > right_primer.span.start:
raise ValueError(
"The reference must be the same across primers in a pair; received "
f"left primer ref: {self.left_primer.span.refname}, "
f"right primer ref: {self.right_primer.span.refname}"
"Left primer does not start before the right primer. "
f"Left primer span: {left_primer.span}, "
f"Right primer span: {right_primer.span}"
# Require that the left primer starts before the right primer
if left_primer.span.start > right_primer.span.start:
raise ValueError(
"Left primer does not start before the right primer. "
f"Left primer span: {left_primer.span}, "
f"Right primer span: {right_primer.span}"
)
# Ensure primers don't have significant overlap
overlap = min(left_primer.span.end, right_primer.span.end) - max(left_primer.span.start, right_primer.span.start)
if overlap > len(left_primer.bases) / 2: # Allow up to 50% overlap
raise ValueError(
f"Primers overlap too much ({overlap} bases). "
f"Left primer span: {left_primer.span}, "
f"Right primer span: {right_primer.span}"
)

Comment on lines +559 to +577
def test_calculate_amplicon_span() -> None:
left = Oligo(name="l", bases="AACCGGTTAA", tm=60, penalty=1, span=Span("chr1", 50, 59))
right = Oligo(name="l", bases="AACCGGTTAA", tm=60, penalty=1, span=Span("chr1", 150, 159))
assert PrimerPair.calculate_amplicon_span(left, right) == Span("chr1", 50, 159)

left = Oligo(name="l", bases="AACCGGTTAA", tm=60, penalty=1, span=Span("chr2", 50, 59))
right = Oligo(name="l", bases="AACCGGTTAA", tm=60, penalty=1, span=Span("chr3", 150, 159))
with pytest.raises(ValueError, match="different references"):
PrimerPair.calculate_amplicon_span(left, right)

left = Oligo(name="l", bases="AACCGGTTAA", tm=60, penalty=1, span=Span("chr1", 150, 159))
right = Oligo(name="l", bases="AACCGGTTAA", tm=60, penalty=1, span=Span("chr1", 50, 59))
with pytest.raises(ValueError, match="Left primer does not start before the right primer"):
PrimerPair.calculate_amplicon_span(left, right)

left = Oligo(name="l", bases="AACCGGTTAAACGTT", tm=60, penalty=1, span=Span("chr1", 150, 164))
right = Oligo(name="l", bases="AACCGGTTAA", tm=60, penalty=1, span=Span("chr1", 150, 159))
with pytest.raises(ValueError, match="Right primer ends before left primer ends"):
PrimerPair.calculate_amplicon_span(left, right)
Copy link
Collaborator

@msto msto Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note I usually try to use pytest.mark.parametrize to parallelize test execution as much as possible.

Tests exit after the first failed assertion, so if you bundle test cases in a single test, you run the risk of obscuring later failures. If you parametrize, you hit as many failures as possible in a single test run, and don't find yourself stuck in a loop of addressing one test case, running the test suite, and discovering a previously hidden failure.

I usually have one test parametrized over "good" cases, with each test case paired with its expected output, and one test parametrized over "bad" cases, where each case is expected to raise an exception.

Since here the span is the primary variable being manipulated by each case, I'd write something like the following:

@pytest.mark.parametrize(
    "left_primer,right_primer,expected_amplicon",
    [
        (Span("chr1", 50, 59), Span("chr1", 150, 159), Span("chr1", 50, 159)),
    ]
)
def test_calculate_amplicon_span(left_primer: Span, right_primer: Span, expected_amplicon: Span) -> None:
    """The amplicon should span from the start of the left primer to the end of the right primer."""
    # TODO add logic to build an `Oligo` from the test `Span`
    actual_amplicon: Span = calculate_amplicon_span(left_primer, right_primer)
    assert actual_amplicon == expected_amplicon


@pytest.mark.parametrize(
    "left_primer,right_primer,error_msg",
    [
        (Span("chr2", 50, 59), Span("chr3", 150, 159), "different references"),
    ],
)
def test_calculate_amplicon_span_raises(left_primer: Span, right_primer: Span, error_msg: str) -> None:
    """An error should be raised if the spans are on different references, or if the left primer is not to the left of the right primer."""
    with pytest.raises(ValueError, match=error_msg):
        calculate_amplicon_span(left_primer, right_primer)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is, to me, a stylistic thing, and I come down very strongly on the other side. I almost never use pytest.mark.parametrize because I personally find that it makes tests less maintainable, harder to read etc.

If the tests aren't as simple as these, then I tend to break them up into multiple test functions.

@tfenne tfenne merged commit ef7d90e into main Nov 18, 2024
5 of 7 checks passed
@tfenne tfenne deleted the tf_simplify_pairing branch November 18, 2024 22:00
msto added a commit that referenced this pull request Nov 20, 2024
See this discussion for context:
#89 (comment)

This PR also updates the mypy config to exclude the `mkdocs`
directories.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants