Skip to content

Commit

Permalink
refactor: Primer -> Oligo, PrimerLike -> OligoLike; corresponding upd…
Browse files Browse the repository at this point in the history
…ates to imports and tests
  • Loading branch information
emmcauley committed Oct 7, 2024
1 parent f71dc65 commit c3f0389
Show file tree
Hide file tree
Showing 23 changed files with 1,254 additions and 681 deletions.
18 changes: 10 additions & 8 deletions docs/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
The `prymer` Python library is intended to be used for three main purposes:

1. [Clustering targets](#clustering-targets) into larger amplicons prior to designing primers.
2. [Designing primers](#designing-primers) (left or right) or primer pairs using Primer3 for each target from (1).
3. [Build and Picking a set of primer pairs](#build-and-picking-primer-pairs) from the designed primer pairs produced in (2).
2. [Designing](#designing-primers) primers (single and paired) and internal hybridization probes using Primer3 for each target from (1).
3. [Build and Picking a set of primer pairs](#build-and-picking-primer-pairs) from the design candidates produced in (2).

## Clustering Targets

Expand All @@ -18,22 +18,24 @@ amplicons prior to primer design.
Designing primers (left or right) or primer pairs using Primer3 is primarily performed using the
[`Primer3`][prymer.primer3.primer3.Primer3] class, which wraps the
[`primer3` command line tool](https://github.com/primer3-org/primer3). The
[`design_primers()`][prymer.primer3.primer3.Primer3.design_primers] facilitates the design of single and paired primers
[`design()`][prymer.primer3.primer3.Primer3.design] method facilitates the design of primers (single and paired) and internal hybridization probes
for a single target. The `Primer3` instance is intended to be re-used to design primers across multiple targets, or
re-design (after changing parameters) for the same target, or both!

Common input parameters are specified in [`Primer3Parameters()`][prymer.primer3.primer3_parameters.Primer3Parameters] and
[`Primer3Weights()`][prymer.primer3.primer3_weights.Primer3Weights], while the task type (left primer,
Common input parameters for designing primers are specified in [`PrimerAndAmpliconParameters()`][prymer.primer3.primer3_parameters.PrimerAndAmpliconParameters] and
[`PrimerAndAmpliconWeights()`][prymer.primer3.primer3_weights.PrimerAndAmpliconWeights], while the task type (left primer,
right primer, or primer pair design) is specified with the corresponding
[`Primer3Task`][prymer.primer3.primer3_task.Primer3Task].
[`Primer3Task`][prymer.primer3.primer3_task.Primer3Task].
Design specifications for designing probes are stored in [`ProbeParameters()`][prymer.primer3.primer3_parameters.ProbeParameters].
Penalty weights for designing internal probes are specified in [`ProbeWeights()`][prymer.primer3.primer3_weights.ProbeWeights].

The result of a primer design is encapsulated in the [`Primer3Result`][prymer.primer3.primer3.Primer3Result] class. It
provides the primers (or primer pairs) that were designed, as well as a list of reasons some primers were not returned,
provides the primers, probes, or primer pairs that were designed, as well as a list of reasons some primers were not returned,
for example exceeding the melting temperature threshold, too high GC content, and so on. These failures are
encapsulated in the [`Primer3Failures`][prymer.primer3.primer3.Primer3Failure] class.

The [`Primer3Result`][prymer.primer3.primer3.Primer3Result] returned by the primer design contains either a list of
[`Primer`][prymer.api.primer.Primer]s or [`PrimerPair`][prymer.api.primer_pair.PrimerPair]s, depending on the
[`Oligo`][prymer.api.primer.Oligo]s or [`PrimerPair`][prymer.api.primer_pair.PrimerPair]s, depending on the
[`Primer3Task`][prymer.primer3.primer3_task.Primer3Task] specified in the input parameters.
These can be subsequently filtered or examined.

Expand Down
8 changes: 4 additions & 4 deletions prymer/api/__init__.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
from prymer.api.clustering import ClusteredIntervals
from prymer.api.clustering import cluster_intervals
from prymer.api.minoptmax import MinOptMax
from prymer.api.oligo import Oligo
from prymer.api.oligo_like import OligoLike
from prymer.api.picking import FilteringParams
from prymer.api.picking import build_and_pick_primer_pairs
from prymer.api.picking import build_primer_pairs
from prymer.api.picking import pick_top_primer_pairs
from prymer.api.primer import Primer
from prymer.api.primer_like import PrimerLike
from prymer.api.primer_pair import PrimerPair
from prymer.api.span import BedLikeCoords
from prymer.api.span import Span
Expand All @@ -27,8 +27,8 @@
"build_primer_pairs",
"pick_top_primer_pairs",
"build_and_pick_primer_pairs",
"PrimerLike",
"Primer",
"OligoLike",
"Oligo",
"PrimerPair",
"Span",
"Strand",
Expand Down
207 changes: 207 additions & 0 deletions prymer/api/oligo.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,207 @@
"""
# Oligo Class and Methods
This module contains a class and class methods to represent an oligo (e.g., designed by Primer3).
Oligos can represent single primer and/or internal probe designs.
Class attributes include the base sequence, melting temperature, and the score of the oligo. The
mapping of the oligo to the genome is also stored.
Optional attributes include naming information and a tail sequence to attach to the 5' end of the
oligo (if applicable). Optional attributes also include the thermodynamic results from Primer3.
## Examples of interacting with the `Oligo` class
```python
>>> from prymer.api.span import Span, Strand
>>> oligo_span = Span(refname="chr1", start=1, end=20)
>>> oligo = Oligo(tm=70.0, penalty=-123.0, span=oligo_span, bases="AGCT" * 5)
>>> oligo.longest_hp_length()
1
>>> oligo.length
20
>>> oligo.name is None
True
>>> oligo = Oligo(tm=70.0, penalty=-123.0, span=oligo_span, bases="GACGG"*4)
>>> oligo.longest_hp_length()
3
>>> oligo.untailed_length()
20
>>> oligo.tailed_length()
20
>>> primer = oligo.with_tail(tail="GATTACA")
>>> primer.untailed_length()
20
>>> primer.tailed_length()
27
>>> primer = primer.with_name(name="fwd_primer")
>>> primer.name
'fwd_primer'
```
Oligos may also be written to a file and subsequently read back in, as the `Oligo` class is an
`fgpyo` `Metric` class:
```python
>>> from pathlib import Path
>>> left_span = Span(refname="chr1", start=1, end=20)
>>> left = Oligo(tm=70.0, penalty=-123.0, span=left_span, bases="G"*20)
>>> right_span = Span(refname="chr1", start=101, end=120)
>>> right = Oligo(tm=70.0, penalty=-123.0, span=right_span, bases="T"*20)
>>> path = Path("/tmp/path/to/primers.txt")
>>> Oligo.write(path, left, right) # doctest: +SKIP
>>> primers = Oligo.read(path) # doctest: +SKIP
>>> list(primers) # doctest: +SKIP
[
Oligo(tm=70.0, penalty=-123.0, span=amplicon_span, bases="G"*20),
Oligo(tm=70.0, penalty=-123.0, span=amplicon_span, bases="T"*20)
]
```
"""

from dataclasses import dataclass
from dataclasses import replace
from typing import Any
from typing import Callable
from typing import Dict
from typing import Optional

from fgpyo.fasta.sequence_dictionary import SequenceDictionary
from fgpyo.sequence import longest_dinucleotide_run_length
from fgpyo.sequence import longest_homopolymer_length
from fgpyo.util.metric import Metric

from prymer.api.oligo_like import MISSING_BASES_STRING
from prymer.api.oligo_like import OligoLike
from prymer.api.span import Span


@dataclass(frozen=True, init=True, kw_only=True, slots=True)
class Oligo(OligoLike, Metric["Oligo"]):
"""Stores the properties of the designed oligo.
Oligos can include both single primer and internal probe designs. The penalty score of the
design is emitted by Primer3 and controlled by the corresponding design parameters.
The penalty for a primer is set by the combination of `PrimerAndAmpliconParameters` and
`PrimerWeights`, whereas a probe penalty is set by `ProbeParameters` and `ProbeWeights`.
Attributes:
tm: the calculated melting temperature of the oligo
penalty: the penalty or score for the oligo
span: the mapping of the primer to the genome
bases: the base sequence of the oligo (excluding any tail)
tail: an optional tail sequence to put on the 5' end of the primer
name: an optional name to use for the primer
"""

tm: float
penalty: float
span: Span
bases: Optional[str] = None
tail: Optional[str] = None

def __post_init__(self) -> None:
super(Oligo, self).__post_init__()

def longest_hp_length(self) -> int:
"""Length of longest homopolymer in the oligo."""
if self.bases is None:
return 0
else:
return longest_homopolymer_length(self.bases)

@property
def length(self) -> int:
"""Length of un-tailed oligo."""
return self.span.length

def untailed_length(self) -> int:
"""Length of un-tailed oligo."""
return self.span.length

def tailed_length(self) -> int:
"""Length of tailed oligo."""
return self.span.length if self.tail is None else self.span.length + len(self.tail)

def longest_dinucleotide_run_length(self) -> int:
"""Number of bases in the longest dinucleotide run in a oligo.
A dinucleotide run is when length two repeat-unit is repeated. For example,
TCTC (length = 4) or ACACACACAC (length = 10). If there are no such runs, returns 2
(or 0 if there are fewer than 2 bases)."""
return longest_dinucleotide_run_length(self.bases)

def with_tail(self, tail: str) -> "Oligo":
"""Returns a copy of the oligo with the tail sequence attached."""
return replace(self, tail=tail)

def with_name(self, name: str) -> "Oligo":
"""Returns a copy of oligo object with the given name."""
return replace(self, name=name)

def bases_with_tail(self) -> Optional[str]:
"""
Returns the sequence of the oligo prepended by the tail.
If `tail` is None, only return `bases`.
"""
if self.tail is None:
return self.bases
return f"{self.tail}{self.bases}"

def to_bed12_row(self) -> str:
"""Returns the BED detail format view:
https://genome.ucsc.edu/FAQ/FAQformat.html#format1.7"""
bed_coord = self.span.get_bedlike_coords()
return "\t".join(
map(
str,
[
self.span.refname, # contig
bed_coord.start, # start
bed_coord.end, # end
self.id, # name
500, # score
self.span.strand.value, # strand
bed_coord.start, # thick start
bed_coord.end, # thick end
"100,100,100", # color
1, # block count
f"{self.length}", # block sizes
"0", # block starts (relative to `start`)
],
)
)

def __str__(self) -> str:
"""
Returns a string representation of this oligo
"""
# If the bases field is None, replace with MISSING_BASES_STRING
bases: str = self.bases if self.bases is not None else MISSING_BASES_STRING
return f"{bases}\t{self.tm}\t{self.penalty}\t{self.span}"

@classmethod
def _parsers(cls) -> Dict[type, Callable[[str], Any]]:
return {
Span: lambda value: Span.from_string(value),
}

@staticmethod
def compare(this: "Oligo", that: "Oligo", seq_dict: SequenceDictionary) -> int:
"""Compares this oligo to that oligo by their span, ordering references using the given
sequence dictionary.
Args:
this: the first oligo
that: the second oligo
seq_dict: the sequence dictionary used to order references
Returns:
-1 if this oligo is less than the that oligo, 0 if equal, 1 otherwise
"""
return Span.compare(this=this.span, that=that.span, seq_dict=seq_dict)
35 changes: 15 additions & 20 deletions prymer/api/primer_like.py → prymer/api/oligo_like.py
Original file line number Diff line number Diff line change
@@ -1,22 +1,22 @@
"""
# Class and Methods for primer-like objects
# Class and Methods for oligo-like objects
The `PrimerLike` class is an abstract base class designed to represent primer-like objects,
such as individual primers or primer pairs. This class encapsulates common attributes and
The `OligoLike` class is an abstract base class designed to represent oligo-like objects,
such as individual primers and probes or primer pairs. This class encapsulates common attributes and
provides a foundation for more specialized implementations.
In particular, the following methods/attributes need to be implemented:
- [`span()`][prymer.api.primer_like.PrimerLike.span] -- the mapping of the primer-like
- [`span()`][prymer.api.oligo_like.OligoLike.span] -- the mapping of the oligo-like
object to the genome.
- [`bases()`][prymer.api.primer_like.PrimerLike.bases] -- the bases of the primer-like
- [`bases()`][prymer.api.oligo_like.OligoLike.bases] -- the bases of the oligo-like
object, or `None` if not available.
- [`to_bed12_row()`][prymer.api.primer_like.PrimerLike.to_bed12_row] -- the 12-field BED
representation of this primer-like object.
- [`to_bed12_row()`][prymer.api.oligo_like.OligoLike.to_bed12_row] -- the 12-field BED
representation of this oligo-like object.
See the following concrete implementations:
- [`Primer`][prymer.api.primer.Primer] -- a class to store an individual primer
- [`Primer`][prymer.api.oligo.Oligo] -- a class to store an individual oligo
- [`PrimerPair`][prymer.api.primer_pair.PrimerPair] -- a class to store a primer pair
"""
Expand All @@ -25,7 +25,6 @@
from abc import abstractmethod
from dataclasses import dataclass
from typing import Optional
from typing import TypeVar
from typing import assert_never

from fgpyo.sequence import gc_content
Expand All @@ -38,9 +37,9 @@


@dataclass(frozen=True, init=True, slots=True)
class PrimerLike(ABC):
class OligoLike(ABC):
"""
An abstract base class for primer-like objects, such as individual primers or primer pairs.
An abstract base class for oligo-like objects, such as individual primers or primer pairs.
Attributes:
name: an optional name to use for the primer
Expand All @@ -67,12 +66,12 @@ def __post_init__(self) -> None:
@property
@abstractmethod
def span(self) -> Span:
"""Returns the mapping of the primer-like object to a genome."""
"""Returns the mapping of the oligo-like object to a genome."""

@property
@abstractmethod
def bases(self) -> Optional[str]:
"""Returns the base sequence of the primer-like object."""
"""Returns the base sequence of the oligo-like object."""

@property
def percent_gc_content(self) -> float:
Expand All @@ -88,7 +87,7 @@ def percent_gc_content(self) -> float:
@property
def id(self) -> str:
"""
Returns the identifier for the primer-like object. This shall be the `name`
Returns the identifier for the oligo-like object. This shall be the `name`
if one exists, otherwise a generated value based on the location of the object.
"""
if self.name is not None:
Expand All @@ -98,7 +97,7 @@ def id(self) -> str:

@property
def location_string(self) -> str:
"""Returns a string representation of the location of the primer-like object."""
"""Returns a string representation of the location of the oligo-like object."""
return (
f"{self.span.refname}_{self.span.start}_"
+ f"{self.span.end}_{self._strand_to_location_string()}"
Expand All @@ -107,7 +106,7 @@ def location_string(self) -> str:
@abstractmethod
def to_bed12_row(self) -> str:
"""
Formats the primer-like into 12 tab-separated fields matching the BED 12-column spec.
Formats the oligo-like into 12 tab-separated fields matching the BED 12-column spec.
See: https://genome.ucsc.edu/FAQ/FAQformat.html#format1
"""

Expand All @@ -124,7 +123,3 @@ def _strand_to_location_string(self) -> str:
case _: # pragma: no cover
# Not calculating coverage on this line as it should be impossible to reach
assert_never(f"Encountered unhandled Strand value: {self.span.strand}")


PrimerLikeType = TypeVar("PrimerLikeType", bound=PrimerLike)
"""Type variable for classes generic over `PrimerLike` types."""
Loading

0 comments on commit c3f0389

Please sign in to comment.