Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: Expand the documentation for OffTargetDetector #46

Merged
merged 6 commits into from
Sep 24, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
128 changes: 105 additions & 23 deletions prymer/offtarget/offtarget_detector.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,11 +102,14 @@ class OffTargetResult:
"""Information obtained by running a single primer pair through the off-target detector.

Attributes:
primer_pair: the primer submitted
passes: True if the primer pair passes all checks, False otherwise
primer_pair: the primer pair submitted
passes: True if neither primer exceeds the specified maximum number of hits, and the primer
pair would generate an acceptable number of amplicons in the reference genome (i.e.
`min_primer_pair_hits <= num_amplicons <= max_primer_pair_hits`). False otherwise.
cached: True if this result is part of a cache, False otherwise. This is useful for testing
spans: the set of mappings of the primer pair to the genome or an empty list if mappings
were not retained
spans: The set of mappings of the primer pair to the genome (i.e., the region spanned by the
inferred amplicon). This list will be empty if the generating [[OffTargetDetector]] was
constructed with `keep_spans=False`.
left_primer_spans: the list of mappings for the left primer, independent of the pair
mappings, or an empty list
right_primer_spans: the list of mappings for the right primer, independent of the pair
Expand All @@ -122,15 +125,28 @@ class OffTargetResult:


class OffTargetDetector:
"""A class for detecting off-target mappings of primers and primer pairs that uses a custom
version of "bwa aln".
"""
Detect off-target mappings of primers and primer pairs.

The off-target detection is faster and more sensitive than traditional isPCR and in addition can
correctly detect primers that are repetitive and contain many thousands or millions of mappings
to the genome.
`OffTargetDetector` uses a [custom, interactive
version](https://github.com/fulcrumgenomics/bwa-aln-interactive/) of `bwa aln` to perform
off-target detection. This approach is faster and more sensitive than traditional isPCR and in
addition can correctly detect primers that are repetitive and contain many thousands or millions
of mappings to the genome.

Note that while this class invokes BWA with multiple threads, it is not itself thread-safe.
Only one thread at a time should invoke methods on this class without external synchronization.

Off-target detection may be performed for individual primers and/or primer pairs.

To detect off-target hits for individual primers, use the `OffTargetDetector.filters()` method,
which will remove any primers that have more than the specified number of maximum hits against
the reference.

To detect off-target amplification of primer pairs, use the `OffTargetDetector.check_one()` or
`OffTargetDetector.check_all()` methods. These methods screen the individual primers in each
pair for off-target hits, and verify that the possible amplicons inferred by the primers'
alignments does not exceed the specified maximum number of primer pair hits.
"""

def __init__(
Expand All @@ -150,6 +166,19 @@ def __init__(
executable: str | Path = "bwa",
) -> None:
"""
Initialize an [[OffTargetDetector]].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the primer pair related parameters be given defaults? I don't need them for checking single primers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been wondering something similar - if it's possible to provide defaults for more of these parameters, or if the features should be decoupled

cc @tfenne @emmcauley


This class accepts a large number of parameters to control the behavior of the off-target
detection. The parameters are used in the various aspects of off-target detection as
follows:

1. Alignment (off-target hit detection): `ref`, `executable`, `threads`,
`three_prime_region_length`, `max_mismatches_in_three_prime_region`, `max_mismatches`,
and `max_primer_hits`.
2. Filtering of individual primers: `max_primer_hits`.
3. Checking of primer pairs: `max_primer_hits`, `min_primer_pair_hits`,
`max_primer_pair_hits`, and `max_amplicon_size`.
Comment on lines +175 to +180
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it'd be helpful to have the parameters grouped by the function(s) that use them. I feel like there could be a better way to organize/present this, lmk if you have any suggestions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is helpful. I'm happy to come back later and expand this further too.


Args:
ref: the reference genome fasta file (must be indexed with BWA)
max_primer_hits: the maximum number of hits an individual primer can have in the genome
Expand All @@ -173,10 +202,10 @@ def __init__(
max_amplicon_size: the maximum amplicon size to consider amplifiable
cache_results: if True, cache results for faster re-querying
threads: the number of threads to use when invoking bwa
keep_spans: if True, [[OffTargetResult]] objects will have amplicon spans
keep_spans: if True, [[OffTargetResult]] objects will be reported with amplicon spans
populated, otherwise not
keep_primer_spans: if True, [[OffTargetResult]] objects will have left and right
primer spans
keep_primer_spans: if True, [[OffTargetResult]] objects will be reported with left and
right primer spans
executable: string or Path representation of the `bwa` executable path
"""
self._primer_cache: dict[str, BwaResult] = {}
Expand All @@ -200,25 +229,61 @@ def __init__(
self._keep_primer_spans: bool = keep_primer_spans

def filter(self, primers: list[PrimerType]) -> list[PrimerType]:
"""Filters an iterable of Primers to return only those that have less than
`max_primer_hits` mappings to the genome."""
"""
Remove primers that have more than `max_primer_hits` mappings to the genome.

This method maps each primer to the specified reference with `bwa aln` to search for
off-target hits. Note that when the reference includes the sequence from which the primers
were designed, the on-target hit will be included in the hit count. `max_primer_hits` should
be set to at least 1 in this case.

Args:
primers: A list of primers to filter.

Returns:
The input primers, filtered to remove any primers with more than `max_primer_hits`
mappings to the genome.
"""
results: dict[str, BwaResult] = self.mappings_of(primers)
return [
primer for primer in primers if results[primer.bases].hit_count <= self._max_primer_hits
]

def check_one(self, primer_pair: PrimerPair) -> OffTargetResult:
"""Checks a PrimerPair for off-target sites in the genome at which it might amplify."""
"""
Check an individual primer pair for off-target amplification.

See `OffTargetDetector.check_all()` for details.

Args:
primer_pair: The primer pair to check.

Returns:
The results of the off-target detection.
See `OffTargetResult` for details regarding the attributes included in the result.
"""
result: dict[PrimerPair, OffTargetResult] = self.check_all([primer_pair])
return result[primer_pair]

def check_all(self, primer_pairs: list[PrimerPair]) -> dict[PrimerPair, OffTargetResult]:
"""Checks a collection of primer pairs for off-target sites, returning a dictionary of
`PrimerPair`s to `OffTargetResult`.
"""
Check a collection of primer pairs for off-target amplification.

This method maps each primer to the specified reference with `bwa aln` to search for
off-target hits. Possible amplicons are identified from the pairwise combinations of these
alignments, up to the specified `max_amplicon_size`.

Primer pairs are marked as passing if both of the following are true:
1. Each primer has no more than `max_primer_hits` alignments to the genome.
2. The size of the set of possible amplicons does not exceed `max_primer_pair_hits`, and is
at least `min_primer_pair_hits`.

Args:
primer_pairs: The primer pairs to check.

Returns:
a Map containing all given primer pairs as keys with the values being the result of
off-target checking.
A dictionary mapping each checked primer pair to its off-target detection results.
See `OffTargetResult` for details regarding the attributes included in each result.
"""

primer_pair_results: dict[PrimerPair, OffTargetResult] = {}
Expand Down Expand Up @@ -294,10 +359,27 @@ def _build_off_target_result(
return result

def mappings_of(self, primers: list[PrimerType]) -> dict[str, BwaResult]:
"""Function to take a set of primers and map any that are not cached, and return mappings
for all of them. Note: the genomics sequence of the returned mappings are on the opposite
strand of that of the strand of the primer. I.e. we map the complementary bases (reversed)
to that of the primer."""
"""
Map primers to the reference genome using `bwa aln`.

Alignment results may be optionally cached for efficiency. Set `cache_results` to `True`
when instantiating the [[OffTargetDetector]] to enable caching.

Any primers without cached results, or all primers when `cache_results` is `False`, will be
mapped to the reference genome using `bwa aln` and the specified alignment parameters.

**Note**: The reverse complement of each primer sequence is used for mapping. The `query`
sequence in each `BwaResult` will match the input primer sequence, as will the sequences
used as keys in the output dictionary. However, the coordinates reported in each `BwaHit`
associated with a result will correspond to complementary sequence.
Comment on lines +371 to +374
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this sufficiently clear?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is, but I think this function should be private.


Args:
primers: A list of primers to map.

Returns:
A dictionary mapping each primer's sequence to its alignment results.
See `BwaResult` for details regarding the attributes included in each result.
"""

primers_to_map: list[PrimerType]
if not self._cache_results:
Expand Down
Loading