-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: use bwa-aln-interactive and upgrade developer's docs #32
Conversation
Local and remote tests hang forever when you install My full stack trace is: Stack Trace<doctest prymer.offtarget.bwa[3]>:1:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <prymer.offtarget.bwa.BwaAlnInteractive object at 0x13e8920d0>
ref = PosixPath('tests/offtarget/data/miniref.fa'), max_hits = 1, executable = 'bwa-aln-interactive'
max_mismatches = 3, max_mismatches_in_seed = 3, max_gap_opens = 0, max_gap_extensions = -1, seed_length = 20
reverse_complement = False, include_alt_hits = False, threads = 16
def __init__(
self,
ref: Path,
max_hits: int,
executable: str | Path = "bwa-aln-interactive",
max_mismatches: int = 3,
max_mismatches_in_seed: int = 3,
max_gap_opens: int = 0,
max_gap_extensions: int = -1,
seed_length: int = 20,
reverse_complement: bool = False,
include_alt_hits: bool = False,
threads: Optional[int] = None,
):
"""
Args:
ref: the path to the reference FASTA, which must be indexed with bwa.
max_hits: the maximum number of hits to report - if more than this number of seed hits
are found, report only the count and not each hit.
executable: string or Path representation of the `bwa-aln-interactive` executable path
max_mismatches: the maximum number of mismatches allowed in the full query sequence
max_mismatches_in_seed: the maximum number of mismatches allowed in the seed region
max_gap_opens: the maximum number of gap opens allowed in the full query sequence
max_gap_extensions: the maximum number of gap extensions allowed in the full query
sequence
seed_length: the length of the seed region
reverse_complement: reverse complement each query sequence before alignment
include_alt_hits: if true include hits to references with names ending in _alt,
otherwise do not include them.
threads: the number of threads to use. If `None`, use all available CPUs.
"""
threads = os.cpu_count() if threads is None else threads
executable_path = ExecutableRunner.validate_executable_path(executable=executable)
self.reverse_complement: bool = reverse_complement
self.include_alt_hits: bool = include_alt_hits
self.max_hits: int = max_hits
missing_aux_paths = []
for aux_ext in BWA_AUX_EXTENSIONS:
aux_path = Path(f"{ref}{aux_ext}")
if not aux_path.exists():
missing_aux_paths.append(aux_path)
if len(missing_aux_paths) > 0:
message: str
if len(missing_aux_paths) > 1:
message = "BWA index files do not exist:\n\t"
else:
message = "BWA index file does not exist:\n\t"
message += "\t\n".join(f"{p}" for p in missing_aux_paths)
raise FileNotFoundError(f"{message}\nPlease index with: `{executable_path} index {ref}`")
# -N = non-iterative mode: search for all n-difference hits (slooow)
# -S = output SAM (run samse)
# -Z = interactive mode (no input buffer and force processing with empty lines between recs)
command: list[str] = [
f"{executable_path}",
"aln",
"-t",
f"{threads}",
"-n",
f"{max_mismatches}",
"-o",
f"{max_gap_opens}",
"-e",
f"{max_gap_extensions}",
"-l",
f"{seed_length}",
"-k",
f"{max_mismatches_in_seed}",
"-X",
f"{max_hits}",
"-N",
"-S",
"-Z",
"-D",
f"{ref}",
"/dev/stdin",
]
super().__init__(command=command)
# HACK ALERT
# This is a hack. By trial and error, pysam.AlignmentFile() will block reading unless
# there's at least a few records due to htslib wanting to read a few records for format
# auto-detection. Lame. So a hundred queries are sent to the aligner to align enable the
# htslib auto-detection to complete, and for us to be able to read using pysam.
num_warmup: int = 100
for i in range(num_warmup):
query = Query(id=f"ignoreme:{i}", bases="A" * 100)
fastq_str = query.to_fastq(reverse_complement=self.reverse_complement)
self._subprocess.stdin.write(fastq_str)
self.__signal_bwa() # forces the input to be sent to the underlying process.
> self._reader = sam.reader(path=self._subprocess.stdout, file_type=sam.SamFileType.SAM)
prymer/offtarget/bwa.py:307:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
path = <_io.TextIOWrapper name=20 encoding='UTF-8'>, file_type = <SamFileType.SAM: ('', '.sam')>, unmapped = False
def reader(
path: SamPath, file_type: Optional[SamFileType] = None, unmapped: bool = False
) -> SamFile:
"""Opens a SAM/BAM/CRAM for reading.
To read from standard input, provide any of `"-"`, `"stdin"`, or `"/dev/stdin"` as the input
`path`.
Args:
path: a file handle or path to the SAM/BAM/CRAM to read or write.
file_type: the file type to assume when opening the file. If None, then the file
type will be auto-detected.
unmapped: True if the file is unmapped and has no sequence dictionary, False otherwise.
"""
> return _pysam_open(path=path, open_for_reading=True, file_type=file_type, unmapped=unmapped)
../../../.miniforge-x86/envs/prymer/lib/python3.12/site-packages/fgpyo/sam/__init__.py:309:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
path = <_io.TextIOWrapper name=20 encoding='UTF-8'>, open_for_reading = True
file_type = <SamFileType.SAM: ('', '.sam')>, unmapped = False, kwargs = {'mode': 'r'}
def _pysam_open(
path: SamPath,
open_for_reading: bool,
file_type: Optional[SamFileType] = None,
unmapped: bool = False,
**kwargs: Any,
) -> SamFile:
"""Opens a SAM/BAM/CRAM for reading or writing.
This function permits reading from standard input and writing to standard output. The specified
path may be the UNIX conventional `"-"`, the more explicit `"stdin"` or `"stdout"`, or an
absolute path to either of the standard streams `"/dev/stdin"` or `"/dev/stdout"`.
When writing to standard output, the file type must be specified.
Args:
path: a file handle or path to the SAM/BAM/CRAM to read or write.
open_for_reading: True to open for reading, false otherwise.
file_type: the file type to assume when opening the file. If None, then the file type
will be auto-detected for reading and must be a path-like object for writing.
unmapped: True if the file is unmapped and has no sequence dictionary, False otherwise.
kwargs: any keyword arguments to be passed to
`pysam.AlignmentFile`; may not include "mode".
"""
if isinstance(path, (str, Path)):
if str(path) in _STDIN_PATHS and open_for_reading:
path = sys.stdin
elif str(path) in _STDOUT_PATHS and not open_for_reading:
assert file_type is not None, "Must specify file_type when writing to standard output"
path = sys.stdout
else:
file_type = file_type or SamFileType.from_path(path)
path = str(path)
elif not isinstance(path, _IOClasses): # type: ignore
open_type = "reading" if open_for_reading else "writing"
raise TypeError(f"Cannot open '{type(path)}' for {open_type}.")
if file_type is None and not open_for_reading:
raise ValueError("file_type must be given when writing to a file-like object")
# file_type must be set when writing, so if file_type is None, then we must be opening it
# for reading. Hence, only set mode in kwargs to pysam when file_type is set and when
# writing since we can let pysam auto-recognize the file type when reading. See discussion:
# https://github.com/pysam-developers/pysam/issues/655
if file_type is not None:
kwargs["mode"] = "r" if open_for_reading else "w" + file_type.mode
else:
assert open_for_reading, "Bug: file_type was None but open_for_reading was False"
if unmapped and open_for_reading:
kwargs["check_sq"] = False
# Open it alignment file, suppressing stderr in case index files are older than SAM file
with fgpyo.io.suppress_stderr():
> alignment_file = pysam.AlignmentFile(path, **kwargs)
E KeyboardInterrupt
../../../.miniforge-x86/envs/prymer/lib/python3.12/site-packages/fgpyo/sam/__init__.py:290: KeyboardInterrupt |
e0b2797
to
e69b48f
Compare
EDIT: I solved the issue. See later comments. I've narrowed the issue down to this "hack" which does not appear to work for me: prymer/prymer/offtarget/bwa.py Lines 287 to 300 in e942368
Increasing the records for warmup doesn't resolve the issue. This is difficult to debug because stderr is suppressed for everything (bwa, htslib, etc.). Forcing flush/unbufferd IO with Although it initially appears pysam AlignmentFile is to blame (hence the hack), I have discovered the issue is not actually with pysam at all. If I remove the dependency on Removing the "hack" is easy and I suggest we do it anyway. But it exposes an issue with |
I figured it out. The GitHub release of As a side quest, I was able to remove the I wish |
Can you open a follow-up issue for this? I agree |
You have one open for Primer3: I wrote one up for bwa: |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #32 +/- ##
=======================================
Coverage 96.76% 96.77%
=======================================
Files 26 26
Lines 1701 1703 +2
Branches 328 327 -1
=======================================
+ Hits 1646 1648 +2
Misses 29 29
Partials 26 26 ☔ View full report in Codecov by Sentry. |
prymer/offtarget/bwa.py
Outdated
self._subprocess.stdin.flush() | ||
self._subprocess.stdin.write("\n" * 16) | ||
self._subprocess.stdin.flush() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Bioconda built executable requires more newlines before emitting an alignment.
Why the executable needs more than 2 newlines is currently a mystery.
@msto's tested the developer docs and tests passed. I think this is ready for review! |
@coderabbitai full review |
✅ Actions performedFull review triggered. |
WalkthroughThe pull request introduces several modifications across multiple files to enhance the project's configuration and documentation. Key changes include updates to the GitHub Actions workflow for testing, revisions to the Changes
Assessment against linked issues
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (13)
docs/index.md (1)
7-7
: LGTM! Consider updating the section title for clarity.The update to the installation documentation link is appropriate and aligns with the PR objectives of enhancing developer documentation. This change will help users find the expanded information more easily.
Consider updating the section title to reflect the expanded content:
-* [Installation](installation-and-developers-documentation.md) +* [Installation & Developer Guide](installation-and-developers-documentation.md)This change would make it immediately clear to users that the section now includes both installation instructions and developer documentation.
.github/workflows/tests.yml (1)
50-52
: Excellent updates to testing and documentation building!The simplification of the test command to
poetry run pytest
is a good improvement, making the workflow more straightforward.The addition of a documentation build test with
poetry run mkdocs build --strict
is a great practice. It ensures that the documentation can be built successfully and catches potential issues early. The use of the--strict
flag is particularly commendable as it treats warnings as errors, promoting high-quality documentation.Suggestion for improvement:
Consider adding the--cov
flag to the pytest command to generate coverage reports. This would align with the subsequent step that uploads coverage reports to Codecov.Example:
run: poetry run pytest --cov=./ --cov-report=xmlThis change would ensure that coverage data is generated during the test run.
Also applies to: 59-61
.gitignore (2)
Line range hint
88-116
: Consider uncommenting specific environment management patternsThe patterns for pyenv, pipenv, poetry, and pdm are commented out. Depending on your project's needs, you might want to uncomment some of these.
For example, if you're using Poetry, you might want to uncomment:
-#poetry.lock +poetry.lockThis ensures consistent dependency versions across different development environments.
Line range hint
162-167
: Consider uncommenting PyCharm-specific ignore patternThe PyCharm-specific ignore pattern is commented out. If your team uses PyCharm, you might want to uncomment this line.
If PyCharm is commonly used in your development environment, consider uncommenting:
-#.idea/ +.idea/This will ignore PyCharm project settings, which are typically user-specific.
README.md (2)
20-34
: Great improvements to installation and download information!The updated badges for Bioconda and PyPI installation, along with the addition of download statistics, provide valuable information to users and potential contributors. This enhances the project's credibility and makes it easier for users to choose their preferred installation method.
Consider adding a brief explanation in the README about the significance of Bioconda for bioinformatics projects, as not all users may be familiar with it.
48-62
: Great improvements to the installation instructions!The new "Recommended Installation" section provides clear and concise instructions, addressing the PR objectives. The explicit mention of prerequisites and the use of Bioconda for installation simplify the process for users.
Consider adding a brief note about alternative installation methods (e.g., PyPI) for users who may not prefer or have access to Bioconda. This could be a simple line like:
"For alternative installation methods, please refer to the [developer's instructions][developers-instructions-link]."docs/installation-and-developers-documentation.md (4)
13-39
: LGTM: Comprehensive development setup instructions.The development setup instructions are clear, comprehensive, and well-organized. They cover all necessary steps from environment setup to package installation.
Consider adding a note about potential troubleshooting steps or common issues that developers might encounter during the setup process.
41-54
: LGTM: Clear instructions for testing and linting.The build checking section provides clear instructions for running tests, including mypy checks, ruff checks, and unit tests. The separation of formatting and linting commands is appropriate.
Consider adding a brief explanation of what each tool (mypy, ruff, pytest) does for developers who might be unfamiliar with them.
64-73
: LGTM: Well-documented release process.The release process is well-documented and follows best practices for version control and release management. The steps are clear and easy to follow.
Consider adding a note about the importance of updating the changelog before creating a release, as this is often overlooked but crucial for maintaining a good release history.
74-92
: LGTM: Comprehensive release automation and versioning explanation.The explanation of Semantic Versioning and the description of the automated release process are clear and comprehensive. The note about editing the changelog is important and well-placed.
Regarding the static analysis hints:
- The phrase "In brief" (line 76) is concise and appropriate in this context. No change needed.
- Consider hyphenating "backwards-compatible" in lines 79 and 80 for improved readability:
-* `MINOR` version when you add functionality in a backwards compatible manner -* `PATCH` version when you make backwards compatible bug fixes +* `MINOR` version when you add functionality in a backwards-compatible manner +* `PATCH` version when you make backwards-compatible bug fixes🧰 Tools
🪛 LanguageTool
[style] ~76-~76: ‘In brief’ might be wordy. Consider a shorter alternative.
Context: ...tic Versioning](https://semver.org/). > In brief: > > *MAJOR
version when you make i...(EN_WORDINESS_PREMIUM_IN_BRIEF)
[uncategorized] ~79-~79: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...version when you add functionality in a backwards compatible manner > *PATCH
version when you mak...(EN_COMPOUND_ADJECTIVE_INTERNAL)
[uncategorized] ~80-~80: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...anner > *PATCH
version when you make backwards compatible bug fixes GitHub Actions will take car...(EN_COMPOUND_ADJECTIVE_INTERNAL)
prymer/offtarget/bwa.py (3)
18-23
: Excellent addition of installation information.The new documentation clearly explains the requirement for
bwa-aln-interactive
and provides helpful links for installation. This improves the usability of the module.Consider adding a brief note about the purpose or advantages of using
bwa-aln-interactive
over standardbwa
to provide context for why this specific version is required.
203-206
: Good addition of reference links in the class docstring.The added links to the GitHub repository and Bioconda package in the
BwaAlnInteractive
class docstring provide valuable context and resources for users.Consider using a more concise format for the links, such as inline links, to improve readability. For example:
See [bwa-aln-interactive on GitHub](https://github.com/fulcrumgenomics/bwa-aln-interactive) and its [Bioconda package](https://bioconda.github.io/recipes/bwa-aln-interactive/README.html).
308-313
: Improved documentation for BWA signaling, but further clarification needed.The added comments provide helpful context about the need for multiple newlines when signaling BWA. However, the exact reason for using 16 newlines and the nature of the platform differences remain unclear.
To address the previous review comment and improve clarity:
- Consider adding a brief explanation of which platforms were tested and how they behaved differently.
- If possible, provide more specific information about why 16 newlines were chosen and how this number was determined.
- Consider adding a TODO comment to investigate and document the root cause of this platform-dependent behavior in the future.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (10)
- .github/workflows/tests.yml (1 hunks)
- .gitignore (1 hunks)
- README.md (1 hunks)
- docs/index.md (1 hunks)
- docs/installation-and-developers-documentation.md (1 hunks)
- docs/installation.md (0 hunks)
- prymer.yml (1 hunks)
- prymer/offtarget/bwa.py (6 hunks)
- prymer/offtarget/offtarget_detector.py (2 hunks)
- pyproject.toml (1 hunks)
💤 Files with no reviewable changes (1)
- docs/installation.md
🧰 Additional context used
🪛 LanguageTool
README.md
[style] ~37-~37: Using many exclamation marks might seem excessive (in this case: 4 exclamation marks for a text that’s 706 characters long)
Context: ...python-tests-badge]][python-tests-link] [![publish prymer][publish-prymer-badge]][...(EN_EXCESSIVE_EXCLAMATION)
docs/installation-and-developers-documentation.md
[style] ~76-~76: ‘In brief’ might be wordy. Consider a shorter alternative.
Context: ...tic Versioning](https://semver.org/). > In brief: > > *MAJOR
version when you make i...(EN_WORDINESS_PREMIUM_IN_BRIEF)
[uncategorized] ~79-~79: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...version when you add functionality in a backwards compatible manner > *PATCH
version when you mak...(EN_COMPOUND_ADJECTIVE_INTERNAL)
[uncategorized] ~80-~80: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...anner > *PATCH
version when you make backwards compatible bug fixes GitHub Actions will take car...(EN_COMPOUND_ADJECTIVE_INTERNAL)
🔇 Additional comments (31)
prymer.yml (2)
6-8
: Approve new dependencies and channel specifications.The new dependencies align well with the project objectives:
- The addition of
bwa-aln-interactive
from Bioconda addresses the integration requirement mentioned in the PR objectives.- The explicit channel specifications for each package (bioconda, conda-forge) is a good practice for reproducibility.
- The Python dependency is now explicitly from conda-forge, which is appropriate.
Let's verify if the specified version of bwa-aln-interactive is the latest stable version:
#!/bin/bash # Description: Check if bwa-aln-interactive 0.7.18 is the latest version on Bioconda # Test: Search for the latest version of bwa-aln-interactive on Bioconda conda search -c bioconda bwa-aln-interactive | tail -n 1If a newer version is available, consider updating to the latest stable release.
6-8
: Verify the removal of development dependencies.Several development and testing tools have been removed from the conda environment file:
- ruff (linter)
- mypy (static type checker)
- pytest and pytest-workflow (testing frameworks)
- poetry and pyproject_hooks (dependency management)
While this simplifies the conda environment, it raises some concerns:
- How will linting, type checking, and testing be performed now?
- Is there an alternative method for managing development dependencies?
To ensure these tools are still available for development, let's check if they're specified elsewhere:
If these development tools are not specified elsewhere, consider:
- Adding them to a separate development-specific conda environment file.
- Specifying them in a
pyproject.toml
file if using Poetry for dependency management.- Documenting the development setup process if these tools are intended to be installed globally or through other means.
✅ Verification successful
Development dependencies are correctly managed in
pyproject.toml
.The removed development and testing tools are specified under
[tool.poetry.group.dev.dependencies]
inpyproject.toml
, ensuring they remain available for development purposes.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for development dependencies in pyproject.toml or other config files # Test 1: Check if pyproject.toml exists and contains dev dependencies if [ -f pyproject.toml ]; then echo "Contents of pyproject.toml:" cat pyproject.toml fi # Test 2: Check for any other potential config files that might specify dev dependencies echo "Other potential config files:" find . -maxdepth 1 -type f \( -name "*.toml" -o -name "*.ini" -o -name "*.cfg" \) -not -name "pyproject.toml"Length of output: 226
Script:
#!/bin/bash # Description: Check for development dependencies in common config files # Test 1: Check if pyproject.toml exists and contains dev dependencies if [ -f pyproject.toml ]; then echo "Contents of pyproject.toml:" cat pyproject.toml else echo "pyproject.toml not found." fi # Test 2: Check for requirements.txt and list dev dependencies if present if [ -f requirements.txt ]; then echo "Contents of requirements.txt:" cat requirements.txt else echo "requirements.txt not found." fi # Test 3: Check for setup.py and list development dependencies if [ -f setup.py ]; then echo "Contents of setup.py:" cat setup.py else echo "setup.py not found." fi # Test 4: Search for other potential config files that might specify dev dependencies echo "Other potential config files containing 'dev' or 'development':" grep -rilE 'dev(dependencies)?' .Length of output: 5349
.github/workflows/tests.yml (4)
28-32
: Excellent addition of Poetry installation step!This new step explicitly manages the Poetry installation, ensuring consistency across environments by using a specific version. The use of parallel installation is a nice touch for potential performance improvements.
This change effectively addresses the previous suggestion about managing Poetry in its own environment, albeit using a different approach than the suggested
pipx
.
46-48
: Great simplification of dependency installation!The change to use
poetry install
for dependency installation is a positive improvement. This approach is the standard and recommended way to install dependencies for Poetry-based projects, ensuring that all dependencies specified in the pyproject.toml file are correctly installed.This simplification makes the workflow more straightforward and easier to maintain.
Line range hint
1-61
: Overall excellent improvements to the workflow!The changes made to this workflow file significantly improve its efficiency and align it with best practices:
- The addition of a dedicated Poetry installation step ensures consistent dependency management.
- Simplification of dependency installation and testing commands improves readability and maintainability.
- The inclusion of a documentation build test enhances the overall quality assurance process.
These modifications collectively contribute to a more robust and reliable CI/CD pipeline. The workflow now better handles dependency management, testing, and documentation building, which should lead to improved development and deployment processes.
Great job on these updates!
38-38
: Verify the intention behind channel reordering.The channels have been reordered from "conda-forge,bioconda" to "bioconda,conda-forge". This change may affect package resolution priorities, potentially prioritizing bioinformatics-specific packages from Bioconda over more general packages from conda-forge.
Please confirm if this reordering is intentional and consider any potential impacts on package versions or conflicts that might arise from this change.
✅ Verification successful
Channel reordering verified as intentional.
The channels have been reordered to prioritize "bioconda" over "conda-forge" in both the configuration file and documentation. This ensures that bioinformatics-specific packages from Bioconda are given priority, aligning with the project's dependency requirements.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for any conda-related issues in the repository # Test: Search for conda environment files and their channel specifications echo "Searching for conda environment files and their channel specifications:" rg --type yaml 'channels:' -A 5 # Test: Look for any explicit mentions of conda channels in Python files echo "Checking for explicit mentions of conda channels in Python files:" rg --type python 'conda[-_]forge|bioconda' # Test: Search for any conda-related issues or discussions in documentation echo "Searching for conda-related discussions in documentation:" rg --type md 'conda[-_]forge|bioconda'Length of output: 1679
.gitignore (11)
1-2
: LGTM: General system and IDE filesThe inclusion of
.DS_Store
and.vscode/
is appropriate for ignoring system-specific and IDE files.
Line range hint
4-13
: LGTM: Python-specific patternsThis section correctly ignores common Python bytecode and compiled files.
Line range hint
15-31
: LGTM: Distribution and packagingComprehensive coverage of distribution and packaging artifacts. This section will help keep the repository clean of build-related files.
Line range hint
33-37
: LGTM: PyInstallerAppropriate patterns for ignoring PyInstaller build artifacts.
Line range hint
39-41
: LGTM: Installer logsCorrectly ignores pip-related log files.
Line range hint
43-57
: LGTM: Testing and coverageComprehensive coverage of test-related files and directories for various testing frameworks.
Line range hint
59-61
: LGTM: TranslationsAppropriate patterns for ignoring translation files.
Line range hint
63-70
: LGTM: Web frameworks (Django, Flask)Correctly ignores framework-specific files and directories.
Line range hint
72-86
: LGTM: Other tools (Scrapy, Sphinx, PyBuilder, Jupyter, IPython)Good coverage of various Python-related tools and their artifacts.
Line range hint
118-160
: LGTM: Other Python-related tools and frameworksComprehensive coverage of various Python tools, including Celery, SageMath, virtual environments, and type checkers.
Line range hint
1-167
: Overall assessment: Comprehensive and well-structured .gitignore fileThis
.gitignore
file is comprehensive and well-structured, covering a wide range of Python development scenarios. It will effectively keep the repository clean of unnecessary files and build artifacts.A few minor suggestions:
- Consider uncommenting specific environment management patterns (e.g.,
poetry.lock
) if they apply to your project.- If PyCharm is used in your development environment, consider uncommenting the
.idea/
pattern.These minor adjustments can further tailor the
.gitignore
file to your project's specific needs.README.md (3)
5-18
: Excellent updates to the project badges!The addition of MyPy, Poetry, and Ruff badges significantly improves the README by showcasing the project's commitment to modern development practices. The reorganization enhances clarity and provides valuable information to potential contributors and users.
37-46
: Excellent addition of CI/CD and code quality badges!The new badges for tests, publishing, and code coverage provide crucial information about the project's quality and development processes. This transparency is valuable for both users and potential contributors, instilling confidence in the project's reliability and maintenance.
🧰 Tools
🪛 LanguageTool
[style] ~37-~37: Using many exclamation marks might seem excessive (in this case: 4 exclamation marks for a text that’s 706 characters long)
Context: ...python-tests-badge]][python-tests-link] [![publish prymer][publish-prymer-badge]][...(EN_EXCESSIVE_EXCLAMATION)
1-62
: Excellent updates to the README.md file!The changes made to this file significantly enhance its quality and usefulness:
- The updated badges provide a comprehensive overview of the project's quality, development practices, and installation options.
- The new "Recommended Installation" section addresses Issue Update README to refer to bioconda package for
bwa-aln-interactive
#30 by clearly referencing the Bioconda package and providing specific installation instructions.- The promotion of Bioconda for installation indirectly supports Issue Add
prymer
to bioconda #5 by encouraging users to use this bioinformatics-focused package manager.- The overall structure and content of the README now provide a more informative and user-friendly introduction to the project.
These improvements align well with the PR objectives and will greatly benefit both new and existing users of the project.
🧰 Tools
🪛 LanguageTool
[style] ~37-~37: Using many exclamation marks might seem excessive (in this case: 4 exclamation marks for a text that’s 706 characters long)
Context: ...python-tests-badge]][python-tests-link] [![publish prymer][publish-prymer-badge]][...(EN_EXCESSIVE_EXCLAMATION)
docs/installation-and-developers-documentation.md (3)
1-11
: LGTM: Clear and concise installation instructions.The recommended installation section provides clear and concise instructions for installing
prymer
using bioconda. It correctly mentions the dependencies and provides the exact command for installation.
56-62
: LGTM: Concise documentation building instructions.The documentation building section provides clear and concise instructions for building and serving the documentation using mkdocs. The combined command for building and serving is efficient.
1-92
: Overall: Excellent documentation for installation and development.This new documentation file provides comprehensive and well-structured guidance for installing, developing, and releasing the
prymer
package. It covers all necessary aspects, including environment setup, testing, documentation building, and the release process. The instructions are clear, concise, and follow best practices.A few minor suggestions have been made to further enhance the documentation, but overall, this is an excellent addition to the project that will greatly benefit both users and developers.
🧰 Tools
🪛 LanguageTool
[style] ~76-~76: ‘In brief’ might be wordy. Consider a shorter alternative.
Context: ...tic Versioning](https://semver.org/). > In brief: > > *MAJOR
version when you make i...(EN_WORDINESS_PREMIUM_IN_BRIEF)
[uncategorized] ~79-~79: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...version when you add functionality in a backwards compatible manner > *PATCH
version when you mak...(EN_COMPOUND_ADJECTIVE_INTERNAL)
[uncategorized] ~80-~80: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...anner > *PATCH
version when you make backwards compatible bug fixes GitHub Actions will take car...(EN_COMPOUND_ADJECTIVE_INTERNAL)
pyproject.toml (1)
Line range hint
149-152
: Approve Ruff linting and isort configuration changesThe changes to the Ruff configuration are beneficial:
- Adding "B" to the
unfixable
list prevents automatic fixes for bugbear checks, which often require human review.- Setting
force-single-line = true
for isort will format all imports on separate lines, improving readability.These changes will enhance code quality and consistency.
prymer/offtarget/bwa.py (4)
195-195
: Good addition of docstring for BWA_AUX_EXTENSIONS.The added docstring clearly explains the purpose of the
BWA_AUX_EXTENSIONS
constant, improving code readability and maintainability.
221-221
: Correct update of default executable name.Changing the default value of the
executable
parameter from "bwa" to "bwa-aln-interactive" ensures consistency with the module's requirements and previous documentation updates.
236-236
: Appropriate update to executable parameter docstring.The docstring for the
executable
parameter has been correctly updated to reflect the use ofbwa-aln-interactive
, maintaining consistency with previous changes.
266-266
: Correct update to error message for missing BWA index files.The error message has been appropriately updated to use the correct executable name when suggesting how to index the reference. The use of
f"{executable_path}"
ensures that the message remains accurate even if a custom executable path is provided.prymer/offtarget/offtarget_detector.py (3)
Line range hint
129-167
: Excellent improvement to the class docstring!The expanded docstring for the
OffTargetDetector
class is a significant enhancement. It now provides:
- Clear information about the custom "bwa-aln-interactive" tool being used.
- An overview of the class's functionality and main methods.
- Important notes on thread safety.
- A comprehensive explanation of off-target detection for both individual primers and primer pairs.
These improvements greatly enhance the usability and understanding of the class for potential users.
Line range hint
169-228
: Great improvement to the init method docstring!The expanded docstring for the
__init__
method is a significant enhancement:
- It now groups parameters by their role in the off-target detection process, making it easier to understand their purpose.
- Each parameter is explained in detail, providing crucial information for users of the class.
- The docstring clarifies the default value and use case for
min_primer_pair_hits
, which is particularly helpful.These improvements will greatly aid users in properly initializing and using the
OffTargetDetector
class.
Line range hint
229-516
: Overall improvements focus on documentation and alignment with custom BWA versionThe changes in this file are primarily focused on improving documentation and aligning the
OffTargetDetector
class with the use of the custom "bwa-aln-interactive" tool. The core functionality of the class remains unchanged, which is good for maintaining compatibility with existing usage.Key improvements:
- Enhanced class and method docstrings provide better clarity on the class's purpose and usage.
- The
executable
parameter now defaults to "bwa-aln-interactive", aligning with the custom BWA version mentioned in the documentation.These changes should improve the usability and understanding of the
OffTargetDetector
class without introducing breaking changes to its core functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
tests/offtarget/test_offtarget.py (1)
40-40
: Update function signature to match usageThe addition of
executable=BWA_EXECUTABLE_NAME
to theOffTargetDetector
constructor is good. However, the_build_detector
function signature should be updated to include this new parameter for consistency and to prevent potential issues in the future.Consider updating the
_build_detector
function signature as follows:def _build_detector( ref_fasta: Path, max_primer_hits: int = 1, max_primer_pair_hits: int = 1, three_prime_region_length: int = 20, max_mismatches_in_three_prime_region: int = 0, max_mismatches: int = 0, max_amplicon_size: int = 250, cache_results: bool = True, executable: str = BWA_EXECUTABLE_NAME, ) -> OffTargetDetector:prymer/offtarget/offtarget_detector.py (1)
Line range hint
130-168
: LGTM: Improved docstring and flexible executable parameterThe updates to the class docstring and
__init__
method are beneficial:
- The expanded docstring provides clearer information about the custom BWA version used, which is helpful for users and maintainers.
- Changing the
executable
parameter's default value toBWA_EXECUTABLE_NAME
improves flexibility and maintainability.These changes align well with the updated import statement.
Consider adding a brief explanation in the docstring about the
BWA_EXECUTABLE_NAME
constant and where it's defined. This could help developers quickly understand where to look if they need to modify the executable name.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (5)
- prymer/offtarget/bwa.py (7 hunks)
- prymer/offtarget/offtarget_detector.py (3 hunks)
- pyproject.toml (1 hunks)
- tests/api/test_picking.py (3 hunks)
- tests/offtarget/test_offtarget.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- prymer/offtarget/bwa.py
- pyproject.toml
🧰 Additional context used
🔇 Additional comments (6)
tests/offtarget/test_offtarget.py (2)
11-11
: LGTM: Import of BWA_EXECUTABLE_NAMEThe addition of this import is appropriate and follows good practices by importing only the specific name needed.
Line range hint
1-284
: Verify impact on other test casesThe changes to import and use
BWA_EXECUTABLE_NAME
are appropriate and align with the PR objectives. However, it's important to ensure that these changes don't inadvertently affect other test cases in this file.Please run the following command to check if any other test cases might be impacted:
This will help identify any other places where BWA configurations might need to be updated for consistency.
✅ Verification successful
Please run the following command to check for any other occurrences of BWA-related configurations in the test file:
BWA-related configurations are confined to lines 8-14 in
tests/offtarget/test_offtarget.py
.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for any other occurrences of BWA-related configurations in the test file rg --type python "bwa|BWA" tests/offtarget/test_offtarget.pyLength of output: 93
Script:
#!/bin/bash # Description: Check for any other occurrences of BWA-related configurations in the test file rg --type py "bwa|BWA" tests/offtarget/test_offtarget.pyLength of output: 334
prymer/offtarget/offtarget_detector.py (1)
93-93
: LGTM: ImportingBWA_EXECUTABLE_NAME
constantThe import of
BWA_EXECUTABLE_NAME
fromprymer.offtarget.bwa
is a good practice. It centralizes the definition of the BWA executable name, which can make future updates easier and more consistent across the codebase.tests/api/test_picking.py (3)
30-30
: Consistent Import ofBWA_EXECUTABLE_NAME
Importing
BWA_EXECUTABLE_NAME
fromprymer.offtarget.bwa
ensures that the test suite uses the correct executable consistently. This improves maintainability and reduces the risk of discrepancies between the tests and the actual implementation.
589-589
: Explicitly Specifyingexecutable
Parameter inOffTargetDetector
Adding
executable=BWA_EXECUTABLE_NAME
to theOffTargetDetector
instantiation makes the executable explicit, enhancing clarity and ensuring that the correct BWA executable is used during testing. This change improves test reliability by avoiding any default behaviors that might differ from the intended configuration.
897-897
: Ensuring Correct Executable intest_and_pick_primer_pairs
By passing
executable=BWA_EXECUTABLE_NAME
to theOffTargetDetector
withintest_and_pick_primer_pairs
, you ensure consistency across the test cases. This explicit specification enhances reproducibility and aligns the test setup with the application's expected behavior.
Co-authored-by: Tim Fennell <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
prymer/offtarget/bwa.py (1)
63-65
: Good addition of BWA_EXECUTABLE_NAME constantThe introduction of this constant improves code maintainability by centralizing the executable name.
Consider expanding the docstring slightly to provide more context:
-"""The executable name for the interactive build of bwa aln.""" +"""The executable name for the interactive build of bwa aln, used for alignment operations."""
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
- prymer/offtarget/bwa.py (7 hunks)
🧰 Additional context used
🔇 Additional comments (6)
prymer/offtarget/bwa.py (6)
18-23
: Excellent addition of installation instructionsThe new documentation clearly explains the requirement for
bwa-aln-interactive
and provides useful links to the GitHub repository and Bioconda package. This addition enhances the usability of the module and helps users set up the necessary dependencies.
Line range hint
224-239
: Improved constructor signature and documentationThe changes to the constructor are well-implemented:
- Using
BWA_EXECUTABLE_NAME
as the default value for theexecutable
parameter ensures consistency.- The updated documentation for the
executable
parameter is clear and informative.These modifications enhance the usability and maintainability of the class.
269-269
: Enhanced error message for missing BWA index filesThe updated error message now includes the correct executable path in the suggested indexing command. This improvement makes it easier for users to resolve issues with missing BWA index files, enhancing the overall user experience.
198-198
: Typo correction in commentThe typo in the comment for
BWA_AUX_EXTENSIONS
has been corrected from "extensiosn" to "extensions". This minor change improves the overall quality and readability of the code.
Line range hint
1-458
: Overall assessment of changesThe modifications in this file significantly improve the integration of
bwa-aln-interactive
, enhance documentation, and address several issues mentioned in the linked GitHub issues and past review comments. Key improvements include:
- Clear installation instructions for
bwa-aln-interactive
.- Introduction of a constant for the BWA executable name, improving maintainability.
- Updated constructor signature and documentation for better clarity.
- Enhanced error messages for missing BWA index files.
- Implementation of a workaround for platform-specific newline issues.
These changes align well with the PR objectives and contribute to a more robust and user-friendly codebase. The only area that might benefit from further investigation is the platform-specific newline behavior in the
__signal_bwa
method.
311-316
: Workaround implemented for platform-specific newline issueThe changes address the issue mentioned in past comments about the executable requiring more newlines. The implementation of 16 newlines seems to work across different platforms.
While this workaround is functional, it would be beneficial to understand the root cause of this behavior. Consider the following:
- Investigate why different platforms require a different number of newlines.
- Explore if there's a more robust solution that doesn't rely on a fixed number of newlines.
- Document any findings in the code or in a separate document for future reference.
To help with the investigation, you could run the following script to check if there are any platform-specific configurations or if this behavior is documented:
Closes #5
Closes #30
Closes #27
This PR also: