- major changes to the command-line API
- the new API now uses a subcommand system:
pyani <cmd> <arguments>
- the new API now uses a subcommand system:
- major changes to result storage
- results are now stored in an SQLite database, rather than reported to
.tab
files - enables reuse of previously calculated results in new analyses
- enables generation of multiple output files from the same analysis, after the analysis is complete
- results are now stored in an SQLite database, rather than reported to
- more output formats
- tabular output is available as HTML tables, as well as plain text
- new documentation
- documentation is now available at ReadTheDocs
- papers citing or referring to
pyani
are now listed
pyani download
(replacinggenbank_get_genomes_by_taxon.py
) allows use of an NCBI API key for faster/more stable downloadsgenbank_get_genomes_by_taxon.py
label/class file output updated to include a hash, matching thepyani download
output format
pyani plot
now produces distribution plots in addition to heatmaps- A number of bug fixes were implemented, including:
- consistent handling of filenames as
Path
s - alignment length calculation for ANIm was corrected
- consistent handling of filenames as
- major refactoring of code ported from v0.2
- static typing implemented
- testing converted to
pytest
conventions fromunittest
- update legacy BLAST download location in TravisCI
- update concordance tests (issue #105)
- extend test suites (issue #104)
- modify ANIm concordance test to accommodate new command structure
- add
delta-filter
wrapper for compatibility with SGE/OGE schedulers
- Fix for issue #97 where numeric arguments to the GenBank download script were not recognised
- GenBank download script now insists on integer input for
--batchsize
,--retries
, and--timeout
- Added
setup.cfg
that points to README.md - Fix issue #97 where valid input arguments were not recognised in the download script
- Add Dockerfiles for making Docker images
ANIm
now usesdelta-filter
to remove alignments of repeat regions (issue #91)- added
--filter_exe
option to specify location ofdelta-filter
utility (issue #91) - fixed
--format
option so that GenBank downloads work again (issue #89) - add
--SGEargs
option toaverage_nucleotide_identity.py
for custom qsub settings README.md
badges now clickable--version
switch added toaverage_nucleotide_identity.py
- FTP timeouts are now caught differently in
genbank_get_genomes_by_taxon.py
- Additional characters in NCBI FTP URIs now escaped in
genbank_get_genomes_by_taxon.py
- should be fewer failed downloads - Modified error messaging when
NUCmer
alignment fails average_nucleotide_identity.py
argument documentation improvements- Script now fails immediately if label or class files missing (issue #78)
- Changes to
--noclobber
log behaviour (issue #79) - fixed
--rerender
code (issue #85)
- fixes a bug in the installed scripts where the shebang (
#!
) in wheel and egg packages pointed to a development Python
- fix for issue #53 (--maxmatch has no effect)
- fix to
genbank_get_genomes_by_taxon.py
to account for NCBI FTP location changes - fixed issue #52 (local variable bug)
- fixed issued #49 (TETRA failure) and #51 (matplotlib bug)
- add several tests and support for
codecov.io
,landscape.io
andTravis-CI
- removed requirement for
rpy2
- moved scripts to
bin/
subdirectory
pyani
now requiresrpy2
v2.8.0 in order to satisfy running under Anaconda (see issue #26)pyani
now checks for presence ofrpy2
and - when run from source - ifrpy2
is not available,pyani
doesn't throw an error until R graphical output is requested. If installed -via-pip
, thenpyani
still raisespkg_resources.DistributionNotFound
ifrpy2
is missing.- Updated
genbank_get_genomes_by_taxon.py
script to use the new FTP locations at NCBI for each assembly. - Fixed bug where
ANIb
would not go to completion if empty BLASTN files were generated (see issue #27) - Fixed bug where
ANIm
would not finish undermultiprocessing
if input sequences were highly divergent. - Added Hadamard product of percentage identity and alignment coverage as output.
- Fixed bug where label/classes are out of sync with new NCBI downloaded filenames
- Added --rerender option to draw (new) graphics from old output, without recalculation
- Corrected matplotlib row dendrogram orientation
- Seaborn output no longer dumps core on large (ca. 500 genome) datasets
genbank_get_genomes_by_taxon.py
attempts to identify cause for failed downloads and correct, where nomenclature/versions are at fault- graceful replacement of classes that are not present in
classes.txt
- add
pyani
version to log file
- Merged pull request from peterjc to make printing from tests Python3-friendly.
- Merged pull request from peterjc to use
open()
for opening files. - Merged pull request from peterjc to cope with missing labels/classes more gracefully
- Fixed
-s
/--fragsize
option inaverage_nucleotide_identity.py
(thanks to Joseph Adelskov for hte report). - BLAST and
nucmer
results are now written to a subdirectory of the output folder. By default, these sequence search output files are compressed, but this behaviour can be suppressed using the--nocompress
option. - Added
genbank_get_genomes_by_taxon.py
as an aid to downloading publicly-available genome files from GenBank, for analysis.