Releases: dportik/SuperCRUNCH
Releases · dportik/SuperCRUNCH
v1.3.2
v1.3.1
v1.3.0
Changes in v1.3.0:
- Added a
conda
environment recipe for SuperCRUNCH, allowing easy installation of all requirements except MACSE. Parse_Loci.py
: Added new feature that allows a term to be added to the loci search terms that will exclude a record if a match is found. For example, adding the negative termpseudogene
will exclude all records containing that word, even if they match the other abbreviation or description terms. This requires a four-column search terms file, where the fourth column is the negative term (N/A
in this column indicates no negative term should be used). This module was made backwards-compatible with the three-column search terms file - if a fourth column is not present theN/A
is automatically generated.Filter_Seqs_and_Species.py
: Added--accessions_include
flag. This points to a text file of accession numbers (one per line). When used with the--seq_selection oneseq
option, if an accession included in the list is found in the available seqs for a taxon and gene, it must be selected. This is not just an "allowed list", this list will override other settings for selection such as length. Also added the--accessions_exclude
flag, which points to a text file of accession numbers (one per line). These accessions will NEVER be selected - they are removed from all searches. This is the equivalent of including a "blocked list".Taxa_Assessment.py
: Altered SQL search query for "unmatched" taxa to avoid sql variable limit maximum issue. Also, now invokes theSeqIO.index_db()
method for sequence files >5GB, rather than usingSeqIO.index()
method, which is much more memory efficient for big data. TheSeqIO.index_db()
method is already used inParse_Loci.py
.Cluster_Blast_Extract.py
: Added feature to remove problematic long sequences if they somehow end up in the main cluster of sequences for a gene. The new filter removes all seqs that are 1.3x the length of the 95th percentile of all lengths.- Added a new
Remove_Long_Accessions.py
module, which can filter a downloaded GenBank fasta file to remove extremely long sequences (>150kb). This will eliminate whole genome sequencing records, which are not useful for SuperCRUNCH. - Updated recognition for file extensions produced by updated blastn tools (
.ndb
,.not
,.ntf
,.nto
).
Release for Zenodo archiving
V1.2.1 added interleaved nexus output
Release 1.2
- Version 1.2:
- Made all modules compatible with Python 2.7 and Python 3.7.
- SQL now implemented in
Parse_Loci.py
(up to 30x speedup!),Filter_Seqs_and_Species.py
(3x speedup), andTaxon_Assessment.py
(3x speedup). - Added output directory specification to all modules.
- Two trimming modules now included:
Trim_Alignments_Trimal.py
andTrim_Alignments_Custom.py
. TheTrim_Alignments_Custom.py
module allows finding start and stop block positions, and row-wise (internal) sliding window trimming based on divergence. - Added new module
Filter_Fasta_by_Min_Seqs.py
to filter fasta files using a minimum number of sequences. - Output directory structures improved for all modules.
- Added
--quiet
option toFilter_Seqs_and_Species.py
for less output on screen (useful when processing large numbers of loci). - Added option
--numerical
toFasta_Get_Taxa.py
to allow non-alphabetical identifiers for subspecies/trinomial name combinations. This allows museum, field, or numerical codes to be discovered. - Re-ordered tasks in
Cluster_Blast_Extract.py
to allow completion of all steps for one fasta file before moving to next fasta file in sequence. - Added multithreading for BLAST searches and new --bp_bridge flag for coordinate merging in
Cluster_Blast_Extract.py
andReference_Blast_Extract.py
. - Remove empty fasta files sometimes produced by
Coding_Translation_Tests.py
. - Complete code re-write for
Align.py
,Cluster_Blast_Extract.py
,Filter_Seqs_and_Species.py
,Parse_Loci.py
,Taxon_Assessment.py
. - Module
Relabel_Fasta.py
is nowFasta_Relabel_Seqs.py
.
Release 1.1
- Version 1.1:
- Added multithreading option for MAFFT and Clustal-O in
Align.py
- Added multithreading option for MAFFT in
Adjust_Direction.py
- Added arg to specify output directory for
Concatenation.py
- Corrected output column labeling in label key output files from
Relabel_Fasta.py
- Added gappyout option for trimming with trimAl in
Trim_Alignments.py
- Output sequences failing similarity searches to own file in
Cluster_Blast_Extract.py
andReference_Blast_Extract.py
- Updated documentation on wiki pages
- Added multithreading option for MAFFT and Clustal-O in
initial release
Initial release of SuperCRUNCH.