-
Notifications
You must be signed in to change notification settings - Fork 3
Home
Felix Thalén edited this page Apr 23, 2024
·
8 revisions
Patchwork is an alignment-based program for retrieving and concatenating phylogenetic markers from whole-genome sequencing (WGS) data. The program searches the provided DNA query contigs against one or more amino acid reference sequences. Multiple, overlapping hits are merged to derive a single, continuous sequence for each provided reference sequence.
usage: Patchwork.jl [--contigs PATH [PATH...]]
--reference PATH [PATH...] [--search-results PATH]
[--database PATH] [--matrix NAME]
[--custom-matrix PATH]
[--species-delimiter CHARACTER]
[--fasta-extension STRING] [--wrap-column NUMBER]
[--no-plots] [--output-dir PATH] [--overwrite]
[--query-gencode NUMBER] [--strand STRING]
[--min-orf NUMBER] [--fast] [--mid-sensitive]
[--sensitive] [--more-sensitive]
[--very-sensitive] [--ultra-sensitive]
[--iterate [MODE...]] [--frameshift NUMBER]
[--evalue NUMBER] [--min-score NUMBER]
[--max-target-seqs NUMBER] [--top NUMBER]
[--max-hsps NUMBER] [--id PERCENTAGE]
[--query-cover PERCENTAGE]
[--subject-cover PERCENTAGE] [--masking MODE]
[--len NUMBER] [--gapopen NUMBER]
[--gapextend NUMBER] [--retain-stops]
[--retain-ambiguous] [--no-trimming]
[--window-size NUMBER]
[--required-distance NUMBER] [--threads NUMBER]
[--block-size NUMBER] [--version] [-h]
Alignment-based retrieval and concatenation of phylogenetic markers
from whole-genome sequencing data
optional arguments:
--version show version information and exit
-h, --help show this help message and exit
input/output:
--contigs PATH [PATH...]
PATH to 1+ nucleotide sequence files in FASTA
or FASTQ format. Can be GZip compressed.
--reference PATH [PATH...]
PATH to 1+ amino acid sequence files in the
FASTA format.
--search-results PATH
PATH to a tabular DIAMOND output file, with
one header line in format: 6 qseqid sseqid
pident length mismatch gapopen qstart qend
sstart send evalue bitscore qframe sseq seq.
--database PATH Path to a subject DIAMOND or BLAST database to
search against.
--matrix NAME Specifies the NAME of the scoring matrix
--custom-matrix PATH PATH to a custom scoring matrix
--species-delimiter CHARACTER
Set the CHARACTER used to separate the OTU
from the rest in sequence IDs (type: Char,
default: '@')
--fasta-extension STRING
Filetype extension used for output FASTA files
(default: ".fas")
--wrap-column NUMBER Wrap output sequences at column NUMBER. 0 = no
wrap (type: Int64, default: 0)
--no-plots Do not include plots
--output-dir PATH Write output files to this directory PATH
(default: "patchwork_output")
--overwrite Overwrite old content in the output directory
DIAMOND BLASTX:
--query-gencode NUMBER
Genetic code used for translation of query
sequences. A list of possible values can be
found on the NCBI website. Standard Code is
used by default (type: Int64)
--strand STRING Specifies the strand of the query. Possible
values are: 'both', 'plus', and 'minus'. Both
strands are searched by default
--min-orf NUMBER DIAMOND ignores translated sequences with
smaller open reading frames. Default is:
disabled for sequences smaller than 30, 20 fro
sequences smaller than 100, and 40 otherwise.
Set to 1 to disable (type: Int64)
--fast Set DIAMOND sensitivity mode to 'fast'.
--mid-sensitive Set DIAMOND sensitivity mode to
'mid-sensitive'.
--sensitive Set DIAMOND sensitivity mode to 'sensitive'.
--more-sensitive Set DIAMOND sensitivity mode to
'more-sensitive'.
--very-sensitive Set DIAMOND sensitivity mode to
'very-sensitive'.
--ultra-sensitive Set DIAMOND sensitivity mode to
'ultra-sensitive'.
--iterate [MODE...] Set DIAMOND option --iterate. In version
2.0.12 or higher, you can optionally specify a
space-separated list of sensitivity modes to
iterate over. Allowed values are 'fast',
'mid-sensitive', 'sensitive',
'more-sensitive', 'very-sensitive',
'ultra-sensitive', 'default' and none
(default: ["PATCHWORK_OFF"])
--frameshift NUMBER Allow frameshift in DIAMOND and set frameshift
penalty. Without this option, frameshift is
disabled entirely (type: Int64)
--evalue NUMBER Only report DIAMOND hits with lower e-values
than the given value (type: Float64)
--min-score NUMBER Only report DIAMOND hits with bitscores >= the
given value. Overrides the --evalue
option (type: Float64)
--max-target-seqs NUMBER
The maximum number of subject sequences that
DIAMOND may report per query. Default is 25;
setting it to 0 will report all hits (type:
Int64)
--top NUMBER Discard DIAMOND hits outside the given
percentage range of the top alignment score.
This option overrides --max-target-seqs (type:
Int64)
--max-hsps NUMBER Maximum number of HSPs DIAMOND may report per
target sequence for each query. Default is
reporting only the highest-scoring HSP.
Setting this option to 0 will report all
alternative HSPs (type: Int64)
--id PERCENTAGE Discard DIAMOND hits with less sequence
identity than the given percentage (type:
Float64)
--query-cover PERCENTAGE
Discard DIAMOND hits with less query cover
than the given percentage (type: Float64)
--subject-cover PERCENTAGE
Discard DIAMOND hits with less subject cover
than the given percentage (type: Float64)
--masking MODE Set the DIAMOND mode for repeat masking. Note
that, contrary to DIAMOND default (tantan
masking enabled), Patchwork disables masking
by default (default: 0)! Set to 1 to enable
tantan masking, or to 2 to enable default
BLASTP SEG masking. Note that the latter
requires a DIAMOND version >= 2.0.12. (type:
Int64, default: 0)
alignment:
--len NUMBER Discard DIAMOND hits shorter than the provided
NUMBER (type: Int64)
--gapopen NUMBER Set the gap open penalty to this positive
NUMBER (type: Int64)
--gapextend NUMBER Set the gap extension penalty to this positive
NUMBER (type: Int64)
--retain-stops Do not remove stop codons (`*`) in the output
sequences
--retain-ambiguous Do not remove ambiguous characters from the
output sequences
sliding window:
--no-trimming Skip sliding window-based trimming of
alignments
--window-size NUMBER Specifices the NUMBER of positions to average
across (type: Int64, default: 4)
--required-distance NUMBER
Specifies the average distance required (type:
Float64, default: -7.0)
resources:
--threads NUMBER Number of threads to utilize (type: Int64,
default: 12)
--block-size NUMBER Billions of sequence letters to be processed
at a time. A larger block size
leads to increased performance at the expense
of disk and memory usage. Values
>20 are not recommended. (type: Float64,
default: 2.0)