Skip to content

Latest commit

 

History

History
208 lines (203 loc) · 10.5 KB

Usage.md

File metadata and controls

208 lines (203 loc) · 10.5 KB

Options for taxonomic classification module:

General options:
    --query
        Query file (fastq or fasta)
    --human
        Human genome set in config folder, default human.genome_set.
    --decoy
        Decoy genome set in config folder, default plasmid.genome_set.
    --species
        Genome set for species identification in config folder, default species_id.genome_set.
    --assembly
        Genome set for assembly identification in config folder, default assembly_id.genome_set.
   
Environment config:
    --python 
        Path to python, default python3.
    --temp_folder
        Temprary folder, default ''.
    --RAM_folder
        Temporary folder in RAM, default '/run/shm'.
    --taxonomy_db
        Taxonomy database, default 'db/ncbi_taxonomy.db'.
    --tool_folder
        Tool folder, default 'tools/'.
    --config_folder
        Config folder, default 'config/'.
    --assembly_folder
        Assembly folder, default 'genomes/'.
    --aligner
        Path to minimap2 aligner, default 'minimap2 within the PATH'.
    --read_simulator
        Path to read simulation program, default 'tools/nanosim/simulator.py'.
    --read_simulation_profiles
        Path to read simulation profiles, default 'tools/nanosim/nanosim_profiles'.
    --human_similar_filter_assembly_id
        Assembly ID for human similar region filter, default 'GCF_000001405.37'.
    --max_aligner_thread INT
        Maximum number of threads used by aligner, default 64.
    --max_qcat_thread INT
        Maximum number of threads used by qcat, default 64.
    --genus_height INT
        Height in taxonomy to be considered as genus, default 7.

Fastq filter options:
    --head_crop INT
        Crop INT bp from read head, default 0.
    --tail_crop INT
        Crop INT bp from read tail, default 0.
    --min_read_length INT
        Reads with length smaller than INT read length are filtered, default 500.
    --min_read_quality FLOAT
        Reads with average base quality smaller than FLOAT are filtered, default 7.0.

Alignment filter options:
    --adaptor_trimming, --no-adaptor_trimming
        Adapter trimming is on by default.
    --read_trimming, --no-read_trimming
        Read trimming is on by default.
    --read_filter, --no-read_filter
        Read quality filter is on by default.
    --human_filter, --no-human_filter
        Human filter is on by default.
    --decoy_filter, --no-decoy_filter
        Decoy filter is on by default.
    --variable_region_adjustment, --no-variable_region_adjustment
		Default false.
    --spike_filter, --no-spike_filter
        Spike filter is on by default.
    --closing_spike_filter, --no-closing_spike_filter
        Closing spike filter is on by default.
    --human_similar_filter, --no-human_similar_filter
        Human similar region filter is on by default.
    --microbe_similar_filter, --no-microbe_similar_filter
        Micorbe similar region filter is on by default.
    --short_alignment_filter, --no-short_alignment_filter
        Short alignment filter is on by default.
    --unique_alignment, --no-unique_alignment
        Unique alignment filter is on by default
    --noise_projection, --no-noise_projection
        off
    --similar_species_marker, --no-similar_species_marker
        on
    --mapping_only, --no-mapping_only
        off
    --reassign_read_id, --no-reassign_read_id
        off
    --all_steps, --filter_fq_only

    --min_alignment_score INT
        Minimal alignment score, default 0.
    --human_filter_alignment_score_threshold INT
        Alignment score threshold for flagging a read as a human read, default 1000.
    --human_filter_alignment_score_percent_threshold INT
        Alignment score (normalized by read length) threshold (in percent) for flagging a read as a human read, default 100.
    --decoy_filter_alignment_score_threshold INT
        Alignment score threshold for flagging a read as a decoy read, default 1000.
    --decoy_filter_alignment_score_percent_threshold INT
        Alignment score (normalized by read length) threshold (in percent) for flagging a read as a decoy read, default 100.
    --species_id_min_aligned_bp INT
        Minimal covered BP to include a species for analysis, default 0.
    --good_alignment_threshold INT
        Alignment score threshold in percentage of best alignment score, default 80.
    --assembly_id_min_average_depth FLOAT
        Minimal average depth to perform assembly selection, default 0.5.
    --variable_region_percent INT
        Maximum percentage of strands aligned for a region to be labeled as variable, default 50.
    --expected_max_depth_stdev INT
        Number of standard deviations for calculating expected max depth, default 6.
    --microbe_similar_filter_abundance_threshold_80 FLOAT
        Difference (no. of times) in apparent abundance to trigger similar region filter with 80% similarity, default 160.
    --microbe_similar_filter_abundance_threshold_90 FLOAT
        Difference (no. of times) in apparent abundance to trigger similar region filter with 90% similarity, default 80.
    --microbe_similar_filter_abundance_threshold_95 FLOAT
        Difference (no. of times) in apparent abundance to trigger similar region filter with 95% similarity, default 40.
    --microbe_similar_filter_abundance_threshold_98 FLOAT
        Difference (no. of times) in apparent abundance to trigger similar region filter with 98% similarity, default 16.
    --microbe_similar_filter_abundance_threshold_99 FLOAT
        Difference (no. of times) in apparent abundance to trigger similar region filter with 99% similarity, default 8.
    --microbe_similar_filter_abundance_threshold_99_2 FLOAT
        Difference (no. of times) in apparent abundance to trigger similar region filter with 99.2% similarity, default 6.4.
    --microbe_similar_filter_targeted_max_span_percent INT
        Maximum percent of regions (targeted) to be marked as similar region, default 90.
    --microbe_similar_filter_allowed_max_span_percent INT
        Maximum percent of regions (allowed) to be marked as similar region, default 97.
    --microbe_similar_filter_min_average_depth FLOAT
        Minimum average depth to be considered as source of noise, default 0.2.
    --microbe_similar_filter_max_span_percent_overall INT
        Maximum percent of regions to be marked as similar region (overall), default 97.
    --max_alignment_noise_overlap INT
        The maximum percent for an alignment to overlap with noise regions without being removed, default 50.
    --min_alignment_length INT
        Minimum alignment length to be considered as evidence, default 250.
    --closing_expected_max_depth_stdev INT
        Number of standard deviations for calculating expected max depth for closing spike filter, default 9.
    --unique_alignment_threshold INT
        Unique alignments shall have no alignments with alignment score within this percent, default 80.
    --number_of_genus_to_perform_noise_projection INT
        Number of genus to perform noise projection, default 3.
    --min_percent_abundance_to_perform_noise_projection INT
        Minimum percent of abundance relative to the most abundant species in a genus to perform noise projection, default 25
    --noise_projection_simulated_read_length_bin_size INT
        Read length bin size for generating simulated reads, default 1000
    --noise_projection_simulated_read_length_multiplier FLOAT
        Multiplier over average read length to obtain maximum read length, default 0.5.
    --noise_projection_simulated_read_error_profile 
        Error profile for generating simulated reads, default 'ecoli_R91D'.
    --noise_projection_num_read_to_simulate INT
        Number of simulated reads to generate, default 10000.
    --similar_species_marker_num_genus INT
        Number of top most abundant species (1 per genus) to be considered as possible source of noise, default 3.
    --similar_species_marker_alignment_similarity_1 INT
        Similarity cutoff (1) used for alignment, available choices are 99, 98, 95, 90, 80. default 98.
    --similar_species_marker_aligned_region_threshold_1 INT
        Percentage of aligned region (1) to be considered as highly similar, default 50.
    --similar_species_marker_alignment_similarity_2 INT
        Similarity cutoff (2) used for alignment, available choices are 99, 98, 95, 90, 80. default 95.
    --similar_species_marker_aligned_region_threshold_2 INT
        Percentage of aligned region (2) to be considered as highly similar, default 75.
    --similar_species_marker_similarity_combine_logic
        Logic for combining criteria 1 and 2, choices are 'and', 'or', default 'or'.

Output options:
    --output_prefix
        Output Prefix, query file name will be used for output prefix by default.
    --output_folder
        Output folder, default ./.
    --archive_format
        Format used for output archive file, choices are 'zip', 'tar', 'gztar', 'bztar', default 'gztar'.
    --quality_score_bin_size FLOAT
        Bin size for quality score histogram, default 0.2.
    --read_length_bin_size INT
        Bin size for read length histogram, default 100.
    --output_adaptor_trimmed_query, --no-output_adaptor_trimmed_query
        Fastq after adaptor trimming is not outputted by default.
    --output_trimmed_and_filtered_query, --no-output_trimmed_and_filtered_query
        Fastq after trimming and quality filtering is outputted by default.
    --output_human_and_decoy_filtered_query, --no-output_human_and_decoy_filtered_query
        Fastq passed human and decoy filter is outputted by default.
    --output_PAF, --no-output_PAF
        Alignment in PAF format is outputted by default.
    --output_noise_stat, --no-output_noise_stat
        Noise regions are outputted by default.
    --output_separate_noise_bed, --no-output_separate_noise_bed
        default yes
    --output_raw_signal, --no-output_raw_signal
        default yes
    --output_id_signal, --no-output_id_signal
        default yes
    --output_per_read_data, --no-output_per_read_data
        default yes
    --output_quality_score_histogram, --no-output_quality_score_histogram
        default yes
    --output_read_length_histogram, --no-output_read_length_histogram
        default yes
    --output_human_stat, --no-output_human_stat
        yes
    --output_decoy_stat, --no-output_decoy_stat
        yes
    --output_genome_set, --no-output_genome_set
        default yes
    --aligner_log
        Log for stderr output from aligner program, default 'minimap2.log'.
    --read_sim_log
        Log for stderr output from read simulator, default 'read_sim.log'.
    --adaptor_trimming_log
        Log for stdout output from adaptor trimming program, default 'adaptor_trimming.log'.