-
Notifications
You must be signed in to change notification settings - Fork 2
License
rdpstaff/Xander-HMMgs
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Using HMMgs: See detailed step-by-step instructions in Xander_assembler repository (https://github.com/rdpstaff/Xander_assembler) Build - Build a De Bruijn graph from from a set of reads java -jar hmmgs.jar build <read_file> <bloom_out> <kmerSize> <bloomSizeLog2> [cutoff = 2] [# hashCount = 4] [bitsetSizeLog2 = 30] read_file fasta or fastq files containing the reads to build the graph from bloom_out file to write the bloom filter to kmerSize should be multiple of 3, (recommend 45, minimum 30, maximum 63) bloomSizeLog2 the size of the bloom filter (or memory needed) is 2^bloomSizeLog2 bits, increase if the predicted false positive rate is greater than 1% cutoff minimum number of times a kmer has to be observed in SEQFILE to be included in the final bloom filter hashCount number of hash functions, recommend 4 bitsetSizeLog2 the size of one bitSet 2^bitsetSizeLog2, recommend 30 The bloom filter stats such as bloom filter predicted false positive rate is written to stdout. Search - Perform local assembly starting at the given start points in a given de Bruijn Graph output files <kmers>_nucl.fasta, _prot.fasta, search stats written to stdout java -jar hmmgs.jar search [-h] [-u] [-p <n_nodes>] <k> <limit_in_seconds> <bloom_filter> <for_hmm> <rev_hmm> <kmers> -u don't normalize the hmm input -p n_nodes prune the search if the score does not improve after n_nodes (default 20, set to 0 to disable pruning) k number of best local assemblies to return for each kmer limit_in_seconds dtime limit for individual searches (conservative suggestion = 100) bloom_filter bloom filter built using hmmgs build for_hmm, rev_hmm hidden markov models, HMMER3 format kmers starting points (can use KmerFilter's fast_kmer_filter to identify starting points) [#threads] experimental, suggested 1 (not thoroughly tested) Merge - Merge the left and right contigs generated by hmmgs search java -jar hmmgs.jar merge [options] <hmm> <hmmgs_file> <nucl_contig> -a,--all Generate all combinations for multiple paths for each starting kmer, instead of just the best -b,--min-bits <arg> Minimum bits score -l,--min-length <arg> Minimum length -o,--out <arg> Write output to file instead of stdout KmerFilter: fast_kmer_filter - search a set of reads against a set of reference sequences to identify starting points for assembly java -jar KmerFilter.jar fast_kmer_filter <kmerSize> <query_file> [name=]<ref_file> ... -a,--aligned Build trie from aligned sequences -o,--out <arg> Redirect output to file -T,--transl-table <arg> Translation table to use when translating nucleotide to protein sequences -t,--threads <arg> #Threads to use <kmerSize> kmer length, should be multiple of 3, (recommend 45, minimum 30, maximum 63) <query_file> read file to search for starting points in (use the same fasta file used to build the De Bruijn Graph) 1 or more aligned reference files (aligned using the same HMM that will be used to search) with an optional reference name (ie nifh=my_nifh_refs_aligned.fasta) Other uses: HMMgs can also be used to extract subgraphs from starting points instead of contigs to perform further analysis with (see edu.msu.cme.rdp.graph.GraphSearch) HMMgs can also be used to compute base coverage for contigs (generated by hmmgs or other programs) (see edu.msu.cme.rdp.graph.abundance.ReadKmerMapper and base_coverage.py) NOTES: When using fast_kmer_filter to identify start points there are two things to be aware of. 1. While the Bloom Filter Builder allows any k-size (hmmgs requiers a k divisible by 3 however), fast_kmer_filter requires k <= 63 2. fast_kmer_filter allows for multiple gene starting points to be searched for at the same time (since each requires a scan over the read file it is faster to do every gene at once), however this means the output file is multiplexed and must be demultiplexed before used in hmmgs search. This can be done with the following command: grep 'gene_name' <multiplexed_starts_file> | cut -f2- > <demultiplexed_gene_start_points>
About
No description, website, or topics provided.
Resources
License
Stars
Watchers
Forks
Packages 0
No packages published