Skip to content

Commit

Permalink
Add colored pipeline description & prepare for v1.3.0
Browse files Browse the repository at this point in the history
  • Loading branch information
ivartb committed Aug 25, 2022
1 parent a99f633 commit 265d596
Show file tree
Hide file tree
Showing 5 changed files with 1,863 additions and 3 deletions.
44 changes: 44 additions & 0 deletions Pipelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ Here three possible usage pipelines of MetaFast toolkit are presented. Each pipe
* [Pipeline 1. Metagenomic distance estimation](#pipeline-1-metagenomic-distance-estimation)
* [Pipeline 2. Unique metagenomic features finder](#pipeline-2-unique-metagenomic-features-finder)
* [Pipeline 3. Specific metagenomic features counter](#pipeline-3-specific-metagenomic-features-counter)
* [Pipeline 4. Colored metagenomic features finder](#pipeline-4-colored-metagenomic-features-finder)
* [Format conversion tools](#format-conversion-tools)

## Pipeline 1. Metagenomic distance estimation
Expand Down Expand Up @@ -170,6 +171,49 @@ This tool is designed to be used with 3 categories of metagenomes and provide th
java -jar metafast.jar -t kmers-multiple-filters -k <k> -i <ungroupped.kmers.bin> -cd <workDir/group_1/n_samples.kmers.bin> -uc <workDir/group_2/n_samples.kmers.bin> -nonibd <workDir/group_3/n_samples.kmers.bin>
`

## Pipeline 4. Colored metagenomic features finder

Pipeline for extracting group-specific features from metagenomic samples and manipulating with them. Feature construction is based on k-mers occurencies in samples from different categories, represented as a colored nodes in de Bruijn graph (see figure below).

![Colored graph](img/pipe4_colors.svg)

The data analysis pipeline is symmetric to [unique features finder](#pipeline-2-unique-metagenomic-features-finder) with new steps for k-mers filtering and features extraction. Step-by-step data processing is presented on the image below.

![Pipeline 4](img/pipe4.svg)

Order of tools to run:

1. **K-mers counter**
Extract k-mers from each metagenomic sample and saves in internal binary format for further processing (`workDir/kmers/*.kmers.bin`). This step can be performed separately for metagenomes with known and unknown categories. For the convenience of further explanations we will refer to samples with known categories as _group\_1.kmers.bin_ ... _group\_N.kmers.bin_ for N categories and _ungroupped.kmers.bin_ for samples with unknown category.
`
java -jar metafast.jar -t kmer-counter-many -k <k> -i <inputFiles>
`
2. **K-mers coloring (group frequencies counter)**
Count the occurence frequencies of each k-mer in each category of samples and saves in internal binary format for further processing (`workDir/colored_kmers/colored_kmers.kmers.bin`). Mandatory parameter `--class` requires a text file in tab-separated format with two columns: sample_name [string] and class [0|1|2]. If `val` vlag is SET count k-mer occurrence as total coverage in samples, otherwise as number of samples.
`
java -jar metafast.jar -t kmers-color -k <k> -kf <group_{1..N}.kmers.bin> --class <samples_classes.tab> [-val]
`

1. **Colored component extractor**
Extract graph components from tangled graph based on k-mers coloring. These subgraph components can be used as features specific for analyzed category (`workDir/colored-components/components_color_[0|1|2].bin`)
**_Parameters:_**\
`--n_groups <int>` – number of classes (default: 3)\
`--separate` – use only color-specific k-mers in components (does not work in linear mode)\
`--linear` – choose best path on fork to create linear components\
`--n_comps <int>` – select not more than X components for each class (default: -1, means all components)\
`--perc` – relative abundance of k-mer in group to become color-specific (default: 0.9)\
`
java -jar metafast.jar -t component-colored -k <k> -i <colored_kmers.kmers.bin>
`

4. **Features calculator**
Counts coverage of each component (subgraph) by k-mers for each metagenomic sample independently. For each sample outputs numerical features vector of coverages (`workDir/vectors/*.vec`). Features vectors for samples with known categories can be further used to train machine learning model to predict categories for samples with unknown categories.
`
java -jar metafast.jar -t features-calculator -k <k> -cm <components.bin> -ka <*.kmers.bin>
`



## Format conversion tools

#### Binary to Fasta convertor
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,9 +54,9 @@ ant
~~~


## MetaFast 1.5
## MetaFast 1.3

A new version of MetaFast software is being prepared for the release. New pipelines for comparative metagenomics data analysis have been implemented. Three recommended use cases (including the original one) and a detailed description of available tools are presented in [Pipelines.md](Pipelines.md)
A new version of MetaFast software is being prepared for the release. New pipelines for comparative metagenomics data analysis have been implemented. Four recommended use cases (including the original one) and a detailed description of available tools are presented in [Pipelines.md](Pipelines.md)

## Running instructions

Expand Down
2 changes: 1 addition & 1 deletion build.xml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<project name="jarsBuilder" default="metafast">
<property name="VERSION" value="1.2.0"/>
<property name="VERSION" value="1.3.0"/>
<exec executable="git" outputproperty="revision">
<arg value="rev-parse"/>
<arg value="--short"/>
Expand Down
3 changes: 3 additions & 0 deletions img/pipe4.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1,813 changes: 1,813 additions & 0 deletions img/pipe4_colors.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 265d596

Please sign in to comment.