Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
Added explanation of the motif file format
  • Loading branch information
Michael Hiller authored Nov 4, 2020
1 parent 1d4b1f5 commit a8f158c
Showing 1 changed file with 13 additions and 6 deletions.
19 changes: 13 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ TFforge.py data/tree_simulation.nwk motifs.ls data/species_lost_simulation.txt e
--windowsize 200 --scorefile scores_simulation -bg=background/
# Run job list as batch or in parallel
bash alljobs_simulation > scores_simulation
bash alljobs_scores_simulation > scores_simulation
# Run the association test
TFforge_statistics.py data/tree_simulation.nwk motifs.ls data/species_lost_simulation.txt \
Expand All @@ -50,7 +50,13 @@ TFforge_statistics.py data/tree_simulation.nwk motifs.ls data/species_lost_simul
# General workflow
## Input data
- species tree in newick format
- motif files in wtmx format (see example/data/ACA1.wtmx as example)
- motif files in wtmx format (see example/data/ACA1.wtmx as example) which is:
> \>*motif_name* *motif_length* [optional fields]
> *pos1_freqA* *pos1_freqC* *pos1_freqG* *pos1_freqT*
> *pos2_freqA* *pos2_freqC* *pos2_freqG* *pos2_freqT*
> ..
> *posN_freqA* *posN_freqC* *posN_freqG* *posN_freqT*
> \<
- motif list: path of wtmx files of one TF per line (see example/motifs.ls)
- Phenotype-loss species list: one species per line (see example/data/species_lost_simulation.txt)
- CRE fastafiles: each file contains the sequence for every (ancestral or extant) species
Expand All @@ -61,16 +67,17 @@ Generate the TFforge branch_scoring commands for all CREs and TFs.
```
TFforge.py <tree> <motif_list> <lost_species_list> <element_list>
```
This creates for every CRE and every TF a TFforge_branch_scoring.py job. Each line in alljobs_<scorefile> consists a single job. Each job is completely independent of any other job, thus each job can be run in parallel to others.
This creates for every CRE and every TF a TFforge_branch_scoring.py job. Each line in alljobs_\<scorefile\> consists a single job. Each job is completely independent of any other job, thus each job can be run in parallel to others.

Execute that alljobs file. Either sequentially via
```
bash alljobs_simulation > scores_simulation
```
or run it in parallel by using a compute cluster.
Every job returns a line in the following format:
motif_file CRE_file (branch_start>branch_end:branch_score )* <
which should be concatenated into a file called "<scorefile>".
Every job returns a line in the following format:
> *motif_file* *CRE_file* (*branch_start*>*branch_end*:*branch_score* )* <
which should be concatenated into a file called "\<scorefile\>".

## Step 2: Association test
```
Expand Down

0 comments on commit a8f158c

Please sign in to comment.