Overall idea is to use SibeliaZ
for making alignment and then construct longer synteny blocks with maf2synteny
module as described here.
Let's suggest you already have your genomes in fasta
format in merged.fna
file.
We recommend installing SibeliaZ with a conda:
conda install sibeliaz -c bionconda
We recommend to use some parameters:
-k 15
k-mer size for bacterial size genomes (recommended in SibeliaZ documentation for bacterial genomes);-a N*20
whereN
is the number of genomes, for detecting highly duplicated blocks (20 equal to 10 duplications within genome).-n
to skip the alignment with nucleotides and only output coordinates of the alignment saving time and memory;-t 4
optional, for desired threads number;- Do not change
-m
parameter. It will slow down the computation significantly and will not give you blocks. Blocks need to be obtained withmaf2synteny
tool.
Final command:
sibeliaz -k 15 -n -o sibeliaz_out merged.fna
Now you have blocks_coords.gff
in output folder!
We recommend using fine
parameters for merging alignments into blocks.
For this, you need to create file fine.txt
with this content:
30 150
100 500
500 1500
Also, create file fine_500.txt
if you want to get shorter blocks (shorter then 1500):
30 150
100 500
More about these parameters and what they mean here.
Then, you can run maf2syneny
SibeliaZ module with desired minimal block size -b
.
For getting blocks with minimal size 5000:
maf2synteny -s fine.txt -b 5000 blocks_coords.gff
For getting blocks with minimal size 1000:
maf2synteny -s fine_500.txt -b 1000 blocks_coords.gff