This repository contains the Karyon pipeline.
Karyon is a pipeline for the assembly and analysis of highly heterozygous genomes. It uses redundans (Pryszcz & Gabaldón, 2016) to reduce heterozygosity during the assembly process, and then maps the original libraries against the reduced assembly to analyze the distribution of heterozygous regions. With this information, it generates a series of plots that can aid researchers to generate informed hypotheses with regard of the architecture of their genomes.
Scripts contained in this repository:
- karyon.py -> complete pipeline, including genome assembly, assembly reduction, SNP calling and plot generation
- prepare_libraries.py -> karyon dependency. It uses Trimmomatic () to trim input libraries before genome assembly.
- spades_recipee.py -> Karyon dependency. It generates a file that launches dipSPAdes () with the input.
- varcall_recipee.py -> Karyon dependency. It generates a file that launches all steps in the SNP calling pipeline.
- karyonplots.py -> Karyon dependency. It generates all the plots as part of the Karyon pipeline.
- all_plots.py -> Standalone version of karyonplots.py. It allows the user to input karyon results to generate the plots again.
- nQuire_plot.py -> It allows the user to run the local ploidy plot alone.
- Dockerfile -> Docker file involved in building the image to run Karyon
- install.sh -> Bash script required to install the remaining dependencies in the dockerfile
- redundans_env.yml & busco_env.yml -> Conda environments in YAML format required to install some of the trickiest dependencies.
- Quick start You can install it using the standard installation or through Docker.
- Standard installation
Follow this steps
# First clone the Karyon repository
git clonehttps://github.com/Gabaldonlab/karyon.git
# Change to karyon/scripts directory
cd karyon/scripts/
# Then, run the installation script.
bash installation.sh
- Docker installation
In order to run this container you'll need docker installed. Need to get started?
- Use the Dockerfile build
# From the karyon git directory
docker build --no-cache -t cgenomics/karyon:1.2 .
# Start the container and indicate a volume and a container name
docker run -dit --name=karyon -v $(pwd):/root/src/karyon/shared --rm cgenomics/karyon:1.2
# Install all the necessary dependencies inside the running container. First run interactively the container
docker exec -it karyon bash
# Changing dir to the karyon volume in the container where the Dockerfile is located
cd /root/src/karyon/shared/
# Run the dependency installation script
bash scripts/docker_install.sh
- Pull the docker image from Docker Hub
Pull gabaldonlab/karyon
from the Docker repository:
# First pull the image
docker pull cgenomics/karyon:1.2
# Start the container and indicate a volume and a container name
docker run -dit --name=karyon -v $(pwd):/root/src/karyon/shared --rm cgenomics/karyon:1.2
# Install all the necessary dependencies inside the running container. First run interactively the container
docker exec -it karyon bash
# Changing dir to the karyon volume in the container where the Dockerfile is located
cd /root/src/karyon/shared/
# Run the dependency installation script
bash scripts/docker_install.sh
The test dataset is composed by two sequencing libraries from NCBI SRA corresponding to Lichtheimia ramosa B5399, one of the strains analyzed in the main publication.
# Execute interactively the docker container
docker exec -it karyon bash
# Configure SRA tools within the docker container
~/src/karyon/shared/dependencies/sratoolkit.3.0.0-ubuntu64/bin/vdb-config --interactive
# Download the SRA libraries at the desired location
cd /root/src/karyon/shared/
~/src/karyon/shared/dependencies/sratoolkit.3.0.0-ubuntu64/bin/fastq-dump --split-files SRR974799 SRR974800
Please, check the manual for a comprehensive use of Karyon
- Miguel Ángel Naranjo Ortiz - Pipeline work - MANaranjo
- Manuel Molina Marín - Docker work - manumolina
- Diego Fuentes Palacios - Docker work & testing dfupa
- Toni Gabaldón - Intellectual design & validation - tgabaldon
This project is licensed under the GNU General Public License - see the LICENSE.md file for details.