Analyzing RNA-Seq data using Python 3 Snakemake

Alex Soupir

Snakemake is a Bioinformatics tool for managing a workflow. This tool proves valuable when analyzing a large amount of data with multiple tools. This script was made as a learning tool for workflow manager. There is also Nextflow to manage large analysis workflows. Here, Snakemake was used to run everything that is usually run on Linux with RNA-Seq Analyses (here is my long winded version of an RNA-seq analysis on Mice p53 gene mutation).

The Snakemake file was developed for analyzing sequencing data from a recent publication - Dietary walnut altered gene expressions related to tumor growth, survival, and metastasis in breast cancer patients: a pilot clinical trial. The raw sequences were downloaded from the Sequence Read Archive Run Selector using sra-tools.

The genome and the gtf files were downloaded and an index was created of the genome.
The minikraken was downloaded and extracted.
Homo sapiens rRNA sequences were downloaded from NCBI.
PhiX sequences were downloaded from Illumina.

Conda Tools Used:

FastQC
Trimmomatic
STAR
featureCounts (conda Subread)
Bowtie2
bwa
Samtools
Bam2Fastx
MultiQC

Running

To run Snakemake, a big memory cluster node was used. To run, in the same folder as the snakefile I used snakemake -j 80 which tells Snakemake to use 80 cores. Snakemake if not given a file name searches current directory for a file named snakefile.

The output of running Snakemake is a QC folder with results for all of the steps as well as MultiQC which makes nice HTML pages to summarize the results.

MultiQC doesn't have the ability to identify and create summaries for microbial contamination with KrakenUniq.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
MultiQC		MultiQC
README.md		README.md
Snakefile		Snakefile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analyzing RNA-Seq data using Python 3 Snakemake

Alex Soupir

Conda Tools Used:

Running

About

Releases

Packages

Languages

ACSoupir/SnakeMake

Folders and files

Latest commit

History

Repository files navigation

Analyzing RNA-Seq data using Python 3 Snakemake

Alex Soupir

Conda Tools Used:

Running

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages