Pipeline to create the multiscale methylation plot

Components of the workflow

Calculate x-values for plot
Add chromosome lengths to x-values file (needed for next step)
Create windows around x-values for calculating average methylation value
Calculate average methylation values for each window
Calculate cross-sample mean and standard deviation for each window size

Dependencies

The following dependencies are downloaded with --use-conda, otherwise you must have them in your PATH.

snakemake (version 6.0+)
bedtools
python (version 3.7+)
- pandas

Running the pipeline

Clone or download the repo.
- git clone [email protected]:huishenlab/multiscale_methylation_plot_pipeline.git, OR
- Download from the releases page.
Place gzipped BED files in the raw_data directory.
- Alternatively, you can specify the path to your gzipped BED files in config/config.yaml.
Replace the temporary sample names in config/samples.tsv with your sample names (everything before .bed.gz in your input files.
- Alternatively, you can specify the path to your sample sheet in config/config.yaml. If you use your own file, be sure to include "sample" as the header line.
Modify the config/config.yaml file with your chosen inputs.
- At minimum, you will need to specify the FAI index file location for your genome. All inputs have defaults set.
The pipeline can then be run on the command line (snakemake --cores 1 --use-conda) or submitted to a job scheduler on a cluster (a PBS script is provided: qsub bin/run_snakemake_workflow.sh).

After the pipeline finishes running

Three directories will be created in your specified output directory.
- analysis/: results from the pipeline
  - means/: mean methylation values in bins of varying size for each input sample
  - stats/: average and standard deviation for values in each bin across all samples
  - x_values/: bins used for calculating mean values
- benchmarks/: benchmarking data compiled when each rule runs
- logs/: log files generated as rules finish

Creating the multiscale plot

The multiscale plot can be created using bisplotti, specifically the multiscaleMethylationPlot() function. Input is a specific sample in analysis/means/ or the average for all samples from analysis/stats/. If you have two (or more) sample groups that you processed together, you can also find per-bin average values for a specified set of samples using multiscaleGroupAverage() in bisplotti.

Example dataset

An example dataset has been provided as a .zip file on the releases page. Instructions for using can be found in a README once the file has been unzipped.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
bin		bin
config		config
raw_data		raw_data
workflow		workflow
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pipeline to create the multiscale methylation plot

Components of the workflow

Dependencies

Running the pipeline

After the pipeline finishes running

Creating the multiscale plot

Example dataset

About

Releases 2

Packages

Languages

License

huishenlab/multiscale_methylation_plot_pipeline

Folders and files

Latest commit

History

Repository files navigation

Pipeline to create the multiscale methylation plot

Components of the workflow

Dependencies

Running the pipeline

After the pipeline finishes running

Creating the multiscale plot

Example dataset

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages