- Calculate x-values for plot
- Add chromosome lengths to x-values file (needed for next step)
- Create windows around x-values for calculating average methylation value
- Calculate average methylation values for each window
- Calculate cross-sample mean and standard deviation for each window size
The following dependencies are downloaded with --use-conda
, otherwise you must have them in your PATH.
snakemake
(version 6.0+)bedtools
python
(version 3.7+)pandas
- Clone or download the repo.
git clone [email protected]:huishenlab/multiscale_methylation_plot_pipeline.git
, OR- Download from the releases page.
- Place gzipped BED files in the
raw_data
directory.- Alternatively, you can specify the path to your gzipped BED files in
config/config.yaml
.
- Alternatively, you can specify the path to your gzipped BED files in
- Replace the temporary sample names in
config/samples.tsv
with your sample names (everything before.bed.gz
in your input files.- Alternatively, you can specify the path to your sample sheet in
config/config.yaml
. If you use your own file, be sure to include "sample" as the header line.
- Alternatively, you can specify the path to your sample sheet in
- Modify the
config/config.yaml
file with your chosen inputs.- At minimum, you will need to specify the FAI index file location for your genome. All inputs have defaults set.
- The pipeline can then be run on the command line (
snakemake --cores 1 --use-conda
) or submitted to a job scheduler on a cluster (a PBS script is provided:qsub bin/run_snakemake_workflow.sh
).
- Three directories will be created in your specified output directory.
analysis/
: results from the pipelinemeans/
: mean methylation values in bins of varying size for each input samplestats/
: average and standard deviation for values in each bin across all samplesx_values/
: bins used for calculating mean values
benchmarks/
: benchmarking data compiled when each rule runslogs/
: log files generated as rules finish
The multiscale plot can be created using bisplotti, specifically the
multiscaleMethylationPlot()
function. Input is a specific sample in analysis/means/
or the average for all samples
from analysis/stats/
. If you have two (or more) sample groups that you processed together, you can also find per-bin
average values for a specified set of samples using multiscaleGroupAverage()
in bisplotti
.
An example dataset has been provided as a .zip
file on the releases page. Instructions for using can be found in a
README once the file has been unzipped.