Skip to content

This repository comprises scripts to select putatively relevant sites for analysis with scTAM-seq.

License

Notifications You must be signed in to change notification settings

veltenlab/CpGSelectionPipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CpGSelectionPipeline

This repository comprises scripts to select CpGs from bulk DNA methylation data for subsequent analysis with scTAM-seq.

In general, three classes of CpGs are selected:

  • Differentially Methylated CpGs (DMCs, in RnBeads)
  • Intermediately Methylated CpGs (IMCs, in IMC)
  • CpGs harboring Within-Sample Heterogeneity (WSH, in WSH)

Further scripts, such as those for annotating selected sites, selecting constitutively methylated and unmethylated sites, and amplicons without an HhaI cutsite, as well as validation and plotting functions can be found in scripts.

General configurations of the pipeline, such as the number of CpGs to be covered in the panel, can be provided in the file config.yaml.

Collecting the data

As an example of the pipeline, we used the Kulis et al. data from the BLUEPRINT project (https://ega-archive.org/datasets/EGAD00001001304, https://doi.org/10.1038/ng.3291). We recommend using a standard bisulfite sequencing data processing pipeline, such as bismark for low-level data processing. Bismark output files can be used directly as input to RnBeads, the generated bam-files can be used to compute the PDR and qFDRP. Notably, the panel designed for our current publication on scTAM-seq was generated with an earlier version of the pipeline.

Installing dependencies

The pipeline mainly depends on RnBeads, which can be installed using:

source('https://rnbeads.org/data/install.R')

Furthermore, the following R packages are required: -yaml -data.table -argparse -ggsci

To facilitate installing, we provide a yaml file for easy installation with conda/bioconda

conda create -f selection_pipeline.yml -n selection_pipeline

Running the pipeline

The pipeline has been targeted to an HPC system operated by the SGE jobs scheduling system. For other HPC environments or a local installation, the scripts have to be changed accordingly.

For each of the categories of CpGs mentioned above, a dedicated script has been created:

  • In RnBeads, you'll have to point in the rnbOptions.xml and script.R file to the downloaded data directory and adjust the file according to your system. See the RnBeads documentation for further information in case you want to modify the analysis. The example sample_annotation.csv may have to be adapted according to the filenames of the downloaded files. Lastly, the script select_reliable_DMRs.R will select sites that are specifically methylated in each of the populations.
  • To compute the WSH scores PDR and qFDRP, the bam-files generated by bismark are leveraged. Scripts to compute the scores are available from WSH and have to point to the corresponding BAM files and the RnBeads output file generated earlier.
  • IMCs are defined as sites with intermediate methylation in the HSCs with low PDR scores. They can be determined using the find_IMCs_low_PDR.R in the IMC folder.
  • Variably methylated sites are selected based on those that are not IMCs (i.e. have high PDR), but also high qFDRP scores. For this, the script filter_qFDRP_results.R is responsible.
  • To include control regions (non-digested, constitutively methylated, constitutively unmethylated), corresponding scripts have been generated in scripts.

After successful execution of the scripts, you can combine differend DMRs from different panels and upload them to the Tapestri Designer tool (https://designer.missionbio.com/).

Contact

You can contact Michael Scherer for questions and suggestions.

Citation

If you use this pipeline in your publication, please cite: Bianchi, A., Scherer, M., Zaurin, R. et al. scTAM-seq enables targeted high-confidence analysis of DNA methylation in single cells. Genome Biol 23, 229 (2022). https://doi.org/10.1186/s13059-022-02796-7

About

This repository comprises scripts to select putatively relevant sites for analysis with scTAM-seq.

Resources

License

Stars

Watchers

Forks

Packages

No packages published