Overview

Using DNMTools to analyze whole-genome methylation sequencing data

Using SCG for storage and computing: Primer for SCG with information about nodes, modules, etc.

General setup with installing DNMTools and HTSLib

Using specific server

Using sessions to have separate jobs running simultaneously

tmux -- gives you a session and you can then switch between sessions
tmux control b, release everything, and then press d -- exit session
tmux ls -- see all the sessions; pwd and hostname to see that connected to server
tmux a -t15 (this connects to session 15, for example)

Navigate to directory within vsebast/shared: cd /oak/stanford/scg/lab_vsebast/shared/wgms
Want to load module for DNMTools using SCG
module spider dnmtools
module add dnmtools
- advantage: the above method of installation is having issues, so this is a single step of loading a module
- disadvantage: we cannot (re)install these pacakges as the most updated version (stuck with whatever SCG uses)
Test that dnmtools is properly working by typing "dnmtools" (no quotes)
- would not work in a separate session

Indexing the genome
- module load dnmtools
- dnmtools abismalidx hg38.fa hg38.idx

Decompress raw .fq.gz file
- gzip -d <file.gz>
- Faster to work with decompressed files, but they take more space
- abismal can work with compressed files (in the process decompresses and then deletes)
head <file.fq>
- can see the content of the file
- the first line is the location of the reads
- the second line is the actual read: no C because there is conversion in methylation data (compared to seeing C in DNA data)
- head <file.fq> -n2 will give 2 rows
Removing adapters (trimming)
Quality control (if you are not sure of quality of sequencing itself)