Name		Name	Last commit message	Last commit date
parent directory ..
MAnorm.r		MAnorm.r
MAnorm.sh		MAnorm.sh
MAnorm2.R		MAnorm2.R
MAnorm2.sh		MAnorm2.sh
MAnorm3.R		MAnorm3.R
MAnorm3.sh		MAnorm3.sh
README.md		README.md
classfy_by_M_value.sh		classfy_by_M_value.sh

README.md

MAnorm - identifying differential binding in chip-seq data using linear normalization on shared peaks

MAnorm3 is my improved version which allows for 2 v 2 comparisons (replicates) using edgeR. I published a paper comparing this method to others: https://www.ncbi.nlm.nih.gov/pubmed/25972895

Original MAnorm can be found here Published in Genome Biology 2012

Workflow

StepI: clean and sort input
StepII: classify common or unique peaks
StepIII: count peak read
StepIV: normalize using common peaks

Experience from running on a quad core workstation with 32gb of ram:

StepI is I/O bound (mostly reading from disk and writing back to disk)
StepII & StepIII is CPU bound (lots of parallel bedtools)
Ram usage I've seen with ENCODE datasets is 20gb+/-5gb but I dont actually think all of this is used for processing

Files

MAnorm - original
MAnorm2 - retains input/output compatibility w/original but will be faster and will pool replicates
MAnorm3 - has different input/output, uses edgeR with replicates (2v2 comparisons only currently)

Problems with MAnorm

I think there is something wrong with how the p-values are calculated (see code in MAnorm2.R starting from line 50 for details).
- pval calculation is not optimized (very slow)
  - It is faster to use choose() and run in parallel
- Stirling approximation seems to be done incorrectly
  - This calculation is consistant in matlab version (more details in matlab MAnorm than R MAnorm)
- pval are not symmetric (calculations from x vs y do not give the same pvalues as y vs x)
mergeBed command in MAnorm.sh does not actually work (need to sort first)
Lots of tmp files generated by MAnorm.sh and a lot of steps could be done in parallel

Changes in MAnorm3

This version uses edgeR for significance testing.

Requires a replicate (since ENCODE samples have 2 reps)
Run bedtools in parallel (faster unless you have IO bottleneck)
Added parameter specify output directory
MA adjustment is included as an offset to edgeR model
sortBed before mergeBed command so the merged dataset is actually merged
Requires input to be gzipped (will add switch for this later)
Outputs a bunch of plots

Todo

Contact MAnorm author about some of the issues in p-value calculation and merging
Add error checks and helpful diagnostic messages
Add switch to check for .gz ending
Change code so samples with no replicates can run

Changes in MAnorm2

This version tries to keep the same input/output as original MAnorm but has the following changes:

I run bedtools in parallel in the background for significant speedup.
Allows replicates (see 'read in' section of MAnorm2.sh code)
- Concatenates replicates (proper way would be to downsample so replicates have equal weight)
- Not well tested, use MAnorm3 if you have replicates

Example of MAnorm3 run:

MAnorm3.sh test_cond1_cond2 \
  peaksA.bed.gz \
  peaksB.bed.gz \
  readsA1.bed.gz \
  readsA2.bed.gz \
  readsB1.bed.gz \
  readsB2.bed.gz \
  60 70 55 53

Where 60 70 55 53 correspond to shift lengths (half of estimated fragment length) for readsA1.bed readsA2.bed readsB1.bed readsB2.bed and readsA1 and readsA2 are replicates used to identify peaks in peaksA. You must have MAnorm3.sh in your $PATH and MAnorm3.R must be in current working directory (you could modify this at the end of MAnorm3.sh). As with before, the output is tab seperated file ending in .xls

Second example:

Usage: MAnorm3.sh test_cond1_cond2 gr_peak.bed er_peak.bed gr_rep1.bed gr_rep2.bed er_rep1.bed er_rep2.bed 120 110 95 100

gr_peak.bed: sample 1 significant peak list
er_peak.bed: sample 2 significant peak list
gr_rep1.bed: sample 1 raw reads rep1
gr_rep2.bed: sample 1 raw reads rep2
er_rep1.bed: sample 2 raw reads rep1
er_rep2.bed: sample 2 raw reads rep2

shifts (1/2 est fragment size):
gr_rep1: 120
gr_rep2: 110
er_rep1: 95
er_rep2: 100

Note that test_cond1_cond2 is the name of the folder that will be created. The script is currently hardcoded to look for a (name)(type1)(type2) layout

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MAnorm

MAnorm

README.md

MAnorm - identifying differential binding in chip-seq data using linear normalization on shared peaks

Workflow

Files

Problems with MAnorm

Changes in MAnorm3

Todo

Changes in MAnorm2

Example of MAnorm3 run:

Files

MAnorm

Directory actions

More options

Directory actions

More options

Latest commit

History

MAnorm

Folders and files

parent directory

README.md

MAnorm - identifying differential binding in chip-seq data using linear normalization on shared peaks

Workflow

Files

Problems with MAnorm

Changes in MAnorm3

Todo

Changes in MAnorm2

Example of MAnorm3 run: