Compute the cross mappability for rat genome

Reference genome versions and other details can be found in RatGTExPortal.
1. Libraries were sequenced using 150-bp paired-end sequencing.
Cross-mappability calculated based on original for human, modified for rat

Preparation of files

Need to make a copy of reference genome that splitted by chromosome for downstream analysis.
Using 12G memory.
The BigWig output contains a column with dna, need to remove this column for downstream process.
The paper tested different lengths of k-mer but didn't provide any details about how to choose k-mer. Based on the descrition, they use the read length as the length for exon. So we are going to use the same setting here.

Download bowtie 1.2.2 and index genemo using command bowtiew-build <ref_genome> <prefix>.

R: r/4.0.2-openblas, cpu/0.15.4 gcc/9.2.0
Modifications to the cross_mappability repository:
1. For all Rscript, need the change the argpharser - to --
2. Need to install library, using env.R files. Add library path to all scripts.
3. In the compute_mappability.R and gtf_to_bed.R, need to change the code to get the utr in rat gtf which specified as 'three_prime_utr', 'five_prime_utr'.
4. In the compute_mappability.R, the rat ref genome contains more contigs, some contigs only exist in exon, need to change the code avoid error message.
Export bowtie path

Haven't optimized the workflow, just used the original one. There are different part in set_variables.sh, run each single part sequentially by comment off other parts. Maybe use snakemake to better organize the workflow.
Set user_slurm=0, Run bash set_variables.sh in an interactive node
Set user_slurm=1, Run bash set_variables.sh <expans_account> to submit jobs

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
scripts		scripts
.gitignore		.gitignore
README.md		README.md