- Reference genome versions and other details can be found in RatGTExPortal.
- Libraries were sequenced using 150-bp paired-end sequencing.
- Cross-mappability calculated based on original for human, modified for rat
- Run gem_index.sh to index genome, take about 1 hour to index whole rat genome.
- Need to make a copy of reference genome that splitted by chromosome for downstream analysis.
- Using 12G memory.
- The BigWig output contains a column with dna, need to remove this column for downstream process.
- The paper tested different lengths of k-mer but didn't provide any details about how to choose k-mer. Based on the descrition, they use the read length as the length for exon. So we are going to use the same setting here.
- Potential issues:
- The paper build index using Bowtie which may not aware of the splicing event.
- Download bowtie 1.2.2 and index genemo using command
bowtiew-build <ref_genome> <prefix>
.
- Bowtie does not work on Expanse, not sure why. Built the index on snorlax.
- Modules need to be load :
- R: r/4.0.2-openblas, cpu/0.15.4 gcc/9.2.0
- Modifications to the cross_mappability repository:
- For all Rscript, need the change the argpharser - to --
- Need to install library, using env.R files. Add library path to all scripts.
- In the compute_mappability.R and gtf_to_bed.R, need to change the code to get the utr in rat gtf which specified as 'three_prime_utr', 'five_prime_utr'.
- In the compute_mappability.R, the rat ref genome contains more contigs, some contigs only exist in exon, need to change the code avoid error message.
- Export bowtie path
- To run jobs:
- Haven't optimized the workflow, just used the original one. There are different part in
set_variables.sh
, run each single part sequentially by comment off other parts. Maybe use snakemake to better organize the workflow. - Set user_slurm=0, Run
bash set_variables.sh
in an interactive node - Set user_slurm=1, Run
bash set_variables.sh <expans_account>
to submit jobs