Skip to content

Latest commit

 

History

History
44 lines (33 loc) · 1.82 KB

README.md

File metadata and controls

44 lines (33 loc) · 1.82 KB

Notebooks for "Mapping MAVE data for use in human genomics applications"

Code for data analysis and figure generation for "Mapping MAVE data for use in human genomics applications" (Arbesfeld et. al.):

  • mavedb_mapping.ipynb: This notebook applies the mapping algorithm to a set of 209 examined score sets from MaveDB, successfully creating mappings for ~2.5 million variant pairs across 207 score sets.
  • mapping_analysis.ipynb: This notebook computes reference sequence concordance across the generated VRS mapping pairs. The notebook also computes the number of unique pre-mapped and post-mapped variants.
  • mavedb_scoreset_breakdown.ipynb: This notebook generates the summary statistics that are described in the manuscript.

Environment

A compatible Python environment can be generated using the included requirements.txt file.

First, create and activate a virtual environment of your preference. For example, using virtualenv:

python3 -m virtualenv venv
source venv/bin/activate

Then install all requirements in requirements.txt:

python3 -m pip install -r requirements.txt

Directory layout

After executing mapping code, this directory will contain working and output data in the following locations:

├── README.md
├── analysis_files
│   ├── mappings
│   │   └── <mapping output files>
│   └── <mapping checkpoint files>
├── experiment_scoresets.txt
├── mapping_analysis.ipynb
├── mave_mapping_fig_3b.R
├── mavedb_files
│   └── <Scoreset records and metadata from MaveDB>
├── mavedb_mapping.ipynb
├── mavedb_scoreset_breakdown.ipynb
└── requirements.txt