Code for data analysis and figure generation for "Mapping MAVE data for use in human genomics applications" (Arbesfeld et. al.):
mavedb_mapping.ipynb
: This notebook applies the mapping algorithm to a set of 209 examined score sets from MaveDB, successfully creating mappings for ~2.5 million variant pairs across 207 score sets.mapping_analysis.ipynb
: This notebook computes reference sequence concordance across the generated VRS mapping pairs. The notebook also computes the number of unique pre-mapped and post-mapped variants.mavedb_scoreset_breakdown.ipynb
: This notebook generates the summary statistics that are described in the manuscript.
A compatible Python environment can be generated using the included requirements.txt
file.
First, create and activate a virtual environment of your preference. For example, using virtualenv
:
python3 -m virtualenv venv
source venv/bin/activate
Then install all requirements in requirements.txt
:
python3 -m pip install -r requirements.txt
After executing mapping code, this directory will contain working and output data in the following locations:
├── README.md
├── analysis_files
│ ├── mappings
│ │ └── <mapping output files>
│ └── <mapping checkpoint files>
├── experiment_scoresets.txt
├── mapping_analysis.ipynb
├── mave_mapping_fig_3b.R
├── mavedb_files
│ └── <Scoreset records and metadata from MaveDB>
├── mavedb_mapping.ipynb
├── mavedb_scoreset_breakdown.ipynb
└── requirements.txt