Skip to content

LaetitiaPapaxanthos/SAPIENs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 

Repository files navigation

SAPIENs

This repository contains the code of the model Sequence-Activity Prediction In Ensemble of Networks (SAPIENs). The provided code allows to reproduce the prediction results presented in the manuscript "Large-scale DNA-based phenotypic recording and deep learning enable highly accurate sequence-function mapping". Further detail about the model can be found in the main document and in the machine learning annex of the manuscript.

The repository is organised as follows:

  1. code/trained_model contains the weights of the trained model presented in the main document of the aforementioned paper and a Jupyter notebook that loads the weights of the model and computes the main predictive performance results (Figure 4 of the manuscript).

  2. code/training contains scripts that allow to entirely retrain the model, given the hyperparameters chosen on the validation set. This directory also contains results_sample, which shows the output of SAPIENs obtained with the sample training and validation sets provided in data (sequences_train_sample.npy, targets_train_sample.npy, sequences_validation_sample.npy, targets_validation_sample.npy).

  3. data contains the RBS sequences (as DNA strings) and their RBS strength (target), split across training, validation and test sets. This split is the one used for the main document of the manuscript ("Split0"). The sampled training and validation sets are extracted from the training and validation sets used in the manuscript.

Loading the trained model code/trained_model

The trained model can be loaded in the Jupyter notebook load_trained_model.ipynb and can be used to predict the target values of the sequences in the test set. The weights of the model are available in weights.

Training the model code/training

The repository contains resnet_model.py that implements the core architecture of a single resnet model, run_resnet.py that trains the model given a set of hyperparameters and main.sh that loads the data, trains the ensemble on five GPUs at a time and saves the prediction results. The hyperparameters are set in the script utils_model.py. To train the model, please run bash main.sh. The input file format corresponds to the standard binary file format in Python (.npy format), with one DNA sequence per row (no further preprocessing is needed).

Please be aware that different parts of this repository obey different licenses.

Dependencies

The code only supports python3 and requires the following packages:

  • numpy (tested on 1.16.3)
  • sklearn (tested on 0.21.3)
  • matplotlib (tested on 3.1.1)
  • tensorflow==1.12.0
  • keras==2.2.4

Contact

[email protected], [email protected]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published