- Description
- Requirements
- How to run an example experiment
- How to run the experiments of the paper
- How to run an experiment on your own data
- Code organisation
- Note
- Acknowledgements
- How to cite this work
- License
- Address
This is the supporting code for the paper «Beam search for automated design and scoring of novel ROR ligands with machine intelligence». This code allows you to replicate the experiments of the paper as well as running our method on your own set of molecules.
Preprint version (not up to date with the published version)
Abstract of the paper: Chemical language models enable de novo drug design without the requirement for explicit molecular construction rules. While such models have been applied to generate novel compounds with desired bioactivity, the actual prioritization and selection of the most promising computational designs remains challenging. In this work, we leveraged the probabilities learnt by chemical language models with the beam search algorithm as a model-intrinsic technique for automated molecule design and scoring. Prospective application of this method yielded three novel inverse agonists of retinoic acid receptor-related orphan receptors (RORs). Each design was synthesizable in three reaction steps and presented low-micromolar to nanomolar potency towards RORγ. This model-intrinsic sampling technique eliminates the strict need for external compound scoring functions, thereby further extending the applicability of generative artificial intelligence to data-driven drug discovery.
First, you need to clone the repository:
git clone [email protected]:ETHmodlab/molecular_design_with_beam_search.git
Then, you can run the following command, which will create a conda virtual environement and install all the needed dependencies (if you don't have conda installed, you can get it first by following the instructions here):
cd molecular_design_with_beam_search/
sh install.sh
This command will also install a git submodule in order to use the WHALES descriptor. Once the installation is done, you can activate the conda virtual environement:
conda activate molb
Please note that you will need to activate this virtual environement every time you want to use this project.
Now, you can try quickly the code with a toy experiment by using our example configuration file (which contains explanation of each parameter). By running it with the following command, you will do a fine-tuning experiment with the four natural products modulators of RORγ used in this paper, for two epochs only (to be fast), and sample molecules with the beam search:
cd experiments/
sh run_morty.sh configfiles/0_parameters_example.ini
The results of the analysis can be found in experiments/results/0_parameters_example/. There, you can find a picture with the top ranked molecules, a reproduction of the paper's figures and a .txt file with the SMILES of the top ranked molecules. Please note that sampled molecules not found in ChEMBL based on a similarity search with their webresource client are used in the results.
If you want to run the same experiments as in the paper, you can run the following for the fine-tuning on the four natural products:
cd experiments/
sh run_morty.sh configfiles/A_experiment_one_fine_tuning_step.ini
Or those two experiments in sequence, as the second part of the experiments (as defined by B2_experiment_two_fine_tuning_steps.ini) needs the results of the first part (B1_experiment_two_fine_tuning_steps.ini), for the experiment with the two-steps fine-tuning:
cd experiments/
sh run_morty.sh configfiles/B1_experiment_two_fine_tuning_steps.ini
sh run_morty.sh configfiles/B2_experiment_two_fine_tuning_steps.ini
Note that the experiment with the fine-tuning on the four natural products is fast, even on a CPU. If you don't have a GPU, some patience will be needed, even though we provided the pretrained weights of the chemical language model. Moreover, make sure you run B1_experiment_two_fine_tuning_steps.ini before B2_experiment_two_fine_tuning_steps.ini, as B2_experiment_two_fine_tuning_steps.ini uses the model trained in B1_experiment_two_fine_tuning_steps.ini.
To do an experiment on your own set of molecules, you will need to create your own configuration file (the .ini file). In this file, you can choose your own parameters for the beam serach and the final ranking, as well as give the path your fine-tuning molecules (a .txt file with one SMILES string per line).
Then, you can just run the following command:
sh run_morty.sh configfiles/{your_parameter_file_name}.ini
You will find the results of your experiment in experiments/results/{your_parameter_file_name}/
The main script (run_morty.sh) that allows you to run the full experiment with one command can be used separately. If you wish, for example, to only fine-tune a model on your own data, you can run the following:
sh run_training.sh configfiles/{your_parameter_file_name}.ini
All specific scripts (to fine-tune, do the plots, etc) can be run in the same way.
This work (code and paper) is build on top of our previous research. Notably, if you wish to pretrain a chemical langauge model on your own data—rather than using one of the two available pretrained models here—we recommend you to use the open source code of our previous paper (https://github.com/ETHmodlab/virtual_libraries).
This research was supported by the Swiss National Science Foundation (grant no. 205321_182176 to Gisbert Schneider), the RETHINK initiative at ETH Zurich and the Novartis Forschungsstiftung (FreeNovationgrant “AI in Drug Discovery” to Gisbert Schneider).
Moret, M., Helmstädter, M., Grisoni, F., Schneider, G. and Merk, D. (2021), Beam search for automated design and scoring of novel ROR ligands with machine intelligence. Angew. Chem. Int. Ed.. Accepted Author Manuscript. https://doi.org/10.1002/anie.202104405
MODLAB
ETH Zurich
Inst. of Pharm. Sciences
HCI H 413
Vladimir-Prelog-Weg 4
CH-8093 Zurich