cd biosynthesis/multistep
conda env create -f environment.yml
conda activate biosynthesis
git clone https://github.com/OpenNMT/OpenNMT-py.git
cd OpenNMT-py
pip install -e .
单步反应相关数据和代码放在singlestep/
中:
biogenisis_reaction.txt
, reactions.txt
是训练用的原数据,前者是bio的,后者主要用其中np_like的数据;
word_preprocess.py
处理数据的代码,将数据9:1分为train和valid,valid=test;
retro_en_de.yaml
字典生成和模型训练的配置文件;
mol_trans/all_train
, mol_trans/bio_train
, mol_trans/nplike_train
划分好的训练数据,all_train是bio&np_like混合的数据;
mol_trans/model
训练好的模型, 20个模型是用all_train训练11w步到30w步得到的;
mol_trans/run
保存了字典;
score_predictions.py
结果评估代码;
修改配置文件后,使用
onmt_train -config retro_en_de.yaml
常用参数如下
onmt_translate -model mol_trans/model/new_all_step_140000.pt
-src mol_trans/bio_train/new-src-val.txt
-output mol_trans/pred.txt
-batch_size 64
-max_length 200
-beam_size 10
-n_best 10
-gpu 0
-replace_unk
其他参数见https://opennmt.net/OpenNMT-py/options/translate.html
python score_predictions.py -predictions xxx -targets xxx
-predictions 是预测文件; -targets 是ground truth
cd multistep
pip install -e packages/mlp_retrosyn
pip install -e packages/rdchiral
pip install -e .
pip install -e onmt
预测分子的接口保存在interface.py
中
核心代码如下
def run(mol, top_k, building_block_pth='building_block.csv'):
os.environ['CUDA_VISIBLE_DEVICES'] = '0, 1, 2, 3, 4, 5, 6, 7'
assert torch.cuda.is_available()
if 'building_block.csv' in os.walk('retro_star/dataset'):
os.remove('retro_star/dataset/building_block.csv')
shutil.copyfile(building_block_pth, 'retro_star/dataset/building_block.csv')
mol = Chem.MolToSmiles(Chem.MolFromSmarts(mol))
planner = RSPlanner(
gpu=1,
use_value_fn=True,
iterations=100,
expansion_topk=30,
top_k=top_k,
viz=False
)
result = planner.plan(mol)
mol_dict = {}
if result is None:
return None
for i, ele in enumerate(result):
ele_dict = {
i: {
'routes': ele[0],
'routes_score': ele[1]
}
}
mol_dict.update(ele_dict)
return mol_dict
输入: mol
为需要预测的分子,top_k
指输出概率前k大的结果(受iteration影响,最终结果个数<=k),building_block_pth
为building block的路径
输出: 一个字典{0: {routes: xxx, socre: xxx}, 1:{routse: xxx, socre: xxx}, ...}
相关代码在eval.py中,直接从运行日志中获取预测结果进行评估,具体细节代码中有注释