1. 配置环境

cd biosynthesis/multistep
conda env create -f environment.yml
conda activate biosynthesis

2. 单步反应

git clone https://github.com/OpenNMT/OpenNMT-py.git
cd OpenNMT-py
pip install -e .

单步反应相关数据和代码放在singlestep/中:
biogenisis_reaction.txt, reactions.txt是训练用的原数据，前者是bio的，后者主要用其中np_like的数据;
word_preprocess.py处理数据的代码，将数据9:1分为train和valid，valid=test;
retro_en_de.yaml字典生成和模型训练的配置文件;
mol_trans/all_train, mol_trans/bio_train, mol_trans/nplike_train划分好的训练数据，all_train是bio&np_like混合的数据；
mol_trans/model训练好的模型, 20个模型是用all_train训练11w步到30w步得到的；
mol_trans/run保存了字典；
score_predictions.py结果评估代码；

1) 训练

修改配置文件后，使用

onmt_train -config retro_en_de.yaml

2) 预测

常用参数如下

onmt_translate -model mol_trans/model/new_all_step_140000.pt 
               -src mol_trans/bio_train/new-src-val.txt 
               -output mol_trans/pred.txt 
               -batch_size 64
               -max_length 200 
               -beam_size 10 
               -n_best 10
               -gpu 0
               -replace_unk

其他参数见https://opennmt.net/OpenNMT-py/options/translate.html

3) 评估

python score_predictions.py -predictions xxx -targets xxx

-predictions 是预测文件; -targets 是ground truth

3. 多步反应

cd multistep
pip install -e packages/mlp_retrosyn
pip install -e packages/rdchiral
pip install -e .
pip install -e onmt

1) 接口

预测分子的接口保存在interface.py中
核心代码如下

def run(mol, top_k, building_block_pth='building_block.csv'):
    os.environ['CUDA_VISIBLE_DEVICES'] = '0, 1, 2, 3, 4, 5, 6, 7'
    assert torch.cuda.is_available()

    if 'building_block.csv' in os.walk('retro_star/dataset'):
        os.remove('retro_star/dataset/building_block.csv')

    shutil.copyfile(building_block_pth, 'retro_star/dataset/building_block.csv')

    mol = Chem.MolToSmiles(Chem.MolFromSmarts(mol))

    planner = RSPlanner(
        gpu=1,
        use_value_fn=True,
        iterations=100,
        expansion_topk=30,
        top_k=top_k,
        viz=False
    )

    result = planner.plan(mol)
    mol_dict = {}
    if result is None:
        return None
    for i, ele in enumerate(result):
        ele_dict = {
            i: {
                'routes': ele[0],
                'routes_score': ele[1]
            }
        }
        mol_dict.update(ele_dict)
    return mol_dict

输入: mol为需要预测的分子，top_k指输出概率前k大的结果(受iteration影响，最终结果个数<=k)，building_block_pth为building block的路径输出: 一个字典{0: {routes: xxx, socre: xxx}, 1:{routse: xxx, socre: xxx}, ...}

2) 批量结果评测

相关代码在eval.py中，直接从运行日志中获取预测结果进行评估，具体细节代码中有注释

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

1. 配置环境

2. 单步反应

1) 训练

2) 预测

3) 评估

3. 多步反应

1) 接口

2) 批量结果评测

Files

README.md

Latest commit

History

README.md

File metadata and controls

1. 配置环境

2. 单步反应

1) 训练

2) 预测

3) 评估

3. 多步反应

1) 接口

2) 批量结果评测