Skip to content

Latest commit

 

History

History
executable file
·
97 lines (77 loc) · 3.17 KB

README.md

File metadata and controls

executable file
·
97 lines (77 loc) · 3.17 KB

1. 配置环境

cd biosynthesis/multistep
conda env create -f environment.yml
conda activate biosynthesis

2. 单步反应

git clone https://github.com/OpenNMT/OpenNMT-py.git
cd OpenNMT-py
pip install -e .

单步反应相关数据和代码放在singlestep/中:
biogenisis_reaction.txt, reactions.txt是训练用的原数据,前者是bio的,后者主要用其中np_like的数据;
word_preprocess.py处理数据的代码,将数据9:1分为train和valid,valid=test;
retro_en_de.yaml字典生成和模型训练的配置文件;
mol_trans/all_train, mol_trans/bio_train, mol_trans/nplike_train划分好的训练数据,all_train是bio&np_like混合的数据;
mol_trans/model训练好的模型, 20个模型是用all_train训练11w步到30w步得到的;
mol_trans/run保存了字典;
score_predictions.py结果评估代码;

1) 训练

修改配置文件后,使用

onmt_train -config retro_en_de.yaml
2) 预测

常用参数如下

onmt_translate -model mol_trans/model/new_all_step_140000.pt 
               -src mol_trans/bio_train/new-src-val.txt 
               -output mol_trans/pred.txt 
               -batch_size 64
               -max_length 200 
               -beam_size 10 
               -n_best 10
               -gpu 0
               -replace_unk 

其他参数见https://opennmt.net/OpenNMT-py/options/translate.html

3) 评估
python score_predictions.py -predictions xxx -targets xxx

-predictions 是预测文件; -targets 是ground truth

3. 多步反应

cd multistep
pip install -e packages/mlp_retrosyn
pip install -e packages/rdchiral
pip install -e .
pip install -e onmt
1) 接口

预测分子的接口保存在interface.py
核心代码如下

def run(mol, top_k, building_block_pth='building_block.csv'):
    os.environ['CUDA_VISIBLE_DEVICES'] = '0, 1, 2, 3, 4, 5, 6, 7'
    assert torch.cuda.is_available()

    if 'building_block.csv' in os.walk('retro_star/dataset'):
        os.remove('retro_star/dataset/building_block.csv')

    shutil.copyfile(building_block_pth, 'retro_star/dataset/building_block.csv')

    mol = Chem.MolToSmiles(Chem.MolFromSmarts(mol))

    planner = RSPlanner(
        gpu=1,
        use_value_fn=True,
        iterations=100,
        expansion_topk=30,
        top_k=top_k,
        viz=False
    )

    result = planner.plan(mol)
    mol_dict = {}
    if result is None:
        return None
    for i, ele in enumerate(result):
        ele_dict = {
            i: {
                'routes': ele[0],
                'routes_score': ele[1]
            }
        }
        mol_dict.update(ele_dict)
    return mol_dict

输入: mol为需要预测的分子,top_k指输出概率前k大的结果(受iteration影响,最终结果个数<=k),building_block_pth为building block的路径 输出: 一个字典{0: {routes: xxx, socre: xxx}, 1:{routse: xxx, socre: xxx}, ...}

2) 批量结果评测

相关代码在eval.py中,直接从运行日志中获取预测结果进行评估,具体细节代码中有注释