Consolidate all results #147

jyaacoub · 2024-11-13T01:10:33Z

code

import logging
from matplotlib import pyplot as plt

from src.analysis.figures import prepare_df, fig_combined, custom_fig

dft = prepare_df('./results/model_media/model_stats.csv')
dfv = prepare_df('./results/model_media/model_stats_val.csv')

models = {
    'DG': ('nomsa', 'binary', 'original', 'binary'),
    'esm': ('ESM', 'binary', 'original', 'binary'), # esm model
    'aflow': ('nomsa', 'aflow', 'original', 'binary'),
    # 'gvpP': ('gvp', 'binary', 'original', 'binary'),
    'gvpL': ('nomsa', 'binary', 'gvp', 'binary'),
    # 'aflow_ring3': ('nomsa', 'aflow_ring3', 'original', 'binary'),
    'gvpL_aflow': ('nomsa', 'aflow', 'gvp', 'binary'),
    # 'gvpl_esm':('ESM', 'binary', 'gvp', 'binary'),
    # 'gvpL_aflow_rng3': ('nomsa', 'aflow_ring3', 'gvp', 'binary'),
    #GVPL_ESMM_davis3D_nomsaF_aflowE_48B_0.00010636872718329864LR_0.23282479481785903D_2000E_gvpLF_binaryLE
    # 'gvpl_esm_aflow': ('ESM', 'aflow', 'gvp', 'binary'),
}

fig, axes = fig_combined(dft, datasets=['davis', 'kiba', 'PDBbind'], fig_callable=custom_fig,
            models=models, metrics=['cindex', 'mse'],
            fig_scale=(10,5), add_stats=True, title_postfix=" test set performance", box=True)
plt.xticks(rotation=45)

# fig, axes = fig_combined(dfv, datasets=['davis'], fig_callable=custom_fig,
#             models=models, metrics=['cindex', 'mse'],
#             fig_scale=(10,5), add_stats=True, title_postfix=" validation set performance", box=True, fold_labels=True)
# plt.xticks(rotation=45)

Final models - these are the ones we will show in the paper.

    models = {
        'DG': ('nomsa', 'binary', 'original', 'binary'),
        'esm': ('ESM', 'binary', 'original', 'binary'), # esm model
        'aflow': ('nomsa', 'aflow', 'original', 'binary'),
        # 'gvpP': ('gvp', 'binary', 'original', 'binary'),
        'gvpL': ('nomsa', 'binary', 'gvp', 'binary'),
        # 'aflow_ring3': ('nomsa', 'aflow_ring3', 'original', 'binary'),
        'gvpL_aflow': ('nomsa', 'aflow', 'gvp', 'binary'),
        # 'gvpl_esm':('ESM', 'binary', 'gvp', 'binary'),
        # 'gvpL_aflow_rng3': ('nomsa', 'aflow_ring3', 'gvp', 'binary'),
        #GVPL_ESMM_davis3D_nomsaF_aflowE_48B_0.00010636872718329864LR_0.23282479481785903D_2000E_gvpLF_binaryLE
        # 'gvpl_esm_aflow': ('ESM', 'aflow', 'gvp', 'binary'),
    }

jyaacoub · 2024-12-01T21:49:31Z

FIG 1

DATASET INFO

TABLE COUNTS

FULL TABLE COUNTS:

 | Dataset   |   Protein |   Compound |  Total Binding Entities |
 |-----------|-----------|------------|-------------------------|
 | davis     |       442 |         68 |                   30056 |
 | kiba      |       229 |       2111 |                  118254 |
 | pdbbind   |      3889 |      12639 |                   19443 |

USED TABLE COUNTS:

Due to memory limitations a couple records were excluded from our runs this is the full count that were actually used.

   Dataset  Protein  Compound  Total Binding Entities
0    davis      439        68                   29852
1     kiba      226      2111                  117590
2  pdbbind     3785     10950                   16265

SEQUENCE LENGTH DISTRIBUTION

non-overlayed or normalized plot

normalized and overlayed plot

MODEL RESULTS

All (except pocket) results - 2x3 - MSE and Cindex

Stratified with pocket results

see below additional figures

Still need pocket versions to be plotted.

jyaacoub · 2024-12-02T20:06:16Z

FIG 3 - Platinum Dataset

DATASET INFO

TABLE COUNTS

	Unique protein sequence counts: 860
	            Unique protein IDs: 361
	          Unique ligand counts: 197
	                 Total records: 1962

Distribution for the number of mutations per protein

pkd distributions (stratified by # of mutations)

Model results

Stratified figures below are better at showing the model results #147 (comment)

Raw predictive performance

This plot shows the ability for the model to just predict the pkd given the protein sequence and ligand SMILES

Davis is terrible - no better than random
Kiba is better
PDBbind is best with cindex > 0.70

Delta predictive performance

Instead of looking at absolute predictive performance this plot show how well the model is able to predict the delta between a mutated and unmutated sequence.

Here all 3 are terrible - no better than random.

…edictions #147 aflow models will only have 1 pdb and should be tied to the pid. This is an issue for some since the way we grab those files for non-pdbbind proteins is with `f.startswith(pid)` which raises issues when we have two pids one a subsequence of the other (e.g.: PIK3CA.pdb and PIK3CA(Q546K).pdb) #147

Get mutations in pocket and out of pocket for stratified figures to show if there is a difference. #147

…g subsets of platinum dataset #147

**Still missing kiba_gvpl_aflow**, the checkpoint is on Graham or Cedar which are both down at the moment. Once they come back on run the following to check if they are there: ```bash find results/model_checkpoints -type f | grep .*GVPLM_kiba.*nomsaF.*_aflowE_16B.*.model* ``` #147

jyaacoub · 2025-01-15T18:29:02Z

Additional figures - stratified performance results

MODEL TEST DATASETS

pocket vs full protein representation

These are the only ones we trained with the pocket representation:

NOTE: I dont think its worth training the remaining 11 model configurations (total of 55 models to train - 1 for each of the 5 folds - this would take at least a couple months at my current pace)

Platinum:

RAW model predictive performance (all vs only mutated vs only wildtype)

In pocket vs out of pocket mutation differences

RAW predictive performance:

DELTA predictive performance:

single vs multiple mutations

RAW predictive performance:

DELTA predictive performance:

For comparing against different groups within our test sets.

…oint #147

only davis and kiba checkpoints for DG and aflow models were available.

jyaacoub added a commit that referenced this issue Dec 1, 2024

feat(paper_figures): dataset info #147

e352c69

jyaacoub added a commit that referenced this issue Dec 1, 2024

feat(paper_figures): dataset info #147

37639ef

Still need pocket versions to be plotted.

jyaacoub added a commit that referenced this issue Dec 1, 2024

feat(model_results): padded layout #147

6651b70

jyaacoub added a commit that referenced this issue Dec 2, 2024

feat(paper_figures): build platinum dataset #147

0249897

jyaacoub added a commit that referenced this issue Dec 2, 2024

feat(platinum): table counts for platinum #147

afa2d04

jyaacoub added a commit that referenced this issue Dec 2, 2024

feat(paper_figures): plot_Platinum_mutations_dist #147

24526d1

jyaacoub added a commit that referenced this issue Dec 2, 2024

fix #147

f746d9b

jyaacoub added a commit that referenced this issue Dec 4, 2024

fix(load_tuned_model): transform for load_state_dict #147

f2aeb9b

jyaacoub added a commit that referenced this issue Dec 4, 2024

feat(paper_figures): model platinum results #147 #94

ca2e25d

jyaacoub added a commit that referenced this issue Jan 15, 2025

feat(paper_figures): platinum in pocket indices

199b88b

Get mutations in pocket and out of pocket for stratified figures to show if there is a difference. #147

jyaacoub added a commit that referenced this issue Jan 15, 2025

feat(paper_figures): resampling for more robust metrics when comparin…

d83aea5

…g subsets of platinum dataset #147

jyaacoub self-assigned this Jan 15, 2025

jyaacoub added a commit that referenced this issue Jan 20, 2025

results(platinum): raw predictive performance plot #147

2f346cb

jyaacoub added a commit that referenced this issue Jan 20, 2025

feat(figures): custom_fig_stratified #147

1b71703

For comparing against different groups within our test sets.

jyaacoub added a commit that referenced this issue Jan 23, 2025

fix(platinum): updated kiba_gvpl_aflow predictions with proper checkp…

68163a9

…oint #147

jyaacoub added a commit that referenced this issue Jan 23, 2025

feat(platinum): single vs multiple mutations figure #147

27e531a

jyaacoub added a commit that referenced this issue Jan 23, 2025

fix(platinum): increasing sampling for more robust results #147

657a0e8

jyaacoub added a commit that referenced this issue Jan 24, 2025

feat(paper_figures): pocket rep #147

57f95f5

only davis and kiba checkpoints for DG and aflow models were available.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consolidate all results #147

Consolidate all results #147

jyaacoub commented Nov 13, 2024 •

edited

Loading

jyaacoub commented Dec 1, 2024 •

edited

Loading

FULL TABLE COUNTS:

USED TABLE COUNTS:

non-overlayed or normalized plot

normalized and overlayed plot

jyaacoub commented Dec 2, 2024 •

edited

Loading

jyaacoub commented Jan 15, 2025 •

edited

Loading

NOTE: I dont think its worth training the remaining 11 model configurations (total of 55 models to train - 1 for each of the 5 folds - this would take at least a couple months at my current pace)

RAW predictive performance:

DELTA predictive performance:

RAW predictive performance:

DELTA predictive performance:

Consolidate all results #147

Consolidate all results #147

Comments

jyaacoub commented Nov 13, 2024 • edited Loading

jyaacoub commented Dec 1, 2024 • edited Loading

FIG 1

DATASET INFO

FULL TABLE COUNTS:

USED TABLE COUNTS:

non-overlayed or normalized plot

normalized and overlayed plot

MODEL RESULTS

jyaacoub commented Dec 2, 2024 • edited Loading

FIG 3 - Platinum Dataset

DATASET INFO

Model results

Stratified figures below are better at showing the model results #147 (comment)

jyaacoub commented Jan 15, 2025 • edited Loading

Additional figures - stratified performance results

MODEL TEST DATASETS

NOTE: I dont think its worth training the remaining 11 model configurations (total of 55 models to train - 1 for each of the 5 folds - this would take at least a couple months at my current pace)

Platinum:

RAW predictive performance:

DELTA predictive performance:

RAW predictive performance:

DELTA predictive performance:

jyaacoub commented Nov 13, 2024 •

edited

Loading

jyaacoub commented Dec 1, 2024 •

edited

Loading

jyaacoub commented Dec 2, 2024 •

edited

Loading

jyaacoub commented Jan 15, 2025 •

edited

Loading