1. Evaluation of OPI-Llama-3.1-8B-Instruct model on 9 tasks.
Each testing result is derived from the Llama-3.1-8B-Instruct model that has been fine-tuned using OPI_full_1.61M_train.json and subsequently evaluated on the respective testing set for each specific task.
Task Type | Task Name | Testing file | Accuracy | Precision | Recall | F1 | Rouge-L |
---|---|---|---|---|---|---|---|
Sequence Understanding | EC Number Prediction (split100) | CLEAN_EC_number_new_test | - | 0.3724 | 0.3374 | 0.3468 | - |
CLEAN_EC_number_price_test | - | 0.0738 | 0.0738 | 0.0738 | - | ||
Fold Type Prediction | fold_type_test_Fold_Holdout | 0.1045 | - | - | - | - | |
fold_type_test_Superfamily_Holdout | 0.1507 | - | - | - | - | ||
fold_type_test_Family_Holdout | 0.6145 | - | - | - | - | ||
Subcellular Localization Prediction | subcell_loc_test | 0.4214 | - | - | - | - | |
Annotation Prediction | Function Keywords Prediction | CASPSimilarSeq_keywords_test | - | 0.4202 | 0.5057 | 0.4385 | - |
IDFilterSeq_keywords_test | - | 0.6762 | 0.6905 | 0.6650 | - | ||
UniProtSeq_keywords_test | - | 0.7606 | 0.7489 | 0.7374 | - | ||
Gene Ontology(GO) Terms Prediction | CASPSimilarSeq_go_terms_test | - | 0.1113 | 0.0936 | 0.099 | - | |
IDFilterSeq_go_terms_test | - | 0.6686 | 0.6287 | 0.6304 | - | ||
UniProtSeq_go_terms_test | - | 0.7150 | 0.6897 | 0.6849 | - | ||
Function Description Prediction | CASPSimilarSeq_function_test | - | - | - | - | 0.7524 | |
IDFilterSeq_function_test | - | - | - | - | 0.4786 | ||
UniProtSeq_function_test | - | - | - | - | 0.5144 | ||
Knowledge Mining | Tissue Location Prediction from Gene Symbol | gene_symbol_to_tissue_test | - | 0.4002 | 0.9356 | 0.5466 | - |
Cancer Prediction from Gene Symbol | gene_symbol_to_cancer_test | - | 0.2890 | 0.2701 | 0.2664 | - | |
Cancer Prediction from Gene Name | gene_name_to_cancer_test | - | 0.2786 | 0.2707 | 0.2659 | - |
2. Evaluation of OPI-Galactica-6.7B model on 9 tasks
Each testing result is derived from the Galactica-6.7B model that has been fine-tuned using OPI_full_1.61M_train.json and subsequently evaluated on the respective testing set for each specific task.
Task Type | Task Name | Testing file | Accuracy | Precision | Recall | F1 | Rouge-L |
---|---|---|---|---|---|---|---|
Sequence Understanding | EC Number Prediction (split100) | CLEAN_EC_number_new_test | - | 0.2700 | 0.2663 | 0.2596 | - |
CLEAN_EC_number_price_test | - | 0.0268 | 0.0268 | 0.0268 | - | ||
Fold Type Prediction | fold_type_test_Fold_Holdout | 0.0808 | - | - | - | - | |
fold_type_test_Superfamily_Holdout | 0.1348 | - | - | - | - | ||
fold_type_test_Family_Holdout | 0.4854 | - | - | - | - | ||
Subcellular Localization Prediction | subcell_loc_test | 0.7771 | - | - | - | - | |
Annotation Prediction | Function Keywords Prediction | CASPSimilarSeq_keywords_test | - | 0.8120 | 0.7360 | 0.7643 | - |
Function Keywords Prediction | IDFilterSeq_keywords_test | - | 0.8377 | 0.8019 | 0.8070 | - | |
Function Keywords Prediction | UniProtSeq_keywords_test | - | 0.8596 | 0.8196 | 0.8276 | - | |
Gene Ontology (GO) Terms Prediction | CASPSimilarSeq_go_terms_test | - | 0.7613 | 0.7492 | 0.7476 | - | |
Gene Ontology (GO) Terms Prediction | IDFilterSeq_go_terms_test | - | 0.7404 | 0.7274 | 0.7207 | - | |
Gene Ontology (GO) Terms Prediction | UniProtSeq_go_terms_test | - | 0.7638 | 0.7373 | 0.7358 | - | |
Function Description Prediction | CASPSimilarSeq_function_test | - | - | - | - | 0.7430 | |
Function Description Prediction | IDFilterSeq_function_test | - | - | - | - | 0.7014 | |
Function Description Prediction | UniProtSeq_function_test | - | - | - | - | 0.7133 | |
Knowledge Mining | Tissue Location Prediction from Gene Symbol | gene_symbol_to_tissue_test | - | 0.3917 | 0.9077 | 0.5303 | - |
Cancer Prediction from Gene Symbol | gene_symbol_to_cancer_test | - | 0.3555 | 0.3189 | 0.3229 | - | |
Cancer Prediction from Gene Name | gene_name_to_cancer_test | - | 0.2728 | 0.2554 | 0.2533 | - |
3. Performance comparison between OPI-Llama-3.1-8B-Instruct and OPI-Galactica-6.7B across 9 tasks.
It highlights task-specific strengths of each model, with Llama-3.1 excelling in EC number prediction and fold type prediction tasks whose prediction targets are numeric type, such as 3.4.11.4 and 10. Galactica leads in all three AP tasks, as well as cancer prediction from gene symbols whose prediction targets are character type.