1. Evaluation of OPI-Llama-3.1-8B-Instruct model on 9 tasks.

Each testing result is derived from the Llama-3.1-8B-Instruct model that has been fine-tuned using OPI_full_1.61M_train.json and subsequently evaluated on the respective testing set for each specific task.

Task Type	Task Name	Testing file	Accuracy	Precision	Recall	F1	Rouge-L
Sequence Understanding	EC Number Prediction (split100)	CLEAN_EC_number_new_test	-	0.3724	0.3374	0.3468	-
	EC Number Prediction (split100)	CLEAN_EC_number_price_test	-	0.0738	0.0738	0.0738	-
	Fold Type Prediction	fold_type_test_Fold_Holdout	0.1045	-	-	-	-
		fold_type_test_Superfamily_Holdout	0.1507	-	-	-	-
		fold_type_test_Family_Holdout	0.6145	-	-	-	-
	Subcellular Localization Prediction	subcell_loc_test	0.4214	-	-	-	-
Annotation Prediction	Function Keywords Prediction	CASPSimilarSeq_keywords_test	-	0.4202	0.5057	0.4385	-
		IDFilterSeq_keywords_test	-	0.6762	0.6905	0.6650	-
		UniProtSeq_keywords_test	-	0.7606	0.7489	0.7374	-
	Gene Ontology(GO) Terms Prediction	CASPSimilarSeq_go_terms_test	-	0.1113	0.0936	0.099	-
		IDFilterSeq_go_terms_test	-	0.6686	0.6287	0.6304	-
		UniProtSeq_go_terms_test	-	0.7150	0.6897	0.6849	-
	Function Description Prediction	CASPSimilarSeq_function_test	-	-	-	-	0.7524
		IDFilterSeq_function_test	-	-	-	-	0.4786
		UniProtSeq_function_test	-	-	-	-	0.5144
Knowledge Mining	Tissue Location Prediction from Gene Symbol	gene_symbol_to_tissue_test	-	0.4002	0.9356	0.5466	-
	Cancer Prediction from Gene Symbol	gene_symbol_to_cancer_test	-	0.2890	0.2701	0.2664	-
	Cancer Prediction from Gene Name	gene_name_to_cancer_test	-	0.2786	0.2707	0.2659	-

2. Evaluation of OPI-Galactica-6.7B model on 9 tasks

Each testing result is derived from the Galactica-6.7B model that has been fine-tuned using OPI_full_1.61M_train.json and subsequently evaluated on the respective testing set for each specific task.

Task Type	Task Name	Testing file	Accuracy	Precision	Recall	F1	Rouge-L
Sequence Understanding	EC Number Prediction (split100)	CLEAN_EC_number_new_test	-	0.2700	0.2663	0.2596	-
	EC Number Prediction (split100)	CLEAN_EC_number_price_test	-	0.0268	0.0268	0.0268	-
	Fold Type Prediction	fold_type_test_Fold_Holdout	0.0808	-	-	-	-
		fold_type_test_Superfamily_Holdout	0.1348	-	-	-	-
		fold_type_test_Family_Holdout	0.4854	-	-	-	-
	Subcellular Localization Prediction	subcell_loc_test	0.7771	-	-	-	-
Annotation Prediction	Function Keywords Prediction	CASPSimilarSeq_keywords_test	-	0.8120	0.7360	0.7643	-
	Function Keywords Prediction	IDFilterSeq_keywords_test	-	0.8377	0.8019	0.8070	-
	Function Keywords Prediction	UniProtSeq_keywords_test	-	0.8596	0.8196	0.8276	-
	Gene Ontology (GO) Terms Prediction	CASPSimilarSeq_go_terms_test	-	0.7613	0.7492	0.7476	-
	Gene Ontology (GO) Terms Prediction	IDFilterSeq_go_terms_test	-	0.7404	0.7274	0.7207	-
	Gene Ontology (GO) Terms Prediction	UniProtSeq_go_terms_test	-	0.7638	0.7373	0.7358	-
	Function Description Prediction	CASPSimilarSeq_function_test	-	-	-	-	0.7430
	Function Description Prediction	IDFilterSeq_function_test	-	-	-	-	0.7014
	Function Description Prediction	UniProtSeq_function_test	-	-	-	-	0.7133
Knowledge Mining	Tissue Location Prediction from Gene Symbol	gene_symbol_to_tissue_test	-	0.3917	0.9077	0.5303	-
	Cancer Prediction from Gene Symbol	gene_symbol_to_cancer_test	-	0.3555	0.3189	0.3229	-
	Cancer Prediction from Gene Name	gene_name_to_cancer_test	-	0.2728	0.2554	0.2533	-

3. Performance comparison between OPI-Llama-3.1-8B-Instruct and OPI-Galactica-6.7B across 9 tasks.

It highlights task-specific strengths of each model, with Llama-3.1 excelling in EC number prediction and fold type prediction tasks whose prediction targets are numeric type, such as 3.4.11.4 and 10. Galactica leads in all three AP tasks, as well as cancer prediction from gene symbols whose prediction targets are character type.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evaluation_results.md

evaluation_results.md

1. Evaluation of OPI-Llama-3.1-8B-Instruct model on 9 tasks.

2. Evaluation of OPI-Galactica-6.7B model on 9 tasks

3. Performance comparison between OPI-Llama-3.1-8B-Instruct and OPI-Galactica-6.7B across 9 tasks.

Files

evaluation_results.md

Latest commit

History

evaluation_results.md

File metadata and controls

1. Evaluation of OPI-Llama-3.1-8B-Instruct model on 9 tasks.

2. Evaluation of OPI-Galactica-6.7B model on 9 tasks

3. Performance comparison between OPI-Llama-3.1-8B-Instruct and OPI-Galactica-6.7B across 9 tasks.