Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resolved: errors in sycophancy-test, factuality-test and augmentation. #869

Merged
merged 10 commits into from
Nov 3, 2023
2 changes: 1 addition & 1 deletion demo/tutorials/misc/Loading_Data_with_Custom_Columns.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/misc/custom_column_csv.ipynb)"
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/misc/Loading_Data_with_Custom_Columns.ipynb)"
]
},
{
Expand Down
4 changes: 2 additions & 2 deletions docs/pages/tutorials/tutorials.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ The following table gives an overview of the different tutorial notebooks. We ha
| LogiQA | OpenAI | Question-Answering | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/llm_notebooks/dataset-notebooks/LogiQA_dataset.ipynb) |
| ASDiv | OpenAI | Question-Answering | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/llm_notebooks/dataset-notebooks/ASDiv_dataset.ipynb) |
| BigBench | OpenAI | Question-Answering | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/llm_notebooks/dataset-notebooks/Bigbench_dataset.ipynb) |
| HuggingFaceDataset-Support | Hugging Face/OpenAI | Text-Classification/Summarization | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/misc/HuggingFace_Dataset_Notebook.ipynb) |
| HuggingFaceDataset-Support | Hugging Face/Spacy/OpenAI | NER/Text-Classification/Question-Answering/Summarization | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/misc/HuggingFace_Dataset_Notebook.ipynb) |
| Augmentation-Control | John Snow Labs | NER | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/misc/Augmentation_Control_Notebook.ipynb) |
| Comparing Models | Hugging Face/John Snow Labs/Spacy | NER/Text-Classification | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/misc/Comparing_Models_Notebook.ipynb) |
| Runtime Test | Hugging Face/John Snow Labs/Spacy | NER | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/misc/RuntimeTest_Notebook.ipynb) |
Expand Down Expand Up @@ -84,7 +84,7 @@ The following table gives an overview of the different tutorial notebooks. We ha
| Evaluation Metrics | OpenAI | Question-Answering | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/misc/Evaluation_Metrics.ipynb) |
| Fiqa | OpenAI | Question-Answering | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/llm_notebooks/dataset-notebooks/Fiqa_dataset.ipynb) |
| Customized Model | Custom | Text-Classification | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/misc/Custom_Hub_Notebook.ipynb) |

| Loading Data with Custom Columns | Hugging Face/OpenAI | NER/Text-Classification/Question-Answering/Summarization | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/misc/Loading_Data_with_Custom_Columns.ipynb) |

<style>
.heading {
Expand Down
22 changes: 12 additions & 10 deletions langtest/modelhandler/jsl_modelhandler.py
Original file line number Diff line number Diff line change
Expand Up @@ -174,18 +174,20 @@ def load_model(cls, path) -> "NLUPipeline":
Args:
path (str): Path to pretrained local or NLP Models Hub SparkNLP model
"""
if os.path.exists(path):
if try_import_lib("johnsnowlabs"):
loaded_model = nlp.load(path=path)
if isinstance(path, str):
if os.path.exists(path):
if try_import_lib("johnsnowlabs"):
loaded_model = nlp.load(path=path)
else:
loaded_model = PipelineModel.load(path)
else:
loaded_model = PipelineModel.load(path)
else:
if try_import_lib("johnsnowlabs"):
loaded_model = nlp.load(path)
else:
raise ValueError(Errors.E039)
if try_import_lib("johnsnowlabs"):
loaded_model = nlp.load(path)
else:
raise ValueError(Errors.E039)

return cls(loaded_model)
return cls(loaded_model)
return cls(path)

@abstractmethod
def predict(self, text: str, *args, **kwargs) -> Any:
Expand Down
2 changes: 1 addition & 1 deletion langtest/tasks/task.py
Original file line number Diff line number Diff line change
Expand Up @@ -595,7 +595,7 @@ def create_sample(
class FactualityTest(BaseTask):
"""Factuality task."""

_name = "factuality"
_name = "factualitytest"
_default_col = {
"article_sent": ["article_sent"],
"correct_sent": ["correct_sent"],
Expand Down
12 changes: 8 additions & 4 deletions langtest/utils/custom_types/sample.py
Original file line number Diff line number Diff line change
Expand Up @@ -2338,9 +2338,7 @@ def prompt_eval(self):
answer_key="answer",
prediction_key="text",
)
if (graded_outputs1[0]["text"].strip() == "CORRECT") and (
graded_outputs2[0]["text"].strip() == "CORRECT"
):
if self.output(graded_outputs1) and self.output(graded_outputs2):
return True
else:
return False
Expand All @@ -2365,7 +2363,7 @@ def prompt_eval(self):
answer_key="answer",
prediction_key="text",
)
return graded_outputs[0]["text"].strip() == "CORRECT"
return self.output(graded_outputs)

def is_pass_with_ground_truth(self) -> bool:
"""
Expand Down Expand Up @@ -2460,6 +2458,12 @@ def run(self, model, **kwargs):

return True

def output(self, graded_outputs):
"""
Check if the output is correct.
"""
return list(graded_outputs[0].values())[0].replace("\n", "").strip() == "CORRECT"


Sample = TypeVar(
"Sample",
Expand Down