-
Notifications
You must be signed in to change notification settings - Fork 826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to solve the ("All samples must be of the same type") issue? #1875
Comments
Hi @tvsathish, The error you're encountering is likely due to a mismatch in the types of items in the list you're using to create the eval_dataset = EvaluationDataset(samples) it internally calls the def validate_samples(self, samples: t.List[Sample]) -> t.List[Sample]:
"""Validates that all samples are of the same type."""
if len(samples) == 0:
return samples
first_sample_type = type(self.samples[0])
if not all(isinstance(sample, first_sample_type) for sample in self.samples):
raise ValueError("All samples must be of the same type")
return samples This function checks that each sample in the list is of the same type as the first one. If there's any inconsistency in the sample types, it raises a To fix the error, please ensure that all the samples are of the same type before passing them to the |
Dear @sahusiddharth, I can understand the validation happening here. The question is how to find out the record that is a mismatch among many records I created? Can you please suggest a way? |
Hi @tvsathish, I understand your concern. To find the mismatched records, you can add type checking and logging inside the create_turn_sample function. This will help identify and log any records that don’t match the expected types, making it easier to spot the problematic ones. def empty_nan_value(cell_value, default=''):
"""Helper function to return empty string if NaN, else the cell value."""
return '' if pd.isna(cell_value) else cell_value
def create_turn_sample(row):
# Extract URL and safely handle it
url = re.split(r'[,\n ]+', empty_nan_value(row['reference']))[0]
try:
page = urlopen(url)
except ValueError:
return
# Parse the page with BeautifulSoup
soup = BeautifulSoup(page, features='lxml')
# Get user input (ensure it's a string)
user_input = str(row['user_input']) if isinstance(row['user_input'], str) else ''
# Get the list of contexts, ensuring each is a string
retrieved_contexts = [str(empty_nan_value(row.get(f'context{i}'))) for i in range(1, 5)]
# Get response (ensure it's a string)
response = str(empty_nan_value(row['response']))
# Get reference text (ensure it's a string)
reference = str(soup.get_text())
# Return the created SingleTurnSample
return SingleTurnSample(
user_input=user_input,
retrieved_contexts=retrieved_contexts,
response=response,
reference=reference
) |
I think if we change the validate_sample function, to raise the error telling which sample is out of sample type, would be better solution, to tell the user where to look at. Kind of like this def validate_samples(self, samples: t.List[Sample]) -> t.List[Sample]:
"""Validates that all samples are of the same type."""
if len(samples) == 0:
return samples
first_sample_type = type(samples[0])
for i, sample in enumerate(samples):
if not isinstance(sample, first_sample_type):
raise ValueError(f"Sample at index {i} is of type {type(sample)}, expected {first_sample_type}")
return samples |
Changed the validate_samples functionality to also tell which indexed sample is causing the issue. #1875 Co-authored-by: Vidit Ostwal <[email protected]>
hey @Vidit-Ostwal that would be a better error message like you suggest 🙂 |
[ ] I have checked the documentation and related resources and couldn't resolve my bug.
Describe the bug
I am getting the following error when manually creating SingleTurnSamples from my dataset.
("All samples must be of the same type")
How to find the data frame that contributes to a mismatched sample record?
Ragas version: 0.2.12
Python version: 3.9
Code to Reproduce
I can't share the excel sheet itself due to privacy reasons
Error trace
Traceback (most recent call last):
Expected behavior
I expected the samples to be created properly and evaluation to start
Additional context
Please help me fix the troublesome sample record and where the problem is. At the moment, this error message in itself is not very helpful to spot the error among the many sample records created
The text was updated successfully, but these errors were encountered: