Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while running code in here(https://docs.ragas.io/en/stable/concepts/test_data_generation/rag/) #1868

Open
sunnyeesl opened this issue Jan 22, 2025 · 2 comments
Labels
bug Something isn't working module-testsetgen Module testset generation question Further information is requested

Comments

@sunnyeesl
Copy link

[O] I checked the documentation and related resources and couldn't find an answer to my question.

Your Question
Hi, I'm testing the codes here (https://docs.ragas.io/en/stable/concepts/test_data_generation/rag/) and got some errors.
but I couldn't find the way to test the codes in the docs end to end.

ragas 0.2.12
python 3.10.12
llm: gpt-4o

(https://docs.ragas.io/en/stable/concepts/test_data_generation/rag/?h=jaccardsimilaritybuilder#extractors)

from ragas.testset.transforms.extractors import NERExtractor

extractor = NERExtractor(llm=azure_llm)
output = [await extractor.extract(node) for node in sample_nodes]
output

I ran the above code in the docs successfully, but the output I got is different to the output in the docs.
The format of the output is not same with the output in docs.

# my output
[('entities', ['Einstein']), ('entities', ['Einstein'])]

# output in docs
('entities',
 {'ORG': [],
  'LOC': [],
  'PER': ['Einstein'],
  'MISC': ['theory of relativity',
   'space',
   'time',
   "observer's frame of reference"]})

Also, I faced an error when I run the following codes

from ragas.testset.graph import KnowledgeGraph
from ragas.testset.transforms.relationship_builders.traditional import JaccardSimilarityBuilder

kg = KnowledgeGraph(nodes=sample_nodes)
rel_builder = JaccardSimilarityBuilder(property_name="entities", key_name="PER", new_property_name="entity_jaccard_similarity")
relationships = await rel_builder.transform(kg)
relationships

with the following error message.
It seems the error is occurred because of the format of the extractor output.

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_70752/3507502496.py in <module>
      4 kg = KnowledgeGraph(nodes=sample_nodes)
      5 rel_builder = OverlapScoreBuilder(property_name="entities", key_name="PER", new_property_name="entity_jaccard_similarity")
----> 6 relationships = await rel_builder.transform(kg)
      7 relationships

~/.local/lib/python3.10/site-packages/ragas/testset/transforms/relationship_builders/traditional.py in transform(self, kg)
    122                     )
    123                 if self.key_name is not None:
--> 124                     node_x_items = node_x_items.get(self.key_name, [])
    125                     node_y_items = node_y_items.get(self.key_name, [])
    126 

AttributeError: 'list' object has no attribute 'get'
@sunnyeesl sunnyeesl added the question Further information is requested label Jan 22, 2025
@dosubot dosubot bot added bug Something isn't working module-testsetgen Module testset generation labels Jan 22, 2025
@Vidit-Ostwal
Copy link
Contributor

Hi @sunnyeesl

I believe the output you are getting is the expected one, I went through the NER documentation, and the sole purpose of the NER is to just extract the entities and nothing else.

The documentation does show

('entities',
 {'ORG': [],
  'LOC': [],
  'PER': ['Einstein'],
  'MISC': ['theory of relativity',
   'space',
   'time',
   "observer's frame of reference"]})

But I believe that
the output will be more like

('entites', [List:str])
('entities', ['Einstein', 'theory of relativity', 'space', 'time'],

This is even confirmed when I saw the
class of NEROutput

class NEROutput(BaseModel):
    entities: t.List[str]

I think this a documentation error, in my opinion.

@Vidit-Ostwal
Copy link
Contributor

For the 2nd error you were facing,

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_70752/3507502496.py in <module>
      4 kg = KnowledgeGraph(nodes=sample_nodes)
      5 rel_builder = OverlapScoreBuilder(property_name="entities", key_name="PER", new_property_name="entity_jaccard_similarity")
----> 6 relationships = await rel_builder.transform(kg)
      7 relationships

~/.local/lib/python3.10/site-packages/ragas/testset/transforms/relationship_builders/traditional.py in transform(self, kg)
    122                     )
    123                 if self.key_name is not None:
--> 124                     node_x_items = node_x_items.get(self.key_name, [])
    125                     node_y_items = node_y_items.get(self.key_name, [])
    126 

AttributeError: 'list' object has no attribute 'get'

I would first suggest to update the ragas version.
I also tried the example given in the documentation

from ragas.testset.graph import Node

sample_nodes = [Node(
    properties={"page_content": "Einstein's theory of relativity revolutionized our understanding of space and time. It introduced the concept that time is not absolute but can change depending on the observer's frame of reference."}
),Node(
    properties={"page_content": "Time dilation occurs when an object moves close to the speed of light, causing time to pass slower relative to a stationary observer. This phenomenon is a key prediction of Einstein's special theory of relativity."}
)]
output = [await extractor.extract(node) for node in sample_nodes]

kg = KnowledgeGraph(nodes=sample_nodes)
rel_builder = JaccardSimilarityBuilder(property_name="entities", key_name="PER", new_property_name="entity_jaccard_similarity")
relationships = await rel_builder.transform(kg)
relationships

but received this error

Cell In[16], [line 21](vscode-notebook-cell:?execution_count=16&line=21)
     [19](vscode-notebook-cell:?execution_count=16&line=19) kg = KnowledgeGraph(nodes=sample_nodes)
     [20](vscode-notebook-cell:?execution_count=16&line=20) rel_builder = JaccardSimilarityBuilder(property_name="entities", key_name="PER", new_property_name="entity_jaccard_similarity")
---> [21](vscode-notebook-cell:?execution_count=16&line=21) relationships = await rel_builder.transform(kg)
     [22](vscode-notebook-cell:?execution_count=16&line=22) relationships

File /opt/homebrew/anaconda3/envs/RagasEnv/lib/python3.11/site-packages/ragas/testset/transforms/relationship_builders/traditional.py:34, in JaccardSimilarityBuilder.transform(self, kg)
     [32](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/anaconda3/envs/RagasEnv/lib/python3.11/site-packages/ragas/testset/transforms/relationship_builders/traditional.py:32) items2 = node2.get_property(self.property_name)
     [33](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/anaconda3/envs/RagasEnv/lib/python3.11/site-packages/ragas/testset/transforms/relationship_builders/traditional.py:33) if items1 is None or items2 is None:
---> [34](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/anaconda3/envs/RagasEnv/lib/python3.11/site-packages/ragas/testset/transforms/relationship_builders/traditional.py:34)     raise ValueError(
     [35](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/anaconda3/envs/RagasEnv/lib/python3.11/site-packages/ragas/testset/transforms/relationship_builders/traditional.py:35)         f"Node {node1.id} or {node2.id} has no {self.property_name}"
     [36](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/anaconda3/envs/RagasEnv/lib/python3.11/site-packages/ragas/testset/transforms/relationship_builders/traditional.py:36)     )
     [37](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/anaconda3/envs/RagasEnv/lib/python3.11/site-packages/ragas/testset/transforms/relationship_builders/traditional.py:37) if self.key_name is not None:
     [38](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/anaconda3/envs/RagasEnv/lib/python3.11/site-packages/ragas/testset/transforms/relationship_builders/traditional.py:38)     items1 = items1.get(self.key_name, [])

ValueError: Node a98373dd-fc01-422e-af9e-1dac80971d96 or 18d994a0-8f40-4f77-b9b9-0aff23232ed3 has no entities

I believe this is because of incorrectly calling the

rel_builder = JaccardSimilarityBuilder(property_name="entities", key_name="PER", new_property_name="entity_jaccard_similarity")

the correct way should be

rel_builder = JaccardSimilarityBuilder(property_name="page_content", new_property_name="entity_jaccard_similarity")

because the sample_nodes which are given an input to the KnowledgeGraph class

sample_nodes = [Node(
    properties={"page_content": "Einstein's theory of relativity revolutionized our understanding of space and time. It introduced the concept that time is not absolute but can change depending on the observer's frame of reference."}
),Node(
    properties={"page_content": "Time dilation occurs when an object moves close to the speed of light, causing time to pass slower relative to a stationary observer. This phenomenon is a key prediction of Einstein's special theory of relativity."}
)]

doesn't have an entities in there definition, also the page_content is a simple string not a dictionary, therefore removing the requirement of key_name which is used to fetch 1 layer deeper in the property.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working module-testsetgen Module testset generation question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants