You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Bug]: < File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\indexers\utils.py", line 390, in check_key_length raise ValueError("Columns must be same length as key")>
#631
Closed
myyourgit opened this issue
Jul 19, 2024
· 2 comments
there is below error log pring while running >python -m graphrag.index --root ./ragtest0716
FO dependencies for create_base_entity_graph: ['create_summarized_entities']
22:56:23,463 graphrag.index.run INFO read table from storage: create_summarized_entities.parquet
22:56:23,487 datashaper.workflow.workflow INFO executing verb cluster_graph
22:56:23,501 graphrag.index.verbs.graph.clustering.cluster_graph WARNING Graph has no nodes
22:56:23,514 datashaper.workflow.workflow ERROR Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key
Traceback (most recent call last):
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\datashaper\workflow\workflow.py", line 410, in _execute_verb
result = node.verb.func(**verb_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\graphrag\index\verbs\graph\clustering\cluster_graph.py", line 102, in cluster_graph
output_df[[level_to, to]] = pd.DataFrame(
~~~~~~~~~^^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\frame.py", line 4299, in setitem
self._setitem_array(key, value)
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\frame.py", line 4341, in _setitem_array
check_key_length(self.columns, key, value)
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\indexers\utils.py", line 390, in check_key_length
raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key
22:56:23,523 graphrag.index.reporting.file_workflow_callbacks INFO Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key details=None
22:56:23,523 graphrag.index.run ERROR error running workflow create_base_entity_graph
Traceback (most recent call last):
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\graphrag\index\run.py", line 323, in run_pipeline
result = await workflow.run(context, callbacks)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\datashaper\workflow\workflow.py", line 369, in run
timing = await self._execute_verb(node, context, callbacks)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\datashaper\workflow\workflow.py", line 410, in _execute_verb
result = node.verb.func(**verb_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\graphrag\index\verbs\graph\clustering\cluster_graph.py", line 102, in cluster_graph
output_df[[level_to, to]] = pd.DataFrame(
~~~~~~~~~^^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\frame.py", line 4299, in setitem
self._setitem_array(key, value)
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\frame.py", line 4341, in _setitem_array
check_key_length(self.columns, key, value)
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\indexers\utils.py", line 390, in check_key_length
raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key
22:56:23,526 graphrag.index.reporting.file_workflow_callbacks INFO Error running pipeline! details=None
Steps to reproduce
run python -m graphrag.index --root ./ragtest0716
Expected Behavior
No response
GraphRAG Config Used
run lm_studio,
enable gemma 2b in LLM model.
enable nomic AI in embedding model.
setting.yaml.
encoding_model: cl100k_base
skip_workflows: []
llm:
api_key: lm-studio
type: openai_chat # or azure_openai_chat
model: gemma-2b-it-GGUF/gemma-2b-it-q8_0.gguf
model_supports_json: true # recommended if this is available for your model.
api_base: http://localhost:1234/v1
max_tokens: 4000
request_timeout: 180.0
api_base: https://.openai.azure.com
api_version: 2024-02-15-preview
organization: <organization_id>
deployment_name: <azure_model_deployment_name>
tokens_per_minute: 150_000 # set a leaky bucket throttle
requests_per_minute: 10_000 # set a leaky bucket throttle
max_retries: 10
max_retry_wait: 10.0
sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
concurrent_requests: 25 # the number of parallel inflight requests that may be made
parallelization:
stagger: 0.3
num_threads: 50 # the number of threads to use for parallel processing
async_mode: threaded # or asyncio
embeddings:
parallelization: override the global parallelization settings for embeddings
async_mode: threaded # or asyncio
llm:
api_key: lm-studio
type: openai_embedding # or azure_openai_embedding
model: nomic-ai/nomic-embed-text-v1.5-GGUF/nomic-embed-text-v1.5.Q4_K_M.gguf
api_base: http://localhost:1234/v1
api_version: 2024-02-15-preview
# organization: <organization_id>
# deployment_name: <azure_model_deployment_name>
# tokens_per_minute: 150_000 # set a leaky bucket throttle
# requests_per_minute: 10_000 # set a leaky bucket throttle
max_retries: 100
max_retry_wait: 10.0
# sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
# concurrent_requests: 25 # the number of parallel inflight requests that may be made
# batch_size: 16 # the number of documents to send in a single request
# batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
# target: required # or optional
Logs and screenshots
{"type": "error", "data": "Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key", "stack": "Traceback (most recent call last):\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\datashaper\workflow\workflow.py", line 410, in _execute_verb\n result = node.verb.func(**verb_args)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\graphrag\index\verbs\graph\clustering\cluster_graph.py", line 102, in cluster_graph\n output_df[[level_to, to]] = pd.DataFrame(\n ~~~~~~~~~^^^^^^^^^^^^^^^^\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\frame.py", line 4299, in setitem\n self._setitem_array(key, value)\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\frame.py", line 4341, in _setitem_array\n check_key_length(self.columns, key, value)\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\indexers\utils.py", line 390, in check_key_length\n raise ValueError("Columns must be same length as key")\nValueError: Columns must be same length as key\n", "source": "Columns must be same length as key", "details": null}
{"type": "error", "data": "Error running pipeline!", "stack": "Traceback (most recent call last):\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\graphrag\index\run.py", line 323, in run_pipeline\n result = await workflow.run(context, callbacks)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\datashaper\workflow\workflow.py", line 369, in run\n timing = await self._execute_verb(node, context, callbacks)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\datashaper\workflow\workflow.py", line 410, in _execute_verb\n result = node.verb.func(**verb_args)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\graphrag\index\verbs\graph\clustering\cluster_graph.py", line 102, in cluster_graph\n output_df[[level_to, to]] = pd.DataFrame(\n ~~~~~~~~~^^^^^^^^^^^^^^^^\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\frame.py", line 4299, in setitem\n self._setitem_array(key, value)\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\frame.py", line 4341, in _setitem_array\n check_key_length(self.columns, key, value)\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\indexers\utils.py", line 390, in check_key_length\n raise ValueError("Columns must be same length as key")\nValueError: Columns must be same length as key\n", "source": "Columns must be same length as key", "details": null}
Additional Information
GraphRAG Version: latest
Operating System: win10
Python Version: 3.11.9
Related Issues:
The text was updated successfully, but these errors were encountered:
myyourgit
added
bug
Something isn't working
triage
Default label assignment, indicates new issue needs reviewed by a maintainer
labels
Jul 19, 2024
This is generally caused by faulty entity extraction. I would recommend taking a look at the generated cache files for this step, it could be that either the LLM returned a malformatted response, or that it is being chatty when answering.
We are centralizing other LLM discussions in these threads:
Other LLM/Api bases: #339,
Ollama: #345
Local embeddings: #370
I'll resolve this issue so we can keep the focus on those threads
This is generally caused by faulty entity extraction. I would recommend taking a look at the generated cache files for this step, it could be that either the LLM returned a malformatted response, or that it is being chatty when answering.
We are centralizing other LLM discussions in these threads: Other LLM/Api bases: #339, Ollama: #345 Local embeddings: #370
I'll resolve this issue so we can keep the focus on those threads
Hi, Alonso:
Thank you!
above 3 three thread seems to resolve the local ollama setting issue, not my issue.
Thanks
Describe the bug
there is below error log pring while running >python -m graphrag.index --root ./ragtest0716
FO dependencies for create_base_entity_graph: ['create_summarized_entities']
22:56:23,463 graphrag.index.run INFO read table from storage: create_summarized_entities.parquet
22:56:23,487 datashaper.workflow.workflow INFO executing verb cluster_graph
22:56:23,501 graphrag.index.verbs.graph.clustering.cluster_graph WARNING Graph has no nodes
22:56:23,514 datashaper.workflow.workflow ERROR Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key
Traceback (most recent call last):
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\datashaper\workflow\workflow.py", line 410, in _execute_verb
result = node.verb.func(**verb_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\graphrag\index\verbs\graph\clustering\cluster_graph.py", line 102, in cluster_graph
output_df[[level_to, to]] = pd.DataFrame(
~~~~~~~~~^^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\frame.py", line 4299, in setitem
self._setitem_array(key, value)
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\frame.py", line 4341, in _setitem_array
check_key_length(self.columns, key, value)
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\indexers\utils.py", line 390, in check_key_length
raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key
22:56:23,523 graphrag.index.reporting.file_workflow_callbacks INFO Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key details=None
22:56:23,523 graphrag.index.run ERROR error running workflow create_base_entity_graph
Traceback (most recent call last):
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\graphrag\index\run.py", line 323, in run_pipeline
result = await workflow.run(context, callbacks)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\datashaper\workflow\workflow.py", line 369, in run
timing = await self._execute_verb(node, context, callbacks)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\datashaper\workflow\workflow.py", line 410, in _execute_verb
result = node.verb.func(**verb_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\graphrag\index\verbs\graph\clustering\cluster_graph.py", line 102, in cluster_graph
output_df[[level_to, to]] = pd.DataFrame(
~~~~~~~~~^^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\frame.py", line 4299, in setitem
self._setitem_array(key, value)
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\frame.py", line 4341, in _setitem_array
check_key_length(self.columns, key, value)
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\indexers\utils.py", line 390, in check_key_length
raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key
22:56:23,526 graphrag.index.reporting.file_workflow_callbacks INFO Error running pipeline! details=None
Steps to reproduce
run python -m graphrag.index --root ./ragtest0716
Expected Behavior
No response
GraphRAG Config Used
run lm_studio,
enable gemma 2b in LLM model.
enable nomic AI in embedding model.
setting.yaml.
encoding_model: cl100k_base
skip_workflows: []
llm:
api_key: lm-studio
type: openai_chat # or azure_openai_chat
model: gemma-2b-it-GGUF/gemma-2b-it-q8_0.gguf
model_supports_json: true # recommended if this is available for your model.
api_base: http://localhost:1234/v1
max_tokens: 4000
request_timeout: 180.0
api_base: https://.openai.azure.com
api_version: 2024-02-15-preview
organization: <organization_id>
deployment_name: <azure_model_deployment_name>
tokens_per_minute: 150_000 # set a leaky bucket throttle
requests_per_minute: 10_000 # set a leaky bucket throttle
max_retries: 10
max_retry_wait: 10.0
sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
concurrent_requests: 25 # the number of parallel inflight requests that may be made
parallelization:
stagger: 0.3
num_threads: 50 # the number of threads to use for parallel processing
async_mode: threaded # or asyncio
embeddings:
parallelization: override the global parallelization settings for embeddings
async_mode: threaded # or asyncio
llm:
api_key: lm-studio
type: openai_embedding # or azure_openai_embedding
model: nomic-ai/nomic-embed-text-v1.5-GGUF/nomic-embed-text-v1.5.Q4_K_M.gguf
api_base: http://localhost:1234/v1
api_version: 2024-02-15-preview
Logs and screenshots
{"type": "error", "data": "Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key", "stack": "Traceback (most recent call last):\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\datashaper\workflow\workflow.py", line 410, in _execute_verb\n result = node.verb.func(**verb_args)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\graphrag\index\verbs\graph\clustering\cluster_graph.py", line 102, in cluster_graph\n output_df[[level_to, to]] = pd.DataFrame(\n ~~~~~~~~~^^^^^^^^^^^^^^^^\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\frame.py", line 4299, in setitem\n self._setitem_array(key, value)\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\frame.py", line 4341, in _setitem_array\n check_key_length(self.columns, key, value)\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\indexers\utils.py", line 390, in check_key_length\n raise ValueError("Columns must be same length as key")\nValueError: Columns must be same length as key\n", "source": "Columns must be same length as key", "details": null}
{"type": "error", "data": "Error running pipeline!", "stack": "Traceback (most recent call last):\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\graphrag\index\run.py", line 323, in run_pipeline\n result = await workflow.run(context, callbacks)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\datashaper\workflow\workflow.py", line 369, in run\n timing = await self._execute_verb(node, context, callbacks)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\datashaper\workflow\workflow.py", line 410, in _execute_verb\n result = node.verb.func(**verb_args)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\graphrag\index\verbs\graph\clustering\cluster_graph.py", line 102, in cluster_graph\n output_df[[level_to, to]] = pd.DataFrame(\n ~~~~~~~~~^^^^^^^^^^^^^^^^\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\frame.py", line 4299, in setitem\n self._setitem_array(key, value)\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\frame.py", line 4341, in _setitem_array\n check_key_length(self.columns, key, value)\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\indexers\utils.py", line 390, in check_key_length\n raise ValueError("Columns must be same length as key")\nValueError: Columns must be same length as key\n", "source": "Columns must be same length as key", "details": null}
Additional Information
The text was updated successfully, but these errors were encountered: