[Bug]: < File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\indexers\utils.py", line 390, in check_key_length raise ValueError("Columns must be same length as key")> #631

myyourgit · 2024-07-19T15:25:28Z

Describe the bug

there is below error log pring while running >python -m graphrag.index --root ./ragtest0716

FO dependencies for create_base_entity_graph: ['create_summarized_entities']
22:56:23,463 graphrag.index.run INFO read table from storage: create_summarized_entities.parquet
22:56:23,487 datashaper.workflow.workflow INFO executing verb cluster_graph
22:56:23,501 graphrag.index.verbs.graph.clustering.cluster_graph WARNING Graph has no nodes
22:56:23,514 datashaper.workflow.workflow ERROR Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key
Traceback (most recent call last):
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\datashaper\workflow\workflow.py", line 410, in _execute_verb
result = node.verb.func(**verb_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\graphrag\index\verbs\graph\clustering\cluster_graph.py", line 102, in cluster_graph
output_df[[level_to, to]] = pd.DataFrame(
~~~~~~~~~^^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\frame.py", line 4299, in setitem
self._setitem_array(key, value)
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\frame.py", line 4341, in _setitem_array
check_key_length(self.columns, key, value)
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\indexers\utils.py", line 390, in check_key_length
raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key
22:56:23,523 graphrag.index.reporting.file_workflow_callbacks INFO Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key details=None
22:56:23,523 graphrag.index.run ERROR error running workflow create_base_entity_graph
Traceback (most recent call last):
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\graphrag\index\run.py", line 323, in run_pipeline
result = await workflow.run(context, callbacks)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\datashaper\workflow\workflow.py", line 369, in run
timing = await self._execute_verb(node, context, callbacks)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\datashaper\workflow\workflow.py", line 410, in _execute_verb
result = node.verb.func(**verb_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\graphrag\index\verbs\graph\clustering\cluster_graph.py", line 102, in cluster_graph
output_df[[level_to, to]] = pd.DataFrame(
~~~~~~~~~^^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\frame.py", line 4299, in setitem
self._setitem_array(key, value)
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\frame.py", line 4341, in _setitem_array
check_key_length(self.columns, key, value)
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\indexers\utils.py", line 390, in check_key_length
raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key
22:56:23,526 graphrag.index.reporting.file_workflow_callbacks INFO Error running pipeline! details=None

Steps to reproduce

run python -m graphrag.index --root ./ragtest0716

Expected Behavior

No response

GraphRAG Config Used

run lm_studio,
enable gemma 2b in LLM model.
enable nomic AI in embedding model.

setting.yaml.

encoding_model: cl100k_base
skip_workflows: []
llm:
api_key: lm-studio
type: openai_chat # or azure_openai_chat
model: gemma-2b-it-GGUF/gemma-2b-it-q8_0.gguf
model_supports_json: true # recommended if this is available for your model.
api_base: http://localhost:1234/v1

max_tokens: 4000

request_timeout: 180.0

api_base: https://.openai.azure.com

api_version: 2024-02-15-preview

organization: <organization_id>

deployment_name: <azure_model_deployment_name>

tokens_per_minute: 150_000 # set a leaky bucket throttle

requests_per_minute: 10_000 # set a leaky bucket throttle

max_retries: 10

max_retry_wait: 10.0

sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times

concurrent_requests: 25 # the number of parallel inflight requests that may be made

parallelization:
stagger: 0.3

num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:

parallelization: override the global parallelization settings for embeddings

async_mode: threaded # or asyncio
llm:
api_key: lm-studio
type: openai_embedding # or azure_openai_embedding
model: nomic-ai/nomic-embed-text-v1.5-GGUF/nomic-embed-text-v1.5.Q4_K_M.gguf
api_base: http://localhost:1234/v1

api_version: 2024-02-15-preview

# organization: <organization_id>
# deployment_name: <azure_model_deployment_name>
# tokens_per_minute: 150_000 # set a leaky bucket throttle
# requests_per_minute: 10_000 # set a leaky bucket throttle
max_retries: 100
max_retry_wait: 10.0
# sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
# concurrent_requests: 25 # the number of parallel inflight requests that may be made
# batch_size: 16 # the number of documents to send in a single request
# batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
# target: required # or optional

Logs and screenshots

{"type": "error", "data": "Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key", "stack": "Traceback (most recent call last):\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\datashaper\workflow\workflow.py", line 410, in _execute_verb\n result = node.verb.func(**verb_args)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\graphrag\index\verbs\graph\clustering\cluster_graph.py", line 102, in cluster_graph\n output_df[[level_to, to]] = pd.DataFrame(\n ~~~~~~~~~^^^^^^^^^^^^^^^^\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\frame.py", line 4299, in setitem\n self._setitem_array(key, value)\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\frame.py", line 4341, in _setitem_array\n check_key_length(self.columns, key, value)\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\indexers\utils.py", line 390, in check_key_length\n raise ValueError("Columns must be same length as key")\nValueError: Columns must be same length as key\n", "source": "Columns must be same length as key", "details": null}
{"type": "error", "data": "Error running pipeline!", "stack": "Traceback (most recent call last):\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\graphrag\index\run.py", line 323, in run_pipeline\n result = await workflow.run(context, callbacks)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\datashaper\workflow\workflow.py", line 369, in run\n timing = await self._execute_verb(node, context, callbacks)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\datashaper\workflow\workflow.py", line 410, in _execute_verb\n result = node.verb.func(**verb_args)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\graphrag\index\verbs\graph\clustering\cluster_graph.py", line 102, in cluster_graph\n output_df[[level_to, to]] = pd.DataFrame(\n ~~~~~~~~~^^^^^^^^^^^^^^^^\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\frame.py", line 4299, in setitem\n self._setitem_array(key, value)\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\frame.py", line 4341, in _setitem_array\n check_key_length(self.columns, key, value)\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\indexers\utils.py", line 390, in check_key_length\n raise ValueError("Columns must be same length as key")\nValueError: Columns must be same length as key\n", "source": "Columns must be same length as key", "details": null}

Additional Information

GraphRAG Version: latest
Operating System: win10
Python Version: 3.11.9
Related Issues:

The text was updated successfully, but these errors were encountered:

AlonsoGuevara · 2024-07-19T23:45:42Z

Hi!

This is generally caused by faulty entity extraction. I would recommend taking a look at the generated cache files for this step, it could be that either the LLM returned a malformatted response, or that it is being chatty when answering.

We are centralizing other LLM discussions in these threads:
Other LLM/Api bases: #339,
Ollama: #345
Local embeddings: #370

I'll resolve this issue so we can keep the focus on those threads

myyourgit · 2024-07-21T07:08:59Z

Hi!

This is generally caused by faulty entity extraction. I would recommend taking a look at the generated cache files for this step, it could be that either the LLM returned a malformatted response, or that it is being chatty when answering.

We are centralizing other LLM discussions in these threads: Other LLM/Api bases: #339, Ollama: #345 Local embeddings: #370

I'll resolve this issue so we can keep the focus on those threads

Hi, Alonso:
Thank you!
above 3 three thread seems to resolve the local ollama setting issue, not my issue.
Thanks

myyourgit added bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer labels Jul 19, 2024

AlonsoGuevara closed this as completed Jul 19, 2024

AlonsoGuevara added oss_llm and removed bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer labels Jul 19, 2024

Ikaros-521 mentioned this issue Jul 22, 2024

Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key details=None Ikaros-521/GraphRAG-Ollama-UI#6

Closed

etiennebonnafoux mentioned this issue Jul 23, 2024

[Bug]: How to solve ValueError: Columns must be same length as key？ #670

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: < File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\indexers\utils.py", line 390, in check_key_length raise ValueError("Columns must be same length as key")> #631

[Bug]: < File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\indexers\utils.py", line 390, in check_key_length raise ValueError("Columns must be same length as key")> #631

myyourgit commented Jul 19, 2024 •

edited

Loading

AlonsoGuevara commented Jul 19, 2024

myyourgit commented Jul 21, 2024

[Bug]: < File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\indexers\utils.py", line 390, in check_key_length raise ValueError("Columns must be same length as key")> #631

[Bug]: < File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\indexers\utils.py", line 390, in check_key_length raise ValueError("Columns must be same length as key")> #631

Comments

myyourgit commented Jul 19, 2024 • edited Loading

Describe the bug

Steps to reproduce

Expected Behavior

GraphRAG Config Used

max_tokens: 4000

request_timeout: 180.0

api_base: https://.openai.azure.com

api_version: 2024-02-15-preview

organization: <organization_id>

deployment_name: <azure_model_deployment_name>

tokens_per_minute: 150_000 # set a leaky bucket throttle

requests_per_minute: 10_000 # set a leaky bucket throttle

max_retries: 10

max_retry_wait: 10.0

sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times

concurrent_requests: 25 # the number of parallel inflight requests that may be made

num_threads: 50 # the number of threads to use for parallel processing

parallelization: override the global parallelization settings for embeddings

api_version: 2024-02-15-preview

Logs and screenshots

Additional Information

AlonsoGuevara commented Jul 19, 2024

myyourgit commented Jul 21, 2024

myyourgit commented Jul 19, 2024 •

edited

Loading