Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: < File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\indexers\utils.py", line 390, in check_key_length raise ValueError("Columns must be same length as key")> #631

Closed
myyourgit opened this issue Jul 19, 2024 · 2 comments

Comments

@myyourgit
Copy link

myyourgit commented Jul 19, 2024

Describe the bug

there is below error log pring while running >python -m graphrag.index --root ./ragtest0716

FO dependencies for create_base_entity_graph: ['create_summarized_entities']
22:56:23,463 graphrag.index.run INFO read table from storage: create_summarized_entities.parquet
22:56:23,487 datashaper.workflow.workflow INFO executing verb cluster_graph
22:56:23,501 graphrag.index.verbs.graph.clustering.cluster_graph WARNING Graph has no nodes
22:56:23,514 datashaper.workflow.workflow ERROR Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key
Traceback (most recent call last):
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\datashaper\workflow\workflow.py", line 410, in _execute_verb
result = node.verb.func(**verb_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\graphrag\index\verbs\graph\clustering\cluster_graph.py", line 102, in cluster_graph
output_df[[level_to, to]] = pd.DataFrame(
~~~~~~~~~^^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\frame.py", line 4299, in setitem
self._setitem_array(key, value)
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\frame.py", line 4341, in _setitem_array
check_key_length(self.columns, key, value)
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\indexers\utils.py", line 390, in check_key_length
raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key
22:56:23,523 graphrag.index.reporting.file_workflow_callbacks INFO Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key details=None
22:56:23,523 graphrag.index.run ERROR error running workflow create_base_entity_graph
Traceback (most recent call last):
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\graphrag\index\run.py", line 323, in run_pipeline
result = await workflow.run(context, callbacks)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\datashaper\workflow\workflow.py", line 369, in run
timing = await self._execute_verb(node, context, callbacks)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\datashaper\workflow\workflow.py", line 410, in _execute_verb
result = node.verb.func(**verb_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\graphrag\index\verbs\graph\clustering\cluster_graph.py", line 102, in cluster_graph
output_df[[level_to, to]] = pd.DataFrame(
~~~~~~~~~^^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\frame.py", line 4299, in setitem
self._setitem_array(key, value)
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\frame.py", line 4341, in _setitem_array
check_key_length(self.columns, key, value)
File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\indexers\utils.py", line 390, in check_key_length
raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key
22:56:23,526 graphrag.index.reporting.file_workflow_callbacks INFO Error running pipeline! details=None

Steps to reproduce

run python -m graphrag.index --root ./ragtest0716

Expected Behavior

No response

GraphRAG Config Used

run lm_studio,
enable gemma 2b in LLM model.
enable nomic AI in embedding model.

setting.yaml.

encoding_model: cl100k_base
skip_workflows: []
llm:
api_key: lm-studio
type: openai_chat # or azure_openai_chat
model: gemma-2b-it-GGUF/gemma-2b-it-q8_0.gguf
model_supports_json: true # recommended if this is available for your model.
api_base: http://localhost:1234/v1

max_tokens: 4000

request_timeout: 180.0

api_base: https://.openai.azure.com

api_version: 2024-02-15-preview

organization: <organization_id>

deployment_name: <azure_model_deployment_name>

tokens_per_minute: 150_000 # set a leaky bucket throttle

requests_per_minute: 10_000 # set a leaky bucket throttle

max_retries: 10

max_retry_wait: 10.0

sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times

concurrent_requests: 25 # the number of parallel inflight requests that may be made

parallelization:
stagger: 0.3

num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:

parallelization: override the global parallelization settings for embeddings

async_mode: threaded # or asyncio
llm:
api_key: lm-studio
type: openai_embedding # or azure_openai_embedding
model: nomic-ai/nomic-embed-text-v1.5-GGUF/nomic-embed-text-v1.5.Q4_K_M.gguf
api_base: http://localhost:1234/v1

api_version: 2024-02-15-preview

# organization: <organization_id>
# deployment_name: <azure_model_deployment_name>
# tokens_per_minute: 150_000 # set a leaky bucket throttle
# requests_per_minute: 10_000 # set a leaky bucket throttle
max_retries: 100
max_retry_wait: 10.0
# sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
# concurrent_requests: 25 # the number of parallel inflight requests that may be made
# batch_size: 16 # the number of documents to send in a single request
# batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
# target: required # or optional

Logs and screenshots

{"type": "error", "data": "Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key", "stack": "Traceback (most recent call last):\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\datashaper\workflow\workflow.py", line 410, in _execute_verb\n result = node.verb.func(**verb_args)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\graphrag\index\verbs\graph\clustering\cluster_graph.py", line 102, in cluster_graph\n output_df[[level_to, to]] = pd.DataFrame(\n ~~~~~~~~~^^^^^^^^^^^^^^^^\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\frame.py", line 4299, in setitem\n self._setitem_array(key, value)\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\frame.py", line 4341, in _setitem_array\n check_key_length(self.columns, key, value)\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\indexers\utils.py", line 390, in check_key_length\n raise ValueError("Columns must be same length as key")\nValueError: Columns must be same length as key\n", "source": "Columns must be same length as key", "details": null}
{"type": "error", "data": "Error running pipeline!", "stack": "Traceback (most recent call last):\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\graphrag\index\run.py", line 323, in run_pipeline\n result = await workflow.run(context, callbacks)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\datashaper\workflow\workflow.py", line 369, in run\n timing = await self._execute_verb(node, context, callbacks)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\datashaper\workflow\workflow.py", line 410, in _execute_verb\n result = node.verb.func(**verb_args)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\graphrag\index\verbs\graph\clustering\cluster_graph.py", line 102, in cluster_graph\n output_df[[level_to, to]] = pd.DataFrame(\n ~~~~~~~~~^^^^^^^^^^^^^^^^\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\frame.py", line 4299, in setitem\n self._setitem_array(key, value)\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\frame.py", line 4341, in _setitem_array\n check_key_length(self.columns, key, value)\n File "C:\ProgramData\anaconda3\envs\graphrag_env0716\Lib\site-packages\pandas\core\indexers\utils.py", line 390, in check_key_length\n raise ValueError("Columns must be same length as key")\nValueError: Columns must be same length as key\n", "source": "Columns must be same length as key", "details": null}

Additional Information

  • GraphRAG Version: latest
  • Operating System: win10
  • Python Version: 3.11.9
  • Related Issues:
@myyourgit myyourgit added bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer labels Jul 19, 2024
@AlonsoGuevara
Copy link
Contributor

Hi!

This is generally caused by faulty entity extraction. I would recommend taking a look at the generated cache files for this step, it could be that either the LLM returned a malformatted response, or that it is being chatty when answering.

We are centralizing other LLM discussions in these threads:
Other LLM/Api bases: #339,
Ollama: #345
Local embeddings: #370

I'll resolve this issue so we can keep the focus on those threads

@AlonsoGuevara AlonsoGuevara added oss_llm and removed bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer labels Jul 19, 2024
@myyourgit
Copy link
Author

Hi!

This is generally caused by faulty entity extraction. I would recommend taking a look at the generated cache files for this step, it could be that either the LLM returned a malformatted response, or that it is being chatty when answering.

We are centralizing other LLM discussions in these threads: Other LLM/Api bases: #339, Ollama: #345 Local embeddings: #370

I'll resolve this issue so we can keep the focus on those threads

Hi, Alonso:
Thank you!
above 3 three thread seems to resolve the local ollama setting issue, not my issue.
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants