Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: <title>FileNotFoundError: Table default-community-full_content does not exist #1628

Open
3 tasks
HENScience opened this issue Jan 16, 2025 · 0 comments
Open
3 tasks
Labels
bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer

Comments

@HENScience
Copy link

Do you need to file an issue?

  • I have searched the existing issues and this bug is not already filed.
  • My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
  • I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.

Describe the bug

The following error is displayed when creating an index and executing to flows.generate_text_embeddings:
15:11:42,754 graphrag.index.run.run_workflows ERROR error running workflow generate_text_embeddings
Traceback (most recent call last):
File "D:\software\Anaconda3\envs\graphrag\Lib\site-packages\lancedb\table.py", line 1195, in open
tbl.version
File "D:\software\Anaconda3\envs\graphrag\Lib\site-packages\lancedb\table.py", line 1237, in version
return self._dataset.version
^^^^^^^^^^^^^
File "D:\software\Anaconda3\envs\graphrag\Lib\site-packages\lancedb\table.py", line 1210, in _dataset
return self._ref.dataset
^^^^^^^^^^^^^^^^^
File "D:\software\Anaconda3\envs\graphrag\Lib\site-packages\lancedb\table.py", line 1086, in dataset
self.dataset = lance.dataset(
^^^^^^^^^^^^^^
File "D:\software\Anaconda3\envs\graphrag\Lib\site-packages\lance_init
.py", line 107, in dataset
ds = LanceDataset(
^^^^^^^^^^^^^
File "D:\software\Anaconda3\envs\graphrag\Lib\site-packages\lance\dataset.py", line 171, in init
self._ds = _Dataset(
^^^^^^^^^
ValueError: Dataset at path E:/data/graphrag/index/books/output/lancedb/default-community-full_content.lance was not found: Not found: E:/data/graphrag/index/books/output/lancedb/default-community-full_content.lance/_versions, D:\a\lance\lance\rust\lance-table\src\io\commit.rs:286:23, D:\a\lance\lance\rust\lance\src\dataset\builder.rs:317:35

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:\software\Anaconda3\envs\graphrag\Lib\site-packages\graphrag\index\run\run_workflows.py", line 166, in _run_workflows
result = await run_workflow(
^^^^^^^^^^^^^^^^^^^
File "D:\software\Anaconda3\envs\graphrag\Lib\site-packages\graphrag\index\workflows\generate_text_embeddings.py", line 45, in run_workflow
await generate_text_embeddings(
File "D:\software\Anaconda3\envs\graphrag\Lib\site-packages\graphrag\index\flows\generate_text_embeddings.py", line 98, in generate_text_embeddings
await _run_and_snapshot_embeddings(
File "D:\software\Anaconda3\envs\graphrag\Lib\site-packages\graphrag\index\flows\generate_text_embeddings.py", line 121, in _run_and_snapshot_embeddings
data["embedding"] = await embed_text(
^^^^^^^^^^^^^^^^^
File "D:\software\Anaconda3\envs\graphrag\Lib\site-packages\graphrag\index\operations\embed_text\embed_text.py", line 89, in embed_text
return await _text_embed_with_vector_store(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\software\Anaconda3\envs\graphrag\Lib\site-packages\graphrag\index\operations\embed_text\embed_text.py", line 206, in _text_embed_with_vector_store
vector_store.load_documents(documents, overwrite and i == 0)
File "D:\software\Anaconda3\envs\graphrag\Lib\site-packages\graphrag\vector_stores\lancedb.py", line 76, in load_documents
self.document_collection = self.db_connection.open_table(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\software\Anaconda3\envs\graphrag\Lib\site-packages\lancedb\db.py", line 445, in open_table
return LanceTable.open(self, name, index_cache_size=index_cache_size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\software\Anaconda3\envs\graphrag\Lib\site-packages\lancedb\table.py", line 1198, in open
raise FileNotFoundError(f"Table {name} does not exist")
FileNotFoundError: Table default-community-full_content does not exist
15:11:42,762 graphrag.callbacks.file_workflow_callbacks INFO Error running pipeline! details=None
15:11:42,908 graphrag.cli.index ERROR Errors occurred during the pipeline run, see logs for more details.

### logs.json:
{
"type": "error",
"data": "Error running pipeline!",
"stack": "Traceback (most recent call last):\n File "D:\software\Anaconda3\envs\graphrag\Lib\site-packages\lancedb\table.py", line 1195, in open\n tbl.version\n File "D:\software\Anaconda3\envs\graphrag\Lib\site-packages\lancedb\table.py", line 1237, in version\n return self._dataset.version\n ^^^^^^^^^^^^^\n File "D:\software\Anaconda3\envs\graphrag\Lib\site-packages\lancedb\table.py", line 1210, in _dataset\n return self._ref.dataset\n ^^^^^^^^^^^^^^^^^\n File "D:\software\Anaconda3\envs\graphrag\Lib\site-packages\lancedb\table.py", line 1086, in dataset\n self._dataset = lance.dataset(\n ^^^^^^^^^^^^^^\n File "D:\software\Anaconda3\envs\graphrag\Lib\site-packages\lance\init.py", line 107, in dataset\n ds = LanceDataset(\n ^^^^^^^^^^^^^\n File "D:\software\Anaconda3\envs\graphrag\Lib\site-packages\lance\dataset.py", line 171, in init\n self._ds = _Dataset(\n ^^^^^^^^^\nValueError: Dataset at path E:/data/graphrag/index/books/output/lancedb/default-community-full_content.lance was not found: Not found: E:/data/graphrag/index/books/output/lancedb/default-community-full_content.lance/_versions, D:\a\lance\lance\rust\lance-table\src\io\commit.rs:286:23, D:\a\lance\lance\rust\lance\src\dataset\builder.rs:317:35\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File "D:\software\Anaconda3\envs\graphrag\Lib\site-packages\graphrag\index\run\run_workflows.py", line 166, in _run_workflows\n result = await run_workflow(\n ^^^^^^^^^^^^^^^^^^^\n File "D:\software\Anaconda3\envs\graphrag\Lib\site-packages\graphrag\index\workflows\generate_text_embeddings.py", line 45, in run_workflow\n await generate_text_embeddings(\n File "D:\software\Anaconda3\envs\graphrag\Lib\site-packages\graphrag\index\flows\generate_text_embeddings.py", line 98, in generate_text_embeddings\n await _run_and_snapshot_embeddings(\n File "D:\software\Anaconda3\envs\graphrag\Lib\site-packages\graphrag\index\flows\generate_text_embeddings.py", line 121, in _run_and_snapshot_embeddings\n data["embedding"] = await embed_text(\n ^^^^^^^^^^^^^^^^^\n File "D:\software\Anaconda3\envs\graphrag\Lib\site-packages\graphrag\index\operations\embed_text\embed_text.py", line 89, in embed_text\n return await _text_embed_with_vector_store(\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "D:\software\Anaconda3\envs\graphrag\Lib\site-packages\graphrag\index\operations\embed_text\embed_text.py", line 206, in _text_embed_with_vector_store\n vector_store.load_documents(documents, overwrite and i == 0)\n File "D:\software\Anaconda3\envs\graphrag\Lib\site-packages\graphrag\vector_stores\lancedb.py", line 76, in load_documents\n self.document_collection = self.db_connection.open_table(\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "D:\software\Anaconda3\envs\graphrag\Lib\site-packages\lancedb\db.py", line 445, in open_table\n return LanceTable.open(self, name, index_cache_size=index_cache_size)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "D:\software\Anaconda3\envs\graphrag\Lib\site-packages\lancedb\table.py", line 1198, in open\n raise FileNotFoundError(f"Table {name} does not exist")\nFileNotFoundError: Table default-community-full_content does not exist\n",
"source": "Table default-community-full_content does not exist",
"details": null
}

Steps to reproduce

No response

Expected Behavior

No response

GraphRAG Config Used

# Paste your config here
parallelization:
  stagger: 0.3
  num_threads: 20

async_mode: asyncio # or asyncio/threaded
    
embeddings:
  async_mode: asyncio # or asyncio/threaded
  parallelization:
    num_threads: 10
  batch_max_tokens: 8192
  vector_store: 
    type: lancedb
    db_uri: 'output\lancedb'
    container_name: default
    overwrite: false
  llm:
    api_key: XXXX
    type: azure_openai_embedding # or openai_embedding
    model: text-embedding-3-large
    api_base: XXXX
    api_version: XXXX
    deployment_name: text-embedding-3-large

### Input settings ###

input:
  type: file # or blob
  file_type: text # or csv
  base_dir: "input"
  file_encoding: utf-8
  file_pattern: ".*\\.txt$"

chunks:
  size: 1200   # 是token的大小
  overlap: 30
  group_by_columns: [id]

### Storage settings ###
## If blob storage is specified in the following four sections,
## connection_string and container_name must be provided

cache:
  type: file # or blob
  base_dir: "cache"

reporting:
  type: file # or console, blob
  base_dir: "logs"

storage:
  type: file # or blob
  base_dir: "output"

## only turn this on if running `graphrag index` with custom settings
## we normally use `graphrag update` with the defaults
update_index_storage:
  # type: file # or blob
  # base_dir: "update_output"

### Workflow settings ###

skip_workflows: []

entity_extraction:
  prompt: "prompts/entity_extraction.txt"
  entity_types: [organization,person,geo,event]
  max_gleanings: 1
  # strategy: 
  #   type: nltk

summarize_descriptions:
  prompt: "prompts/summarize_descriptions.txt"
  max_length: 500

claim_extraction:
  enabled: true
  prompt: "prompts/claim_extraction.txt"
  description: "Any claims or facts that could be relevant to information discovery."
  max_gleanings: 1

community_reports:
  prompt: "prompts/community_report.txt"
  max_length: 1000
  max_input_length: 2000

cluster_graph:
  max_cluster_size: 10

embed_graph:
  enabled: false # if true, will generate node2vec embeddings for nodes

umap:
  enabled: false # if true, will generate UMAP embeddings for nodes

snapshots:
  graphml: true
  embeddings: false
  transient: false

### Query settings ###
## The prompt locations are required here, but each search method has a number of optional knobs that can be tuned.
## See the config docs: https://microsoft.github.io/graphrag/config/yaml/#query

local_search:
  prompt: "prompts/local_search_system_prompt.txt"

global_search:
  map_prompt: "prompts/global_search_map_system_prompt.txt"
  reduce_prompt: "prompts/global_search_reduce_system_prompt.txt"
  knowledge_prompt: "prompts/global_search_knowledge_system_prompt.txt"

drift_search:
  prompt: "prompts/drift_search_system_prompt.txt"

Logs and screenshots

No response

Additional Information

  • GraphRAG Version: v1.2.0
  • Operating System: Windows11
  • Python Version: 3.12.8
  • Related Issues:
@HENScience HENScience added bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer labels Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer
Projects
None yet
Development

No branches or pull requests

1 participant