-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question]: Unable to interpret file #4927
Comments
Major error: |
Is it on MAC? |
I used this image on x86 Linux and had the same problem. The file could not be parsed. I had changed mysql to mariaDB and it still happened. |
I'm also running on an x86 Linux virtual machine, and I'm unable to use MySQL. The MySQL Docker container fails to start. After switching to MariaDB, it started successfully. Additionally, the original MinIO image RELEASE.2023-12-10T10-51-33Z-cpuv2 failed to start, so I replaced it with RELEASE.2023-12-02T10-51-33Z-cpuv1. Ultimately, the entire system is now functional. However, I have no idea how to resolve the aforementioned errors. |
Describe your problem
I uploaded a very simple text file, but it stopped every time when it did not exceed 10%, and it would not be updated for dozens of minutes. Checking the log found an error. How to solve it? The server has enough space, 10 cores and 32GB of memory. And check the CPU status, the usage is very low.
log:
2025-02-13 12:02:34,419 INFO 20 task_consumer_0 reported heartbeat: {"name": "task_consumer_0", "now": "2025-02-13T12:02:34.417+08:00", "boot_at": "2025-02-13T11:42:32.798+08:00", "pending": 0, "lag": 0, "done": 0, "failed": 4, "current": null}
2025-02-13 12:02:34,450 INFO 19 172.18.0.3 - - [13/Feb/2025 12:02:34] "POST /v1/document/run HTTP/1.1" 200 -
2025-02-13 12:02:34,471 INFO 20 handle_task begin for task {"id": "5adc36a2e9bf11ef97940242ac120003", "doc_id": "da05ad88e9b911ef9f7b0242ac120004", "from_page": 0, "to_page": 100000000, "retry_count": 0, "kb_id": "b16deaf2e9b911ef8b450242ac120004", "parser_id": "naive", "parser_config": {"auto_keywords": 3, "auto_questions": 1, "raptor": {"use_raptor": false}, "graphrag": {"use_graphrag": true, "entity_types": ["organization", "person", "geo", "event", "category"], "method": "light"}, "chunk_token_num": 128, "delimiter": "\n!?;\u3002\uff1b\uff01\uff1f", "layout_recognize": "DeepDOC", "html4excel": false}, "name": "11111.txt", "type": "doc", "location": "11111.txt", "size": 3735, "tenant_id": "44670308e9b911efa2950242ac120004", "language": "Chinese", "embd_id": "EntropyYue/jina-embeddings-v2-base-zh:160m@Ollama", "pagerank": 2, "kb_parser_config": {"auto_keywords": 3, "auto_questions": 1, "raptor": {"use_raptor": false}, "graphrag": {"use_graphrag": true, "entity_types": ["organization", "person", "geo", "event", "category"], "method": "light"}}, "img2txt_id": "", "asr_id": "", "llm_id": "qwen2.5@Ollama", "update_time": 1739419354436, "task_type": ""}
2025-02-13 12:02:34,481 INFO 19 172.18.0.3 - - [13/Feb/2025 12:02:34] "GET /v1/document/list?kb_id=b16deaf2e9b911ef8b450242ac120004&keywords=&page_size=10&page=1 HTTP/1.1" 200 -
2025-02-13 12:02:34,849 INFO 20 HTTP Request: POST http://kode.work:11434/api/embeddings "HTTP/1.1 200 OK"
2025-02-13 12:02:35,090 INFO 20 HEAD http://es01:9200/ragflow_44670308e9b911efa2950242ac120004 [status:200 duration:0.222s]
2025-02-13 12:02:35,109 INFO 20 From minio(0.018906587000174113) 11111.txt/11111.txt
2025-02-13 12:02:35,119 INFO 20 set_progress(5adc36a2e9bf11ef97940242ac120003), progress: 0.1, progress_msg: 12:02:35 Page(1
100000001): Start to parse.100000001): [ERROR]Internal server error while chunking: failed to acquire lock update_progress2025-02-13 12:02:35,130 INFO 20 set_progress(5adc36a2e9bf11ef97940242ac120003), progress: -1, progress_msg: 12:02:35 Page(1
2025-02-13 12:02:35,141 INFO 20 set_progress(5adc36a2e9bf11ef97940242ac120003), progress: -1, progress_msg: 12:02:35 [ERROR][Exception]: failed to acquire lock update_progress
2025-02-13 12:02:35,143 ERROR 20 handle_task got exception for task {"id": "5adc36a2e9bf11ef97940242ac120003", "doc_id": "da05ad88e9b911ef9f7b0242ac120004", "from_page": 0, "to_page": 100000000, "retry_count": 0, "kb_id": "b16deaf2e9b911ef8b450242ac120004", "parser_id": "naive", "parser_config": {"auto_keywords": 3, "auto_questions": 1, "raptor": {"use_raptor": false}, "graphrag": {"use_graphrag": true, "entity_types": ["organization", "person", "geo", "event", "category"], "method": "light"}, "chunk_token_num": 128, "delimiter": "\n!?;\u3002\uff1b\uff01\uff1f", "layout_recognize": "DeepDOC", "html4excel": false}, "name": "11111.txt", "type": "doc", "location": "11111.txt", "size": 3735, "tenant_id": "44670308e9b911efa2950242ac120004", "language": "Chinese", "embd_id": "EntropyYue/jina-embeddings-v2-base-zh:160m@Ollama", "pagerank": 2, "kb_parser_config": {"auto_keywords": 3, "auto_questions": 1, "raptor": {"use_raptor": false}, "graphrag": {"use_graphrag": true, "entity_types": ["organization", "person", "geo", "event", "category"], "method": "light"}}, "img2txt_id": "", "asr_id": "", "llm_id": "qwen2.5@Ollama", "update_time": 1739419354436, "task_type": ""}
Traceback (most recent call last):
File "/ragflow/rag/svr/task_executor.py", line 218, in build_chunks
cks = chunker.chunk(task["name"], binary=binary, from_page=task["from_page"],
File "/ragflow/rag/app/naive.py", line 250, in chunk
callback(0.1, "Start to parse.")
File "/ragflow/rag/svr/task_executor.py", line 134, in set_progress
TaskService.update_progress(task_id, d)
File "/ragflow/.venv/lib/python3.10/site-packages/peewee.py", line 3128, in inner
return fn(*args, **kwargs)
File "/ragflow/api/db/services/task_service.py", line 193, in update_progress
with DB.lock("update_progress", -1):
File "/ragflow/api/db/db_models.py", line 371, in enter
self.lock()
File "/ragflow/api/db/db_models.py", line 355, in lock
raise Exception(f'failed to acquire lock {self.lock_name}')
Exception: failed to acquire lock update_progress
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/ragflow/rag/svr/task_executor.py", line 626, in handle_task
do_handle_task(task)
File "/ragflow/rag/svr/task_executor.py", line 559, in do_handle_task
chunks = build_chunks(task, progress_callback)
File "/ragflow/rag/svr/task_executor.py", line 225, in build_chunks
progress_callback(-1, "Internal server error while chunking: %s" % str(e).replace("'", ""))
File "/ragflow/rag/svr/task_executor.py", line 134, in set_progress
TaskService.update_progress(task_id, d)
File "/ragflow/.venv/lib/python3.10/site-packages/peewee.py", line 3128, in inner
return fn(*args, **kwargs)
File "/ragflow/api/db/services/task_service.py", line 193, in update_progress
with DB.lock("update_progress", -1):
File "/ragflow/api/db/db_models.py", line 371, in enter
self.lock()
File "/ragflow/api/db/db_models.py", line 355, in lock
raise Exception(f'failed to acquire lock {self.lock_name}')
Exception: failed to acquire lock update_progress
The text was updated successfully, but these errors were encountered: