Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: Unable to interpret file #4927

Open
polemp opened this issue Feb 13, 2025 · 5 comments
Open

[Question]: Unable to interpret file #4927

polemp opened this issue Feb 13, 2025 · 5 comments
Labels
question Further information is requested

Comments

@polemp
Copy link

polemp commented Feb 13, 2025

Describe your problem

I uploaded a very simple text file, but it stopped every time when it did not exceed 10%, and it would not be updated for dozens of minutes. Checking the log found an error. How to solve it? The server has enough space, 10 cores and 32GB of memory. And check the CPU status, the usage is very low.

log:
2025-02-13 12:02:34,419 INFO 20 task_consumer_0 reported heartbeat: {"name": "task_consumer_0", "now": "2025-02-13T12:02:34.417+08:00", "boot_at": "2025-02-13T11:42:32.798+08:00", "pending": 0, "lag": 0, "done": 0, "failed": 4, "current": null}
2025-02-13 12:02:34,450 INFO 19 172.18.0.3 - - [13/Feb/2025 12:02:34] "POST /v1/document/run HTTP/1.1" 200 -
2025-02-13 12:02:34,471 INFO 20 handle_task begin for task {"id": "5adc36a2e9bf11ef97940242ac120003", "doc_id": "da05ad88e9b911ef9f7b0242ac120004", "from_page": 0, "to_page": 100000000, "retry_count": 0, "kb_id": "b16deaf2e9b911ef8b450242ac120004", "parser_id": "naive", "parser_config": {"auto_keywords": 3, "auto_questions": 1, "raptor": {"use_raptor": false}, "graphrag": {"use_graphrag": true, "entity_types": ["organization", "person", "geo", "event", "category"], "method": "light"}, "chunk_token_num": 128, "delimiter": "\n!?;\u3002\uff1b\uff01\uff1f", "layout_recognize": "DeepDOC", "html4excel": false}, "name": "11111.txt", "type": "doc", "location": "11111.txt", "size": 3735, "tenant_id": "44670308e9b911efa2950242ac120004", "language": "Chinese", "embd_id": "EntropyYue/jina-embeddings-v2-base-zh:160m@Ollama", "pagerank": 2, "kb_parser_config": {"auto_keywords": 3, "auto_questions": 1, "raptor": {"use_raptor": false}, "graphrag": {"use_graphrag": true, "entity_types": ["organization", "person", "geo", "event", "category"], "method": "light"}}, "img2txt_id": "", "asr_id": "", "llm_id": "qwen2.5@Ollama", "update_time": 1739419354436, "task_type": ""}
2025-02-13 12:02:34,481 INFO 19 172.18.0.3 - - [13/Feb/2025 12:02:34] "GET /v1/document/list?kb_id=b16deaf2e9b911ef8b450242ac120004&keywords=&page_size=10&page=1 HTTP/1.1" 200 -
2025-02-13 12:02:34,849 INFO 20 HTTP Request: POST http://kode.work:11434/api/embeddings "HTTP/1.1 200 OK"
2025-02-13 12:02:35,090 INFO 20 HEAD http://es01:9200/ragflow_44670308e9b911efa2950242ac120004 [status:200 duration:0.222s]
2025-02-13 12:02:35,109 INFO 20 From minio(0.018906587000174113) 11111.txt/11111.txt
2025-02-13 12:02:35,119 INFO 20 set_progress(5adc36a2e9bf11ef97940242ac120003), progress: 0.1, progress_msg: 12:02:35 Page(1100000001): Start to parse.
2025-02-13 12:02:35,130 INFO 20 set_progress(5adc36a2e9bf11ef97940242ac120003), progress: -1, progress_msg: 12:02:35 Page(1
100000001): [ERROR]Internal server error while chunking: failed to acquire lock update_progress
2025-02-13 12:02:35,141 INFO 20 set_progress(5adc36a2e9bf11ef97940242ac120003), progress: -1, progress_msg: 12:02:35 [ERROR][Exception]: failed to acquire lock update_progress
2025-02-13 12:02:35,143 ERROR 20 handle_task got exception for task {"id": "5adc36a2e9bf11ef97940242ac120003", "doc_id": "da05ad88e9b911ef9f7b0242ac120004", "from_page": 0, "to_page": 100000000, "retry_count": 0, "kb_id": "b16deaf2e9b911ef8b450242ac120004", "parser_id": "naive", "parser_config": {"auto_keywords": 3, "auto_questions": 1, "raptor": {"use_raptor": false}, "graphrag": {"use_graphrag": true, "entity_types": ["organization", "person", "geo", "event", "category"], "method": "light"}, "chunk_token_num": 128, "delimiter": "\n!?;\u3002\uff1b\uff01\uff1f", "layout_recognize": "DeepDOC", "html4excel": false}, "name": "11111.txt", "type": "doc", "location": "11111.txt", "size": 3735, "tenant_id": "44670308e9b911efa2950242ac120004", "language": "Chinese", "embd_id": "EntropyYue/jina-embeddings-v2-base-zh:160m@Ollama", "pagerank": 2, "kb_parser_config": {"auto_keywords": 3, "auto_questions": 1, "raptor": {"use_raptor": false}, "graphrag": {"use_graphrag": true, "entity_types": ["organization", "person", "geo", "event", "category"], "method": "light"}}, "img2txt_id": "", "asr_id": "", "llm_id": "qwen2.5@Ollama", "update_time": 1739419354436, "task_type": ""}
Traceback (most recent call last):
File "/ragflow/rag/svr/task_executor.py", line 218, in build_chunks
cks = chunker.chunk(task["name"], binary=binary, from_page=task["from_page"],
File "/ragflow/rag/app/naive.py", line 250, in chunk
callback(0.1, "Start to parse.")
File "/ragflow/rag/svr/task_executor.py", line 134, in set_progress
TaskService.update_progress(task_id, d)
File "/ragflow/.venv/lib/python3.10/site-packages/peewee.py", line 3128, in inner
return fn(*args, **kwargs)
File "/ragflow/api/db/services/task_service.py", line 193, in update_progress
with DB.lock("update_progress", -1):
File "/ragflow/api/db/db_models.py", line 371, in enter
self.lock()
File "/ragflow/api/db/db_models.py", line 355, in lock
raise Exception(f'failed to acquire lock {self.lock_name}')
Exception: failed to acquire lock update_progress

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/ragflow/rag/svr/task_executor.py", line 626, in handle_task
do_handle_task(task)
File "/ragflow/rag/svr/task_executor.py", line 559, in do_handle_task
chunks = build_chunks(task, progress_callback)
File "/ragflow/rag/svr/task_executor.py", line 225, in build_chunks
progress_callback(-1, "Internal server error while chunking: %s" % str(e).replace("'", ""))
File "/ragflow/rag/svr/task_executor.py", line 134, in set_progress
TaskService.update_progress(task_id, d)
File "/ragflow/.venv/lib/python3.10/site-packages/peewee.py", line 3128, in inner
return fn(*args, **kwargs)
File "/ragflow/api/db/services/task_service.py", line 193, in update_progress
with DB.lock("update_progress", -1):
File "/ragflow/api/db/db_models.py", line 371, in enter
self.lock()
File "/ragflow/api/db/db_models.py", line 355, in lock
raise Exception(f'failed to acquire lock {self.lock_name}')
Exception: failed to acquire lock update_progress

@polemp polemp added the question Further information is requested label Feb 13, 2025
@polemp
Copy link
Author

polemp commented Feb 13, 2025

Major error:
2025-02-13 12:02:35,130 INFO 20 set_progress(5adc36a2e9bf11ef97940242ac120003), progress: -1, progress_msg: 12:02:35 Page(1~100000001): [ERROR]Internal server error while chunking: failed to acquire lock update_progress
2025-02-13 12:02:35,141 INFO 20 set_progress(5adc36a2e9bf11ef97940242ac120003), progress: -1, progress_msg: 12:02:35 [ERROR][Exception]: failed to acquire lock update_progress

@polemp
Copy link
Author

polemp commented Feb 13, 2025

Image

@KevinHuSh
Copy link
Collaborator

Is it on MAC?
What about changing Mysql to mariaDB in docker-compose-base.yaml?

@juquxiang
Copy link

Is it on MAC? What about changing Mysql to mariaDB in docker-compose-base.yaml?

I used this image on x86 Linux and had the same problem. The file could not be parsed. I had changed mysql to mariaDB and it still happened.

@polemp
Copy link
Author

polemp commented Feb 14, 2025

I'm also running on an x86 Linux virtual machine, and I'm unable to use MySQL. The MySQL Docker container fails to start. After switching to MariaDB, it started successfully. Additionally, the original MinIO image RELEASE.2023-12-10T10-51-33Z-cpuv2 failed to start, so I replaced it with RELEASE.2023-12-02T10-51-33Z-cpuv1. Ultimately, the entire system is now functional. However, I have no idea how to resolve the aforementioned errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants