Celery worker/consumer loses connection to RabbitMQ broker after 20 to 30 minutes #434

griffinschulte · 2024-05-29T16:33:03Z

Hi folks,

I'm developing a Flask application that utilizes Celery with RabbitMQ as the broker. Below are the version details of my dependencies:

Flask=3.0.3
amqp=5.2.0
celery=5.4.0
kombu=5.3.7
RabbitMQ=3.12.13

I'm running into an issue where Celery workers handling long running tasks (> 30 minutes) appear to lose connection to RabbitMQ after about 20-30 minutes of execution. The worker continues to handle task execution; however, once the task completes, my Celery worker returns the following error, and no longer responds to new tasks that are generated:

[2024-05-29 09:36:19,714: INFO/MainProcess] Task project.tasks.validation_workflow[0f2dda43-4a09-41a9-ad52-2edb22721d57] succeeded in 1800.813s: None [2024-05-29 09:36:19,717: CRITICAL/MainProcess] Couldn't ack 1, reason:SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2396)') Traceback (most recent call last): File "C:\Users\User_Name\project\venv\lib\site-packages\kombu\message.py", line 131, in ack_log_error self.ack(multiple=multiple) File "C:\Users\User_Name\project\venv\lib\site-packages\kombu\message.py", line 126, in ack self.channel.basic_ack(self.delivery_tag, multiple=multiple) File "C:\Users\User_Name\project\venv\lib\site-packages\amqp\channel.py", line 1407, in basic_ack return self.send_method( File "C:\Users\User_Name\project\venv\lib\site-packages\amqp\abstract_channel.py", line 70, in send_method conn.frame_writer(1, self.channel_id, sig, args, content) File "C:\Users\User_Name\project\venv\lib\site-packages\amqp\method_framing.py", line 186, in write_frame write(buffer_store.view[:offset]) File "C:\Users\User_Name\project\venv\lib\site-packages\amqp\transport.py", line 347, in write self._write(s) File "C:\Users\User_Name\project\venv\lib\site-packages\amqp\transport.py", line 597, in _write n = write(s) File "C:\Program Files\Python310\lib\ssl.py", line 1149, in write return self._sslobj.write(data) ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:2396)

As a result, the task is added back into the queue, and begins to execute again after I restart the worker. Around the 20-30 minute mark during task execution, I can see the consumer drop from the Celery queue. The Celery worker returns no errors until after the task completes execution. If the task is successful, the result is reflected as successful in the Celery worker console, and in Flower, but is not acknowledged. The error is displayed immediately after the success message in the worker console. I initially thought this was a consumer_timeout issue, but after increasing the default value from 30 minutes to 10 hours, I'm still getting the above error.

I'm having a hard time identifying what may be the issue here. Any help would be greatly appreciated.

Thank you!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Celery worker/consumer loses connection to RabbitMQ broker after 20 to 30 minutes #434

Celery worker/consumer loses connection to RabbitMQ broker after 20 to 30 minutes #434

griffinschulte commented May 29, 2024

Celery worker/consumer loses connection to RabbitMQ broker after 20 to 30 minutes #434

Celery worker/consumer loses connection to RabbitMQ broker after 20 to 30 minutes #434

Comments

griffinschulte commented May 29, 2024