Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restart Worker when Celery fails #135

Open
Denis-Gavrielov opened this issue Sep 9, 2019 · 6 comments
Open

Restart Worker when Celery fails #135

Denis-Gavrielov opened this issue Sep 9, 2019 · 6 comments

Comments

@Denis-Gavrielov
Copy link
Contributor

There seems to be an issue with celery that the worker stops listening to request if a celery heartbeat is missed.
These are some threats that I found where people seemed to have similar issues:
celery/celery#4997
celery/celery#4185
analyseether/ether_sql#42
celery/celery#2296

Right now, there is a cronjob that checks every hour if the worker's log shows a missed heartbeat and restarts the container in that case. This works fine for now, but there might be more elegant solutions for the future.

@waqas-ali-pk
Copy link

@Denis-Gavrielov That's great idea indeed, to run cronjob after every hour and check status of workers. Are you restarting that specific worker/container, or restarting everything?

Can you share cronjob details/code, how you have implemented it for your case. I want to implement same solution, if you share stuff, it will be of great helpful.

@Denis-Gavrielov
Copy link
Contributor Author

Hi @waqas-ali-pk
It has been a while that I have worked with this project. If there is still a cronjob configured then probably via Ansible. Let me know if that was helpful.

@andronat
Copy link
Member

andronat commented Dec 31, 2020

Hey @waqas-ali-pk I think this is what you are looking for. Honestly there another cronjob that I've been using recently but recently I'm finding very difficult to find time and make a proper PR.

Nevertheless, I just paste what I'm using in a quick a dirty way here:

- name: "Restart {{ kleeweb_worker_container }} if celery heartbeat missed"
  cron:
    name: "Restart {{ kleeweb_worker_container }} if celery heartbeat missed"
    minute: "30"
    job: 'if [ $(sudo docker logs --tail=1 {{ kleeweb_worker_container }} | grep "missed heartbeat from celery" | wc -l) -eq 1 ]; then $(sudo docker restart {{ kleeweb_worker_container }}); fi'
    user: klee
  when: not ci

- name: "Kill all remaining klee containers every day"
  cron:
    name: "Kill all remaining klee containers every day"
    minute: "0"
    hour: "0"
    job: "sudo docker ps --filter ancestor=klee/klee -q | xargs sudo docker kill"
    user: klee
  when: not ci

- name: "Restart {{ kleeweb_worker_container }} every day"
  cron:
    name: "Restart {{ kleeweb_worker_container }} every day"
    minute: "0"
    hour: "0"
    job: "sudo docker restart {{ kleeweb_worker_container }}"
    user: klee
  when: not ci

@waqas-ali-pk
Copy link

Hi @andronat Thanks for sharing this!! I also have another scenario, sometimes celery worker stop without showing any error message, how that case be handled. I really appreciate your help on this.

How we can restart only that specific worker if we do not use docker?

@andronat
Copy link
Member

andronat commented Jan 1, 2021

Hi @andronat Thanks for sharing this!! I also have another scenario, sometimes celery worker stop without showing any error message, how that case be handled. I really appreciate your help on this.

Hm, well in general I was hopping to find time to upgrade to latest Celery as it seems to be more robust (e.g. heartbeats). But I never managed. PRs are always welcomed 😃. So definitely two things I can think of: 1) you could consider put the time to upgrade to latest Celery, 2) you could set a standard time point that you just blindly restart workers. Not all of them together, maybe with a rolling strategy.

How we can restart only that specific worker if we do not use docker?

Well that depends on how you run the celery project on your infrastructure. Docker in general is the easy way out and I highly recommend it.

@waqas-ali-pk
Copy link

@andronat Thanks!! This is helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants