Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uWSGI HTTP Process Continuously Respawning #1649

Open
0xPierre opened this issue Feb 2, 2025 · 6 comments
Open

uWSGI HTTP Process Continuously Respawning #1649

0xPierre opened this issue Feb 2, 2025 · 6 comments

Comments

@0xPierre
Copy link

0xPierre commented Feb 2, 2025

Hello,

Description

I am trying to deploy an instance of internet.nl, but the internetnl-prod-app container running uWSGI keeps getting killed due to an out-of-memory (OOM) condition.

I am running a fresh VPS:

  • OS: Ubuntu 24.10
  • vCPU: 4
  • Memory: 8 Go

Journalctl logs

Feb 02 23:26:57 vps-7de75884 kernel: Memory cgroup out of memory: Killed process 833379 (uwsgi) total-vm:8497512kB, anon-rss:5230280kB, file-rss:1216kB, shmem-rss:0kB, UID:65534 pgtables:10340kB oom_score_adj:0
Feb 02 23:26:57 vps-7de75884 systemd[1]: docker-8fd32c8348bf3c6e9ea7a58da0c93f468d6c2952586d82769b5529792801867f.scope: A process of this unit has been killed by the OOM killer.

Logs of internetnl-prod-app

*** Starting uWSGI 2.0.22 (64bit) on [Sun Feb  2 23:12:37 2025] ***
compiled with version: 10.2.1 20210110 on 30 January 2025 09:42:25
os: Linux-6.11.0-12-generic #13-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov 21 20:03:13 UTC 2024
nodename: app
machine: x86_64
clock source: unix
detected number of CPU cores: 4
current working directory: /app
detected binary path: /usr/local/bin/uwsgi
!!! no internal routing support, rebuild with pcre support !!!
your memory page size is 4096 bytes
detected max file descriptor number: 1073741816
lock engine: pthread robust mutexes
thunder lock: disabled (you can enable it with --thunder-lock)
uWSGI http bound on 0.0.0.0:8080 fd 4
uwsgi socket 0 bound to TCP address 127.0.0.1:40435 (port auto-assigned) fd 3
Python version: 3.9.2 (default, Dec  1 2024, 12:12:57)  [GCC 10.2.1 20210110]
*** Python threads support is disabled. You can enable it with --enable-threads ***
Python main interpreter initialized at 0x5d3998d4e220
your server socket listen backlog is limited to 100 connections
your mercy for graceful operations on workers is 60 seconds
mapped 145840 bytes (142 KB) for 1 cores
*** Operational MODE: single process ***
Single domain scan enabled, batch scanning and API not available.
WSGI app 0 (mountpoint='') ready in 3 seconds on interpreter 0x5d3998d4e220 pid: 1 (default app)
*** uWSGI is running in multiple interpreter mode ***
spawned uWSGI master process (pid: 1)
spawned uWSGI worker 1 (pid: 7, cores: 1)
*** Stats server enabled on 127.0.0.1:1717 fd: 17 ***
spawned uWSGI http 1 (pid: 8)
respawned uWSGI http 1 (pid: 15)
respawned uWSGI http 1 (pid: 16)
respawned uWSGI http 1 (pid: 17)
respawned uWSGI http 1 (pid: 18)
respawned uWSGI http 1 (pid: 19)
respawned uWSGI http 1 (pid: 20)
respawned uWSGI http 1 (pid: 21)
respawned uWSGI http 1 (pid: 22)
respawned uWSGI http 1 (pid: 23)
respawned uWSGI http 1 (pid: 24)
[etc...]

I already tried running only one process of uWSGI.

Any idea how to investigate it or fix it?

@0xPierre
Copy link
Author

Hi,
I figured out, after letting it running for severals days, it finally worked.
Anyone has an idea of why it took so long ?

I got this type of error again and again in the routinator instance :

[WARN] rsync://rpki.miralium.net/repo/: rsync: getaddrinfo: rpki.miralium.net 873: Name has no usable address
[WARN] rsync://rpki.miralium.net/repo/: rsync error: error in socket IO (code 10) at clientserver.c(139) [Receiver=3.4.0]
[WARN] rsync://krill.accuristechnologies.ca/repo/Accuris-Technologies/0/8A15107195E63966ABA1997AD31382979C75F736.cer: no valid manifest rsync://rpki.miralium.net/repo/Miralium-Research-RPKI-CA-A1/0/8A15107195E63966ABA1997AD31382979C75F736.mft found.
[WARN] rsync://subrepo.wildtky.com/repo/: rsync: [Receiver] failed to connect to subrepo.wildtky.com (23.180.200.177): Host is unreachable (113)
[WARN] rsync://subrepo.wildtky.com/repo/: rsync error: error in socket IO (code 10) at clientserver.c(139) [Receiver=3.4.0]
[WARN] rsync://repodepot.wildtky.com/repo/WTAFarms-Jan2025/0/A0D77953B7619F183E3407CF1E95764F0A641BDD.cer: no valid manifest rsync://subrepo.wildtky.com/repo/SubCAJan2025/0/A0D77953B7619F183E3407CF1E95764F0A641BDD.mft found.
[WARN] rsync://rpki.cc/repo/MythicalKitten/12/7BBD0E669176F6F2E8BB8FC3104A8D23435175AE.cer: no valid manifest rsync://krill.ca-bc-01.ssmidge.xyz/repo/SsmidgeLLC/1/7BBD0E669176F6F2E8BB8FC3104A8D23435175AE.mft found.
[WARN] rsync://cloudie-repo.rpki.app/repo/CLOUDIE-RPKI/0/73236D2CCA0EE5A74A9C40FFF721835444703ABE.cer: no valid manifest rsync://rpki.uz/repo/pedjoeang-digital-networks/4/73236D2CCA0EE5A74A9C40FFF721835444703ABE.mft found.
[WARN] rsync://rsync.paas.rpki.ripe.net/repository/0c70401c-7f41-4a6b-9434-cc80dca093e6/2/3B7184989F76A03708039261134F384B50D011BB.cer: no valid manifest rsync://krill.immarket.space/repo/imh/0/3B7184989F76A03708039261134F384B50D011BB.mft found.
[WARN] rsync://rpki.cc/repo/MythicalKitten/1/4173C015E8E1FED254D4938B7E69CB256CCF6936.cer: no valid manifest rsync://krill.ca-bc-01.ssmidge.xyz/repo/AS199177/0/4173C015E8E1FED254D4938B7E69CB256CCF6936.mft found.
[WARN] rsync://rpki-repository.haruue.net/repo/YC3254-RPKI/2/3F0AC25D352C83DA8307594B98ED061BE8489682.mft: certificate has expired.
[WARN] rsync://cloudie-repo.rpki.app/repo/CLOUDIE-RPKI/0/3F0AC25D352C83DA8307594B98ED061BE8489682.cer: no valid manifest rsync://rpki-repository.haruue.net/repo/YC3254-RPKI/2/3F0AC25D352C83DA8307594B98ED061BE8489682.mft found.
[WARN] RRDP https://krill.stonham.uk/rrdp/notification.xml: HTTP status server error (522 <unknown status code>) for url (https://krill.stonham.uk/rrdp/notification.xml)
[WARN] rsync://krill.stonham.info/repo/: rsync error: timeout waiting for daemon connection (code 35) at socket.c(278) [Receiver=3.4.0]
[WARN] rsync://cloudie-repo.rpki.app/repo/CLOUDIE-RPKI/0/635C29FF238CC286AC1625A68EFCC04E2E460171.cer: no valid manifest rsync://krill.stonham.info/repo/Stonham/1/635C29FF238CC286AC1625A68EFCC04E2E460171.mft found.

@bwbroersma
Copy link
Collaborator

bwbroersma commented Feb 14, 2025

Note we currently are also investigating a OOM issue, which is indeed an app-container RAM spike (for us daily, just after 03h time, which seems related to activity in the cron container):


8GB is a bit tight, but should work, Routinator can be a bit resource demanding, see:

Which internet.nl version are you running, and what is the load?

If your load is light, you could use a public Routinator instance by overloading the ROUTINATOR_URL in local.env, and remove the routinator from COMPOSE_PROFILES in your local.env. Note that the profiles changed from v1.9.0 and main, the next is only valid for main:

The Routinator instance is an RPKI Relying Party implementation that downloads
and verifies RPKI data. The check connects to the HTTP API to find ROAs.
This is configured in the `ROUTINATOR_URL` setting or environment variable.
There are some publicly available instances that can be used for local
testing, like `https://rpki-validator.ripe.net/api/v1/validity`. For large
scale or production setups, you should run your own instance.

# use public routinator for development so we don't have to let routinator fetch all data
ROUTINATOR_URL=https://rpki-validator.ripe.net/api/v1/validity

# Disable (do not enable) the `routinator` profile which is enable by default in `defaults.env`.
# Routinator is slow to start initially and requires a lot of resources which is not ideal for
# development environments.
COMPOSE_PROFILES=

v1.9.0 also needs cron (and connectiontest if your have that setup).

@0xPierre
Copy link
Author

0xPierre commented Feb 14, 2025

8GB is a bit tight, but should work, Routinator can be a bit resource demanding, see:

I tried on a 32go of RAM instance. It does the same, the uwsgi get killed around 7/8Go
I also removed the routinator from the COMPOSE_PROFILES

Which internet.nl version are you running, and what is the load?

I am using v1.9.0

@bwbroersma
Copy link
Collaborator

How often are you seeing a Out Of Memory (OOM) Kill?
Currently we're having the issue one time a day at internet.nl, which is also running 1.9.0, the app container uses about 1.4GiB of memory, and then spikes to 5GiB at 03:00 and then get OOM killed. The testing instance running the main branch experiences the same problem.

@0xPierre
Copy link
Author

Hi,
I am experiencing the OOM kill every seconds, my instance is still not accessible after 3 days

root@vps-7de75884:/opt/Internet.nl# docker logs internetnl-prod-app-1 -f --tail 10
respawned uWSGI http 1 (pid: 66948)
respawned uWSGI http 1 (pid: 66949)
respawned uWSGI http 1 (pid: 66950)
respawned uWSGI http 1 (pid: 66951)
respawned uWSGI http 1 (pid: 66952)
respawned uWSGI http 1 (pid: 66953)
respawned uWSGI http 1 (pid: 66954)
respawned uWSGI http 1 (pid: 66955)
respawned uWSGI http 1 (pid: 66956)
respawned uWSGI http 1 (pid: 66957)
respawned uWSGI http 1 (pid: 66958)
respawned uWSGI http 1 (pid: 66959)

@0xPierre
Copy link
Author

Finally I downgraded to Ubuntu 22.04, and it works perfectly, so there is a problem using Ubuntu 24.10.
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants