-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Culler not working when there are too many users? #88
Comments
After looking into the issue I found the problem. During pagination req.url = next_info["url"] which throws an error during fetching [I 250123 20:02:44 __init__:156] Fetching page 2 https://<my_domain>/hub/api/users?state=ready&offset=50&limit=50
[E 250123 20:02:44 ioloop:770] Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOMainLoop object at 0xf73e1c04f580>>, <Task finished name='Task-1' coro=<cull_idle() done, defined at /usr/local/lib/python3.10/dist-packages/jupyterhub_idle_culler/__init__.py:78> exception=ConnectionRefusedError(111, 'Connection refused')>)
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/tornado/ioloop.py", line 750, in _run_callback
ret = callback()
File "/usr/local/lib/python3.10/dist-packages/tornado/ioloop.py", line 774, in _discard_future_result
future.result()
File "/usr/local/lib/python3.10/dist-packages/jupyterhub_idle_culler/__init__.py", line 436, in cull_idle
async for user in fetch_paginated(req):
File "/usr/local/lib/python3.10/dist-packages/jupyterhub_idle_culler/__init__.py", line 142, in fetch_paginated
response = await resp_future
File "/usr/local/lib/python3.10/dist-packages/jupyterhub_idle_culler/__init__.py", line 124, in fetch
return await client.fetch(req)
ConnectionRefusedError: [Errno 111] Connection refused Dummy fix for my use case would be: req.url = next_info["url"].replace(
"<my_url>",
"<my_url>:5081"
) Not ideal, but I was wondering, Is this something jupyterhub should return properly? or the problem is the culler? I believe it could be jupyterhub since if we have hub on
first call would look like
and the pagination would return as next
|
Please can you tell us your Z2JH version, and share your full configuration? |
We are using Z2JH version 2.0.0. As for our configuration: For our culler config:
Installed versions that I believe matter:
|
Bug description
Culler is not culling despite the last activity clearly being surpassed by the maximum time allowed.
Current setup from our k8s logs:
Starting service 'cull-idle': ['python3', '-m', 'jupyterhub_idle_culler', '--url=https://<url>:5081/hub/api', '--timeout=14400', '--cull-every=600', '--concurrency=10']
We observe two things:
In the second case the last_activity was over 3days ago which is way more than the timeout of 14400 (4 hours)
Example obtained from
/api/users/<user>
How to reproduce
I don't know how to reproduce this, we have jupyterhub deployed in several datacenters but only fails in our busiest one with over 150 servers simultaneously at the same time (not sure if it's an issue).
I'm afraid that the service is blocked / not running as expected because there are too many users
Expected behaviour
To cull the servers after they timeout
Actual behaviour
Is not culling the servers after the timeout
Your personal set up
Latest version of jupyterhub and idle culler
Logs
I can see the following logs from the culler:
⚠ Not sure if important but the fetching page 2 doesn't show the port just the url
Sometimes I also get
The text was updated successfully, but these errors were encountered: