Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IMPORTANT: Worker starts sucessfully but then gets killed by jupyterhub #194

Closed
Hoeze opened this issue Nov 5, 2020 · 3 comments
Closed
Labels

Comments

@Hoeze
Copy link
Contributor

Hoeze commented Nov 5, 2020

Hi, I got a huge problem:
The worker starts normally but the jupyterhub directly kills it.
Logs of the worker:

+ batchspawner-singleuser jupyterhub-singleuser --ip=0.0.0.0 --NotebookApp.default_url=/lab
[I 2020-11-05 13:22:43.763 SingleUserNotebookApp manager:81] [nb_conda_kernels] enabled, 18 kernels found
[I 2020-11-05 13:22:44.808 SingleUserNotebookApp extension:162] JupyterLab extension loaded from /opt/modules/i12g/anaconda/envs/jupyterhub/lib/python3.7/site-packages/jupyterlab
[I 2020-11-05 13:22:44.808 SingleUserNotebookApp extension:163] JupyterLab application directory is /opt/modules/i12g/anaconda/envs/jupyterhub/share/jupyter/lab
[I 2020-11-05 13:22:44.988 SingleUserNotebookApp __init__:34] [Jupytext Server Extension] Deriving a JupytextContentsManager from LargeFileManager
[I 2020-11-05 13:22:44.989 SingleUserNotebookApp singleuser:561] Starting jupyterhub-singleuser server version 1.1.0
[I 2020-11-05 13:22:44.996 SingleUserNotebookApp notebookapp:2209] Serving notebooks from local directory: /data/nasif12/home_if12/hoelzlwi
[I 2020-11-05 13:22:44.996 SingleUserNotebookApp notebookapp:2209] Jupyter Notebook 6.1.4 is running at:
[I 2020-11-05 13:22:44.996 SingleUserNotebookApp notebookapp:2209] http://[...]:50758/
[I 2020-11-05 13:22:44.996 SingleUserNotebookApp notebookapp:2210] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[I 2020-11-05 13:22:45.010 SingleUserNotebookApp singleuser:542] Updating Hub with activity every 300 seconds
slurmstepd: error: *** JOB 377371 ON [...] CANCELLED AT 2020-11-05T13:23:39 ***

Logs of jupyterhub:


[I 2020-11-05 13:27:39.649 JupyterHub log:181] 302 POST /jupyter/hub/spawn/<user> -> /jupyter/hub/spawn-pending/<user> (<user>@192.168.16.11) 1013.71ms
[I 2020-11-05 13:27:39.761 JupyterHub pages:398] <user> is pending spawn
[I 2020-11-05 13:27:39.771 JupyterHub log:181] 200 GET /jupyter/hub/spawn-pending/<user> (<user>@192.168.16.11) 29.93ms
[I 2020-11-05 13:27:41.587 JupyterHub log:181] 200 POST /jupyter/hub/api/batchspawner (<user>@192.168.16.13) 24.47ms
[I 2020-11-05 13:27:43.792 JupyterHub log:181] 200 GET /jupyter/hub/api (@192.168.16.13) 2.84ms
[I 2020-11-05 13:27:43.843 JupyterHub log:181] 200 POST /jupyter/hub/api/users/<user>/activity (<user>@192.168.16.13) 36.08ms
[W 2020-11-05 13:27:48.647 JupyterHub base:995] User <user> is slow to start (timeout=10)
[W 2020-11-05 13:28:38.764 JupyterHub user:684] <user>'s server failed to start in 60 seconds, giving up
[I 2020-11-05 13:28:39.153 JupyterHub batchspawner:408] Stopping server job 377372
[I 2020-11-05 13:28:39.155 JupyterHub batchspawner:293] Cancelling job 377372: sudo -E -u <user> scancel 377372
[W 2020-11-05 13:28:51.948 JupyterHub batchspawner:419] Notebook server job 377372 at node03:0 possibly failed to terminate
[E 2020-11-05 13:28:52.010 JupyterHub gen:624] Exception in Future <Task finished coro=<BaseHandler.spawn_single_user.<locals>.finish_user_spawn() done, defined at /opt/modules/i12g/anaconda/envs/jupyterhub/lib/python3.7/site-packages/jupyterhub/handlers/base.py:884> exception=TimeoutError('Timeout')> after timeout
    Traceback (most recent call last):
      File "/opt/modules/i12g/anaconda/envs/jupyterhub/lib/python3.7/site-packages/tornado/gen.py", line 618, in error_callback
        future.result()
      File "/opt/modules/i12g/anaconda/envs/jupyterhub/lib/python3.7/site-packages/jupyterhub/handlers/base.py", line 891, in finish_user_spawn
        await spawn_future
      File "/opt/modules/i12g/anaconda/envs/jupyterhub/lib/python3.7/site-packages/jupyterhub/user.py", line 708, in spawn
        raise e
      File "/opt/modules/i12g/anaconda/envs/jupyterhub/lib/python3.7/site-packages/jupyterhub/user.py", line 607, in spawn
        url = await gen.with_timeout(timedelta(seconds=spawner.start_timeout), f)
    tornado.util.TimeoutError: Timeout
    
[I 2020-11-05 13:28:52.019 JupyterHub log:181] 200 GET /jupyter/hub/api/users/<user>/server/progress (<user>@192.168.16.11) 71227.51ms

image

I am using python 3.7, jupyterhub 1.2, batchspawner 1.0.1 and the current git version of wrapspawner.

What can be the cause of this problem?

@Hoeze Hoeze added the bug label Nov 5, 2020
@Hoeze
Copy link
Contributor Author

Hoeze commented Nov 5, 2020

I think it's a problem with the ProfilesSpawner. When I directly use batchspawner, it works.

@Hoeze
Copy link
Contributor Author

Hoeze commented Nov 5, 2020

I moved this to ProfilesSpawner.
If the maintainers agree that this issues is independent of batchspawner, please close :)

@rcthomas
Copy link
Contributor

rcthomas commented Nov 6, 2020

Agree, this can be closed here since the repro at jupyterhub/wrapspawner#41 can work without batchspawner at all

@Hoeze Hoeze closed this as completed Nov 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants