Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot connect to docker container on windows 11 #290

Closed
Travis-Barton opened this issue Oct 23, 2024 · 4 comments
Closed

Cannot connect to docker container on windows 11 #290

Travis-Barton opened this issue Oct 23, 2024 · 4 comments

Comments

@Travis-Barton
Copy link

I've got the models downloaded and my container starts:

docker run -it -p 5000:5000 -v C:/Users/sivar/.llama:/root/.llama --gpus=all llamastack/llamastack-local-gpu

But when I try to connect to my container I get no response:

$ curl http://localhost:5000/health
curl: (7) Failed to connect to localhost port 5000 after 2243 ms: Couldn't connect to server

Any idea why?

Also I cannot send hello world curl command:

$ python -m llama_stack.apis.inference.client localhost 5000 
User>hello world, write me a 2 sentence poem about the moon
Traceback (most recent call last):
  File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\httpx\_transports\default.py", line 72, in map_httpcore_exceptions
    yield
  File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\httpx\_transports\default.py", line 377, in handle_async_request
    resp = await self._pool.handle_async_request(req)       
  File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\httpcore\_async\connection_pool.py", line 216, in handle_async_request
    raise exc from None
  File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\httpcore\_async\connection_pool.py", line 196, in handle_async_request
    response = await connection.handle_async_request(       
  File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\httpcore\_async\connection.py", line 99, in handle_async_request
    raise exc
  File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\httpcore\_async\connection.py", line 76, in handle_async_request
    stream = await self._connect(request)
  File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\httpcore\_async\connection.py", line 122, in _connect
    stream = await self._network_backend.connect_tcp(**kwargs)
  File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\httpcore\_backends\auto.py", line 30, in connect_tcp
    return await self._backend.connect_tcp(
  File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\httpcore\_backends\anyio.py", line 115, in connect_tcp
    with map_exceptions(exc_map):
  File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\httpcore\_exceptions.py", line 14, in map_exceptions
    raise to_exc(exc) from exc
httpcore.ConnectError: All connection attempts failed       

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\llama_stack\apis\inference\client.py", line 198, in <module>
    fire.Fire(main)
  File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-packages\fire\core.py", line 135, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-packages\fire\core.py", line 468, in _Fire
    component, remaining_args = _CallAndUpdateTrace(        
  File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-packages\fire\core.py", line 684, in _CallAndUpdateTrace    
    component = fn(*varargs, **kwargs)
  File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\llama_stack\apis\inference\client.py", line 194, in main
    asyncio.run(run_main(host, port, stream, model, logprobs))
  File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\asyncio\runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\asyncio\base_events.py", line 649, in run_until_complete
    return future.result()
  File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\llama_stack\apis\inference\client.py", line 154, in run_main
    async for log in EventLogger().log(iterator):
  File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\llama_stack\apis\inference\event_logger.py", line 32, in log
    async for chunk in event_generator:
  File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\llama_stack\apis\inference\client.py", line 93, in _stream_chat_completion
    async with client.stream(
  File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-packages\httpx\_client.py", line 1628, in stream
    response = await self.send(
  File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-packages\httpx\_client.py", line 1674, in send
    response = await self._send_handling_auth(
  File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\httpx\_client.py", line 1702, in _send_handling_auth
    response = await self._send_handling_redirects(
  File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\httpx\_client.py", line 1739, in _send_handling_redirects
    response = await self._send_single_request(request)     
  File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\httpx\_client.py", line 1776, in _send_single_request
    response = await transport.handle_async_request(request)
  File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\httpx\_transports\default.py", line 376, in handle_async_request
    with map_httpcore_exceptions():
  File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\httpx\_transports\default.py", line 89, in map_httpcore_exceptions
    raise mapped_exc(message) from exc
httpx.ConnectError: All connection attempts failed


or

$ curl -X POST http://localhost:5000/inference/chat_completion -H "Content-Type: application/json" -d '{"model": "Llama3.1-8B-Instruct", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Write me a 2 sentence poem about the stars."}], "sampling_params": {"temperature": 0.7, "max_tokens": 50}}'
curl: (7) Failed to connect to localhost port 5000 after 2257 ms: Couldn't connect to server
@ashwinb
Copy link
Contributor

ashwinb commented Oct 23, 2024

can you try disabling ipv6? pass --disable-ipv6 to the docker run command. @yanxi0830 that should work right?

@wukaixingxp
Copy link

I managed to run inference with conda using llama stack run test_8b --port 5000 --disable-ipv6, but I do not know how to pass --disable-ipv6 argument into docker..

@yanxi0830
Copy link
Contributor

Could you try adding the flag --disable-ipv6 to docker run -it -p 5000:5000 -v C:/Users/sivar/.llama:/root/.llama --gpus=all llamastack/llamastack-local-gpu --disable-ipv6?

@Travis-Barton
Copy link
Author

That worked!! thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants