You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've got the models downloaded and my container starts:
docker run -it -p 5000:5000 -v C:/Users/sivar/.llama:/root/.llama --gpus=all llamastack/llamastack-local-gpu
But when I try to connect to my container I get no response:
$ curl http://localhost:5000/health
curl: (7) Failed to connect to localhost port 5000 after 2243 ms: Couldn't connect to server
Any idea why?
Also I cannot send hello world curl command:
$ python -m llama_stack.apis.inference.client localhost 5000
User>hello world, write me a 2 sentence poem about the moon
Traceback (most recent call last):
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\httpx\_transports\default.py", line 72, in map_httpcore_exceptions
yield
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\httpx\_transports\default.py", line 377, in handle_async_request
resp = await self._pool.handle_async_request(req)
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\httpcore\_async\connection_pool.py", line 216, in handle_async_request
raise exc from None
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\httpcore\_async\connection_pool.py", line 196, in handle_async_request
response = await connection.handle_async_request(
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\httpcore\_async\connection.py", line 99, in handle_async_request
raise exc
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\httpcore\_async\connection.py", line 76, in handle_async_request
stream = await self._connect(request)
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\httpcore\_async\connection.py", line 122, in _connect
stream = await self._network_backend.connect_tcp(**kwargs)
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\httpcore\_backends\auto.py", line 30, in connect_tcp
return await self._backend.connect_tcp(
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\httpcore\_backends\anyio.py", line 115, in connect_tcp
with map_exceptions(exc_map):
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\contextlib.py", line 153, in __exit__
self.gen.throw(typ, value, traceback)
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\httpcore\_exceptions.py", line 14, in map_exceptions
raise to_exc(exc) from exc
httpcore.ConnectError: All connection attempts failed
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\llama_stack\apis\inference\client.py", line 198, in <module>
fire.Fire(main)
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-packages\fire\core.py", line 135, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-packages\fire\core.py", line 468, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-packages\fire\core.py", line 684, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\llama_stack\apis\inference\client.py", line 194, in main
asyncio.run(run_main(host, port, stream, model, logprobs))
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\asyncio\runners.py", line 44, in run
return loop.run_until_complete(main)
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\asyncio\base_events.py", line 649, in run_until_complete
return future.result()
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\llama_stack\apis\inference\client.py", line 154, in run_main
async for log in EventLogger().log(iterator):
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\llama_stack\apis\inference\event_logger.py", line 32, in log
async for chunk in event_generator:
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\llama_stack\apis\inference\client.py", line 93, in _stream_chat_completion
async with client.stream(
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\contextlib.py", line 199, in __aenter__
return await anext(self.gen)
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-packages\httpx\_client.py", line 1628, in stream
response = await self.send(
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-packages\httpx\_client.py", line 1674, in send
response = await self._send_handling_auth(
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\httpx\_client.py", line 1702, in _send_handling_auth
response = await self._send_handling_redirects(
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\httpx\_client.py", line 1739, in _send_handling_redirects
response = await self._send_single_request(request)
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\httpx\_client.py", line 1776, in _send_single_request
response = await transport.handle_async_request(request)
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\httpx\_transports\default.py", line 376, in handle_async_request
with map_httpcore_exceptions():
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\contextlib.py", line 153, in __exit__
self.gen.throw(typ, value, traceback)
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\httpx\_transports\default.py", line 89, in map_httpcore_exceptions
raise mapped_exc(message) from exc
httpx.ConnectError: All connection attempts failed
or
$ curl -X POST http://localhost:5000/inference/chat_completion -H "Content-Type: application/json" -d '{"model": "Llama3.1-8B-Instruct", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Write me a 2 sentence poem about the stars."}], "sampling_params": {"temperature": 0.7, "max_tokens": 50}}'
curl: (7) Failed to connect to localhost port 5000 after 2257 ms: Couldn't connect to server
The text was updated successfully, but these errors were encountered:
I managed to run inference with conda using llama stack run test_8b --port 5000 --disable-ipv6, but I do not know how to pass --disable-ipv6 argument into docker..
Could you try adding the flag --disable-ipv6 to docker run -it -p 5000:5000 -v C:/Users/sivar/.llama:/root/.llama --gpus=all llamastack/llamastack-local-gpu --disable-ipv6?
I've got the models downloaded and my container starts:
But when I try to connect to my container I get no response:
Any idea why?
Also I cannot send hello world curl command:
or
The text was updated successfully, but these errors were encountered: