You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CSGHub provide several different way to build llm app.
You could use inference endpoint to hold llm server or build gradio/streamlint app to host a demo app.
You could check our offical doc for inference https://opencsg.com/docs/inferencefinetune/inference_finetune_intro
currently hf-tgi/vllm/sglang are supported, llamacpp/ollama is in the plan.
If you are familar with gradio or streamlit, you could try space funtion as well.
For deepseek r1 model, as i know, to run r1 or r1_zero you would have at least 2*8 h100 gpu or more.
Personally I would suggest you to try ds-distill-qwen or ds-distill-llama which could be loaded with tgi or vllm at a decent gpu requirment. For even lower gpu(such as single gpu), you could use vllm to serve awq quant version . All those funtions are supportted now.
If you meet any problem, feel free to comment😄
Hi Team,
I would like to deploy
deepseek-r1:1.5b
or any other LLM model on a Linux server using CSGHub for demo purposes.Could you please provide a step-by-step tutorial or a video guide? A tutorial in Chinese is also acceptable.
Looking forward to your guidance!
The text was updated successfully, but these errors were encountered: