AI00 RWKV Server
is an inference API server for the RWKV
language model based upon the web-rwkv
inference engine.
It supports VULKAN
parallel and concurrent batched inference and can run on all GPUs that support VULKAN
. No need for Nvidia cards!!! AMD cards and even integrated graphics can be accelerated!!!
No need for bulky pytorch
, CUDA
and other runtime environments, it's compact and ready to use out of the box!
Compatible with OpenAI's ChatGPT API interface.
100% open source and commercially usable, under the MIT license.
If you are looking for a fast, efficient, and easy-to-use LLM API server, then AI00 RWKV Server
is your best choice. It can be used for various tasks, including chatbots, text generation, translation, and Q&A.
Join the AI00 RWKV Server
community now and experience the charm of AI!
QQ Group for communication: 30920262
- Based on the
RWKV
model, it has high performance and accuracy - Supports
VULKAN
inference acceleration, you can enjoy GPU acceleration without the need forCUDA
! Supports AMD cards, integrated graphics, and all GPUs that supportVULKAN
- No need for bulky
pytorch
,CUDA
and other runtime environments, it's compact and ready to use out of the box! - Compatible with OpenAI's ChatGPT API interface
- Chatbots
- Text generation
- Translation
- Q&A
- Any other tasks that LLM can do
-
Directly download the latest version from Release
-
After downloading the model, place the model in the
assets/models/
path, for example,assets/models/RWKV-x060-World-3B-v2-20240228-ctx4096.st
-
Optionally modify
assets/Config.toml
for model configurations like model path, quantization layers, etc. -
Run in the command line
$ ./ai00_rwkv_server
-
Open the browser and visit the WebUI
http://localhost:65530
-
Clone this repository
$ git clone https://github.com/cgisky1980/ai00_rwkv_server.git $ cd ai00_rwkv_server
-
After downloading the model, place the model in the
assets/models/
path, for example,assets/models/RWKV-x060-World-3B-v2-20240228-ctx4096.st
-
Compile
$ cargo build --release
-
After compilation, run
$ cargo run --release
-
Open the browser and visit the WebUI
http://localhost:65530
It only supports Safetensors models with the .st
extension now. Models saved with the .pth
extension using torch need to be converted before use.
-
In the Release you could find an executable called
converter
. Run
$ ./converter --input /path/to/model.pth
- If you are building from source, run
$ cargo run --release --bin converter -- --input /path/to/model.pth
- Just like the steps mentioned above, place the model in the
.st
model in theassets/models/
path and modify the model path inassets/Config.toml
--config
: Configure file path (default:assets/Config.toml
)--ip
: The IP address the server is bound to--port
: Running port
The API service starts at port 65530, and the data input and output format follow the Openai API specification.
/api/oai/v1/models
/api/oai/models
/api/oai/v1/chat/completions
/api/oai/chat/completions
/api/oai/v1/completions
/api/oai/completions
/api/oai/v1/embeddings
/api/oai/embeddings
- Support for
text_completions
andchat_completions
- Support for sse push
- Add
embeddings
- Integrate basic front-end
- Parallel inference via
batch serve
- Support for
int8
quantization - Support for
NF4
quantization - Support for
LoRA
model - Hot loading and switching of
LoRA
model
We are always looking for people interested in helping us improve the project. If you are interested in any of the following, please join us!
- 💀Writing code
- 💬Providing feedback
- 🔆Proposing ideas or needs
- 🔍Testing new features
- ✏Translating documentation
- 📣Promoting the project
- 🏅Anything else that would be helpful to us
No matter your skill level, we welcome you to join us. You can join us in the following ways:
- Join our Discord channel
- Join our QQ group
- Submit issues or pull requests on GitHub
- Leave feedback on our website
We can't wait to work with you to make this project better! We hope the project is helpful to you!
Thank you to these awesome individuals who are insightful and outstanding for their support and selfless dedication to the project
顾真牛 📖 💻 🖋 🎨 🧑🏫 |
研究社交 💻 💡 🤔 🚧 👀 📦 |
josc146 🐛 💻 🤔 🔧 |
l15y 🔧 🔌 💻 |
Cahya Wirawan 🐛 |
yuunnn_w 📖 |
longzou 💻 🛡️ |