-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ValueError: Cannot use chat template functions because tokenizer.chat_template is not set and no template argument was passed! For information about writing templates and setting the tokenizer.chat_template attribute, please see the documentation at https://huggingface.co/docs/transformers/main/en/chat_templating #160
Comments
python -W ignore server.py |
May I ask how I can solve this? thanks!!! |
Having the same issue |
Having the same issue using vila-infer
How I can solve this ? |
python3 build_visual_engine.py --model_path tmp/hf_models/${MODEL_NAME} --model_type vila --vila_path ${VILA_PATH} # for VILA /usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:128: FutureWarning: Using the same erro with TensorRT-LLM |
You can add a chat_template entry to the LLM's tokenizer_config.json file. The old models don't have that entry and I think the HuggingFace library was appending a default chat template for the unknown models. However, since they have changed that behavior due to logical errors caused by a default chat template, you will need to append a chat template on your own. I'm not sure if the chat_template entry from the NVILA model would work, but you can get the idea: https://huggingface.co/Efficient-Large-Model/NVILA-8B/blob/049743cc51bdd8872cdf696a1a58b03fef2c367e/llm/tokenizer_config.json#L50 For comparison, here is the tokenizer_config.json from the old model. There is no entry for the chat_template: https://huggingface.co/Efficient-Large-Model/VILA1.5-3b/blob/main/llm/tokenizer_config.json |
I have added the same content in the tokenizer_comfig.json of VILA1.5-3b, but it still reports the same error |
only need add this to the tokenizer_comfig.json of VILA1.5-3b, it works for me for TensorRT-LLM |
Okay, it seems that this problem has been solved. Taking this opportunity, I would like to ask another similar question. Thank you! When I open this server, similar errors are also reported when the client initiates a request, whether it is NVILA-15B or VILA1.5-3B Traceback (most recent call last): |
Thanks |
vila code version is the newest? |
i'm having this issue right now with the server code too. Did anyone solve this? |
It seems that we haven't found a solution to this problem yet, but I have rebuilt a server and client using the inference code from the project, which can achieve inference. You can refer to the section in/Vila/llava/cli/infer.py. |
@HAOYON-666 thanks for the suggestion! yes, that's what i ended up doing too but still it would be cool to have an official server implementation compatible with streaming and openapi |
(VILA) (base) user@ubuntu(125):/data/workspace/zhaoyong/model/VILA$ sh 1.sh
[2024-12-18 09:46:02,468] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
INFO: Started server process [2989388]
INFO: Waiting for application startup.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00, 1.08s/it]
We've detected an older driver with an RTX 4000 series GPU. These drivers have issues with P2P. This can affect the multi-gpu inference when using accelerate device_map.Please make sure to update your driver to the latest version which resolves this.
ERROR: Traceback (most recent call last):
File "/home/user/miniconda3/envs/VILA/lib/python3.10/site-packages/starlette/routing.py", line 693, in lifespan
async with self.lifespan_context(app) as maybe_state:
File "/home/user/miniconda3/envs/VILA/lib/python3.10/contextlib.py", line 199, in aenter
return await anext(self.gen)
File "/data/workspace/zhaoyong/model/VILA/server.py", line 118, in lifespan
tokenizer, model, image_processor, context_len = load_pretrained_model(model_path, model_name, None)
File "/data/workspace/zhaoyong/model/VILA/llava/model/builder.py", line 115, in load_pretrained_model
model = LlavaLlamaModel(config=config, low_cpu_mem_usage=True, **kwargs)
File "/data/workspace/zhaoyong/model/VILA/llava/model/language_model/llava_llama.py", line 49, in init
self.init_vlm(config=config, *args, **kwargs)
File "/data/workspace/zhaoyong/model/VILA/llava/model/llava_arch.py", line 74, in init_vlm
self.llm, self.tokenizer = build_llm_and_tokenizer(llm_cfg, config, *args, **kwargs)
File "/data/workspace/zhaoyong/model/VILA/llava/model/language_model/builder.py", line 203, in build_llm_and_tokenizer
tokenizer.stop_tokens = infer_stop_tokens(tokenizer)
File "/data/workspace/zhaoyong/model/VILA/llava/utils/tokenizer.py", line 174, in infer_stop_tokens
template = tokenize_conversation(DUMMY_CONVERSATION, tokenizer, overrides={"gpt": SENTINEL_TOKEN})
File "/data/workspace/zhaoyong/model/VILA/llava/utils/tokenizer.py", line 110, in tokenize_conversation
text = tokenizer.apply_chat_template(conversation, add_generation_prompt=add_generation_prompt, tokenize=False)
File "/home/user/miniconda3/envs/VILA/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1803, in apply_chat_template
chat_template = self.get_chat_template(chat_template, tools)
File "/home/user/miniconda3/envs/VILA/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1967, in get_chat_template
raise ValueError(
ValueError: Cannot use chat template functions because tokenizer.chat_template is not set and no template argument was passed! For information about writing templates and setting the tokenizer.chat_template attribute, please see the documentation at https://huggingface.co/docs/transformers/main/en/chat_templating
ERROR: Application startup failed. Exiting.
The text was updated successfully, but these errors were encountered: