You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The model successfully loads in 8- and 4-bit quant, confirmed by VRAM usage observed in nvtop. Upon first prompting of the model, the attached logs are seen in journalctl. nvtop shows zero GPU usage throughout, while maintaining the high VRAM usage in both quant states.
This issue recommends setting the version of the transformers package to exactly 4.44.2. I am having trouble manually penetrating the venv established by this repo to manually up- or downgrade transformers as is required, and would like some guidance.
Is there an existing issue for this?
I have searched the existing issues
Reproduction
Download THUDM/codegeex4-all-9b using the web UI
Load the model in 8- or 4-bit quant, trusting remote code
Prompt the model
Screenshot
No response
Logs
Oct 30 11:02:12 llama start_linux.sh[405]: 11:02:12-571562 INFO Loaded "THUDM_codegeex4-all-9b"in 50.95 seconds.
Oct 30 11:02:12 llama start_linux.sh[405]: 11:02:12-573108 INFO LOADER: "Transformers"
Oct 30 11:02:12 llama start_linux.sh[405]: 11:02:12-573962 INFO TRUNCATION LENGTH: 2048
Oct 30 11:02:12 llama start_linux.sh[405]: 11:02:12-574881 INFO INSTRUCTION TEMPLATE: "Alpaca"
Oct 30 11:02:34 llama start_linux.sh[405]: Traceback (most recent call last):
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/gradio/queueing.py", line 566, in process_events
Oct 30 11:02:34 llama start_linux.sh[405]: response = await route_utils.call_process_api(
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/gradio/route_utils.py", line 261, in call_process_api
Oct 30 11:02:34 llama start_linux.sh[405]: output = await app.get_blocks().process_api(
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/gradio/blocks.py", line 1786, in process_api
Oct 30 11:02:34 llama start_linux.sh[405]: result = await self.call_function(
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^^^^^^^^^^^^
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/gradio/blocks.py", line 1350, in call_function
Oct 30 11:02:34 llama start_linux.sh[405]: prediction = await utils.async_iteration(iterator)
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/gradio/utils.py", line 583, in async_iteration
Oct 30 11:02:34 llama start_linux.sh[405]: return await iterator.__anext__()
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^^^^^^^^^^^^^
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/gradio/utils.py", line 576, in __anext__
Oct 30 11:02:34 llama start_linux.sh[405]: return await anyio.to_thread.run_sync(
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
Oct 30 11:02:34 llama start_linux.sh[405]: return await get_async_backend().run_sync_in_worker_thread(
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2441, in run_sync_in_worker_thread
Oct 30 11:02:34 llama start_linux.sh[405]: return await future
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 943, in run
Oct 30 11:02:34 llama start_linux.sh[405]: result = context.run(func, *args)
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^^^^^^^^^^^
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/gradio/utils.py", line 559, in run_sync_iterator_async
Oct 30 11:02:34 llama start_linux.sh[405]: return next(iterator)
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/gradio/utils.py", line 742, in gen_wrapper
Oct 30 11:02:34 llama start_linux.sh[405]: response = next(iterator)
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/modules/chat.py", line 436, in generate_chat_reply_wrapper
Oct 30 11:02:34 llama start_linux.sh[405]: fori, historyin enumerate(generate_chat_reply(text, state, regenerate, _continue, loading_message=True, for_ui=True)):
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/modules/chat.py", line 403, in generate_chat_reply
Oct 30 11:02:34 llama start_linux.sh[405]: forhistoryin chatbot_wrapper(text, state, regenerate=regenerate, _continue=_continue, loading_message=loading_message, for_ui=for_ui):
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/modules/chat.py", line 348, in chatbot_wrapper
Oct 30 11:02:34 llama start_linux.sh[405]: prompt = generate_chat_prompt(text, state, **kwargs)
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/modules/chat.py", line 200, in generate_chat_prompt
Oct 30 11:02:34 llama start_linux.sh[405]: encoded_length = get_encoded_length(prompt)
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^^^^^^^^^^^^^
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/modules/text_generation.py", line 189, in get_encoded_length
Oct 30 11:02:34 llama start_linux.sh[405]: return len(encode(prompt)[0])
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/modules/text_generation.py", line 140, in encode
Oct 30 11:02:34 llama start_linux.sh[405]: input_ids = shared.tokenizer.encode(str(prompt), return_tensors='pt', add_special_tokens=add_special_tokens)
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2783, in encode
Oct 30 11:02:34 llama start_linux.sh[405]: encoded_inputs = self.encode_plus(
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^^^^
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 3202, in encode_plus
Oct 30 11:02:34 llama start_linux.sh[405]: return self._encode_plus(
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^^^^^
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/transformers/tokenization_utils.py", line 801, in _encode_plus
Oct 30 11:02:34 llama start_linux.sh[405]: return self.prepare_for_model(
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^^^^^^^^^^
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 3698, in prepare_for_model
Oct 30 11:02:34 llama start_linux.sh[405]: encoded_inputs = self.pad(
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 3500, in pad
Oct 30 11:02:34 llama start_linux.sh[405]: encoded_inputs = self._pad(
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^
Oct 30 11:02:34 llama start_linux.sh[405]: TypeError: ChatGLM4Tokenizer._pad() got an unexpected keyword argument 'padding_side'
I'm not going to close the issue since the model I attempted to use still throws, but will provide a solution to anyone else having issues with Codegeex4-all-9B.
Downgrading transformers to v4.44.2 can be done by editing its entry in requirements.txt in the repository root. The venv can then be spun up by running the appropriate cmd_<os> in the same directory. Then you can downgrade via pip install --upgrade -r requirements.txt. This however won't work for the model; the padding_side error goes away and is replaced by a new and exciting error I did not care to troubleshoot or even record.
Instead, I updated this repository to latest and downloaded one of the GGUF files from bartowski which fired up flawlessly.
Describe the bug
The model successfully loads in 8- and 4-bit quant, confirmed by VRAM usage observed in
nvtop
. Upon first prompting of the model, the attached logs are seen injournalctl
.nvtop
shows zero GPU usage throughout, while maintaining the high VRAM usage in both quant states.This issue recommends setting the version of the
transformers
package to exactly4.44.2
. I am having trouble manually penetrating the venv established by this repo to manually up- or downgradetransformers
as is required, and would like some guidance.Is there an existing issue for this?
Reproduction
Screenshot
No response
Logs
System Info
The text was updated successfully, but these errors were encountered: