Releases: oobabooga/text-generation-webui
Releases · oobabooga/text-generation-webui
v2.5
Changes
- Add a "Show after" parameter to the UI, to use with DeepSeek
</think>
- Minor UI improvements (list styles, light theme style)
Bug fixes
v2.4
Changes
- Installer: do not redownload
.whl
requirements during updates unless they have changed, or the commit in the local repo has changed since the last time the update script was executed (e.g. you switched to a different branch manually)
- UI: add "Continue" and "Remove" buttons below the last chat message
- Downloader: make progress bars not jump around in the terminal. They look much nicer after this change.
- Add a helpful error message when llama.cpp fails to load the model (telling you to lower the context length)
- Update/fix some API examples in the documentation
- Add strftime_now to JINJA to sattisfy LLAMA 3.1 and 3.2 (and granite) (#6692). Thanks @FartyPants.
- Give SillyTavern a bit of leaway the way the do OpenAI (#6685). Thanks @FartyPants.
Bug fixes
- Workaround for a convert_to_markdown bug
- Training pro- removed monkeypatch references (#6695). Thanks @FartyPants.
Backend updates
- llama-cpp-python: bump to 0.3.7 (llama.cpp commit
794fe23f29fb40104975c91fe19f23798f7c726e
, January 28th, 2025).
v2.3
Changes
- Major UI optimization: use the morphdom library to make incremental updates to the Chat tab during streaming (#6653). With this:
- The CPU usage is drastically reduced for long contexts or high tokens/second.
- The UI doesn't become sluggish in those scenarios anymore.
- You can select and copy text or code from previous messages during streaming, as those elements remain static with the "morphing" operations performed by morphdom. Only what has changed gets updated.
- Add a button to copy the raw message content below each chat message.
- Add a button to regenerate the reply below the last chat message.
- Activate "auto_max_new_tokens" by default, to avoid having to "continue" the chat reply for every 512 tokens.
- Installer:
- Update Miniconda to 24.11.1 (latest version). Note: Miniconda is only used during the initial setup.
- Make the checksum verification for the Miniconda installer more robust on Windows, to account for systems where it was previously failing to execute at all.
Bug fixes
Backend updates
- Transformers: bump to 4.48.
- flash-attention: bump to 2.7.3.
v2.2
Changes
- UI:
- Add a new "Branch chat" option to the chat tab.
- Add a new "Search chats" menu to the chat tab.
- Improve handling of markdown lists (#6626). This greatly improves the rendering of lists and nested lists in the UI. Thanks, @mamei16.
- Reduce the size of HTML and CSS sent to the UI during streaming. This improves performance and reduces CPU usage.
- Optimize the JavaScript to reduce the CPU usage during streaming.
- Add a horizontal scrollbar to code blocks that are wider than the chat area.
- Make responses start faster by removing unnecessary cleanup calls (#6625). This removes a 0.2 second delay for llama.cpp and ExLlamaV2 while also increasing the reported tokens/second.
- Add a
--torch-compile
flag for transformers (improves performance).
- Add a "Static KV cache" option for transformers (improves performance).
- Connect XTC, DRY, smoothing_factor, and dynatemp to the ExLlamaV2 loader (non-HF).
- Remove the AutoGPTQ loader (#6641). The project was discontinued, and no wheels had been available for a while. GPTQ models can still be loaded through ExLlamaV2.
- Streamline the one-click installer by asking one question to NVIDIA users instead of two.
- Add a
--exclude-pattern
flag to the download-model.py
script (#6542). Thanks, @JackCloudman.
- Add IPv6 support to the API (#6559). Thanks, @BPplays.
Bug fixes
- Fix an
orjson.JSONDecodeError
error on page reload.
- Fix the font size of lists in chat mode.
- Fix CUDA error on MPS backend during API request (#6572). Thanks, @skywinder.
- Add
UnicodeDecodeError
workaround for modules/llamacpp_model.py
(#6040). Thanks, @nclok1405.
- Training_PRO fix: add
if 'quantization_config' in shared.model.config.to_dict()
(#6640). Thanks, @FartyPants.
Backend updates
- llama-cpp-python: bump to 0.3.6 (llama.cpp commit
f7cd13301c2a88f97073fd119072b4cc92c08df1
, January 8, 2025).
v2.1
Before |
After |
 |
 |
Changes
- Organize the Parameters tab (see above): group similar input fields together (sliders, checkboxes, etc) and create headings for types of parameters (curve shape, curve cutoff), to reduce visual clutter and improve navigation.
- Organize the Model tab in a similar way.
- Improve the style of headings, lists, and links in chat messages.
- Improve the typing cursor
|
that appears during chat streaming.
- Slighly improve the chat colors in light mode.
- Reduce the number of built-in presets from 11 to 6, removing presets that I do not consider useful and adding two new presets that I personally use:
Instruct
and Creative
. The old presets can be found here.
Bug fixes
- Fix interface loading with dark theme even when 'dark_theme' is set to false (#6614). Thanks @mamei16.
- Fix newlines in the markdown renderer (#6599). Thanks @mamei16.
Backend updates
- ExLlamaV2: bump to 0.2.7.
v2.0
v2.0 - New looks for text-generation-webui!
Changes
- Improved the UI by pushing Gradio to its limits and making it look like ChatGPT, specifically the early 2023 ChatGPT look (which I think looked better than the current darker theme).
- I have used chatbot-ui (the "legacy" version, v1.0, April/2023) as a reference for the old ChatGPT styles, and copied a lot of CSS and some icons from there. Credits to chatbot-ui!
- Mobile support is now much better, with collapsible sidebars added for easier navigation.
- Better, more readable fonts in instruct mode.
- Improved "past chats" menu, now in its own sidebar visually separated from the chat area.
- Converted the top navigation bar (Chat / Default / Notebook, etc.) into a vertical sidebar on the left.
- Reduced margins and removed borders throughout the UI. The "Parameters" tab looks much tidier now, closer to how Gradio is used in AUTOMATIC1111/stable-diffusion-webui.
- Updated Gradio from version 4.26.0 to 4.37.1, bringing important security fixes.
- For people who feel nostalgic about the old colors, a new
--old-colors
flag has been added to make the UI as similar as possible to its previous look.
- Improved HTML rendering for lists with sub-lists (sub-items were not previously rendered correctly).
- Allow more granular KV cache settings (#6561). Thanks @dinerburger.
Bug fixes
- openai extension fix: Handle Multiple Content Items in Messages (#6528). Thanks @hronoas.
- Filter whitespaces in downloader fields in model tab (#6518). Thanks @mefich.
- Fix an issue caused during the installation of tts (#6496). Thanks @Aluisio-Pires.
- Fix the history upload event in the UI.
Backend updates
- llama-cpp-python: bump to 0.3.5.
- ExLlamaV2: bump to 0.2.6.
- Transformers: bump to 4.47.
- flash-attention: bump to v2.7.2.post1.
- Accelerate: bump to 1.2.
- bitsandbytes: bump to 0.45.
v1.16
Backend updates
- Transformers: bump to 4.46.
- Accelerate: bump to 1.0.
Changes
- Add whisper turbo (#6423). Thanks @SeanScripts.
- Add RWKV-World instruction template (#6456). Thanks @MollySophia.
- Minor Documentation update - query cuda compute for docker .env (#6469). Thanks @practical-dreamer.
- Remove lm_eval and optimum from requirements (they don't seem to be necessary anymore).
Bug fixes
- Fix llama.cpp loader not being random. Thanks @reydeljuego12345.
- Fix temperature_last when temperature not in sampler priority (#6439). Thanks @ThisIsPIRI.
- Make token bans work again on HF loaders (#6488). Thanks @ThisIsPIRI.
- Fix for systems that have bash in a non-standard directory (#6428). Thanks @LuNeder.
- Fix intel bug described in #6253 (#6433). Thanks @schorschie.
- Fix locally compiled llama-cpp-python failing to import.
v1.15
Backend updates
- Transformers: bump to 4.45.
- ExLlamaV2: bump to 0.2.3.
- flash-attention: bump to 2.6.3.
- llama-cpp-python: bump to 0.3.1.
- bitsandbytes: bump to 0.44.
- PyTorch: bump to 2.4.1.
- ROCm: bump wheels to 6.1.2.
- Remove AutoAWQ, AutoGPTQ, HQQ, and AQLM from
requirements.txt
:
- AutoAWQ and AutoGPTQ were removed due to lack of support for PyTorch 2.4.1 and CUDA 12.1.
- HQQ and AQLM were removed to make the project leaner since they're experimental with limited use.
- You can still install those libraries manually if you are interested.
Changes
- Exclude Top Choices (XTC): A sampler that boosts creativity, breaks writing clichés, and inhibits non-verbatim repetition (#6335). Thanks @p-e-w.
- Make it possible to sort repetition penalties with "Sampler priority". The new keywords are:
repetition_penalty
presence_penalty
frequency_penalty
dry
encoder_repetition_penalty
no_repeat_ngram
xtc
(not a repetition penalty but also added in this update)
- Don't import PEFT unless necessary. This makes the web UI launch faster.
- Add beforeunload event to add confirmation dialog when leaving page (#6279). Thanks @leszekhanusz.
- update API documentation with examples to list/load models (#5902). Thanks @joachimchauvet.
- Training pro update script.py (#6359). Thanks @FartyPants.
Bug fixes
- Fix UnicodeDecodeError for BPE-based Models (especially GLM-4) (#6357). Thanks @GralchemOz.
- API: Relax multimodal format, fixes HuggingFace Chat UI (#6353). Thanks @Papierkorb.
- Force /bin/bash shell for conda (#6386). Thanks @Thireus.
- Do not set value for histories in chat when --multi-user is used (#6317). Thanks @mashb1t.
- typo in OpenAI response format (#6365). Thanks @jsboige.
v1.14
Backend updates
- llama-cpp-python: bump to 0.2.89.
- Transformers: bump to 4.44.
Other changes
- Model downloader: use a single session for all downloaded files to reduce the time to start each download.
- Add a
--tokenizer-dir
flag to be used with llamacpp_HF
.
v1.13
Backend updates
- llama-cpp-python: bump to 0.2.85 (adds Llama 3.1 support).
UI updates
- Make
compress_pos_emb
float (#6276). Thanks @hocjordan.
- Make
n_ctx
, max_seq_len
, and truncation_length
numbers rather than sliders, to make it possible to type the context length manually.
- Improve the style of headings in chat messages.
- LaTeX rendering:
- Add back single
$
for inline equations.
- Fix rendering for equations enclosed between
\[
and \]
.
- Fix rendering for multiline equations.
Bug fixes
- Fix saving characters through the UI.
- Fix instruct mode displaying "quotes" as ""double quotes"".
- Fix chat sometimes not scrolling down after sending a message.
- Fix the chat "stop" event.
- Make
--idle-timeout
work for API requests.
Other changes
- Model downloader: improve the progress bar by adding the filename, size, and download speed for each downloaded file.
- Better handle the Llama 3.1 Jinja2 template by not including its optional "tools" headers.