Releases: t41372/Open-LLM-VTuber
v0.5.2
v0.5.2 patch
The default ASR provider was changed back to FunASR
. It was mistakenly set to faster-whisper
a while ago without notice, and it caused a lot of problems for people who use Nvidia GPU without the cudnn. This change was not intended and has now been reversed.
Full Changelog: v0.5.1...v0.5.2
v0.5.1
What's New
If you wonder where the v0.5.0
goes, it's gone forever.
(But if you are lucky enough to get v0.5.0
, it's not a big deal. The difference between v0.5.0
and v0.5.1
is a new CORS fix, which is sort of irrelevant to you.)
Enhancements
-
🎉 llama.cpp Integration: You can now run GGUF model files (LLM models) directly within the project, eliminating the need for external services like Ollama, LM Studio, or other APIs.
-
🎉 Sherpa-ONNX Support for ASR and TTS: Added support for Sherpa-ONNX, enabling better speech recognition and text-to-speech experience. Contributed by @Neil2893 in #50.
- Sherpa-ONNX allows us to run models like SenseVoiceSmall, MeloTTS, and PiperTTS as easy as hell. With more testing and scripting to automate the model downloading process, Sherpa-ONNX with SenseVoiceSmall and MeloTTS or PiperTTS will likely be the new default ASR and TTS model for this project, as these models deliver great performance with fast inference even on CPU. The current SenseVoiceSmall implementation with FunASR is bulky and buggy, the MeloTTS is as difficult to install as possible, and PiperTTS is a dead project with hundreds of unfixed bugs, which includes one that stops me from integrating it into this project. Those issues are addressed with sherpa-onnx. Thanks a lot for the work done by @Neil2893 🎉 🎉 🎉 .
-
🎉 VAD Tuning Options: Introduced
negativeSpeechThreshold
andredemptionFrames
parameters, giving users more control over VAD (Voice Activity Detection) settings to enhance their AI interaction experience. Contributed by @Neil2893 in #53.
Bug fix
- 🐛 CORS policy issue: If users attempt to host the web part of this project separately, it will throw you a CORS policy error, and the Live2D model will not load. This issue is fixed (although the browser may be keeping a cache for the CORS stuff that prevents you from seeing this change).
New Contributor
- Welcome [@Neil2893](https://github.com/Neil2893), who made their first contribution in [#50](#50)!
Full Changelog: [v0.4.4...v0.5.1](v0.4.4...v0.5.1)
Regarding the next version
I will start refactoring this project, which includes breaking changes, as I wanted to change the architecture, clean up some tech debts, and prepare this project for more features. The next version, well, if everything goes well, will be v1.0.0
. I'm also working with some folks to rewrite the front end with React, and an awesome guy is working on making the installation process super easy.
I barely knew Python when I started this project (I started writing this project pretending it was JavaScript and did not even bother with OOP initially). Most of my knowledge about Python and best practices came from doing this project, and there were wrong decisions. I refactored many ugly parts in the past few months, but some of the changes I want involve breaking change. I think it's a good idea to put the breaking changes I can think of and do them in one go, and this is what I will be doing.
The biggest change planned that will influence the users (you) is that I will be removing the CLI mode in v1.0.0
. I can't see a reason for anyone to run this project in CLI mode without the Live2D body after I added the text input feature in v0.4.0
. If you worry about the GPU usage, just run the webpage in the background, and it won't render the Live2D body when it's not on the screen. In addition, the code will be much cleaner without the CLI mode. Let me know if you are super upset about the removal of the CLI mode.
Regarding v1.0.0
, you can check my to-do list and progress on GitHub Project. If you have any suggestions, please let me know. I'm not a super-experienced developer and might do things wrong or make the wrong decisions. Let me know about those things before I finish with the first big-breaking change in this project (or second, but I had very few users at that time to make a long announcement about it).
The to-do list will be in Chinese because... well, most of my users and all of my awesome contributors in this project speak Chinese (and also because Chinese is my first language, after all). I still write announcements in English because this is what I have been doing, but this text will be translated into Chinese when I post the same announcement on the QQ channel and QQ group. So yeah.
v0.4.4
What's Changed
- Update dockerfile by @SunnyPai0413 in #52
- FunASR now requires
onnx
as a dependency without notice. It's now updated in our doc and in the auto-installation script.
New Contributors
- @SunnyPai0413 made their first contribution in #52
Full Changelog: v0.4.3...v0.4.4
v0.4.3
What's Changed
- bugfix: env won't reinstall if it doesn't exist. by @SunKSugaR in #47
New Contributors
- @SunKSugaR made their first contribution in #47
Full Changelog: v0.4.2...v0.4.3
v0.4.2
Emergency Update 0.4.2, everyone 🚨🚨🚨
In version 0.4.1, I accidentally removed the persona settings option and released it without noticed...
No issues with earlier versions, but users of v0.4.1
might notice the AI's personality acting a bit strange, and the option to choose persona settings (personality configuration) has disappeared. This was due to my mistake when I accidentally deleted it!
If you're using v0.4.1, you can either update to the latest version or manually restore the persona settings at line 280
(right below the line # some options: "en_sarcastic_neuro"
).
Just add this line back:
PERSONA_CHOICE: "en_sarcastic_neuro" # or if you rather edit persona prompt below, leave it blank ...
v0.4.1
Release v0.4.1
A day (or a couple of hours depending on your timezone) after the v0.4.0 release, here comes the v0.4.1 release with some quick fixes.
🚀 New Features
- Added persistence for user preferences:
- VAD confidence threshold settings
- Background image selection
These settings are now saved in the browser localStorage and persist across sessions.
🐛 Bug Fixes
- Fixed audio sentence tracking to prevent missing lines
- Implemented improved end-of-audio detection
- Reduced instances of AI skipping sentences
- Restored version number display at server launch
📦 Full Changelog
- View the complete list of changes: v0.4.0...v0.4.1
v0.4.0 Release
Release v0.4.0
I was going to add documentation for GPT-SoVITS, the upgrade script, and the installation scripts before releasing this version. I also plan to have the installation script detect if the user needs a proxy to download models from huggingface before releasing v0.4.0. However, I realized that I would never release v0.4.0 if I chose to do those things, and v0.4.0 would get bigger and bigger every day.
So yeah, another 2 weeks have passed (after five pre-releases), and here is the v0.4.0 release.
🚀 What's New
💬 Text Input in the Browser
You can now interact with the AI directly by typing in the Browser.
🎉 GPT SoVITS Support
Added GPT SoVITS support by @YveMU in PR #40.
⚙️ Auto Installation Script (Experimental)
Introduced an experimental auto-installation script to simplify setup. This script:
- is cross-platform (at least it's intended to be)
- Creates a miniconda environment in the project directory (and the miniconda is also installed to the project directory).
- Installs FFmpeg and the correct Python version in the miniconda environment.
- Automatically configures dependencies for FunASR, edgeTTS, and ollama (excluding the ollama installation itself).
⚡ ASR/TTS Preloading & Caching
ASR and TTS models now preload when the server launches (default but optional), significantly reducing the wait time when opening the webpage.
🖱️ Pointer Interaction Toggle
Added a Pointer Interactive Button to prevent Live2D from following your cursor.
🔧 Adjustable VAD Confidence Threshold
Introduced a Voice Activation Detection (VAD) Confidence Threshold field:
- Configure how confident the AI must be in detecting speech.
- Example: At 98%, the AI will only listen when it's 98% certain you're speaking.
✨ Special Character Filtering
By default, TTS will no longer vocalize special characters like emojis. (you can re-enable this in conf.yaml
.)
🔄 What's Changed
- Voice interruption turned off by default: You can turn it back on with the "Voice Interruption Button" button. This change is motivated by the following prevalent issue
- the AI got interrupted by background noise
- the system will go crazy when you interrupt yourself (interrupt before AI says anything).
- Default TTS: FunASR is now the default TTS.
- ASR/TTS Visibility: The server shows the active ASR and TTS on launch.
- New Prompt: Added a fun English prompt for discussing nuclear proliferation.
🎉 New Contributors
Thanks to our new contributor:
📜 Full Changelog
View the complete list of changes: v0.3.1...v0.4.0
v0.3.1
Release Notes - Version 0.3.1
Well yeah. I forgot to release the version v0.3.0
.
In addition, I realized that I have always been doing the semantic versioning the wrong way, so from this release, we will do the semantic versioning the right way.
What's New
- Added Fish TTS API
- Add Claude API as LLM by @Y0oMu in #35
- Add
initialXshift
andinitialYshift
parameters to Live2D configurations inmodel_dict.json
. These two parameters allow us to change the initial position for the Live2D model.
Improvements
- Improve the error message for edge-tts and other TTS.
- Remove
python-dotenv
as a requirement because it's not used anywhere. - The upgrade script
upgrade.py
no longer has any dependency requirements.
Bug Fixes
- gbk encoding fix now extends to the loading of
model_dict.json
.
New Contributors
Full Changelog: v0.2.5...v0.3.1
v0.2.5-beta
Release Notes - Version 0.2.5
It's been three weeks since the previous release, so here is a new one.
What's New
- AzureTTS Enhancements: Added customizable pitch and rate properties, allowing users to match the voice style of Neuro-sama.
- Experimental Mem0 Integration: Introduced support for Mem0 as an experimental feature.
- Real-Time Configuration Switching: Users can now switch configurations in real-time via the
config_alts
directory, enabling dynamic adjustments of Live2D, voice, LLM, and other settings directly from the frontend. - Dynamic Background Switching: Allows users to change the background image in real-time on the frontend.
- CoquiTTS Support: Added support for CoquiTTS as an additional TTS option.
- Chinese Documentation: Comprehensive documentation is now available in Chinese.
- AI-Generated Favicon: Introduced a new favicon (
favicon.ico
), generated by Adobe Firefly Image 3 (trained exclusively on licensed content). - Experimental Upgrade Script: Added an upgrade script for experimental testing.
Bug Fixes
- No Voice Input Mode: Reinstated support for no voice input mode in CLI mode.
- TTS Naming Issues: Resolved TTS naming inconsistencies (#29, #30).
- File Encoding: Improved file handling for
persona prompt
andconf.yaml
files to support non-UTF-8 encodings, resolving issues related to GBK encoding. - Bug #31: Addressed issue as detailed in GitHub (#31).
Improvements
- Version Tracking: Implemented
__init__.py
for version control. - Memory Management: Fixed memory-related issues.
- Stability: Enhanced stability, especially regarding interruptions.
Changes
- API Key Configuration: Removed the
api_keys.py
file. Users should now add their AzureTTS API keys directly toconf.yaml
. - Default Settings Update: Updated the default LLM to
qwen2.5
and modified the default language for some ASR components toauto
.
Full Changelog: v0.2.4...v0.2.5
v0.2.4-beta
Release Notes - Version 0.2.4
It's been a week since the last release, so here is a new release.
What's New
-
Feature: xTTSv2 TTS Engine Support
Added support for thextts-api-server
, which now integrates with the xTTSv2 text-to-speech engine. Thanks to @Eggze2 for contributing! #23 -
Feature: Environment Variable Support in
conf.yaml
You can now reference environment variables directly in theconf.yaml
file using the${ENV_VAR_NAME}
syntax. This eliminates the need for explicit values in the configuration file by dynamically loading them from the environment.
Bug Fixes
- Hands-Free Voice Interactions (CLI)
Restored hands-free voice interactions in the CLI, which were previously asking for key presses at the end of each conversation turn. This was not intended, and the functionality is now working as expected.
Improvements
- CLI Interruption Stability
Improved the stability of interruptions in CLI mode. The system now knows which sentence was interrupted, preventing sentences it didn't had a chance to say from being stored in the LLM's memory. This behavior is now consistent with the Live2D mode.
Changes
-
Randomized Cache Audio Filenames
Cached audio files are now named with random UUIDs instead of sequential names liketemp-1
, improving uniqueness and preventing potential naming conflicts. -
PortAudio
Dependency Update
PortAudio
is no longer required if the local microphone is not in use (e.g., when running in a headless container with no mic). Previously, the program would throw an error even if all we need is a web server and a local microphone wasn't necessary. Now, it only throws an error if microphone functionality is explicitly needed locally (e.g. when running the main.py).
New Contributors
A big thanks to our newest contributor:
Full Changelog: Compare v0.2.3...v0.2.4
This release note was enhanced with GPT-4o, which is why it sounds so professional.