tts: add speaker file support #12048

dm4 · 2025-02-24T08:12:42Z

Added support for TTS speaker files, including a new command-line option --tts-speaker-file to specify the file path.
Implemented JSON handling in tts.cpp to load and parse speaker data, enhancing audio generation capabilities.

common/arg.cpp

ngxson · 2025-02-26T13:38:26Z

@edwko Could you please have a look on this PR?

edwko · 2025-02-27T09:58:46Z

@ngxson @dm4 Looks good! Just a couple of thoughts, this would handle only v0.2 it might make sense to do this more dynamically, maybe add versioning logic similar to this PR #11287

Maybe get version from common_get_builtin_chat_template, or I could add more metadata to the speaker files (like a version fields) to construct the prompt based on the specific version.

// Something like this:

double get_speaker_version(json speaker) {
    if (speaker.contains("version")) {
        return speaker["version"].get<double>();
    } 
    // Also could get version from model itself
    // if (common_get_builtin_chat_template(model) == "outetts-0.3") {
    //     return 0.3;
    // }
    return 0.2;
}

static std::string audio_text_from_speaker(json speaker) {
    std::string audio_text = "<|text_start|>";
    double version = get_speaker_version(speaker);
    
    if (version <= 0.3) {
        std::string separator = (version == 0.3) ? "<|space|>" : "<|text_sep|>";
        for (const auto &word : speaker["words"])
            audio_text += word["word"].get<std::string>() + separator;
    }
    else if (version > 0.3) {
        // Future version support could be added here
    }

    return audio_text;
}

// static std::string audio_data_from_speaker(json speaker) would also need some adjustments to support different versions.

Signed-off-by: dm4 <[email protected]>

dm4 · 2025-03-01T12:41:46Z

Hello @ngxson and @edwko, I have already added support for version 0.3. Since common_get_builtin_chat_template() was removed in this commit, I have switched to using llama_model_chat_template() to obtain the model's tokenizer.chat_template metadata.

examples/tts/tts.cpp

github-actions bot added the examples label Feb 24, 2025

ngxson reviewed Feb 24, 2025

View reviewed changes

common/arg.cpp Outdated Show resolved Hide resolved

dm4 force-pushed the dm4/tts-speaker-file branch from ea8711d to bf3f5ee Compare February 24, 2025 11:02

tts: add speaker file support

0b9db0a

Signed-off-by: dm4 <[email protected]>

dm4 force-pushed the dm4/tts-speaker-file branch 2 times, most recently from 57c3835 to 888f57e Compare March 1, 2025 12:40

ngxson reviewed Mar 1, 2025

View reviewed changes

examples/tts/tts.cpp Outdated Show resolved Hide resolved

tts: handle outetts-0.3

986ade7

dm4 force-pushed the dm4/tts-speaker-file branch from 888f57e to 986ade7 Compare March 1, 2025 13:19

ngxson approved these changes Mar 1, 2025

View reviewed changes

ngxson requested a review from ggerganov March 1, 2025 13:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tts: add speaker file support #12048

tts: add speaker file support #12048

dm4 commented Feb 24, 2025

ngxson commented Feb 26, 2025

edwko commented Feb 27, 2025

dm4 commented Mar 1, 2025

tts: add speaker file support #12048

Are you sure you want to change the base?

tts: add speaker file support #12048

Conversation

dm4 commented Feb 24, 2025

ngxson commented Feb 26, 2025

edwko commented Feb 27, 2025

dm4 commented Mar 1, 2025