Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can you help me write the code to call Doubao TTS? #856

Open
Explorerlowi opened this issue Sep 26, 2024 · 4 comments
Open

Can you help me write the code to call Doubao TTS? #856

Explorerlowi opened this issue Sep 26, 2024 · 4 comments

Comments

@Explorerlowi
Copy link

My programming skills are so poor that I really can’t do it (灬ꈍ ꈍ灬).
Here is the relevant document: https://www.volcengine.com/docs/6561/79823
I will pay you a certain amount of compensation. Thank you very much if you can do it!

@Explorerlowi
Copy link
Author

The content returned by Doubao TTS request is not directly a binary audio stream. Its audio data is stored in the data field of a json structure. It is base64 encoded data. Binary audio data can only be obtained after base64 decoding. In this case, how to play it
5227de0688c11667cb49d6be8e00967d
image

@Explorerlowi
Copy link
Author

bool Audio2::connectToDoubaoTTS(const char *text) {
xSemaphoreTakeRecursive(mutex_audio, portMAX_DELAY);

setDefaults();

const char *host = "openspeech.bytedance.com";
const char *api_url = "/api/v1/tts";

const char *appid = "82505*****";
const char *access_token = "WZCBgLbSd-ltw5gDeKvEYX9M******";
const char *cluster = "volcano_tts";
const char *voice_type = "BV001_streaming";

// Create JSON request
DynamicJsonDocument doc(1024); // Adjust size as necessary
JsonObject app = doc.createNestedObject("app");
app["appid"] = appid;
app["token"] = access_token;
app["cluster"] = cluster;

JsonObject user = doc.createNestedObject("user");
user["uid"] = "388808087185088";

JsonObject audio = doc.createNestedObject("audio");
audio["voice_type"] = voice_type;
audio["encoding"] = "mp3";
audio["speed_ratio"] = 1.0;
audio["volume_ratio"] = 1.0;
audio["pitch_ratio"] = 1.0;

JsonObject request = doc.createNestedObject("request");
request["reqid"] = String(uuid()); // Generate UUID
request["text"] = text;
request["text_type"] = "plain";
request["operation"] = "query";
request["with_frontend"] = 1;
request["frontend_type"] = "unitTson";

// Prepare JSON payload
String json_payload;
serializeJson(doc, json_payload);

// Connect to the server
_client = static_cast<WiFiClientSecure *>(&clientsecure);
if (!_client->connect(host, 443)) { // Use 443 for HTTPS
    log_e("Connection failed");
    xSemaphoreGiveRecursive(mutex_audio);
    return false;
}

// Create and send HTTP POST request
_client->println("POST " + String(api_url) + " HTTP/1.1");
_client->println("Host: " + String(host));
_client->println("Authorization: Bearer; " + String(access_token));
_client->println("Content-Type: application/json");
_client->println("Content-Length: " + String(json_payload.length()));
_client->println(); // End of headers

// Send JSON payload
_client->print(json_payload);

Serial.println(json_payload);
// Read the response
/*String response = "";
while (_client->connected() || _client->available()) {
    if (_client->available()) {
        char c = _client->read();
        response += c;
    }
}

// Process the response
if (response.indexOf("\"data\"") != -1) {
    // Parse the JSON response to get the data
    DynamicJsonDocument responseDoc(1024); // Adjust size as needed
    deserializeJson(responseDoc, response);
    const char* data = responseDoc["data"];
    // Here you would base64 decode the data and handle the audio
    // Remember to consider the necessary libraries or methods to handle audio output
} else {
    log_e("No data in response");
}

_client->stop();*/
m_streamType = ST_WEBFILE;
Serial.print("play speech: ");
Serial.println(m_streamType);
isplaying = 1;
m_f_running = true;
m_f_ssl = false;
m_f_tts = true;
setDatamode(HTTP_RESPONSE_HEADER);
xSemaphoreGiveRecursive(mutex_audio);
return true;

}

// Method to generate UUID (simple implementation)
String Audio2::uuid() {
uint32_t uid = esp_random(); // Random number as a placeholder for UUID generation
return String(uid, HEX);
}

This is my current code.

@schreibfaul1
Copy link
Owner

With a "normal audio stream", the data would be written to the buffer here.
InBuff.getWritePtr() is the pointer to the position from which the data is written
bytesAddedToBuffer contains the number of bytes actually written.
Then the conversion from base64 would have to be done.
image

You don't need to worry about the rest, if it is an MP3 stream, for example, the ID3 header is automatically loaded when the buffer is full enough and the file is played.

@Explorerlowi
Copy link
Author

Can you teach me how to parse and play the returned audio stream after sending an http(s) request to TTS? For example, which functions will work after the content is returned, and how will the returned content be processed? I can only play Baidu TTS now. When I send a request to Doubao TTS, Ali TTS, etc., I cannot parse and play the returned content normally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants