Can you help me write the code to call Doubao TTS? #856

Explorerlowi · 2024-09-26T16:17:55Z

My programming skills are so poor that I really can’t do it (灬ꈍ ꈍ灬).
Here is the relevant document: https://www.volcengine.com/docs/6561/79823
I will pay you a certain amount of compensation. Thank you very much if you can do it!

Explorerlowi · 2024-10-07T19:20:59Z

The content returned by Doubao TTS request is not directly a binary audio stream. Its audio data is stored in the data field of a json structure. It is base64 encoded data. Binary audio data can only be obtained after base64 decoding. In this case, how to play it

Explorerlowi · 2024-10-07T19:23:42Z

bool Audio2::connectToDoubaoTTS(const char *text) {
xSemaphoreTakeRecursive(mutex_audio, portMAX_DELAY);

setDefaults();

const char *host = "openspeech.bytedance.com";
const char *api_url = "/api/v1/tts";

const char *appid = "82505*****";
const char *access_token = "WZCBgLbSd-ltw5gDeKvEYX9M******";
const char *cluster = "volcano_tts";
const char *voice_type = "BV001_streaming";

// Create JSON request
DynamicJsonDocument doc(1024); // Adjust size as necessary
JsonObject app = doc.createNestedObject("app");
app["appid"] = appid;
app["token"] = access_token;
app["cluster"] = cluster;

JsonObject user = doc.createNestedObject("user");
user["uid"] = "388808087185088";

JsonObject audio = doc.createNestedObject("audio");
audio["voice_type"] = voice_type;
audio["encoding"] = "mp3";
audio["speed_ratio"] = 1.0;
audio["volume_ratio"] = 1.0;
audio["pitch_ratio"] = 1.0;

JsonObject request = doc.createNestedObject("request");
request["reqid"] = String(uuid()); // Generate UUID
request["text"] = text;
request["text_type"] = "plain";
request["operation"] = "query";
request["with_frontend"] = 1;
request["frontend_type"] = "unitTson";

// Prepare JSON payload
String json_payload;
serializeJson(doc, json_payload);

// Connect to the server
_client = static_cast<WiFiClientSecure *>(&clientsecure);
if (!_client->connect(host, 443)) { // Use 443 for HTTPS
    log_e("Connection failed");
    xSemaphoreGiveRecursive(mutex_audio);
    return false;
}

// Create and send HTTP POST request
_client->println("POST " + String(api_url) + " HTTP/1.1");
_client->println("Host: " + String(host));
_client->println("Authorization: Bearer; " + String(access_token));
_client->println("Content-Type: application/json");
_client->println("Content-Length: " + String(json_payload.length()));
_client->println(); // End of headers

// Send JSON payload
_client->print(json_payload);

Serial.println(json_payload);
// Read the response
/*String response = "";
while (_client->connected() || _client->available()) {
    if (_client->available()) {
        char c = _client->read();
        response += c;
    }
}

// Process the response
if (response.indexOf("\"data\"") != -1) {
    // Parse the JSON response to get the data
    DynamicJsonDocument responseDoc(1024); // Adjust size as needed
    deserializeJson(responseDoc, response);
    const char* data = responseDoc["data"];
    // Here you would base64 decode the data and handle the audio
    // Remember to consider the necessary libraries or methods to handle audio output
} else {
    log_e("No data in response");
}

_client->stop();*/
m_streamType = ST_WEBFILE;
Serial.print("play speech: ");
Serial.println(m_streamType);
isplaying = 1;
m_f_running = true;
m_f_ssl = false;
m_f_tts = true;
setDatamode(HTTP_RESPONSE_HEADER);
xSemaphoreGiveRecursive(mutex_audio);
return true;

}

// Method to generate UUID (simple implementation)
String Audio2::uuid() {
uint32_t uid = esp_random(); // Random number as a placeholder for UUID generation
return String(uid, HEX);
}

This is my current code.

schreibfaul1 · 2024-10-07T20:03:55Z

With a "normal audio stream", the data would be written to the buffer here.
InBuff.getWritePtr() is the pointer to the position from which the data is written
bytesAddedToBuffer contains the number of bytes actually written.
Then the conversion from base64 would have to be done.

You don't need to worry about the rest, if it is an MP3 stream, for example, the ID3 header is automatically loaded when the buffer is full enough and the file is played.

Explorerlowi · 2024-10-11T15:05:28Z

Can you teach me how to parse and play the returned audio stream after sending an http(s) request to TTS? For example, which functions will work after the content is returned, and how will the returned content be processed? I can only play Baidu TTS now. When I send a request to Doubao TTS, Ali TTS, etc., I cannot parse and play the returned content normally.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can you help me write the code to call Doubao TTS? #856

Can you help me write the code to call Doubao TTS? #856

Explorerlowi commented Sep 26, 2024

Explorerlowi commented Oct 7, 2024

Explorerlowi commented Oct 7, 2024

schreibfaul1 commented Oct 7, 2024

Explorerlowi commented Oct 11, 2024

Can you help me write the code to call Doubao TTS? #856

Can you help me write the code to call Doubao TTS? #856

Comments

Explorerlowi commented Sep 26, 2024

Explorerlowi commented Oct 7, 2024

Explorerlowi commented Oct 7, 2024

schreibfaul1 commented Oct 7, 2024

Explorerlowi commented Oct 11, 2024