Speech Recognizer

This page contains the instructions to setup and run speech recognition module in Greta.

The Speech Recognizer component identifies full phrase in the spoken language as a person is speaking, and converts them into machine readable format. This module mainly uses the google automatic speech recognition and rely on selenium.WebDriver which follows the W3C Recommendation https://www.w3.org/TR/webdriver1/ .

The module can send the recognized utterance to other modules using ActiveMQ messages.

Installation

Generating OpenSSL certificate

(Please go at https://kb.firedaemon.com/support/solutions/articles/4000121705-openssl-3-0-and-1-1-1-binary-distributions-for-microsoft-windows and use the Windows installer in the "Download OpenSSL 3.0 Windows Installer" section) The following command can be used to create the certificate. When asked for the certificate information, enter localhost for the Common Name/CN; the other values do not matter as much.

openssl req -new -newkey rsa:4096 -days 5000 -nodes -x509 -sha512 -out cert.crt -keyout cert.key

The resulting files must then be converted into a PKCS #12 file. This can be done by issuing the following command and entering an empty password.

openssl pkcs12 -export -in cert.crt -inkey cert.key -out $GRETA_HOME/bin/Common/Data/ASRResources/cert.p12 -passout pass:

Once the certificate is generated, it is required to get Chrome to accept the certificate.

Configuration and Running

This component communicates with other modules using ActiveMQ messaging. The activeMQ broker service must be running befor the speech recognition component is instantiated. The activeMQ broker service can be launched through moduler by adding NetworkConnections->ActiveMQ->Broker.

Once the speech recognizer component is launched, the Chrome web browser will start automatically with the URL https://localhost:8088. It will then ask for the permission to use microphone and it must be granted.

ASR

BY default the module recognizes English(UK) language. It is possible to change the speech recognition language directly in the Browser. For the moment, the currently available options are English (UK), English (US) and French languages.

#Output Format The speech recognizer produces the output in JSON string format.

jsonString = {

         NumWords: transcript.trim().split(/\s+/).length,                           //Number of words

     inputDuration: input_dur,                                                  //input duration 

     inputStartTime: 0,                                                         //input start time (default)

     inputEndTime: input_dur,                                                   //input End time

     TRANSCRIPT: transcript.toUpperCase()                                        //transcript in upper case letters

     };

This module communicates with activeMQ server with the following default configuration:

Host : localhost

Port : 61616

Request Topic : GRETA/ASR/REQUEST

Response Topic : GRETA/ASR/RESPONSE

Thus, any module that wants to listen to the output of the speech recognizer, must subscribe to the port 61616 with the input topic GRETA/ASR/RESPONSE.

Home

Getting started with Greta

Greta Architecture

Quick start

Advanced

Functionalities

Core functionality

Auxiliary functionalities

Incrementality
Microphone
Idle-behavior
AUs from external sources
- Open Face 1 integration
- Open Face 2 integration
Large language model (LLM)
- Mistral
- Mistral incremental
Automatic speech recognition (ASR)
- Speech Recognizer
- DeepASR module
  - DeepGram
Automatic gestures
- MeaningMiner
- NVBG (Nonverbal behavior generator)
Turn Management (Backchannel)
Extentions
Integration examples

Preview functionality

Nothing to show here

Previous functionality (possibly it still works, but not supported anymore)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speech Recognizer

Speech Recognizer

Installation

Generating OpenSSL certificate

Configuration and Running

Home

Getting started with Greta

Functionalities

Core functionality

Auxiliary functionalities

Preview functionality

Previous functionality (possibly it still works, but not supported anymore)

Clone this wiki locally