-
Notifications
You must be signed in to change notification settings - Fork 14
SSI Greta Integration
SSI is integrated in Greta via the SSITranslator module in modular.
The SSITranslator module listens the ActiveMQ topic named "SSI" and receives SSI-XML inputs every 100ms (this can be changed in the SSI configuration) when SSI is running and sending the output in the same ActiveMQ's topic.
The SSI-XML input is parsed and transformed by this module into an SSIFrame that can be used within Greta.
In the SSIFrame class there are methods to access the provided SSI features (in the SSIFrame) depending on the feature type (i.e. integer, double or string).
The complete list of available features, their types, value ranges and access modes are documented in SSITypes.
In this document you can find information about:
- #Setup Modular: how to setup modular for using SSITranslator (i.e. receive inputs from SSI and translate those XML files into SSIFrames)
- #SSITranslator Documentation: the documentaion of the SSITranslator module and associated classes/types in order to use the produced SSIFrames in your own project.
- #SSISender Documentation: SSISender is an helper module that can be used to test an SSI-XML input without running SSI by using the SSISender module in modular.
There is a modular configuration that already includes the SSITranslator for testing purposes.
You can open this configuration from modular, it is located in: <GRETA_DIR>/bin/SSITest.xml
If you want to setup SSI in your own modular configuration you can add the SSITranslator module by selecting:
Add->NetworkConnections->ActiveMQ->Receivers->SSI XML-to-Frame Translator
You do not need to connect this module with any other component in modular.
In the options (click on the module in modular) you can set the Host, Port and Destination Topic to connect and listen to the correct ActiveMQ SSI topic (i.e. where SSI is sending the XML input files).
An SSISender module is also included in this configuration. See below for further details.
OPTIONAL: you can add an SSIFilter to selectively parse the SSI-XML file received via ActiveMQ.
In order to add this module select:
Add->NetwrokConnections->ActiveMQ->Receivers->SSI XML Filter
Then connect it to the SSITranslator component in modular. In the options you can choose which part of the XML file you want to get parsed among the following options: All | Prosody Only | Head Only | Body Only
In order to use the output of the SSITranslator module, your module must implement the interface SSIFramePerfomer. In this way your module can receive the [wiki:SSIFrame]s emitted by the SSITranslator module.
Once your module is ready, you need to connect (in modular) the SSITranslator module to yours. Once connected, your module will start to receive SSIFrames as long as they are produced (either when SSI is running or when you send a test SSI-XML file via the SSISender module)
The following snippet shows an example of implementation of the interface SSIFramePerfomer.
There is also an example of access to an SSI feature.
The complete list of available features, their types, value ranges and access modes are documented in SSITypes.
In the SSIFrame class the methods to access the provided SSI features (in the SSIFrame) depending on the feature type (i.e. integer, double or string) are described.
public void performSSIFrames(List<SSIFrame> ssi_frames_list, ID requestId) {
for (SSIFrame ssf : ssi_frames_list) {
performSSIFrame(ssf, requestId);
}
}
// In this method a single SSIFrame is performed (i.e. accessed and used)
public void performSSIFrame(SSIFrame ssi_frame, ID requestId) {
// Example of access to an integer feature
int pulsesNumber = ssi_frame.getIntValue(SSITypes.SSIFeatureNames.prosody_praat_pulses_number);
// Example of access to a double feature
double bodyPostureLean= ssi_frame.getDoubleValue(SSITypes.SSIFeatureNames.body_posture_lean);
// Example of access to a string feature
String keyword = ssi_frame.getStringValue(SSITypes.SSIFeatureNames.prosody_msspeech_keyword);
}
This module emulates an instance of SSI running and allows the developer to run tests without lunching SSI.
It can be used for testing purposers to send SSI-XML input files into the ActiveMQ dashboard where the SSITranslator is listening so that the XML file is parsed.
If not already loaded in your modular configuration, an SSISender module can be added by selecting:
Add->NetworkConnections->ActiveMQ->Senders->SSISender
There is no need to connect this component in modular with any other component. However, you need to have an SSITranslator module loaded, and you need to make sure in the options that the same host, port and destination topic as in SSITranslator are set.
In order to test SSI-XML files you need to add another module named TextEditor and connect it to the SSISender module.
This module can be added by selecting:
Add->NetworkConnections->ActiveMQ->Text Editor
You can find an example SSI-XML file that you can open (click on the TextEditor->File->Open) and then Send (using the button on the right) to the SSITranslator.
The sample file can be found in: <GRETA_DIR>/bin/Examples/SSI/SSI-Sample-Input.xml
This class represents an SSIFrame emitted by the SSITranslator.
The important methods to use are the getters. The SSI features described in SSITypes are typed differently, and according to the type the appropriate method need to be used to retrieve the respective value from the SSIFrame.
In SSITypes, the SSIFeatureNames that need to be given in input to the following methods are defined.
There are currently 3 types supported that can be retrieved by using the following getters:
-
int:
public int getIntValue(SSIFeatureNames which)
-
double:
public double getDoubleValue(SSIFeatureNames which)
-
String:
public String getStringValue(SSIFeatureNames which)
-
int:
public void applyValue(SSIFeatureNames which, int value)
-
double:
public void applyValue(SSIFeatureNames which, double value)
-
String:
public void applyValue(SSIFeatureNames which, String value)
int headNod = ssi_frame.getIntValue(SSITypes.SSIFeatureNames.head_nod_cat);
This is an example usage of a setter:
SSIFrame frame = new SSIFrame();
double confidence = 0.5d;
frame.applyValue(SSIFeatureNames.prosody_msspeech_confidence, confidence);
This interface defines all the features that are contained in an SSIFrame. The list of features is defined in the enum SSIFeatureNames in the SSITypes class.
See [#Complete] for a copy of this list (note that this copy might be not updated).
Each name is commented with the respective type of the value. Possible types are int, double and String.
The comment also include the range of values for each feature. Some values range in [0-1] continuous intervals, some others are categorical (i.e. 0 or 1 corrensponding to "no" or "yes")
The names of features follow a precise naming convention that recalls the SSI-XML input file structure. The file has 3 main parts that are: prosody, head and body. The prosody part has several sub-parts (e.g. voice, praat, opensmile, etc...).
Therefore, a feature name has the following naming convention:
<XML Main Part>_<XML Sub Part>_<Feature>_<Suffix>
The prefix is: <XML Main Part>_<XML Sub Part>
, this depends on the existing parts of an SSI-XML input file
The middle is the feature name, there are currently 71 features that are all listed in the SSITypes file.
The suffix of a feature name contains information about the stored value. These are some examples:
- Categorical values: when a feature has a categorical value, the name ends with "cat".
- Percentages: the name ends with "100"
- Unity of measure: name ends with "db" or "hz"
- Probability: name ends with "prob"
The interface SSITypes offers two helper methods:
public static SSIFeatureNames getFeatureName(int ordinal)
public static boolean isStringFeature(int ordinal)
- getFeatureName: Returns the feature name (in the form of a SSIFeatureNames) starting from the integer number indicating its position in the enumeration.
- isStringFeature: Takes in input the ordinal corrensponding to a certain feature listed in the enum and returns true if that feature has a String value, false otherwise.
SSITypes.SSIFeatureNames.isStringFeature(SSIFeatureNames.prosody_msspeech_keyword.ordinal()
The following constants are used when, respectively, an int, double or String value has not been set in an SSIFrame, or when an invalid value is read from the SSI-XML input file:
- int: INVALID_OR_EMPTY_DOUBLE_FEATURE_VALUE = -1.0d;
- double: INVALID_OR_EMPTY_INTEGER_FEATURE_VALUE = -1;
- String: INVALID_OR_EMPTY_STRING_FEATURE_VALUE = "N/D";
There are 3 SSI features that yield special values. These features are:
- Pitch: prosody_opensmile_pitch_cat
- Pitch Direction: prosody_opensmile_pitch_direction_cat
- Pitch Energy: prosody_opensmile_energy_cat
- SSIPitchValues getPitchValueName(int cat): Takes the categorical value in input and returns the corrensponding category among: none | low | normal | high
- SSIPitchDirectionValues getPitchDirectionValueName(int cat): Takes the categorical value in input and returns the corrensponding category among: none | rise | fall | rise_fall | fall_rise
- SSIVoiceEnergyValues getVoiceEnergyValueName(int cat): Takes the categorical value in input and returns the corrensponding category among: none | low | medium | high
// assuming that frame is the SSIFrame containing the values that we want to print
System.out.println(" +
"Opensmile pitch : + SSITypes.SSIPitchValues.getPitchValueName(frame.getIntValue(SSIFeatureNames.prosody_opensmile_pitch_cat)) + "\n" +
"Opensmile pitch direction " + SSITypes.SSIPitchDirectionValues.getPitchDirectionValueName(frame.getIntValue(SSIFeatureNames.prosody_opensmile_pitch_direction_cat)) + "\n" +
"Opensmile voice energy " + SSITypes.SSIVoiceEnergyValues.getVoiceEnergyValueName(frame.getIntValue(SSIFeatureNames.prosody_opensmile_energy_cat)) + "\n";
These are the features currently defined in the enum SSIFeatureNames in the SSITypes class.
null_ssi, // This is a dummy value used when no feature is found (see method getFeatureName below)
prosody_voice_activity, // 1 // Integer, 0 = false, 1 = true
prosody_voice_systemtime, // Integer, not used at the moment
prosody_voice_duration, // Integer, not used at the moment
prosody_voice_speech_prob, // Double, probability voice activity is speech [0-1]
prosody_voice_laughter_prob, // 5 // Double, probability voice activity is laughter (1 – speech)
prosody_praat_pitch_median_hz, // Double
prosody_praat_pitch_mean_hz, // Double
prosody_praat_pitch_sd_hz, // Double
prosody_praat_pitch_min_hz, // Double
prosody_praat_pitch_max_hz, // 10 // Double
prosody_praat_pulses_number, // Integer
prosody_praat_pulses_per_sec, // Double
prosody_praat_periods_number, // Integer
prosody_praat_period_mean_sec, // Double
prosody_praat_period_sd_sec, // 15 // Double
prosody_praat_fraction_locally_unvoiced_frames_100, // Double (%)
prosody_praat_voice_breaks_number, // Integer
prosody_praat_voice_breaks_degree_100, // Double (%)
prosody_praat_jitter_local_100, // Double(%)
prosody_praat_jitter_local_abs_sec, // 20 // Double
prosody_praat_jitter_rap_100, // Double (%)
prosody_praat_jitter_ppq5_100, // Double (%)
prosody_praat_jitter_ddp_100, // Double (%)
prosody_praat_shimmer_local_100, // Double (%)
prosody_praat_shimmer_local_db, // 25 // Double
prosody_praat_shimmer_apq3_100, // Double (%)
prosody_praat_shimmer_apq5_100, // Double (%)
prosody_praat_shimmer_apq11_100, // Double (%)
prosody_praat_shimmer_dda_100, // Double (%)
prosody_praat_harmonicity_mean_autocor, // 30 // Double
prosody_praat_harmonicity_mean_noise_harmonics_ratio, // Double
prosody_praat_harmonicity_mean_harmonics_noise_ratio_db, // Double
prosody_praat_speechrate_duration_sec, // Double
prosody_praat_speechrate_voiced_count, // Integer
prosody_praat_speechrate_syllables_per_sec, // 35 // Double
prosody_praat_intensity_minimum_db, // Double
prosody_praat_intensity_maximum_db, // Double
prosody_praat_intensity_median_db, // Double
prosody_praat_intensity_average_db, // Double
prosody_opensmile_pitch_cat, // 40 // Integer, returns an item of the type SSIPitchValues (see below), use the method getPitchValueName to convert from int to enum value
prosody_opensmile_pitch_direction_cat, // Integer, returns an item of the type SSIPitchDirectionValues (see below), use the method getPitchDirectionValueName to convert from int to enum value
prosody_opensmile_energy_cat, // Integer, returns an item of the type SSIVoiceEnergyValues (see below), use the method getVoiceEnergyValueName to convert from int to enum value
prosody_geneva_F0semitoneFrom55Hz_sma3nz_a_mean, // Double
prosody_geneva_F0semitoneFrom55Hz_sma3nz_stddevNorm, // Double
prosody_geneva_F0semitoneFrom55Hz_sma3nz_percentile20, //45 // Double
prosody_geneva_F0semitoneFrom55Hz_sma3nz_percentile50, // Double
prosody_geneva_F0semitoneFrom55Hz_sma3nz_percentile80, // Double
prosody_geneva_F0semitoneFrom55Hz_sma3nz_pctlrange0_2, // Double
prosody_geneva_UnvoicedSegmentLength_stddev, // Double
prosody_msspeech_keyword, // 50 // String, the keyword (i.e. communicative function) recognized as defined in the SSI grammar for the language set in the options of SSI
// NOTE: use applyStringValue and getStringValue to store/retrive this feature
prosody_msspeech_confidence, // Double, [0..1] representing the recognition confidence (if "semantics_prolog" option is used in SSI), otherwise -1 (if option "keyword" is used in SSI).
head_position_x, // Double, coordinates in relation to camera
head_position_y, // Double, coordinates in relation to camera
head_orientation_roll, // Double, coordinates from Kinect SDK [-90,90] -91 = invalid value
head_orientation_pitch, // 55 // Double, coordinates from Kinect SDK [-90,90] -91 = invalid value
head_orientation_yaw, // Double, coordinates from Kinect SDK [-90,90] -91 = invalid value
head_focus, // Double [0..1] where 1 = focused head position (centered), 0 = looking away, .. between values possible
head_tilt, // Double [0..1] where 1 = tilted head, 0 = straight head, .. between values possible
head_nod_cat, // Integer, 1 = yes head nod, 0 = no head nod
head_shake_cat, // 60 // Integer, 1 = yes head shake, 0 = no head shake
head_smile, // Double, values 0 - ~100
body_posture_lean, // Double, [0..1] where 1 = front, 0.5 = center, 0 = back
body_arms_openness, // Double [0..1] where 1 = open, 0 = closed, .. between values possible
body_overall_activity, // Double, [0..50?] where the movement is in 30 second timespan
body_hands_energy, // 65 // Double, [0..1?] where it represents the energy of hand movement
body_gesture_arms_open, // Integer, where 1 = present, 0 = not present
body_gesture_arms_crossed, // Integer, where 1 = present, 0 = not present
body_gesture_left_hand_head_touch, // Integer, where 1 = present, 0 = not present
body_gesture_right_hand_head_touch, // Integer, where 1 = present, 0 = not present
body_gesture_lean_front, // 70 // Integer, where 1 = present, 0 = not present
body_gesture_lean_back; // 71 // Integer, where 1 = present, 0 = not present
Advanced
- Generating New Facial expressions
- Generating New Gestures
- Generating new Hand configurations
- Torso Editor Interface
- Creating an Instance for Interaction
- Create a new virtual character
- Creating a Greta Module in Java
- Modular Application
- Basic Configuration
- Signal
- Feedbacks
- From text to FML
- Expressivity Parameters
- Text-to-speech, TTS
-
AUs from external sources
-
Large language model (LLM)
-
Automatic speech recognition (ASR)
-
Extentions
-
Integration examples
Nothing to show here