SSI Greta Integration

Table of Contents SSI-Greta Integration Setup Modular Using the pre-existing modular configuration Using your own modular configuration SSITranslator Documentation SSISender Documentation SSIFrame SSITypes Feature Names Naming Conventions Prefix Middle Suffix Helper Methods Constants Special Value Types Complete List of Feature Names

SSI-Greta Integration

SSI is integrated in Greta via the SSITranslator module in modular.

The SSITranslator module listens the ActiveMQ topic named "SSI" and receives SSI-XML inputs every 100ms (this can be changed in the SSI configuration) when SSI is running and sending the output in the same ActiveMQ's topic.

The SSI-XML input is parsed and transformed by this module into an SSIFrame that can be used within Greta.

In the SSIFrame class there are methods to access the provided SSI features (in the SSIFrame) depending on the feature type (i.e. integer, double or string).

The complete list of available features, their types, value ranges and access modes are documented in SSITypes.

In this document you can find information about:

#Setup Modular: how to setup modular for using SSITranslator (i.e. receive inputs from SSI and translate those XML files into SSIFrames)
#SSITranslator Documentation: the documentaion of the SSITranslator module and associated classes/types in order to use the produced SSIFrames in your own project.
#SSISender Documentation: SSISender is an helper module that can be used to test an SSI-XML input without running SSI by using the SSISender module in modular.

Setup Modular

Using the pre-existing modular configuration

There is a modular configuration that already includes the SSITranslator for testing purposes. You can open this configuration from modular, it is located in: <GRETA_DIR>/bin/SSITest.xml

Using your own modular configuration

If you want to setup SSI in your own modular configuration you can add the SSITranslator module by selecting:

Add->NetworkConnections->ActiveMQ->Receivers->SSI XML-to-Frame Translator

You do not need to connect this module with any other component in modular.

In the options (click on the module in modular) you can set the Host, Port and Destination Topic to connect and listen to the correct ActiveMQ SSI topic (i.e. where SSI is sending the XML input files).

An SSISender module is also included in this configuration. See below for further details.

OPTIONAL: you can add an SSIFilter to selectively parse the SSI-XML file received via ActiveMQ.

In order to add this module select:

Add->NetwrokConnections->ActiveMQ->Receivers->SSI XML Filter

Then connect it to the SSITranslator component in modular. In the options you can choose which part of the XML file you want to get parsed among the following options: All | Prosody Only | Head Only | Body Only

SSITranslator Documentation

In order to use the output of the SSITranslator module, your module must implement the interface SSIFramePerfomer. In this way your module can receive the [wiki:SSIFrame]s emitted by the SSITranslator module.

Once your module is ready, you need to connect (in modular) the SSITranslator module to yours. Once connected, your module will start to receive SSIFrames as long as they are produced (either when SSI is running or when you send a test SSI-XML file via the SSISender module)

The following snippet shows an example of implementation of the interface SSIFramePerfomer.

There is also an example of access to an SSI feature.

The complete list of available features, their types, value ranges and access modes are documented in SSITypes.

In the SSIFrame class the methods to access the provided SSI features (in the SSIFrame) depending on the feature type (i.e. integer, double or string) are described.

public void performSSIFrames(List<SSIFrame> ssi_frames_list, ID requestId) {
	for (SSIFrame ssf : ssi_frames_list) {
		performSSIFrame(ssf, requestId);
	}
}

// In this method a single SSIFrame is performed (i.e. accessed and used)
public void performSSIFrame(SSIFrame ssi_frame, ID requestId) {

	 // Example of access to an integer feature
	 int pulsesNumber = ssi_frame.getIntValue(SSITypes.SSIFeatureNames.prosody_praat_pulses_number);

	 // Example of access to a double feature
	 double bodyPostureLean= ssi_frame.getDoubleValue(SSITypes.SSIFeatureNames.body_posture_lean);

	// Example of access to a string feature
	String keyword = ssi_frame.getStringValue(SSITypes.SSIFeatureNames.prosody_msspeech_keyword);

}

SSISender Documentation

This module emulates an instance of SSI running and allows the developer to run tests without lunching SSI.

It can be used for testing purposers to send SSI-XML input files into the ActiveMQ dashboard where the SSITranslator is listening so that the XML file is parsed.

If not already loaded in your modular configuration, an SSISender module can be added by selecting:

Add->NetworkConnections->ActiveMQ->Senders->SSISender

There is no need to connect this component in modular with any other component. However, you need to have an SSITranslator module loaded, and you need to make sure in the options that the same host, port and destination topic as in SSITranslator are set.

In order to test SSI-XML files you need to add another module named TextEditor and connect it to the SSISender module.

This module can be added by selecting:

Add->NetworkConnections->ActiveMQ->Text Editor

You can find an example SSI-XML file that you can open (click on the TextEditor->File->Open) and then Send (using the button on the right) to the SSITranslator. The sample file can be found in: <GRETA_DIR>/bin/Examples/SSI/SSI-Sample-Input.xml

SSIFrame

This class represents an SSIFrame emitted by the SSITranslator.

The important methods to use are the getters. The SSI features described in SSITypes are typed differently, and according to the type the appropriate method need to be used to retrieve the respective value from the SSIFrame.

In SSITypes, the SSIFeatureNames that need to be given in input to the following methods are defined.

There are currently 3 types supported that can be retrieved by using the following getters:

int: public int getIntValue(SSIFeatureNames which)
double: public double getDoubleValue(SSIFeatureNames which)
String: public String getStringValue(SSIFeatureNames which)

In case a value needs to be set (or updated) in a frame, these are the respective setters:

int: public void applyValue(SSIFeatureNames which, int value)
double: public void applyValue(SSIFeatureNames which, double value)
String: public void applyValue(SSIFeatureNames which, String value)

This is an example usage of a getter:

int headNod = ssi_frame.getIntValue(SSITypes.SSIFeatureNames.head_nod_cat);

This is an example usage of a setter:

SSIFrame frame = new SSIFrame();
double confidence = 0.5d;
frame.applyValue(SSIFeatureNames.prosody_msspeech_confidence, confidence);

SSITypes

This interface defines all the features that are contained in an SSIFrame. The list of features is defined in the enum SSIFeatureNames in the SSITypes class.

See [#Complete] for a copy of this list (note that this copy might be not updated).

Each name is commented with the respective type of the value. Possible types are int, double and String.

The comment also include the range of values for each feature. Some values range in [0-1] continuous intervals, some others are categorical (i.e. 0 or 1 corrensponding to "no" or "yes")

Feature Names

The names of features follow a precise naming convention that recalls the SSI-XML input file structure. The file has 3 main parts that are: prosody, head and body. The prosody part has several sub-parts (e.g. voice, praat, opensmile, etc...).

Naming Conventions

Therefore, a feature name has the following naming convention:

<XML Main Part>_<XML Sub Part>_<Feature>_<Suffix>

Prefix

The prefix is: <XML Main Part>_<XML Sub Part>, this depends on the existing parts of an SSI-XML input file

Middle

The middle is the feature name, there are currently 71 features that are all listed in the SSITypes file.

Suffix

The suffix of a feature name contains information about the stored value. These are some examples:

Categorical values: when a feature has a categorical value, the name ends with "cat".
Percentages: the name ends with "100"
Unity of measure: name ends with "db" or "hz"
Probability: name ends with "prob"

Helper Methods

The interface SSITypes offers two helper methods:

public static SSIFeatureNames getFeatureName(int ordinal)
public static boolean isStringFeature(int ordinal)

getFeatureName: Returns the feature name (in the form of a SSIFeatureNames) starting from the integer number indicating its position in the enumeration.
isStringFeature: Takes in input the ordinal corrensponding to a certain feature listed in the enum and returns true if that feature has a String value, false otherwise.

Example:

SSITypes.SSIFeatureNames.isStringFeature(SSIFeatureNames.prosody_msspeech_keyword.ordinal()

Constants

The following constants are used when, respectively, an int, double or String value has not been set in an SSIFrame, or when an invalid value is read from the SSI-XML input file:

int: INVALID_OR_EMPTY_DOUBLE_FEATURE_VALUE = -1.0d;
double: INVALID_OR_EMPTY_INTEGER_FEATURE_VALUE = -1;
String: INVALID_OR_EMPTY_STRING_FEATURE_VALUE = "N/D";

Special Value Types

There are 3 SSI features that yield special values. These features are:

Pitch: prosody_opensmile_pitch_cat
Pitch Direction: prosody_opensmile_pitch_direction_cat
Pitch Energy: prosody_opensmile_energy_cat

Although the type of these feature is int, it represents a categorical value. You can use the following methods, declared in the enumerations SSIPitchValues, SSIPitchDirectionValues and SSIVoiceEnergyValues, for converting the int value to the correct category:

SSIPitchValues getPitchValueName(int cat): Takes the categorical value in input and returns the corrensponding category among: none | low | normal | high
SSIPitchDirectionValues getPitchDirectionValueName(int cat): Takes the categorical value in input and returns the corrensponding category among: none | rise | fall | rise_fall | fall_rise
SSIVoiceEnergyValues getVoiceEnergyValueName(int cat): Takes the categorical value in input and returns the corrensponding category among: none | low | medium | high

Examples

// assuming that frame is the SSIFrame containing the values that we want to print

System.out.println(" +
"Opensmile pitch : + SSITypes.SSIPitchValues.getPitchValueName(frame.getIntValue(SSIFeatureNames.prosody_opensmile_pitch_cat)) + "\n" +
"Opensmile pitch direction " + SSITypes.SSIPitchDirectionValues.getPitchDirectionValueName(frame.getIntValue(SSIFeatureNames.prosody_opensmile_pitch_direction_cat)) + "\n" +
"Opensmile voice energy " + SSITypes.SSIVoiceEnergyValues.getVoiceEnergyValueName(frame.getIntValue(SSIFeatureNames.prosody_opensmile_energy_cat)) + "\n";

Complete List of Feature Names

These are the features currently defined in the enum SSIFeatureNames in the SSITypes class.

null_ssi,                                                   // This is a dummy value used when no feature is found (see method getFeatureName below)
prosody_voice_activity, // 1                                // Integer, 0 = false, 1 = true
prosody_voice_systemtime,                                   // Integer, not used at the moment
prosody_voice_duration,                                     // Integer, not used at the moment
prosody_voice_speech_prob,                                  // Double, probability voice activity is speech [0-1]
prosody_voice_laughter_prob, // 5                           // Double, probability voice activity is laughter (1 – speech)
prosody_praat_pitch_median_hz,                              // Double
prosody_praat_pitch_mean_hz,                                // Double
prosody_praat_pitch_sd_hz,                                  // Double
prosody_praat_pitch_min_hz,                                 // Double
prosody_praat_pitch_max_hz, // 10                           // Double
prosody_praat_pulses_number,                                // Integer
prosody_praat_pulses_per_sec,                               // Double
prosody_praat_periods_number,                               // Integer
prosody_praat_period_mean_sec,                              // Double
prosody_praat_period_sd_sec, // 15                          // Double
prosody_praat_fraction_locally_unvoiced_frames_100,         // Double (%)
prosody_praat_voice_breaks_number,                          // Integer
prosody_praat_voice_breaks_degree_100,                      // Double (%)
prosody_praat_jitter_local_100,                             // Double(%)
prosody_praat_jitter_local_abs_sec, // 20                   // Double
prosody_praat_jitter_rap_100,                               // Double (%)
prosody_praat_jitter_ppq5_100,                              // Double (%)
prosody_praat_jitter_ddp_100,                               // Double (%)
prosody_praat_shimmer_local_100,                            // Double (%)
prosody_praat_shimmer_local_db, // 25                       // Double
prosody_praat_shimmer_apq3_100,                             // Double (%)
prosody_praat_shimmer_apq5_100,                             // Double (%)
prosody_praat_shimmer_apq11_100,                            // Double (%)
prosody_praat_shimmer_dda_100,                              // Double (%)
prosody_praat_harmonicity_mean_autocor, // 30               // Double
prosody_praat_harmonicity_mean_noise_harmonics_ratio,       // Double
prosody_praat_harmonicity_mean_harmonics_noise_ratio_db,    // Double
prosody_praat_speechrate_duration_sec,                      // Double
prosody_praat_speechrate_voiced_count,                      // Integer
prosody_praat_speechrate_syllables_per_sec, // 35           // Double
prosody_praat_intensity_minimum_db,                         // Double
prosody_praat_intensity_maximum_db,                         // Double
prosody_praat_intensity_median_db,                          // Double
prosody_praat_intensity_average_db,                         // Double
prosody_opensmile_pitch_cat, // 40                          // Integer, returns an item of the type SSIPitchValues (see below), use the method getPitchValueName to convert from int to enum value
prosody_opensmile_pitch_direction_cat,                      // Integer, returns an item of the type SSIPitchDirectionValues (see below), use the method getPitchDirectionValueName to convert from int to enum value
prosody_opensmile_energy_cat,                               // Integer, returns an item of the type SSIVoiceEnergyValues (see below), use the method getVoiceEnergyValueName to convert from int to enum value
prosody_geneva_F0semitoneFrom55Hz_sma3nz_a_mean,            // Double
prosody_geneva_F0semitoneFrom55Hz_sma3nz_stddevNorm,        // Double
prosody_geneva_F0semitoneFrom55Hz_sma3nz_percentile20, //45 // Double
prosody_geneva_F0semitoneFrom55Hz_sma3nz_percentile50,      // Double
prosody_geneva_F0semitoneFrom55Hz_sma3nz_percentile80,      // Double
prosody_geneva_F0semitoneFrom55Hz_sma3nz_pctlrange0_2,      // Double
prosody_geneva_UnvoicedSegmentLength_stddev,                // Double
prosody_msspeech_keyword, // 50                             // String, the keyword (i.e. communicative function) recognized as defined in the SSI grammar for the language set in the options of SSI
															// NOTE: use applyStringValue and getStringValue to store/retrive this feature
prosody_msspeech_confidence,                                // Double, [0..1] representing the recognition confidence (if "semantics_prolog" option is used in SSI), otherwise -1 (if option "keyword" is used in SSI).
head_position_x,                                            // Double, coordinates in relation to camera
head_position_y,                                            // Double, coordinates in relation to camera
head_orientation_roll,                                      // Double, coordinates from Kinect SDK [-90,90] -91 = invalid value
head_orientation_pitch, // 55                               // Double, coordinates from Kinect SDK [-90,90] -91 = invalid value
head_orientation_yaw,                                       // Double, coordinates from Kinect SDK [-90,90] -91 = invalid value
head_focus,                                                 // Double [0..1] where 1 = focused head position (centered), 0 = looking away, .. between values possible
head_tilt,                                                  // Double [0..1] where 1 = tilted head, 0 = straight head, .. between values possible
head_nod_cat,                                               // Integer, 1 = yes head nod, 0 = no head nod
head_shake_cat, // 60                                       // Integer, 1 = yes head shake, 0 = no head shake
head_smile,                                                 // Double, values 0 - ~100
body_posture_lean,                                          // Double, [0..1] where 1 = front, 0.5 = center, 0 = back
body_arms_openness,                                         // Double [0..1] where 1 = open, 0 = closed, .. between values possible
body_overall_activity,                                      // Double, [0..50?] where the movement is in 30 second timespan
body_hands_energy, // 65                                    // Double, [0..1?] where it represents the energy of hand movement
body_gesture_arms_open,                                     // Integer, where 1 = present, 0 = not present
body_gesture_arms_crossed,                                  // Integer, where 1 = present, 0 = not present
body_gesture_left_hand_head_touch,                          // Integer, where 1 = present, 0 = not present
body_gesture_right_hand_head_touch,                         // Integer, where 1 = present, 0 = not present
body_gesture_lean_front, // 70                              // Integer, where 1 = present, 0 = not present
body_gesture_lean_back; // 71                               // Integer, where 1 = present, 0 = not present

Home

Getting started with Greta

Greta Architecture

Quick start

Advanced

Functionalities

Core functionality

Auxiliary functionalities

Incrementality
Microphone
Idle-behavior
AUs from external sources
- Open Face 1 integration
- Open Face 2 integration
Large language model (LLM)
- Mistral
- Mistral incremental
Automatic speech recognition (ASR)
- Speech Recognizer
- DeepASR module
  - DeepGram
Automatic gestures
- MeaningMiner
- NVBG (Nonverbal behavior generator)
Turn Management (Backchannel)
Extentions
Integration examples

Preview functionality

Nothing to show here

Previous functionality (possibly it still works, but not supported anymore)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly