PicoVoice Rhino integration #1177

LoyVanBeek · 2022-05-07T11:38:10Z

We want to integrate PicoVoice Rhino for intent recognition.

Our current NLU (Natural Language Understanding) pipeline is:

Ask a question
Speech to Text (on robot)
Parse text according to a grammar+target into 'semantics' (mapping of parameters to values)
Take action based on the semantics (ask another question or go do something)

Out of these, PicoVoice Rhino handles:

Speech to Text
Parsing text to 'semantics': an 'intent' with some parameters (e.g. intent 'bringItem' with parameters/slot specifying what item to bring and from where to where to bring the item)

So we don't have to run speech recognition and also don't have to use the grammar parser. We still do the process of interpreting this information and acting on it of course.

There is a downside though: the API's we've developed around the NLU pipeline work with a grammar that specifies what sentences are acceptable and what words fill up what parameters.
That is grammar is still there, on the PicoVoice console.

PicoVoice concepts

In PicoVoice, there are some concepts to know:

Expression: An Intent can be expressed with different sentences and structures of sentences. Eg. 'Get me item A from the kitchen' or 'Go to the kitchen, get me A and bring it to me' both have the same meaning and intent, but a very different structure.
- These expressions are comparable to the grammar definitions the grammar_parser uses.
Intent: A way to interpret a user's command. eg. bringItem, makeCoffee.
- Comparable in function to the Target that the grammar_parser uses.
Slot: an Intent can fill some slots. eg. what item to bring from where to where, what kind of coffee to make etc. These parametrize the command.
Context: a collection of various Intents that have some commonality and relation to each other
- Roughly comparable to a overall grammar definition for the grammar_parser.
- These are referred to via a context_url.

TODO

We'll somehow have to map the stuff we've used in conjunction with the grammar_parser to PicoVoice stuff.
We can't send a grammar and expect that to be recognized. Instead, we have to create Intents (with expressions and slots etc), gather them into a Context and refer to those instead of sending a grammar.

Many of the grammars are not defined/hardcoded in the challenge state machines directly but import this from robocup_knowledge which could save a bit of course.

Create HMI server for PicoVoice: https://github.com/tue-robotics/hmi_picovoice
Set up Amigo/TechUnited PicoVoice account? (@PetervDooren has the credentials)
Define Contexts+Intents+Slots+Expressions for all our grammars.
- yes/no intent
- declareName
- ...
Replace our use of grammars within the RoboCup challenges with context_urls and intents

Integration with Challenges

Because the grammar-based HmiQuery-API is still quite useful and used with eg. Telegram and other HMI servers, maybe it's better to create a 2nd API that reflects that PicoVoice (and other similar services) take care of a larger part of the NLU pipeline.

Both these APIs are useful at the same time. Ideally, we can use the hmi-framework to query the user via Telegram and PicoVoice at the same time.
Since many of the grammars are already defined in robocup_knowledge, maybe we can make the connection between grammar+target for grammar_parser and the intent and context_url for PicoVoice?
We might even be able to generate the .yaml files that PicoVoice can import to define a Context. That would allow to 'compile' a grammer for PicoVoice and thus have a single source of truth.

The text was updated successfully, but these errors were encountered:

Part of tue-robotics/tue_robocup#1177

LoyVanBeek mentioned this issue May 7, 2022

Picovoice HMI server tue-robotics/hmi_picovoice#1

Merged

MatthijsBurgh added a commit to tue-robotics/hmi_picovoice that referenced this issue Nov 15, 2022

Picovoice HMI server (#1)

e0996a0

Part of tue-robotics/tue_robocup#1177

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PicoVoice Rhino integration #1177

PicoVoice Rhino integration #1177

LoyVanBeek commented May 7, 2022 •

edited

Loading

PicoVoice Rhino integration #1177

PicoVoice Rhino integration #1177

Comments

LoyVanBeek commented May 7, 2022 • edited Loading

PicoVoice concepts

TODO

Integration with Challenges

LoyVanBeek commented May 7, 2022 •

edited

Loading