You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We want to integrate PicoVoice Rhino for intent recognition.
Our current NLU (Natural Language Understanding) pipeline is:
Ask a question
Speech to Text (on robot)
Parse text according to a grammar+target into 'semantics' (mapping of parameters to values)
Take action based on the semantics (ask another question or go do something)
Out of these, PicoVoice Rhino handles:
Speech to Text
Parsing text to 'semantics': an 'intent' with some parameters (e.g. intent 'bringItem' with parameters/slot specifying what item to bring and from where to where to bring the item)
So we don't have to run speech recognition and also don't have to use the grammar parser. We still do the process of interpreting this information and acting on it of course.
There is a downside though: the API's we've developed around the NLU pipeline work with a grammar that specifies what sentences are acceptable and what words fill up what parameters.
That is grammar is still there, on the PicoVoice console.
PicoVoice concepts
In PicoVoice, there are some concepts to know:
Expression: An Intent can be expressed with different sentences and structures of sentences. Eg. 'Get me item A from the kitchen' or 'Go to the kitchen, get me A and bring it to me' both have the same meaning and intent, but a very different structure.
These expressions are comparable to the grammar definitions the grammar_parser uses.
Intent: A way to interpret a user's command. eg. bringItem, makeCoffee.
Comparable in function to the Target that the grammar_parser uses.
Slot: an Intent can fill some slots. eg. what item to bring from where to where, what kind of coffee to make etc. These parametrize the command.
Context: a collection of various Intents that have some commonality and relation to each other
Roughly comparable to a overall grammar definition for the grammar_parser.
These are referred to via a context_url.
TODO
We'll somehow have to map the stuff we've used in conjunction with the grammar_parser to PicoVoice stuff.
We can't send a grammar and expect that to be recognized. Instead, we have to create Intents (with expressions and slots etc), gather them into a Context and refer to those instead of sending a grammar.
Many of the grammars are not defined/hardcoded in the challenge state machines directly but import this from robocup_knowledge which could save a bit of course.
Replace our use of grammars within the RoboCup challenges with context_urls and intents
Integration with Challenges
Because the grammar-based HmiQuery-API is still quite useful and used with eg. Telegram and other HMI servers, maybe it's better to create a 2nd API that reflects that PicoVoice (and other similar services) take care of a larger part of the NLU pipeline.
Both these APIs are useful at the same time. Ideally, we can use the hmi-framework to query the user via Telegram and PicoVoice at the same time.
Since many of the grammars are already defined in robocup_knowledge, maybe we can make the connection between grammar+target for grammar_parser and the intent and context_url for PicoVoice?
We might even be able to generate the .yaml files that PicoVoice can import to define a Context. That would allow to 'compile' a grammer for PicoVoice and thus have a single source of truth.
The text was updated successfully, but these errors were encountered:
We want to integrate PicoVoice Rhino for intent recognition.
Our current NLU (Natural Language Understanding) pipeline is:
Out of these, PicoVoice Rhino handles:
So we don't have to run speech recognition and also don't have to use the grammar parser. We still do the process of interpreting this information and acting on it of course.
There is a downside though: the API's we've developed around the NLU pipeline work with a grammar that specifies what sentences are acceptable and what words fill up what parameters.
That is grammar is still there, on the PicoVoice console.
PicoVoice concepts
In PicoVoice, there are some concepts to know:
grammar_parser
uses.bringItem
,makeCoffee
.grammar_parser
uses.grammar_parser
.context_url
.TODO
We'll somehow have to map the stuff we've used in conjunction with the
grammar_parser
to PicoVoice stuff.We can't send a grammar and expect that to be recognized. Instead, we have to create Intents (with expressions and slots etc), gather them into a Context and refer to those instead of sending a grammar.
Many of the grammars are not defined/hardcoded in the challenge state machines directly but import this from
robocup_knowledge
which could save a bit of course.Integration with Challenges
Because the grammar-based HmiQuery-API is still quite useful and used with eg. Telegram and other HMI servers, maybe it's better to create a 2nd API that reflects that PicoVoice (and other similar services) take care of a larger part of the NLU pipeline.
Both these APIs are useful at the same time. Ideally, we can use the
hmi
-framework to query the user via Telegram and PicoVoice at the same time.Since many of the grammars are already defined in
robocup_knowledge
, maybe we can make the connection between grammar+target forgrammar_parser
and the intent and context_url for PicoVoice?We might even be able to generate the .yaml files that PicoVoice can import to define a
Context
. That would allow to 'compile' a grammer for PicoVoice and thus have a single source of truth.The text was updated successfully, but these errors were encountered: