Use your computer with Voice and Joy-Con
- Replace
Keyboard
->Voice Typing
+Joy-Con Button
Video Demo - Replace
Mouse
->Motion Control
+Joy-Con Stick
Video Demo
For those who want to:
- Avoid maintaining the typing posture.
- Play PC game with motion control.
1. The Mode concept
The idea comes from Vim, it can be configured to have multiple modes: normal, speech, motion control. For example, hold down the shoulder button ZR to enter gyro mode and you can rotate it to move the mouse cursor, release it to get back to normal mode.
The Joy-Con has a very limited number of buttons, but a button can trigger different actions in different modes. As there are more than 20 buttons on both side, 20 buttons * 20 modes, in theory it's 400 buttons for different actions. Just bind some most used actions like , in the default mode, maybe even bind the 26 alphabets within 2 modes so you can do button typing without using voice at all.
2. Word Mapping
Speech words can be mapped to different actions like:
- Replacement:
dash
->-
,twenty twenty two
->2022
- Decoration:
snake hello world
->hello_world
,camel hello world
->helloWorld
- Hotkey:
control alt delete
->task explorer launched
- Run shell script:
launch browser
->brave-browser --no-sandbox www.test.com
3. Limit the dictionary for better accuracy
VOSK is used as the backend recognition engine, with the phrase_list parameter, it's possible to use a small dictionary, for instance, when the dictionary is limited to alphabeta~z
, thec
will never be recognized assea
orsee
.
For programming, switch to a speech mode with limited dictionary for typing keywords, punctuation, numbers. And use another mode with unlimited dictionary for variables and comments, once found a conflict, solve it by adding a word mapping.
1. Connect Joy-Con to PC via Bluetooth
- From the system bluetooth manager, click
scan new device
- Hold the sync button for 2 seconds, until the lights start blinking.
- Find -> Pair -> Connect it in the BT manager.
2. Install Docker
-
Windows Download and run the installer from https://docs.docker.com/desktop/install/windows-install/. If it prompt something like
download and install WSL 2 Linux kernel upgrade package
, just install it. -
Linux
apt install docker.io
-
Mac I have zero experience with Mac, also there is no prebuilt binary for Mac, but the steps should be similar to Windows/Linux, all the dependency packages claim to support Mac, check the next section to build it from souce.
3. Run VOSK server
docker run -d -p 2701:2701 aj3423/vosk_lgraph:latest
4. Download the prebuilt binary from release page
- install dependency
hidapi
- Windows: Download the release file from https://github.com/libusb/hidapi/releases, which contains header, lib and dll.
- Linux:
apt install libhidapi-dev
- Install golang from https://go.dev/dl/
- Clone this repo:
git clone github.com/aj3423/joy-typing
- Go to 'main' directory:
cd joy-typing/main
go build .
- Keeps disconnecting right after connected
The Joy-Con can ONLY be paired to one device at a time, once you attach it back to the switch console for charging, it's auto re-paired to the console, you'll have to remove it in the system BT Manager and re-pair it again. I tried some hacky way like attach it to the console during the shutting down or powering up, to get it being charged but not re-paired, I succeeded only once by accident but can't remember how, ended up using some dedicated charging cable.
- No sound input or inaccurate recognition
Diagnose with this tool: vosk-sound-test
It's capable of playing back your voice and saving to a .wav file, so you can verify if the sound quality is expected.
Usage:
1). docker run -d -p 2701:2701 aj3423/vosk_lgraph:latest
2) Download the binary from the release page
3) ./vosk-sound-test -host "127.0.0.1:2701"
4) Say something, press enter to playback, press enter again to save to a .wav file.
- Unexpected behavior of Joy-Con
When press both side buttons(SL
+ SR
), Joy-Con enters an interesting mode, maybe it's the mode when being attached. Check the lights, if the first light keeps on and other three are off, just re-connect it and make sure not press both side buttons. BTW, in this mode, the buttons are bound to some system operations, for example the Joy-Con-Right:
- +: toggle on/off the events below
- R: mouse right click
- B: `Esc`
- Home: system volumn down
- Stick button: system volumn up
- Stick spin: mouse move
...
The file config.toml
is generated at the first launch, it monitors file modification and applys new changes on the fly. The sections:
1. Mode
A mode id must be assigned by parameter -id
, it can be any string as long as not conflicts. The first in the list is used as the default mode.
Mode Type | Description | Parameters |
---|---|---|
[idle] | do nothing, normally used as default mode | -id modeId |
[gyro] | enable/disable the gyroscope on enter/exit |
-id modeId |
[speech] | start/stop capturing audio input on enter/exit |
-id modeId-host backend engine url, default: 127.0.0.1:2701This backend uses a 128M model, there is also a 1.8GB docker image which consumes more memory but results in a better accuracy, can be installed with docker run -d -p 2700:2700 alphacep/kaldi-en:latest and set this param as: '-host 127.0.0.1:2700'. This model doesn't allow dynamic phrase_list, should only be used in sentence mode.-phrase phrase id array that configured in PhraseList section.e.g. '-phrase punctuation java cpp' -flushonexit fire an flush event on mode exit to get recognition result quicker, see the action [flush] below |
2. Mode Rule
A Mode does very little, jobs are done by mode rules. There two types of rules:
trigger
->action
trigger: One-time-event like button press. When an input matches the trigger condition, the corresponding action is performed.
switch
->modifier
switch: It can be turned on and off, when it's on, the modifier will be applied to the input signal.
e.g. [switch] button -id R -> [boost] -speed 3
means when the button R
is down, the cursor moves 3 times faster.
Some examples:
[trigger] stick -side Right -> [cursor] -speed 40
Spin right stick to move mouse cursor.[trigger] button -id R-SR -> [hotkey] -keys t control alt
Press button R-SR to trigger hotkey "ctrl+alt+t".[switch] button -id A -> [prefix] -prefix "[camel] [title] "
When button A is down, speech text is decorated to camel+title case, e.g.: "hello world again" -> "HelloWorldAgain".[switch] button -id R -> [mode] -id MouseMode
Switch to MouseMode by holding R, release R go get back to default mode.
Note: Most parameters are set by single dash: -text hello
, use double dash for boolean parameters: --number=false
, use space seperated strings for array types: -map a b c
. For special character, it must be wrapped with double quote, such as "-".
trigger Type | Description | Parameters |
---|---|---|
[button] | button down/up event | -id buttonId: Y, X, B, A, R-SR, R-SL, R, ZR, -, +, RStick, LStick, Home, Capture, ChargingGrip, Down, Up, Right, Left, L-SR, L-SL, L, ZL Note: a double quote is required for the button "-" |
[stick] | stick spinning event | -side which Joy-Con, "Left" or "Right" |
[gyro] | when gyroscope is enabled | |
[speech] | when the voice is recognized and returned as text |
action Type | Description | Parameters |
---|---|---|
[cursor] | move mouse cursor | -speed cursor move speed, float.>1 to increase, <1 to decrease |
[click] | mouse click | -button "left", "center", "right", "wheelDown", "wheelUp", "wheelLeft", "wheelRight", default: "left"-double is double click, default: false |
[hotkey] | single key press or combination | -keys array of keyse.g. "-keys enter" or "-keys t control alt" Note: "t" first, then "control alt" key list |
[notify] | show a system notification | -title title string-text text body-icon path of icon |
[speech] | execute a speech, words in a sentence can be executed in different ways, which can be configured in section WordMapping | -number convert number words to numeric digits, e.g. "twenty twenty two" -> "2022", default: true-nospace remove space between words, for programming, default: true-typing the word is typed if no other mapped executer handles it(like a hotkey), default: true-map an array of group id in WordMapping, these mapping groups are used to handle this words, see that section for detail.e.g. "-map desktop_hotkey golang python" |
[speak] | used for complex task that cannot be done in a single action, works by simulating a speech text which will be handled by the above [speech] action | -text speech text to be executed |
[flush] | this currently works by sending a chunk of zero data to speech engine, the engine may consider the zeroes as a long period of silence, hence it stops waiting for more voice input and returns result quicker. Only use this with limited phrase list, otherwise it can cause stuck behavior as it doesn't return result until next speech. | |
[repeat] | repeat last action |
switch Type | Description | Parameters |
---|---|---|
[button] | switched on when button down, off when button up | -id buttonId |
[stick] | switched on when stick moves to the edge, off when leaving that edge | -side "Left" or "Right"-dir direction: Up/Down/Left/Right |
modifier Type | Description | Parameters |
---|---|---|
[mode] | switch to another mode | -id modeId |
[boost] | speed up/down cursor movement | -multiplier float number, >1 to speed up, <1 to slow down |
[camel] [title] [snake] [upper] |
convert speech text to different case by adding a prefix | |
[prefix] | add custom prefix to the speech text | -prefix prefix string-space add a space between prefix and origin text, default: true |
3. Phrase List
A dictionary for limiting the speech model, only the words in the list are recognized. For example:
alphabet = ['a', 'b', 'c', ...'zed']
golang = ['package', 'switch', ...'']
Different groups can be used together for different speech modes. e.g. -phrase alphabet golang java
If you found some words conflict a lot, like 4
and for
, remove the for
from the phrase list, map 4 loop
-> for
, or forever
->for
in the mapping section below. Then the conflict is avoided:
- when you say
4
, it types4
- when you say
4 loop
orforever
, it typesfor
4. Word Mapping
A sentence is splitted to many words and executed by different executors, which are registered in this WordMapping section, it can handle complex task like:
run calc.exe
(shell) -> delay 1 second
(delay) -> type "1+1"
(typing) -> press enter
(hotkey)
A list of executors:
word executor Type | Description | Parameters |
---|---|---|
normal words | if there is no tag([...]), it's simply a word replacement Empty or special word should be wrapped with double quote "". |
e.g. spring -> "fmt.Sprintf(" space -> " " |
[hotkey] | trigger a hotkey | The key combination a full key list e.g. launch terminal -> [hotkey] t control alt |
[shell] | execute a shell command | command and arguments e.g. launch brave -> [shell] "brave-browser" "--no-sandbox" "www.test.com" |
[delay] | delay some period | duration string e.g. sleep a while -> [delay] 1s |
[camel] [title] [upper] [snake] |
case decorator | e.g. camel -> [camel] elephant -> [camel] [title] ThisIsElephant |
[typing] | the word will be typed if not handled by other executors | |
[repeat] | repeat last speech |
Mappings are grouped and can be used together like -map programming application go
- Auto change mode when switch between applications
- Show the speech text directly on screen
- This is greatly inspired by Talon Voice
- VOSK
- The awesome dekuNukem/Nintendo_Switch_Reverse_Engineering
- All the Joy-Con protocol implementations: riking/joycon wazho/ns-joycon Davidobot/BetterJoy tomayac/joy-con-webhid looking-glass/joyconlib
MIT