Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ai dictation mode #48

Closed
wants to merge 11 commits into from
Closed

Ai dictation mode #48

wants to merge 11 commits into from

Conversation

C-Loftus
Copy link
Owner

  • Use system accessibility APIs to dynamically get the proper context and automatically fix dictation and all punctuation as you speak it

@C-Loftus C-Loftus marked this pull request as draft March 28, 2024 18:11
@C-Loftus
Copy link
Owner Author

@jaresty Opinions on this? I want to try and make it so that we can pass a lot of context to the model and that we can use Talon for the baseline speech to text and then we can still get the more specific formatting we want on stuff by using the model to fix up things like proper nouns and/or punctuation.

@C-Loftus
Copy link
Owner Author

We can also create model select or something similar to select a range in an editable text box by passing all the context to the model having it return the range, so we wouldn't need to highlight it. I think there is a ton of potential with accessibility APIs in general, but unfortunately this does mean some OS or beta/public release talon fragmentation.

@jaresty
Copy link
Collaborator

jaresty commented Mar 28, 2024

I think this is a great idea. One less step to correct dictation!

@4b11b4
Copy link

4b11b4 commented Apr 28, 2024

This is a rough idea but is there someway to leverage the work from https://github.com/OpenInterpreter/open-interpreter @C-Loftus

@C-Loftus
Copy link
Owner Author

This is a rough idea but is there someway to leverage the work from https://github.com/OpenInterpreter/open-interpreter @C-Loftus

Just curious do you have specific features in that repo you are looking for? @4b11b4 I am somewhat familiar with that, but not the specifics. This repo should have many of the same features but for voice. Since Talon packages in general are intended not to use external libraries, I've implemented most stuff from scratch.

For context (either you or anyone viewing this, this PR is sort of blocked at the moment since it relies upon Talon's accessibility bindings which aren't really documented and have dependencies on an underlying Rust library that sometimes doesn't behave as intended. Without being able to use these apis to pass additional surrounding context, real-time AI dictation fixes aren't particularly useful and it is just better to use model fix grammar as it is currently implemented

Let me know if you have other ideas or I am overlooking something you think could help this situation

@C-Loftus
Copy link
Owner Author

Closing this since it isn't really practical imo. Better to just use copilot or codeium. And axkit handles simpler context aware punctuation well on its own for macos, which would've been a big use case for this.

@C-Loftus C-Loftus closed this Jul 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants