Made by Jørgen Kristiansen Sandhaug and Henrik Skog
Whisper Assistant is a highly efficient and customizable voice-powered assistant designed to boost your productivity by seamlessly converting speech to text and performing a variety of tasks through voice commands. Built on the cutting-edge OpenAI Whisper API and the flexible LangChain framework, Whisper Assistant offers an easy-to-use interface for real-time voice transcription. It is also possible to integrate with large language models to execute complex tasks in a voice conversation fashion by creating custom agents in LangChain.
- Highly Accurate Voice Transcription: Utilizes the OpenAI Whisper API for the most accurate, real-time speech-to-text conversion for any language.
- Customizable Shortcuts: Easily set up your own keyboard shortcuts to activate the transcription feature or start talking to different assistants, making it effortlessly accessible anytime.
- Versatile Voice Commands: Execute tasks, manipulate text, or control your terminal directly with your voice using the LangChain framework.
- Optimized for Efficiency: Smartly stitches together audio segments to minimize costs, reduce transcription time, and improve accuracy, even in speech with gaps.
- Visual Feedback: A status icon in the menu bar shows the app's current state—whether it's recording, processing, or ready for your next command.
- Designed for macOS: Tailored for use on Apple Silicon Macs, ensuring smooth operation and compatibility.
- Python 3.10 (or earlier possibly earlier) (cchardet does not seem to work well with Python 3.11 on Apple Silicon Macs).
- An OpenAI API key for using the Whisper API.
-
Clone the repository:
git clone https://github.com/Futhark-AS/whisper_assistant.git cd whisper_assistant
-
Setup Conda Environment: (This is the recommended way to set up the environment, as there has been many problems with the py2app build when not using conda)
- Create a new conda environment:
conda create -n whisper-assistant python=3.10 conda activate whisper-assistant conda install pip
- Create a new conda environment:
-
Install Dependencies:
pip install -r requirements.txt
-
Resolve Dependencies:
- For Apple Silicon Mac users encountering
libffi
issues:Follow the post-installation instructions from Homebrew to set up the necessary environment variables.brew install libffi
- For Apple Silicon Mac users encountering
-
Setup the environment:
- Create a
.env
file in the project root. - Add your OpenAI API key:
OPEN_AI_API_KEY=your_api_key_here
.
- Create a
-
Configure Your Shortcut:
- Copy the
shortcuts.py.template
file from theconfig
folder. - Rename it to
shortcuts.py
and customize it with your preferred keyboard shortcut. Place this file in theconfig
folder.
- Copy the
-
Build the Application:
python setup.py py2app -A
-
Allow your terminal to monitor input and microphone:
- If on Mac, go to
System Preferences
->Security & Privacy
->Privacy
->Accessibility
. - Add your terminal to the list of apps.
- If on Mac, go to
-
Run Whisper Assistant:
./dist/whisperGPT.app/Contents/MacOS/whisperGPT
- Press one of your configured shortcuts to start recording.
- Whisper Assistant will transcribe your speech and copy the text to your clipboard
- If the shortcut you pressed is for an action/assistant, the will run with the transcribed text as input.
We welcome contributions and suggestions! Feel free to fork the repository, make your changes, and submit a pull request. For major changes or questions, please open an issue first to discuss what you would like to change.
Distributed under the MIT License. See LICENSE
for more information.
- OpenAI Whisper API for the speech-to-text engine.
- LangChain framework for enabling complex command executions.