This project implements a robust and efficient client for the OpenAI Realtime API using WebSocket technology in PYTHON 😎. It is specifically designed to handle real-time audio processing and communication with the API, allowing for seamless interaction between users and advanced AI models. This client enables developers to create applications that can leverage the power of AI in real-time audio scenarios, enhancing user experiences across various domains.
- 🎤 Real-time audio capture from microphone: Capture audio input directly from your microphone with minimal latency.
- 🔊 Real-time audio playback of AI responses: Play back audio responses generated by the AI in real-time, providing immediate feedback to users.
- 🧠 Integration with OpenAI's gpt-4o-realtime-preview-2024-10-01 model: Utilize the latest advancements in AI technology for enhanced conversational capabilities.
- 🔄 Bi-directional audio streaming: Support for both sending audio to the API and receiving audio responses, enabling interactive conversations.
- 🛠️ Configurable audio settings: Customize audio parameters such as sample rate and channels to suit your application's needs.
- 📝 Text transcription of audio input: Automatically transcribe spoken audio into text, facilitating easier interaction and data processing.
- 🔒 Secure API key handling: Ensure that your OpenAI API key is managed securely, preventing unauthorized access.
To run this project, you will need the following:
- Python 3.7+: Ensure you have a compatible version of Python installed on your system.
- PyAudio: A library for audio input and output.
- websocket-client: A library for creating WebSocket clients in Python.
- An OpenAI API key: You must have access to the Realtime API through your OpenAI account.
Follow these steps to set up the project on your local machine:
-
Clone this repository:
git clone https://github.com/yourusername/openai-realtime-python-client.git
-
Install the required packages:
pip install pyaudio websocket-client
-
Set your OpenAI API key as an environment variable:
export OPENAI_API_KEY='your-api-key-here'
To run the script, execute the following command in your terminal:
python realtime_api.py
Once the script is running, it will start capturing audio from your microphone and streaming it to the OpenAI Realtime API. You can stop the script at any time using Ctrl+C
. Ensure that your microphone is set up correctly and that you have the necessary permissions to access it. For optimal performance, consider testing your audio input settings before starting the application.
We welcome contributions to this project! If you have suggestions for improvements, new features, or bug fixes, please open an issue or submit a pull request. Your contributions help us enhance the functionality and usability of the client.
This project is licensed under the MIT License. See the LICENSE file for more details on usage and distribution rights.
If you encounter any issues or have questions, feel free to reach out via the project's GitHub page or open an issue. We are here to help you get the most out of the OpenAI Realtime API Python Client.
Happy coding! 🎉