diff --git a/Projects/3-Advanced/SpeechToText.md b/Projects/3-Advanced/SpeechToText.md new file mode 100644 index 000000000..4a225ca17 --- /dev/null +++ b/Projects/3-Advanced/SpeechToText.md @@ -0,0 +1,48 @@ +# Speech To Text + +**Tier:** 3-Advanced + +**Description:** +Speech To Text is a transcription and text modification application designed for users who want to transcribe audio or video files, make modifications to the transcription, and save it directly to Google Docs. It combines the power of OpenAI's Whisper for high-accuracy transcription with Groq's model for flexible text modification. This app is a streamlined solution for creating, customizing, and storing transcriptions, making it useful for content creators, researchers, and more. + +### Purpose +The purpose of this application is to provide a seamless interface for transcription, text modification, and saving edited transcriptions to Google Docs, reducing manual transcription and editing time. + +### Required Resources +- OpenAI's Whisper model for transcription. +- Groq API for text modification. +- Google Docs API for saving transcriptions directly to Google Docs. + +--- + +## User Stories + +- [ ] User can upload an audio or video file for transcription. +- [ ] User can view the transcription of the uploaded file within the web interface. +- [ ] User can modify the transcription by entering specific instructions. +- [ ] User can view the modified transcription. +- [ ] User can save the modified transcription to Google Docs with a single click. +- [ ] User can access the link to the Google Docs document once it is saved. + +## Bonus Features + +- [ ] User can edit the transcription in real-time as the file is being transcribed. +- [ ] User can view a history of all transcriptions and modifications for reference. +- [ ] User can choose different transcription models for enhanced or faster transcriptions. +- [ ] User can download the transcription or modified text in multiple formats (e.g., .txt, .docx). + +--- + +## Useful Links and Resources + +- [Whisper GitHub Repository](https://github.com/openai/whisper): For transcription. +- [Groq API Documentation](https://docs.groq.com): For text modifications. +- [Google Docs API](https://developers.google.com/docs/api): For saving and managing transcriptions in Google Docs. + +## Example Projects + +- [Otter.ai](https://otter.ai): Provides real-time transcription and editing options for audio. +- [Descript](https://www.descript.com): A transcription tool with editing and video capabilities. +- [Rev](https://www.rev.com): A transcription service with options for downloadable transcripts and API support. + +These example projects offer similar features but may focus on different user needs or industry-specific requirements. The [Textify - SaiAryan](https://github.com/SaiAryan1784/GenAi-hackathon-Blurock-textify) project stands out by integrating direct text modification and Google Docs integration.