Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create SpeechToText.md #865

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions Projects/3-Advanced/SpeechToText.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Speech To Text

**Tier:** 3-Advanced

**Description:**
Speech To Text is a transcription and text modification application designed for users who want to transcribe audio or video files, make modifications to the transcription, and save it directly to Google Docs. It combines the power of OpenAI's Whisper for high-accuracy transcription with Groq's model for flexible text modification. This app is a streamlined solution for creating, customizing, and storing transcriptions, making it useful for content creators, researchers, and more.

### Purpose
The purpose of this application is to provide a seamless interface for transcription, text modification, and saving edited transcriptions to Google Docs, reducing manual transcription and editing time.

### Required Resources
- OpenAI's Whisper model for transcription.
- Groq API for text modification.
- Google Docs API for saving transcriptions directly to Google Docs.

---

## User Stories

- [ ] User can upload an audio or video file for transcription.
- [ ] User can view the transcription of the uploaded file within the web interface.
- [ ] User can modify the transcription by entering specific instructions.
- [ ] User can view the modified transcription.
- [ ] User can save the modified transcription to Google Docs with a single click.
- [ ] User can access the link to the Google Docs document once it is saved.

## Bonus Features

- [ ] User can edit the transcription in real-time as the file is being transcribed.
- [ ] User can view a history of all transcriptions and modifications for reference.
- [ ] User can choose different transcription models for enhanced or faster transcriptions.
- [ ] User can download the transcription or modified text in multiple formats (e.g., .txt, .docx).

---

## Useful Links and Resources

- [Whisper GitHub Repository](https://github.com/openai/whisper): For transcription.
- [Groq API Documentation](https://docs.groq.com): For text modifications.
- [Google Docs API](https://developers.google.com/docs/api): For saving and managing transcriptions in Google Docs.

## Example Projects

- [Otter.ai](https://otter.ai): Provides real-time transcription and editing options for audio.
- [Descript](https://www.descript.com): A transcription tool with editing and video capabilities.
- [Rev](https://www.rev.com): A transcription service with options for downloadable transcripts and API support.

These example projects offer similar features but may focus on different user needs or industry-specific requirements. The [Textify - SaiAryan](https://github.com/SaiAryan1784/GenAi-hackathon-Blurock-textify) project stands out by integrating direct text modification and Google Docs integration.