Diaries of The Upheaval - A Princess Zelda QA and Audio Response Bot

This project features a QA system in which Princess Zelda from The Legend of Zelda series responds to queries related to "Tears of the Kingdom." The database is extracted from Youtube Creator Zeltik's videos as following:

Zelda: Tears of the Kingdom - Story Explained part 1 Duration: 1:21:23
Zelda: Tears of the Kingdom - Story Explained part 2 Duration: 1:02:31
Zelda: Tears of the Kingdom - Story Explained part 3 Duration: 1:06:26
7 Secrets & Lore Details in Tears of the Kingdom Duration: 13:16
Ganondorf’s Seal Explained - Zelda: Tears of the Kingdom Lore Duration: 7:29
Tears of the Kingdom: A Disappointing Masterpiece Duration: 2:11:28
Ganondorf in Tears of the Kingdom: Lore, History & Speculation Duration: 24:04

The bot is designed to stay in character, using a regal tone and retrieving relevant information from stored YouTube transcripts. With multilingual support, this project also includes voice synthesis to provide audio responses in Zelda’s voice.

Features

Multilingual Support: Responses available in multiple languages (English, Spanish, French, German, Portuguese, Italian).
Audio Transcription: Whisper model used for audio transcription of user input.
Language Detection: Language detection via langdetect to tailor responses appropriately.
Vector Database with ChromaDB: Embedding-based search for accurate, contextually relevant responses.
Natural Language Processing with OpenAI API: Uses GPT-3.5 to generate Princess Zelda responses in a character-consistent style.
Speech Synthesis with ElevenLabs: Generates audio responses using ElevenLabs API for a more immersive experience.
Gradio Interface: User-friendly interface for both text and audio inputs, with an enhanced Hyrule-themed design.

Getting Started

Clone the Repository:

git clone <repository-url>
cd <repository-name>

Install Dependencies: Install required libraries from the requirements.txt file:
```
pip install -r requirements.txt
```

API Keys:

Store your OpenAI and Eleven Labs API keys in a .env file:

OPENAI_API_KEY=your_openai_key_here
ELEVEN_LABS_API_KEY=your_elevenlabs_key_here

Run the Application: Launch the application using Gradio:
```
python app.py
```

Project Structure

Language-Based Prompts: Custom prompts for each language to ensure accurate, in-character responses.
Video Transcript Processing: Retrieves YouTube video transcripts to create a comprehensive, searchable database of text embeddings.
Text Chunking and Embedding: Splits transcripts into manageable chunks and stores embeddings in a ChromaDB collection.
Multimodal QA System: Integrates Whisper transcription and ElevenLabs TTS to provide both text and audio responses.
Evaluation Module: Includes Giskard evaluation for response accuracy.

Code Structure

Key Functions and Components

Language Detection:

def detect_language(text):
    try:
        return detect(text)
    except:
        return "en"  # Defaults to English if detection fails

Transcript Collection and Preprocessing:

def get_transcript(video_id):
    try:
        transcript = YouTubeTranscriptApi.get_transcript(video_id, languages=['en-GB'])
        transcript_text = " ".join([item['text'] for item in transcript])
        return re.sub(r'\s+', ' ', transcript_text).strip()
    except Exception as e:
        print(f"Error retrieving transcript for video {video_id}: {e}")
        return None

Embedding Storage in ChromaDB:

def store_transcript_embeddings(video_id, transcript_text):
    text_chunks = split_text(transcript_text)
    for i, chunk in enumerate(text_chunks):
        embedding = openai.Embedding.create(input=[chunk], model="text-embedding-ada-002")['data'][0]['embedding']
        chunk_id = f"{video_id}_chunk_{i}"
        collection.add(ids=[chunk_id], embeddings=[embedding], metadatas=[{'video_id': video_id, 'chunk_index': i, 'text': chunk}])

QA Model and Retrieval System:

def generate_multi_query_response(user_query):
    detected_language = detect_language(user_query)
    prompt = multi_query_processing(user_query)
    zelda_formality = "Speak with reverence of the past, for the history of Hyrule is sacred..."
    enhanced_prompt = f"{get_prompt(detected_language)}\n\nQuestion: {user_query}\n\nDatabase Context:\n{prompt}\n\n{zelda_formality}"
    response = openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=[{"role": "user", "content": enhanced_prompt}], max_tokens=150, temperature=0.7, stream=True)
    response_text = "".join(chunk['choices'][0].get('delta', {}).get('content', "") for chunk in response)
    return response_text

Gradio Interface Setup:

gr_interface = gr.Interface(
    fn=handle_input,
    inputs=[gr.Textbox(label="What brings you here, beloved Hyrulean?"), gr.Audio(source="microphone", type="filepath", label="Or record your question")],
    outputs=[gr.Textbox(label="Response Text"), gr.Audio(label="Response Audio")],
    title="Diaries of The Upheaval",
    description="Ask me anything about the events of Tears of the Kingdom, and hear a response in Princess Zelda's voice."
)

Usage

Ask Princess Zelda a Question:
- Input text or record audio to ask a question about "Tears of the Kingdom."
Receive a Response:
- A text and audio response will be generated based on Zelda’s perspective, including contextually relevant information.
Evaluation:
- Check responses using the built-in Giskard evaluation system to validate answers against expected responses.

Contributing

Fork the Project.
Create your Feature Branch (git checkout -b feature/AmazingFeature).
Commit your Changes (git commit -m 'Add some AmazingFeature').
Push to the Branch (git push origin feature/AmazingFeature).
Open a Pull Request.

License

Distributed under the MIT License. See LICENSE for more information.

Acknowledgments

Special thanks to:

Zeltik for his great content on Legend of Zelda (visit https://www.youtube.com/@Zeltik)
OpenAI for API access.
ElevenLabs for voice synthesis.
Gradio for a user-friendly interface.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
Diaries of the Upheaval Presentation.pdf		Diaries of the Upheaval Presentation.pdf
README.md		README.md
evaluation_results.csv		evaluation_results.csv
requirements.txt		requirements.txt
totk_qa.ipynb		totk_qa.ipynb
totk_qa.py		totk_qa.py
transcripts.txt		transcripts.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Diaries of The Upheaval - A Princess Zelda QA and Audio Response Bot

Features

Getting Started

Project Structure

Code Structure

Key Functions and Components

Usage

Contributing

License

Acknowledgments

About

Releases

Packages

Languages

fredsmeds/Diaries-of-the-Upheaval-

Folders and files

Latest commit

History

Repository files navigation

Diaries of The Upheaval - A Princess Zelda QA and Audio Response Bot

Features

Getting Started

Project Structure

Code Structure

Key Functions and Components

Usage

Contributing

License

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages