AI Tinkerers October Hackathon Show & Tell 🧠

A hackathon competition repository to develop LLM-as-Judge solution to automate the output scoring return by any Large Language Model.

💫 Getting Started

Setting Up

Fork this repository.
Install dependencies with pip install -r requirements.txt.
PR for changes approval.

Benchmarking

Refer to notebook - 03_benchmark_malaysian_mistral_llmasajudge_v2.ipynb in /notebooks-benchmarking-exercises
Login Weight & Bias to setup Weave logger. Get the API key to setup the project.
Change the call_llm function according to the model apply (OpenAI, Huggingface, Gemini, etc.)
Run the notebook for benchmarking.

📂 File Structure

.
├── datasets
├── miscellaneous
├── notebooks-benchmarking-exercises
├── notebooks-data-preparation
├── notebooks-finetuning-models
├── .gitignore
└── requirements.txt

Detail

datasets/ : Dataset use for finetune and benchmarking.
notebooks-benchmarking-exercises/ : Notebook for benchmarking.
notebooks-data-preparation/ : Notebok for finetune data preparation.
notebooks-finetuning-models/ : Notebook for model finetuning.

📈 Progress

🤖 AI Tinkerers Hackathon - Supa Team WeRecooked 🤖

📋 Overview

The AI Tinkerers Hackathon Project - Supa Team WeRecooked is an initiative aimed at building and benchmarking AI models, particularly focusing on developing Large Language Model (LLM) Judges. The project covers dataset preparation, benchmarking, finetuning models, and creating a user-friendly interface to showcase our results using Streamlit.

🗂️ Directory Structure

The project structure is as follows:

.
├── README.md
├── datasets
│   ├── boolq-english-train.jsonl
│   ├── fib-malay-openai.jsonl
│   └── for_presentation/
├── miscellaneous
│   └── AIT_Problemstatement2_SUPA.pdf
├── notebooks-benchmarking-exercises
│   ├── 03_benchmark_openaimini4_0_llmasajudge_v1_v2.ipynb
│   └── 03_benchmark_malaysian_mistral_llmasajudge_v2.ipynb
├── notebooks-data-preparation
│   ├── 01_dataset_prep_boolq_openai.ipynb
│   └── archive_01_dataset_prep_fib_t5.ipynb
├── notebooks-finetuning-models
│   ├── 02_finetune_v1_malaysian_debertav2_base.ipynb
│   └── 02_finetune_v2_malaysian_mistral_7b_32k_instructions_v4.ipynb
└── requirements.txt

📁 Key Components

🔍 Data Preparation: Includes notebooks and scripts for preparing datasets such as BoolQ and FIB for different models (e.g., OpenAI and T5).
📊 Benchmarking: Focuses on evaluating the performance of various AI models such as OpenAI Mini 4.0 and Mistral LLM as Judges.
🔧 Model Finetuning: Contains code to finetune models like Malaysian DeBERTaV2 and Mistral LLM.

📊 Visual Elements and Data

📁 Datasets: Processed and raw datasets are available for English and Malay human preferences.
📉 Benchmarking Notebooks: Jupyter notebooks that contain the evaluation of models using different datasets.
🖥️ Interactive Interface: Results will be visualized and shared using Streamlit.

🔄 Project Workflow

📂 Environment Setup:
- Set up a virtual environment and install required dependencies using requirements.txt.
🔨 Data Processing:
- Prepare datasets for specific tasks, including translation and task-specific formatting.
🚀 Model Finetuning:
- Fine-tune models on preprocessed datasets and evaluate performance.
📊 Benchmarking:
- Benchmark models using evaluation notebooks to assess effectiveness in judging tasks.
🌐 Deployment:
- Deploy results to an interactive Streamlit app for presentation.

🎉 Conclusion

This project showcases the effectiveness of AI models like OpenAI Mini 4.0 and Mistral in the task of LLM Judges, providing a platform for performance evaluation and model comparison. By fine-tuning models on human-preference datasets, we aim to develop more accurate AI models.

🔮 Future Enhancements

📈 Advanced Benchmarking: Expand model benchmarking to include more datasets and AI models.
🤖 Further Finetuning: Apply more advanced techniques for finetuning models to improve performance.
🌐 Enhanced UI: Improve the Streamlit UI for better interaction and visualization.

📚 References

📜 License

Supa Team WeRecooked License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Tinkerers October Hackathon Show & Tell 🧠

💫 Getting Started

Setting Up

Benchmarking

📂 File Structure

Detail

📈 Progress

🤖 AI Tinkerers Hackathon - Supa Team WeRecooked 🤖

📋 Overview

Table of Contents

🗂️ Directory Structure

📁 Key Components

📊 Visual Elements and Data

🔄 Project Workflow

🎉 Conclusion

🔮 Future Enhancements

📚 References

📜 License

About

Releases

Packages

Contributors 5

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
datasets		datasets
miscellaneous		miscellaneous
notebooks-benchmarking-exercises		notebooks-benchmarking-exercises
notebooks-data-preparation		notebooks-data-preparation
notebooks-finetuning-models		notebooks-finetuning-models
.gitignore		.gitignore
README.md		README.md

wanadzhar913/aitinkerers-hackathon-supa-team-werecooked

Folders and files

Latest commit

History

Repository files navigation

AI Tinkerers October Hackathon Show & Tell 🧠

💫 Getting Started

Setting Up

Benchmarking

📂 File Structure

Detail

📈 Progress

🤖 AI Tinkerers Hackathon - Supa Team WeRecooked 🤖

📋 Overview

Table of Contents

🗂️ Directory Structure

📁 Key Components

📊 Visual Elements and Data

🔄 Project Workflow

🎉 Conclusion

🔮 Future Enhancements

📚 References

📜 License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages