An AI-based system designed to assist you with your legal questions.
An example of the input:
Corresponding output generated by the model:
First, you need to install Python packages used in the project:
pip install -r requirements.txt
If you are running the project on Google Colab, you can skip this part as it is written in the Jupyter Notebook.
Second, an OpenAI API Key is needed. You can easily create it by going through this page.
There is a web interface provided for you to interact with our pipeline. The simplest way to do so is by running the example file in the source directory. The embedding used for the retrieval part of the system is the default OpenAIEmbedding and the Language Model is GPT3.5. After running the Notebook, you are asked to enter your OpenAI API Key. After this, you need to wait a little bit for the running cells, and then a query box is presented to you. Write your question in the box and click the button. Your answer is generated after a few seconds.
- Chroma: Different ChromaDBs are presented in this folder. Main contains all laws, ArticleBased has the documents that are split based on each article of law, and the Naive is the one used in most RAG configurations, chunking without any semantics involved.
- Datasets: Different domains of law taken from the official site of the laws of the Islamic Republic.
- Evaluation/Business_law: A dataset of business questions and answers written by our team. There are three different files presented considering the difficulty of the questions.
- Results: final results of the project.
- 1_easy_labse: Evaluation of easy questions with LaBSE for document embeddings.
- 2_easy_labse_chunking: Same as the previous file, but with smart chunking.
- 3_business_all_questions_openai: The generated output for all types of questions, without LLM as the evaluator.
- 4_medium_openai: Evaluation of medium questions with OpenAI embeddings.
- 5_hard_openai_correctness: Evaluating the correctness of hard questions with the LLM only based on correctness.
- Source: source files of the project. For example, Chroma_Builder can be used for creating Document DB for any law.
- Utils: Utilities used for the project.
Three different document embedding models have been implemented in this project. LaBSE, OpenAI default embedding, and Fasttext as a base model.
Also, two different languge models are presented. MaralGPT 7B and GPT3.5. All outputs are generated from GPT.
Our results for different settings are shown here:
Model/Evaluating | Faithfulness | Answer Relevancy | Context Recall | Context Precision | Answer Correctness |
---|---|---|---|---|---|
Easy questions on LaBSE | .887 | 0.735 | .84 | .675 | .606 |
Easy questions on LaBSE with smart chunking | .737 | .804 | .811 | 546 | .619 |
Medium Questions on OpenAI | .895 | .799 | .917 | .85 | .677 |
Hard Questions on OpenAI | - | - | - | - | .645 |