This project aims to facilitate the retrieval of Land Matrix data through natural language queries.
This repository provides several resources:
- An end-to-end Streamlit application with optimal configuration. Explanations.
- A pipeline to reproduce our benchmark of models and methods. See below
- Educational notebooks that describe all the tasks needed for the entire pipeline. Explanations.
git clone https://github.com/tetis-nlp/landmatrix-graphql-python.git
-
Installation of the Python environment
conda create -n landmatrix python=3.9 pandas scikit-learn spacy streamlit conda activate landmatrix conda install -c conda-forge sentence-transformers pip install transformers faiss-cpu pip install ollama pip install langchain-openai pip install langchain-community pip install openpyxl
-
Downloading the Spacy model
python -m spacy download en_core_web_sm
-
Installation and launch of Ollama
curl -fsSL https://ollama.com/install.sh | sh ollama serve ollama pull llama3:8b
-
Configure API keys (only compatible with chat ISDM): add your own ISDM API keys (without
"
)cp credentials.ini.default credentials.ini vim credentials.ini
python src/experiments.py
- Monitore your pipeline :
tail -f logs/pipeline.log
- Stop the pipeline: Kill all the subprocess:
pkill -f src/