mkdir c19_project
cd c19_project
git clone https://github.com/MrMimic/covid-19-applet
python3 --version
Should be >= 3.6.1. Update if needed.
python3 -m venv venv
source venv/bin/activate
Use the kaggle API:
pip install kaggle
kaggle datasets download allen-institute-for-ai/CORD-19-research-challenge
Clone the lib repository:
git clone https://github.com/MrMimic/covid-19-kaggle
Install the developped kaggle lib:
pip install -q git+https://github.com/MrMimic/covid-19-kaggle
Make modifications in the script: covid-19-kaggle/src/main/scripts/create_db.py
- L:17 local_path="articles_database.sqlite"
- L:18 kaggle_data_path="kaggle_data"
Should be changed by your local paths to wanted DB file and downloaded dataset.
At the moment, let run_on_kaggle and only_covid params to True, otherwise the resulting DB will weight 22Go instead of 700Mo.
Run the database creation:
python3 covid-19-kaggle/src/main/scripts/create_db.py
pip install poetry
poetry install
Make change to server.py. It needs the DB you just created and the path to the pre-trained word2vec coming with the repository of the lib code:
- L15: LOCAL_DB_PATH = "<path_to_trained_db>"
- L16: LOCAL_EMBEDDING_PATH = "<path_to_cloned_library>/covid-19-kaggle/resources/global_df_w2v_tfidf.parquet"
Then run server:
python3 server.py
Go to http://:5000.
Watch out, data are stored in cache for only 10min right now. First time you launch the server and a new query after 10 min of inactivity will last ~30sec.
- Open port if distant host