Parse Pubmed XML into JSON and load it into Elasticsearch. See the Downloading Pubmed
documentation for details on obtaining the XML files.
pubmed_es
requires Python 3.7 or higher and Elasticsearch 7. Clone this repo, create a Python venv
, and install the requirements.txt
:
git clone https://github.com/paul-sud/pubmed-es.git
cd pubmed-es
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
From the root of this repo, run the following to read and index the XML, where DATA_DIR
points to a folder containing the XML files:
python -m pubmed_es -d $DATA_DIR