This repository contains all the necessary scripts to set up and access SNOMED via Snowstorm and the SNOMED CT Browser on your own machine with just one single line of code.
In the background, it creates and spins up the following four containerized applications:
- A container with an elasticsearch database for storing and effectively accessing the ontology;
- A container with an instance of the Snowstorm API to access the SNOMED ontology of the database in a standardized way;
- A container with a script to load the ontology files into the database;
- A container with the SNOMED CT Browser for graphical display.
0) Prerequisites:
- An up-to-date Docker instance (tested with
20.10.17)
. git
installed.- A stable internet connection.
- Your own copy of the SNOMED 'RF2' release files as .zip file.
- Only for Linux users: Please provide Docker and its corresponding user group(s) read AND write access to your file system (specifically: To the folders with the input data plus all its subdirectories, and to the folder you will be storing the elasticsearch database files in, cf.
PATH_TO_ELASTIC_DB_STORAGE_LOCATION
in this README and in the example.env file).
Set the environment variables in .env
; spin up the containers with
docker compose up --build
and access through port 80
(Snowstorm) or 8080
(SNOMED CT Browser).
1) Clone this repository:
Use either of the following two commands:
git clone https://github.com/hpi-dhc/containerized-snowstorm.git
git clone [email protected]:hpi-dhc/containerized-snowstorm.git
2) Set the environment variables:
Before starting, folder paths, filenames, and SNOMED release details have to be specified in a separate .env file. The accompanying example.env in this repository may serve as a template for it, however it should be renamed to .env thereafter.
The first two path variables refer to the paths used to store the input (RF2 .zip files) and database data. They must be specified as absolute or relative paths on the host machine, with neither brackets nor quotations marks needed:
PATH_TO_ELASTIC_DB_STORAGE_LOCATION
: The path to the folder where the elasticsearch database files will be stored. This folder must first be created on the host machine before continuing with the next step of the setup. It will be a read-and-write volume, as the database needs to write its files to it during the ETL process. Subsequently, Docker has to be given read-and-write access to it (cf. prerequisites section above).PATH_TO_RF2_FILES
: The path to the folder where the zipped (!) SNOMED release files are stored. This folder will be mounted just for the ETL of the database, not for production. As such, it will be read-only so that the SNOMED files will remain unchanged.
The following two variables should be adapted to the actual SNOMED version which is planned to be used; again with all variable names not enclosed by either brackets or quotations marks:
rf2_filename
: Should be the full name of the zipped RF2 files including the file extension.release
: Should refer to the specific release of SNOMED (e.g., SNOMED-INT for the international version or SNOMED-US for the US-specific release).
From now on, make sure that the working directory is your repository path.
3) Build and compose orchestrated containers:
Use docker compose to run all four services:
a) The elasticsearch
database service,
b) the snowstorm
API service,
c) the ingest-rf2
ETL service, and
d) the snomedct-browser
service:
docker compose up --build
Building and downloading the images, spinning up the containers and executing the ETL scripts will take a considerable amount of time (50-70 minutes on a non-representative set of regular notebooks).
Wait for the process to finish: The ingest-rf2
container should output an
**** IMPORT SUCCESS ****
followed by a section with of the loaded version, and indicate in the last line of the terminal output that it exited with code 0
(A few comments):
- Closing the lid of a laptop or, more generally, sending the host machine to sleep might compromise the successful completion of this step.
- Not detaching (would be
-d
in the command above) from the terminal output will ensure traceability of the import status, but be prepared for a lot of repeated output from theelasticsearch
service, especially regarding security features (cf. Notes section below). - The terminal output of the
ingest-rf2
container periodically shows anexited with code 1
at the beginning. This is intended, as this container needs to fail (to restart) before both theelasticsearch
andsnowstorm
containers are up and running. A more elegant way would be to use await-for
script in theingest-rf2
container, but the outcome would be the same.
4) Shut down the orchestrated containers:
After finishing, shut down the orchestrated containers by executing the following command:
docker compose down
If you are using the same window/tab of your terminal, you first need to Ctrl
+ c
to detach from the process and come back to your working directory OR you open a new window/tab with the same working directory and execute the above command there.
Either way will work.
In any case, the database files will be persisted (at the location specified earlier) and accessed whenever you will start the containers anew.
5) Uncomment the ingest-rf2
service from the docker compose file
Before you start your containers again, make sure you uncomment the entire ingest-rf2
service section (lines 3-12) from the docker-compose.yml (e.g., in VS Code "Ctrl" + "/"
on Windows/Linux ("Cmd" + "/"
on Mac respectively).
6) Back up your database files
Just to be on the safe side, back up the entire folder at PATH_TO_ELASTIC_DB_STORAGE_LOCATION
, e.g., by simply using
cp -r PATH_TO_ELASTIC_DB_STORAGE_LOCATION PATH_TO_BACKUP
and replacing the capitalized variables with the respective absolute or relative paths on the host machine.
7) OPTIONAL: Set Snowstorm to be a database read-only service
The entire data loading process is triggered by the ingest-rf2
service, but is executed through the snowstorm
service against the elasticsearch database.
In order to prevent inadvertent changes to the database, comment out line 37 and uncomment line 38 of the docker-compose.yml file (which includes the --snowstorm.rest-api.readonly=true
flag in the entrypoint).
This will only allow API endpoints with reading capabilities to be accessible.
1) Start the orchestrated containers
IMPORTANT: Make sure you have uncommented the entire ingest-rf2
service section in the docker-compose.yml file as indicated in step 5 above.
Not following this procedure will likely result in exiting containers.
With this modified docker-compose.yml, execute
docker compose up -d
2) Start querying with a) Snowstorm or b) SNOMED CT Browser
a) Snowstorm:
Open a browser on your local machine and type localhost:8080
to access the Swagger UI interface of Snowstorm in the same way as you can access the Swagger UI from SNOMED International.
The variable branch
(if required) is MAIN
.
b) SNOMED CT Browser:
Open a browser on your local machine and type localhost:80
to access the SNOMED CT Browser graphical user interface to SNOMED, allowing you to access the ontology in the same way as through the publicly available SNOMED CT Browser.
A minor issue still is the naming of the version in the UI, currently stating 'undefined'.
Nevertheless, all features are available.
Other variants of accessing the API exist, cf. Notes section below.
Happy querying! :)
3) Shut down services after usage
If you are done querying and want to spare resources on your computer, stop the containers with
docker compose down
- The memory estimates of 24 GB for the
snowstorm
service (lines 37 ff. in docker-compose.yml) will ensure an uninterrupted loading process, but are likely an overestimation that has not been systematically determined in experiments. On the other hand, the default values from the Snowstorm repository have turned out to be on the lower end (for a notebook). You should also experiment with the memory allocation in the Docker settings, which should be larger than the combined allocated memory for all three/four services. - Instead of using the Swagger UI to access the Snowstorm API, you can access the API automatically via any other script, e.g., this excellent set of functions implemented in Python.
- Any option accessing Snowstorm will need to go through port
8080
:- Through
localhost:8080
if you are accessing the API from your local machine (host port number can be changed in the ports section of thesnowstorm
service in the docker-compose.yml file, e.g., when the port is already taken on your host machine); - Through
snowstorm:8080
after attaching to the Docker networkelastic
(network name can be changed in line 57 of the same file) or after directly including the new service into the docker-compose.yml file (however in these cases changing ports of thesnowstorm
service to be exposed inside the Docker network is not straightforward, as the port settings would also need to be changed within the container).
- Through
- Any option accessing Snowstorm will need to go through port
- If not otherwise specified, all commands are executed on the host machine with the working directory being the cloned repository.
- This repository is intended for local use only. Deployment in a production setting would require additional security mechanisms.
- One of these mechanisms would be to pin versions (e.g., of the
snomedct-browser
service if provided by the publisher) to ensure exact reproducibility. - The files in the
ingest
folder are based on the folder with the same name in the Nictiz repository (https://github.com/Nictiz/Nictiz-Snowstorm), but have undergone a few modifications (traced in the ingest.py). - The
docker-compose.yml
file is a heavy modification of the identically named file from the IHTSDO GitHub repository at https://github.com/IHTSDO/snowstorm - This project gratefully builds upon the existing work from Nictiz and IHTSDO/SNOMED International, who kindly provide their code under an Apache 2.0 license. So does this project. Nevertheless, all users are kindly invited to contribute to the project, specifically to leave a note to the author if you find parts of the code to be broken or the explanations in this README ambiguous.
Thanks to Thomas Harris for investigating and suggesting how to modify the ingest-rf2
service such that all database content is also visible to the snomedct-browser
service.
(Written by Jan Philipp Sachs; updated on August 22, 2022)