This is a ckan extension that implements the Electronic Metadata Catalog for South Africa's Department of Agriculture, Land Reform and Rural Development (DALRRD). It also contains additional utilities, useful for running the full EMC.
Dataset fields are defined with the help of the ckan scheming extension.
The dataset schema file can be found in ckanext/dalrrd_emc_dcpr/scheming/dataset_schema.yaml
. It has the definition
of the EMC dataset metadata fields, which conform with the South African spatial metadata standard (SANS1878)
This project is deployed onto the following environments:
- testing đźź - https://testing.emc.kartoza.com
- staging: TBD
- production: TBD
Deployment details are kept elsewhere.
While this project can be installed standalone, it is primarily meant to be used together with docker.
Ideally, you should be able to pull prebuilt images from dockerhub:
https://hub.docker.com/r/kartoza/ckanext-dalrrd-emc-dcpr
docker pull kartoza/ckanext-dalrrd-emc-dcpr:main
Alternatively, you can also build the image locally by using the provided build script:
cd docker
./build.sh
After having the image, use it to create containers. In order to be properly
recognized, your config files must be mounted at /home/appuser/ckan.ini
and /home/appuser/who.ini
. For example, when running standalone:
docker run \
--rm \
--volume=/home/myuser/my-ckan.ini:/home/appuser/ckan.ini \
--volume=/home/myuser/who.ini:/home/appuser/who.ini \
kartoza/ckanext-dalrrd-emc-dcpr:main
The provided Dockerfile
has the following peculiarities:
-
It requires you to mount the ckan configuration files in order to work;
-
Uses poetry to install Python packages and manage their environment;
-
A custom docker entrypoint script implemented in Python. It has access to the poetry env and can be called by running
poetry run docker_entrypoint
; -
The entrypoint script waits for ckan's environment to be available, including waiting some time for the database, solr and redis services to become available. However, it does not perform automatic database migrations nor static files refresh;
-
Uses gunicorn as the Python app server.
To install ckanext-dalrrd-emc-dcpr, make sure CKAN is already installed on your virtual environment if not, follow the
https://docs.ckan.org/en/2.9/maintaining/installing/install-from-source.html
guide to install CKAN, then follow the below steps:
-
Activate your CKAN virtual environment, for example:
. /usr/lib/ckan/default/bin/activate
-
Clone the source and install it on the virtualenv
git clone https://github.com/Kartoza/ckanext-dalrrd-emc-dcpr.git cd ckanext-dalrrd-emc-dcpr pip install -e . pip install -r requirements.txt
-
Add
dalrrd-emc-dcpr
to theckan.plugins
setting in your CKAN config file (by default the config file is located at/etc/ckan/default/ckan.ini
). -
Start CKAN:
ckan -c /etc/ckan/default/ckan.ini run
Run the extension's specific migrations using the following commands to upgrade and downgrade the ckan database respectively.
docker exec -t emc-dcpr_ckan-web_1 poetry run ckan db upgrade -p dalrrd_emc_dcpr
docker exec -t emc-dcpr_ckan-web_1 poetry run ckan db downgrade -p dalrrd_emc_dcpr
# check if there are any datasets that are not indexed
ckan search-index check
# re-index
ckan search-index rebuild
ckan spatial extents
ckan dalrrd-emc-dcpr bootstrap create-sasdi-themes
ckan dalrrd-emc-dcpr bootstrap create-iso-topic-categories
ckan dalrrd-emc-dcpr bootstrap create-sasdi-organizations
This needs to be run periodically (once per day is enough). Be sure to run both commands depicted below.
ckan tracking update
ckan search-index rebuild --refresh
You may use the various ckan harvester <command>
commands to operate existing
harvesters
Create a job:
docker exec -ti emc_dcpr-ckan_harvesting-runner poetry run ckan harvester job <source-id>
This needs to be run periodically (once per hour is likely enough).
ckan dalrrd-emc-dcpr send-email-notifications
Additionally, in order for notifications to work, there is some configuration:
- The CKAN settings must have
ckan.activity_streams_email_notifications = true
- The CKAN settings must have the relevant email configuration (likely being passed as environment variables)
- Each user must manually choose to receive notification e-mails - This is done in the user's profile
- Each user must manually follow those entities (datasets, organizations, etc) that it finds interesting enough in order to be notified of changes via email
There is a CLI command that allows opening a Python shell already configured with the
CKAN environment. This is analogous to django's manage.py shell
command. Start it up with:
ckan shell
This needs to be run periodically (once per hour is likely enough).
ckan dalrrd-emc-dcpr pycsw refresh-materialized-view
It is strongly suggested that you use the provided docker-compose related files for development. They set the following up:
-
Bind mounts the code inside the relevant container(s) so that changes are instantly available inside them;
-
Uses an automatically reloading web server, so that whenever the code changes, the server reloads too;
-
Uses a common ckan configuration file with suitable settings for development - this file is located at
docker/ckan-dev-settings.ini
. It also includes thedocker/who.ini
file, which is another configuration file required by CKAN. The providedckan-dev-settings.ini
:- sets
debug = True
, which causes the ckan frontend to also load the flask debug toolbar
- sets
-
Exposes the following ports to the host machine:
-
ckan web: 5000
-
ckan database: 55432
-
datastore database: 55433
-
solr: 8983
-
-
Uses docker named volumes for storing the ckan database, datastore database, ckan storage and solr data
-
Makes it straightforward to run tests
Additionally, we suggest you use the provided docker/compose.py
helper script
to stand up and wind down the stack.
cd docker
# bring the stack up
./compose.py --compose-file docker-compose.yml --compose-file docker-compose.dev.yml up
# shut it down
./compose.py --compose-file docker-compose.yml --compose-file docker-compose.dev.yml down
# restart services (for example the ckan-web service)
./compose.py --compose-file docker-compose.yml --compose-file docker-compose.dev.yml restart ckan-web
After starting the stack, the ckan web interface is available (after a few moments) on your local machine at
NOTE: The compose file does not try to build the images. You either
build them yourself (with the provided build.sh
script, as mentioned above) or they are pulled
from the registry (if they exist remotely).
The first time you launch it you will need to set up the ckan database (since the ckan image's entrypoint explicitly does not take care of this, as mentioned above). Run the following command:
docker exec -ti emc-dcpr_ckan-web_1 poetry run ckan db init
Afterwards, proceed to run any migrations required by the ckanext-dlarrd-emc-dcpr extension
docker exec -ti emc-dcpr_ckan-web_1 poetry run ckan db upgrade --plugin dalrrd_emc_dcpr
Now you should be able to go to http://localhost:5000
and see the ckan
landing page. If not, you may need to reload the ckan web app after
performing the DB initialization step. This can done by sending the HUP
signal to the gunicorn application server (which is running our ckan
flask app):
docker exec -ti emc-dcpr_ckan-web_1 bash -c 'kill -HUP 1'
After having initialized the database you can now create the first CKAN sysadmin user.
docker exec -ti emc-dcpr_ckan-web_1 poetry run ckan sysadmin add admin
Answer the prompts in order to provide the details for this new user.
After its successful creation you can login to the CKAN site with the admin
user.
The datastore DB requires the creation of a readonly user. The commands to do this are sent directly to
the datastore-db
service by means of mounting a custom script inside the
container's /docker-entrypoint-initdb.d
directory. This means that the DB is initialized automatically
when the container is created.
As mentioned in the postgres docker docs, the DB initialization script is only ran if the container's data directory is empty. This means that if there is already a pre-existing DB, the script will not be executed. If needed, remove the volume that has the DB's data directory and then initialize the container again - THIS WILL TRASH YOUR DB!
docker volume rm emc-dcpr_datastore-db-data
Run the following command in order to have additional extensions correctly get their DB tables created:
docker exec -ti emc-dcpr_ckan-web_1 poetry run ckan spatial initdb
docker exec -ti emc-dcpr_ckan-web_1 poetry run ckan harvester initdb
docker exec -ti emc-dcpr_ckan-web_1 poetry run ckan pages initdb
The ckanext-spatial extension takes care of its own bootstrapping and will create any database tables automatically. However, you may want to bootstrap explicitly. If so, run:
docker exec -ti emc-dcpr_ckan-web_1 poetry run ckan spatial initdb
Additionally, the spatial extension documentation seems to be outdated when it comes to
running its custom CKAN CLI commands. Instead
of the older paster
-based incantation, they should rather be ran like:
poetry run ckan spatial <command>
In order to be able to serve the system's datasets through various OGC standards, create a DB materialized view in order to integrate with pycsw:
docker exec -ti emc-dcpr_ckan-web_1 poetry run ckan dalrrd-emc-dcpr pycsw create-materialized-view
Create the default items required by the EMC/DCPR system by running the bootstrap commands as described in the Operations section
You can issue ckan commands inside the container by making sure they are run
with poetry. This can be done with docker exec
oneliners like this:
docker exec -ti emc-dcpr_ckan-web_1 poetry run ckan --help
Alternatively, you can open a bash shell inside the container use poetry run
there:
docker exec -ti emc-dcpr_ckan-web_1 bash
poetry run ckan --help
Lastly, you can open a bash shell, then open a poetry shell, which is similar to activating its virtualenv, and then run commands, like this:
docker exec -ti emc-dcpr_ckan-web_1 bash
poetry shell
ckan --help
The CKAN database is kept in a docker volume named emc-dcpr_ckan-db-data
. If you need to recreate the DB you
can remove this docker volume. Do the following:
- If needed, wind down the docker-compose stack;
- Remove the DB volume with
docker volume rm emc-dcpr_ckan-db-data
- Start the docker-compose stack again
- Run the DB initialization command
- Bootstrap the system again
CKAN, the base of the SASDI EMC stack, uses bootstrap version 3.4.1. The main
CSS file is generated with Less and what is distributed are the compiled .css
files.
In order to hook into those Less files and have an easier way to define global variables and styles we need to install some additional dependencies and set up a CSS building pipeline.
This is done by following the steps below:
-
Start the docker-compose stack with the development files
-
Run the provided
docker/prepare-for-frontend-dev.sh
script. This will install node.js inside the running container, then use npm to install the dependencies mentioned in thepackage.json
file and immediately start watching for changes:docker exec -ti emc-dcpr_ckan-web_1 bash docker/prepare-for-frontent-dev.sh
-
Now you may edit the
public/base/less
files and reload your web browser to see the changes
The assets/css/dalrrd-emc-dcpr.css
file can be used to write custom CSS directly. Editing this
file can be done when the changes do not involve modifying Less variables - it also does not require nodejs to be
installed
This project uses a Continuous Integration strategy whereby each commit (either to main
or via some PR)
is checked by an automated github workflow. This performs several checks:
- Lint the code with black
- Perform static analysis by running the code through mypy
- Build the docker image
- Run automated tests
Generally, in order for a PR to be accepted it must pass these automated checks.
In order to avoid waiting around for the pipeline to find issues, it is advisable to install
pre-commit and use the provided .pre-commit-config.yaml
file to ensure that
at least the linting and static analysis checks are run as git pre-commit hooks. It is also advisable to run
the tests locally, before pushing your changes to github (see the next section for instructions on running tests).
This project also uses a Continuous Deployment pipeline where each commit to the main
branch results in
the redeployment of our testing environment.
To install ckanext-dalrrd-emc-dcpr for development, activate your CKAN virtualenv and do:
git clone https://github.com/Kartoza/ckanext-dalrrd-emc-dcpr.git
cd ckanext-dalrrd-emc-dcpr
python setup.py develop
pip install -r dev-requirements.txt
Testing uses some additional configuration:
- The
docker/docker-compose.dev.yml
file has an additionalckan-tests-db
service, with a DB that is uses solely for automated testing. - The
docker/ckan-test-settings.ini
file defines the test settings. It must be explicitly passed as the config file to use when running the tests
To run the tests you will need to:
-
Install the development dependencies beforehand, as the docker images do not have them. Run:
docker exec -ti {container-name} poetry install
-
Initialize the db - this is only needed the first time (the dev stack uses volumes to persist the DB)
docker exec -ti emc-dcpr_ckan-web_1 poetry run ckan --config docker/ckan-test-settings.ini db init docker exec -ti emc-dcpr_ckan-web_1 poetry run ckan --config docker/ckan-test-settings.ini harvester initdb docker exec -ti emc-dcpr_ckan-web_1 poetry run ckan --config docker/ckan-test-settings.ini db upgrade -p dalrrd_emc_dcpr
-
When there are model changes you will need to upgrade the DB too. Run this:
docker exec -ti emc-dcpr_ckan-web_1 poetry run ckan --config docker/ckan-test-settings.ini db upgrade -p dalrrd_emc_dcpr
-
Run the tests with
pytest
. We use markers to differentiate between unit and integration tests. Run them like this:# run all tests poetry run pytest --ckan-ini docker/ckan-test-settings.ini # run only unit tests poetry run pytest --ckan-ini docker/ckan-test-settings.ini -m unit # run only integration tests poetry run pytest --ckan-ini docker/ckan-test-settings.ini -m integration
-
Using httpie to check for existing records on the local pycsw test service:
http localhost:55436 \ service==CSW \ version==2.0.2 \ request==GetRecords \ resulttype==results \ typenames=gmd:MD_Metadata \ outputschema==http://www.isotc211.org/2005/gmd \ elementsetname==brief
-
Create a CKAN harvester for the local docker-based pycsw service:
- URL:
http://csw-harvest-target:8000
- Source type:
CSW Server
- Update frequency:
Manual
- Configuration:
{"default_tags": ["csw", "harvest"]}
- Organization:
test-org-1
- URL:
To run any of the above docker commands once this is deployed into Kubernetes you can use one of 3 ways:
- By accessing Rancher:
- Go to Rancher2 > Shared (in header bar) > EMC-DCPR
- Go to the "ckan" Workload
- On a running pod click on the menu and select "Execute Shell"
- By Using Lens:
- Go to Workloads > Deployments.
- Choose the correct namespace: "emc-dcpr"
- Select the Deployment "ckan"
- Scroll to Pods
- Select a running one
- Click on "Pod Shell".
- Using the Kubernetes CLI
- Get the pod name:
kubectl get pods --namespace=emc-dcpr
, it should look likeckan-<randon string>
- Run
kubectl exec -it <pod name> --namespace=emc-dcpr -- bash
- Or an all in one:
kubectl exec -it "$(kubectl get pods --namespace=emc-dcpr | grep Running | grep ckan- | grep -v postgis | cut -d' ' -f1)" --namespace=emc-dcpr -- bash
The system is using the crisp chatbox to allow gathering feedback from users. Configure it at the crisp website
There is some support for importing legacy SASDI EMC datasets. For now, it is available in the form of additional CLI commands:
-
ckan legacy-sasdi saeon-odp import-records
- These commands use the legacy SASDI EMC SAEON-ODP platform and rely on the records being in the DataCITE format. For now there is no provision for downloading records from some remote server -
ckan legacy-sasdi csw <command>
- These commands use the legacy SASDI EMC CSW endpoint and retrieve records via CSWckan legacy-sasdi csw dowload-records
ckan legacy-sasdi csw import-records
ckan legacy-sasdi csw retrieve-thumbnails