InkVisitor uses rethinkdb for storing data. This no-SQL open-source database was considered mainly based on the nature of data models in the project (mostly JSON, schemaless structures with sets of in-app customized conditions). Currently, the database stores the following collections (tables):
- users
- user data: login, password, starred territories
- acl_permissions
- consists of rules for accessing resources (endpoints) by different requestors (groups)
- more fine-grained permissions (ownerships) over entities are defined in respective user entries
- entities
- holds data mentioned in [section](### Entity types).
- relations
- various implementations of logic between multiple (2-n) entities, ie. synonyms
- audits
- log entries for changes made to the entities table
- each entity entry has 0-n audit entries
- documents
- large blobs of text data with encoded tags for referencing entities
The project uses several environments (./env/X.env
) and each of them has a dedicated database namespace (inkvisitor
, inkvisitor_staging
etc). The database can be run in two possible ways - 1) inside the docker or 2) as a separate process / service.
Rethinkdb can be run as a containerized instance. Installed docker + [docker compose are requirements.
Run docker-compose up -d database
from the root directory.
Follow tutorials on the official page.
The database main script is built as CLI
application which will guide you through the import process.
If you are running the import for the first time, make sure all the packages are installed by typing pnpm i
. Then, run pnpm start
in this folder to run the app and by entering the respective key
from the menu choose the desired action.
Before you start, copy the .env.example
into your local .env file and fill variables inside (SSH*
variables are optional).
Import example (this will remove and import the database anew):
- if prompted whether to use the SSH connection, use
n
+<enter>
to stay in the local environment - choose dataset by entering
D
+<enter>
, then choose one of the datasets by entering the respective number or name (ie.1
), confirm with<enter>
- use
X
+<enter>
to run the import
To switch between local -> remote hosts, just provide SSH*
variables. If provided successfully, you will be prompted to confirm that you are connecting via an SSH tunnel.
You can run specialized jobs by typing J
. These jobs are single purposes actions, ie. fixing bad import dates.
pnpm import:remote
pnpm import:remote-data-import
- etc
In the datasets directory, you can find different groups of import data. Respective files are referenced in scripts. For common data (acl entries/users), you can use files stored in default directory.
Rethinkdb
comes with the rethinkdb-dump
tool, which creates a snapshot according to the provided arguments. Normally you would need to call this tool periodically in crontab
. You can use script backup.sh
, which does the following:
- deletes outdated backup files older than 3 days but keeps files for the first day of each month
- runs in cycle for each database (names are provided in the named array) and creates a snapshot with name
backup_YYYY_MM_DD_DBNAME.tar.gz
Cron can be set like this:
crontab -e
- add line
0 0 * * * <path to sh script> >> <path to logfile> 2>&1
To be sure our backup files are stored securely, we can use some cloud storage.
To keep it simple, we are using gcloud
in combination with a free tool - rclone.
Sync it like rclone sync archives remote:inkvisitor-backup
- see sync.sh script, which could be also called with cron.
TODO
Deployment files are stored in $APPS_DIR/neo4j
:
docker-compose.yml
for easier containerized deployment usingpodman-compose
utilityplugins
dir for optional plugins, namely APOC
Container uses 2 volumes by default:
- neo4jdata (
/data-disks/dbds/neo4j
) for core db data, mounted in/data
directory inside the container - neo4jplugins (
$APPS_DIR/neo4j/plugins
) for plugins, mounted in/plugins
directory inside the container
These volumes should not be removed, but if required, use podman volume rm <name>
, where name is hashed string (podman does not work with named volumes as required). To retrieve info about volume, use podman volume list
and podman volume inspect <name>
to retrieve details.
podman-compose up -d neo4j
podman-compose stop neo4j
+ podman container prune
(optional)
Don't remove
Use http://<machine ip>:7474/browser/
. Default user is neo4j
.