From f32ff19cb194a690efc4ebe2fa3e3282d7dcb95d Mon Sep 17 00:00:00 2001
From: imsdu When creating file resources, users can optionally provide custom metadata to be indexed and therefore searchable. This takes the form of a All the API calls modifying a file (creation, update, tagging, deprecation) can specify whether the file should be indexed synchronously or in the background. This behaviour is controlled using Example Example To support files stored in the cloud, Delta allows users to register files already uploaded to S3. This is useful primarily for large files where uploading directly through Delta using HTTP is inefficient and expensive. There are two use cases: registering an already uploaded file by specifying its path, and asking Delta to generate a path in its standard format. This endpoint accepts a path and creates a new file resource based on an existing S3 file. … where Example Users can use delegation for files that cannot be uploaded through Delta (e.g. large files). Here Delta will provide bucket and path details for the upload. Users are then expected to upload the file using other methods, and call back to Delta to register this file with the same metadata that was initially validated. The three steps are outlined in detail below. Delta accepts and validates the following payload. … where It then generates the following details for the file: The user is expected to upload their file to This payload is then signed using the flattened JWS serialization format to ensure that Delta has validated and generated the correct data. This same payload will be passed when creating the file resource. The Example Using the bucket and path from the previous step, the file should be uploaded to S3 by whatever means are appropriate. The only restriction is that this must be finished before the expiry datetime of the signed payload. Once the file has been uploaded to S3 at the specified path, the file resource can be created. The payload can be passed back exactly as it was returned in the previous step. Delta will verify that the signature matches the payload and that the expiry date is not passed. Then the file will be registered as a resource. The usual file resource response will be returned with all the standard metadata and file location details. Example From Delta 1.5, it is possible to fetch SSEs for all files or just files in the scope of an organization or a project.list of incoming links
_outgoing
: address to query to obtain the list of outgoing linksCustom file metadata
+metadata
field containing a JSON Object with one or more of the following fields:
+
name
: a string which is a descriptive name for the file. It will be indexed in the full-text search.description
: a string that describes the file. It will be indexed in the full-text search.keywords
: a JSON object with Label
keys and string
values. These keywords will be indexed and can be used to search for the file.Indexing
indexing
query param, which can be one of two values:
@@ -359,10 +389,7 @@
custom file metadata.
x-nxs-file-content-length
: the size of the uploaded file:{filename}
: String - the name that will be given to the file during linking. This field is optional. When not specified, the original filename is retained.
{mediaType}
: String - the MediaType fo the file. This field is optional. When not specified, Nexus Delta will attempt to detect it.{tagName}
an optional label given to the linked file resource on its first revision.{metadata}
: JSON Object - optional object containing one or more of the following fields:name
: a string which is a descriptive name for the file. It will be indexed in the full-text search.description
: a string that describes the file. It will be indexed in the full-text search.keywords
: a JSON object with Label
keys and string
values. These keywords will be indexed and can be used to search for the file.{metadata}
: JSON Object - Optional, see custom file metadata.
@@ -593,10 +617,7 @@
.
Delegation & Registration (S3 only)
+Register external file
+
+POST /v1/files/{org_label}/{project_label}/register/{file_id}?storage={storageId}&tag={tagName}
+ {
+ "path": "{path}",
+ "mediaType": "{mediaType}",
+ "metadata": {metadata}
+ }
+
+
+{path}
: String - the relative path to the file from the root of S3.{mediaType}
: String - Optional MIME specifying the file type. If omitted this will be inferred by S3.{metadata}
: JSON Object - Optional, see custom file metadata.
+
+source
curl -X POST \
+ -H "Content-Type: application/json" \
+ "http://localhost:8080/v1/files/myorg/myproject/register/myfile?storage=mys3storage" -d \
+ '{
+ "path": "relative/path/to/myfile.png",
+ "mediaType": "image/png",
+ "metadata": {
+ "name": "My File",
+ "description": "a description of the file",
+ "keywords": {
+ "key1": "value1",
+ "key2": "value2"
+ }
+ }
+ }'
source
{
+ "@context" : [
+ "https://bluebrain.github.io/nexus/contexts/files.json",
+ "https://bluebrain.github.io/nexus/contexts/metadata.json"
+ ],
+ "@id" : "http://delta:8080/v1/resources/hkrxvoxdyiiev1p/03vctrp70usfehq/_/iwa41vwspwke6nx",
+ "@type" : "File",
+ "_bytes" : 29625,
+ "_constrainedBy" : "https://bluebrain.github.io/nexus/schemas/files.json",
+ "_createdAt" : "2024-06-14T12:44:36.525177Z",
+ "_createdBy" : "http://delta:8080/v1/realms/test-vuaplrsvrbkpkhca/users/byfxikrrdlmmvsvv",
+ "_deprecated" : false,
+ "_digest" : {
+ "_algorithm" : "SHA-256",
+ "_value" : "05bf442810213b9e5fecd5242eefeff1f3d207913861c96658c75ccf58997e87"
+ },
+ "_filename" : "myfile.png",
+ "_incoming" : "http://delta:8080/v1/files/hkrxvoxdyiiev1p/03vctrp70usfehq/http:%2F%2Fdelta:8080%2Fv1%2Fresources%2Fhkrxvoxdyiiev1p%2F03vctrp70usfehq%2F_%2Fiwa41vwspwke6nx/incoming",
+ "_location" : "relative/path/to/myfile.png",
+ "_mediaType" : "image/png",
+ "_origin" : "External",
+ "_outgoing" : "http://delta:8080/v1/files/hkrxvoxdyiiev1p/03vctrp70usfehq/http:%2F%2Fdelta:8080%2Fv1%2Fresources%2Fhkrxvoxdyiiev1p%2F03vctrp70usfehq%2F_%2Fiwa41vwspwke6nx/outgoing",
+ "_project" : "http://delta:8080/v1/projects/hkrxvoxdyiiev1p/03vctrp70usfehq",
+ "_rev" : 1,
+ "_self" : "http://delta:8080/v1/files/hkrxvoxdyiiev1p/03vctrp70usfehq/http:%2F%2Fdelta:8080%2Fv1%2Fresources%2Fhkrxvoxdyiiev1p%2F03vctrp70usfehq%2F_%2Fiwa41vwspwke6nx",
+ "_storage" : {
+ "@id" : "https://bluebrain.github.io/nexus/vocabulary/mys3storage",
+ "@type" : "S3Storage",
+ "_rev" : 3
+ },
+ "_updatedAt" : "2024-06-14T12:44:36.525177Z",
+ "_updatedBy" : "http://delta:8080/v1/realms/test-vuaplrsvrbkpkhca/users/byfxikrrdlmmvsvv",
+ "_uuid" : "79695062-ecbb-42dc-a62d-61c2c02be129"
+}
Delegating file uploads
+1. Validate and generate path for file delegation
+
+POST /v1/delegate/files/{org_label}/{project_label}/validate?storage={storageId}
+ {
+ "filename": "{filename}",
+ "mediaType": "{mediaType}",
+ "metadata": {metadata}
+ }
+
+
+{filename}
: String - mandatory name given to the file within the generated path.{mediaType}
: String - Optional MIME specifying the file type. If omitted this will be inferred by S3.{metadata}
: JSON Object - Optional, see custom file metadata.
+ {
+ "bucket": "<s3 bucket>",
+ "id": "<file resource identifier>",
+ "path": "<path from s3 root>",
+ "mediaType": "<user provided mediaType>",
+ "metadata": {}
+ }
+
path
within bucket
. The id
is reserved for when the file resource is created. mediaType
and metadata
are what the user specified in the request.
+ {
+ "payload": "<base64 encoded payload contents>",
+ "protected": "<integrity-protected header contents>",
+ "signature": "<signature contents>"
+ }
+
payload
field can be base64 decoded to access the generated file details. Note that protected
contains an expiry field exp
with the datetime at which this signature will expire (in epoch seconds). To view this can also be base64 decoded.
+
+source
curl -X POST \
+ -H "Content-Type: application/json" \
+ "http://localhost:8080/v1/delegate/files/myorg/myproject/validate?storage=mys3storage" -d \
+ '{
+ "filename": "myfile.png",
+ "mediaType": "image/png",
+ "metadata": {
+ "name": "My File",
+ "description": "a description of the file",
+ "keywords": {
+ "key1": "value1",
+ "key2": "value2"
+ }
+ }
+ }'
source
{
+ "protected" : "eyJleHAiOjE3MTg2Mjg1NDUsImFsZyI6IlJTMjU2In0",
+ "payload" : "eyJidWNrZXQiOiJqNDc4M2NjNWFkY2F1a3UiLCJwYXRoIjoibXlwcmVmaXgvYTE0bTYxZWgzajVuNzVjLzhrbXlpb2QyMzM1MnhpZC9maWxlcy9iLzIvOS80LzMvYy9jLzYvaG5qeXR3b2x0ZnVyc2JlaCIsIm1lZGlhVHlwZSI6ImltYWdlL2RhbiIsIm1ldGFkYXRhIjp7ImRlc2NyaXB0aW9uIjoicnByZndzdWpveGJpaXllaSIsImtleXdvcmRzIjp7ImtvZXZvd3Bsc3Jld2NlZGIiOiJ2dnNlaW9wa3Z5Y3d1dG1oIn0sIm5hbWUiOiJmbW94dHh3a3JvdWhtZG90In0sImlkIjoiaHR0cDovL2RlbHRhOjgwODAvdjEvcmVzb3VyY2VzL2ExNG02MWVoM2o1bjc1Yy84a215aW9kMjMzNTJ4aWQvXy9hNGE5ODE2Mi1jYTNmLTQ2YmUtYWUxMC00MzJjYTZmOTdjNGYifQ",
+ "signature" : "htQLymfQIrks7MejErb4v3mnhQT2W6iPZdkd7LVBPJ0Ksybj8XG8dbTJH5pjZJF7-HXi848R14tquZ6iSeXpEqFGiZYge8obPQRLpJA0qbc9Mmhlq-CTbIdsy5OFpdzcDadSj6_k_kzuU2PR-Fli9GtH-34z2d4C9dWsBmnUo_IA3dvSFCF_PaQuajo7cJYa_0yc4VVGKG-xYi9yV4ylD5D2cxMUDFun78NOKDD_2upF-kuf9t5E-NjCl0DffkelbqYuH6nMop2zmwfu-cwHnChaDwKM7HLJGLomD5duU5sq-mVsunnMy58NgzMecLGDbER-27zk7w0TwxkXTKfxWg"
+}
2. Upload file to S3
+3. Create delegated file resource
+
+POST /v1/delegate/files/{org_label}/{project_label}?storage={storageId}
+ {
+ "payload": "<base64 encoded payload contents>",
+ "protected": "<integrity-protected header contents>",
+ "signature": "<signature contents>"
+ }
+
+
source
curl -X POST \
+ -H "Content-Type: application/json" \
+ "http://localhost:8080/v1/delegate/files/myorg/myproject?storage=mys3storage" -d \
+ '{
+ "protected" : "eyJleHAiOjE3MTg2Mjg1NDUsImFsZyI6IlJTMjU2In0",
+ "payload" : "eyJidWNrZXQiOiJqNDc4M2NjNWFkY2F1a3UiLCJwYXRoIjoibXlwcmVmaXgvYTE0bTYxZWgzajVuNzVjLzhrbXlpb2QyMzM1MnhpZC9maWxlcy9iLzIvOS80LzMvYy9jLzYvaG5qeXR3b2x0ZnVyc2JlaCIsIm1lZGlhVHlwZSI6ImltYWdlL2RhbiIsIm1ldGFkYXRhIjp7ImRlc2NyaXB0aW9uIjoicnByZndzdWpveGJpaXllaSIsImtleXdvcmRzIjp7ImtvZXZvd3Bsc3Jld2NlZGIiOiJ2dnNlaW9wa3Z5Y3d1dG1oIn0sIm5hbWUiOiJmbW94dHh3a3JvdWhtZG90In0sImlkIjoiaHR0cDovL2RlbHRhOjgwODAvdjEvcmVzb3VyY2VzL2ExNG02MWVoM2o1bjc1Yy84a215aW9kMjMzNTJ4aWQvXy9hNGE5ODE2Mi1jYTNmLTQ2YmUtYWUxMC00MzJjYTZmOTdjNGYifQ",
+ "signature" : "htQLymfQIrks7MejErb4v3mnhQT2W6iPZdkd7LVBPJ0Ksybj8XG8dbTJH5pjZJF7-HXi848R14tquZ6iSeXpEqFGiZYge8obPQRLpJA0qbc9Mmhlq-CTbIdsy5OFpdzcDadSj6_k_kzuU2PR-Fli9GtH-34z2d4C9dWsBmnUo_IA3dvSFCF_PaQuajo7cJYa_0yc4VVGKG-xYi9yV4ylD5D2cxMUDFun78NOKDD_2upF-kuf9t5E-NjCl0DffkelbqYuH6nMop2zmwfu-cwHnChaDwKM7HLJGLomD5duU5sq-mVsunnMy58NgzMecLGDbER-27zk7w0TwxkXTKfxWg"
+ }'
source
{
+ "@context" : [
+ "https://bluebrain.github.io/nexus/contexts/files.json",
+ "https://bluebrain.github.io/nexus/contexts/metadata.json"
+ ],
+ "@id" : "http://delta:8080/v1/resources/fcp5xcw67hittxm/loeu5kgu48i8tw7/_/1205b629-2a73-4891-abff-140a64e9c391",
+ "@type" : "File",
+ "description" : "a description of the file",
+ "name" : "My File",
+ "_bytes" : 29625,
+ "_constrainedBy" : "https://bluebrain.github.io/nexus/schemas/files.json",
+ "_createdAt" : "2024-06-14T12:53:38.516222Z",
+ "_createdBy" : "http://delta:8080/v1/realms/test-pmcfwjobxkuvebro/users/fwhbqlnnhqncpsqk",
+ "_deprecated" : false,
+ "_digest" : {
+ "_algorithm" : "SHA-256",
+ "_value" : "05bf442810213b9e5fecd5242eefeff1f3d207913861c96658c75ccf58997e87"
+ },
+ "_filename" : "myfile.png",
+ "_incoming" : "http://delta:8080/v1/files/fcp5xcw67hittxm/loeu5kgu48i8tw7/http:%2F%2Fdelta:8080%2Fv1%2Fresources%2Ffcp5xcw67hittxm%2Floeu5kgu48i8tw7%2F_%2F1205b629-2a73-4891-abff-140a64e9c391/incoming",
+ "_keywords" : {
+ "key1" : "value1",
+ "key2" : "value2"
+ },
+ "_location" : "myprefix/fcp5xcw67hittxm/loeu5kgu48i8tw7/files/f/f/9/5/3/4/c/2/ngbeydtjyibvcanj",
+ "_mediaType" : "image/png",
+ "_origin" : "External",
+ "_outgoing" : "http://delta:8080/v1/files/fcp5xcw67hittxm/loeu5kgu48i8tw7/http:%2F%2Fdelta:8080%2Fv1%2Fresources%2Ffcp5xcw67hittxm%2Floeu5kgu48i8tw7%2F_%2F1205b629-2a73-4891-abff-140a64e9c391/outgoing",
+ "_project" : "http://delta:8080/v1/projects/fcp5xcw67hittxm/loeu5kgu48i8tw7",
+ "_rev" : 1,
+ "_self" : "http://delta:8080/v1/files/fcp5xcw67hittxm/loeu5kgu48i8tw7/http:%2F%2Fdelta:8080%2Fv1%2Fresources%2Ffcp5xcw67hittxm%2Floeu5kgu48i8tw7%2F_%2F1205b629-2a73-4891-abff-140a64e9c391",
+ "_storage" : {
+ "@id" : "https://bluebrain.github.io/nexus/vocabulary/mys3storage",
+ "@type" : "S3Storage",
+ "_rev" : 3
+ },
+ "_updatedAt" : "2024-06-14T12:53:38.516222Z",
+ "_updatedBy" : "http://delta:8080/v1/realms/test-pmcfwjobxkuvebro/users/fwhbqlnnhqncpsqk",
+ "_uuid" : "c96a229c-8d1d-46a5-8f47-08555fd98cb3"
+}
Server Sent Events
GET /v1/files/events # for all file events in the application
diff --git a/snapshot/docs/delta/api/resources-api.html b/snapshot/docs/delta/api/resources-api.html
index 2cb95d2f80..cd372c2766 100644
--- a/snapshot/docs/delta/api/resources-api.html
+++ b/snapshot/docs/delta/api/resources-api.html
@@ -355,8 +355,9 @@
section to learn more about it.
Remote contexts are only resolved during creates and updates. That means that when those get updated, the resources importing them must be also updated to take them into account in a new version.
The json payload for create and update operations cannot contain keys beginning with underscore (_), as these fields are reserved for Nexus metadata
Remote contexts are only resolved during creates and updates. That means that when those get updated, the resources importing them must be also updated to take them into account in a new version.
A generic resource can not have a type belonging to the Nexus vocabulary (https://bluebrain.github.io/nexus/vocabulary/).
+Moreover it can not include any field starting with underscore (_) at the root level as these fields are reserved for Nexus metadata.
When using the endpoints described on this page, the responses will contain global metadata described on the Nexus Metadata page. In addition, the following resource specific metadata can be present
If the redirect to Fusion feature is enabled and if the Accept
header is set to text/html
, a redirection to the fusion representation of the resource will be returned.
GET /v1/schemas/{org_label}/{project_label}/{schema_id}/source?rev={rev}&tag={tag}
+GET /v1/schemas/{org_label}/{project_label}/{schema_id}/source?rev={rev}&tag={tag}&annotate={annotate}
where …
{rev}
: Number - the targeted revision to be fetched. This field is optional and defaults to the latest revision.
{tag}
: String - the targeted tag to be fetched. This field is optional.
+ {annotate}
: Boolean - annotate the response with the schema metadata.
{rev}
and {tag}
fields cannot be simultaneously present.
+If {annotate}
is set, the metadata is injected alongside with the original payload where the ones from the original payload take precedence. The context in the original payload is also amended with the metadata context.
Example
- Request
diff --git a/snapshot/docs/delta/api/search-api.html b/snapshot/docs/delta/api/search-api.html
index 240ed4c6cd..98dc7defa9 100644
--- a/snapshot/docs/delta/api/search-api.html
+++ b/snapshot/docs/delta/api/search-api.html
@@ -313,12 +313,13 @@ … where {payload}
is a Elasticsearch query and the response is forwarded from the underlying Elasticsearch indices.
Query a suite
Nexus Delta allows to configure multiple search suites under plugins.search.suites
. Each suite is composed of one or more projects. When querying using a suite, the query is only performed on the underlying Elasticsearch indices of the projects in the suite.
-POST /v1/search/query/suite/{suiteName}
+POST /v1/search/query/suite/{suiteName}?addProject={project}
{payload}
… where:
{suiteName}
is the name of the suite
+ {project}
: Project - can be used to extend the scope of the suite by providing other projects under the format org/project
. This parameter can appear multiple times, expanding further the scope of the search.
{payload}
is a Elasticsearch query and the response is forwarded from the underlying Elasticsearch indices.
Configuration
diff --git a/snapshot/docs/delta/api/storages-api.html b/snapshot/docs/delta/api/storages-api.html
index ca49435bf3..a705c7b660 100644
--- a/snapshot/docs/delta/api/storages-api.html
+++ b/snapshot/docs/delta/api/storages-api.html
@@ -341,6 +341,8 @@ While typically not necessary, you can manage and create additional disk storages, provided you are aware of the local file-system structure and that Nexus has read and write access to the target folder.
{
"@type": "DiskStorage",
+ "name": "{name}",
+ "description": "{description}",
"default": "{default}",
"volume": "{volume}",
"readPermission": "{read_permission}",
@@ -350,6 +352,8 @@
…where
+ {name}
: String - A name for this storage. This field is optional.
+ {description}
: String - A description for this storage. This field is optional.
{default}
: Boolean - the flag to decide whether this storage is going to become the default storage for the target project or not.
{volume}
: String - the path to the local file-system volume where files using this storage will be stored. This field is optional, defaulting to the configuration flag plugins.storage.storages.disk.default-volume
(/tmp
).
{read_permission}
: String - the permission a client must have in order to fetch files using this storage. This field is optional, defaulting to the configuration flag plugins.storage.storages.disk.default-read-permission
(resources/read
).
@@ -363,6 +367,8 @@ In order to be able to use this storage, the configuration flag plugins.storage.storages.remote-disk.enabled
should be set to true
. More information about configuration
{
"@type": "RemoteDiskStorage",
+ "name": "{name}",
+ "description": "{description}",
"default": "{default}",
"folder": "{folder}",
"readPermission": "{read_permission}",
@@ -372,6 +378,8 @@
…where
+ {name}
: String - A name for this storage. This field is optional.
+ {description}
: String - A description for this storage. This field is optional.
{default}
: Boolean - the flag to decide whether this storage is going to become the default storage for the target project or not.
{folder}
: String - the storage service bucket where files using this storage are going to be saved.
{read_permission}
: String - the permission a client must have in order to fetch files using this storage. This field is optional, defaulting to the configuration flag plugins.storage.storages.remote-disk.default-read-permission
(resources/read
).
@@ -383,6 +391,9 @@ In order to be able to use this storage, the configuration flag plugins.storage.storages.amazon.enabled
should be set to true
.
{
"@type": "S3Storage",
+ "name": "{name}",
+ "description": "{description}",
+ "bucket": "{name}",
"default": "{default}",
"readPermission": "{read_permission}",
"writePermission": "{write_permission}",
@@ -391,6 +402,9 @@
…where
+ {name}
: String - A name for this storage. This field is optional.
+ {description}
: String - A description for this storage. This field is optional.
+ {bucket}
: String -The AWS S3 bucket this storage points to. This field is optional, defaulting to the configuration flag plugins.storage.storages.amazon.default-endpoint
.
{default}
: Boolean - the flag to decide whether this storage is going to become the default storage for the target project or not.
{read_permission}
: String - the permission a client must have in order to fetch files using this storage. This field is optional, defaulting to the configuration flag plugins.storage.storages.amazon.default-read-permission
(resources/read
).
{write_permission}
: String - the permission a client must have in order to create files using this storage. This field is optional, defaulting to the configuration flag plugins.storage.storages.amazon.default-write-permission
(files/write
).
diff --git a/snapshot/docs/delta/api/version.html b/snapshot/docs/delta/api/version.html
index 35c6c25a8d..785804600f 100644
--- a/snapshot/docs/delta/api/version.html
+++ b/snapshot/docs/delta/api/version.html
@@ -273,8 +273,8 @@ <
"delta": "1.10.0",
"dependencies": {
"blazegraph": "2.1.6-RC",
- "postgresql": "15.7",
- "elasticsearch": "8.13.3",
+ "postgresql": "16.3",
+ "elasticsearch": "8.14.1",
"remoteStorage": "1.10.0"
},
"plugins": {
diff --git a/snapshot/docs/getting-started/running-nexus/configuration/index.html b/snapshot/docs/getting-started/running-nexus/configuration/index.html
index 724b148e57..267aafcf30 100644
--- a/snapshot/docs/getting-started/running-nexus/configuration/index.html
+++ b/snapshot/docs/getting-started/running-nexus/configuration/index.html
@@ -444,6 +444,24 @@ S3 storage configuration
+
Delta’s S3 storage integration supports users uploading files to S3 independently and then registering them within Delta.
+However, Delta is still responsible for the structure of the bucket so it issues a path to clients via the delegation validation endpoint (TODO link). This involves signing and later verifying a payload using JWS.
+To support this functionality Delta must be configured with an RSA private key:
+ amazon {
+ enabled = true
+ default-endpoint = "http://s3.localhost.localstack.cloud:4566"
+ default-access-key = "MY_ACCESS_KEY"
+ default-secret-key = "CHUTCHUT"
+ default-bucket = "mydefaultbucket"
+ prefix = "myprefix"
+ delegation {
+ private-key = "${rsa-private-key-new-lines-removed}"
+ token-duration = "3 days"
+ }
+ }
+
+To generate such a key in the correct format follow these steps: 1. Generate RSA key: openssl genrsa -out private_key.pem 2048
2. Convert to PKCS#8 format: openssl pkcs8 -topk8 -inform PEM -outform PEM -in private_key.pem -out private_key_pkcs8.pem -nocrypt
3. Remove line breaks, copy secret: cat private_key_pkcs8.pem | tr -d '\n' | pbcopy
Archive plugin configuration
The archive plugin configuration can be found here.
Jira plugin configuration
diff --git a/snapshot/docs/getting-started/running-nexus/docker/docker-compose.yaml b/snapshot/docs/getting-started/running-nexus/docker/docker-compose.yaml
index dedb32ed98..a9aade7d81 100644
--- a/snapshot/docs/getting-started/running-nexus/docker/docker-compose.yaml
+++ b/snapshot/docs/getting-started/running-nexus/docker/docker-compose.yaml
@@ -1,11 +1,10 @@
-version: "3.3"
services:
delta:
depends_on:
- blazegraph
- elasticsearch
- postgres
- image: bluebrain/nexus-delta:1.10.0-M5
+ image: bluebrain/nexus-delta:1.10.0-M13
environment:
DELTA_PLUGINS: "/opt/docker/plugins/"
DELTA_EXTERNAL_CONF: "/config/delta.conf"
@@ -23,7 +22,7 @@ services:
memory: 1024M
elasticsearch:
- image: "docker.elastic.co/elasticsearch/elasticsearch:8.13.3"
+ image: "docker.elastic.co/elasticsearch/elasticsearch:8.14.1"
environment:
discovery.type: "single-node"
bootstrap.memory_lock: "true"
@@ -38,7 +37,7 @@ services:
memory: 512M
postgres:
- image: library/postgres:15.7
+ image: library/postgres:16.3
environment:
POSTGRES_USER: "postgres"
POSTGRES_PASSWORD: "postgres"
diff --git a/snapshot/docs/getting-started/running-nexus/index.html b/snapshot/docs/getting-started/running-nexus/index.html
index 413998d176..254a29d5bf 100644
--- a/snapshot/docs/getting-started/running-nexus/index.html
+++ b/snapshot/docs/getting-started/running-nexus/index.html
@@ -349,13 +349,13 @@ Run Nexus locally with Docker
Requirements
Docker
-Regardless of your OS, make sure to run a recent version of the Docker Engine. This was tested with version 20.10.23. The Docker Engine, along the Docker CLI, come with an installation of Docker Desktop. Visit the official Docker Desktop documentation for detailed installation steps.
+Regardless of your OS, make sure to run a recent version of the Docker Engine. This was tested with version 26.0.0. The Docker Engine, along the Docker CLI, come with an installation of Docker Desktop. Visit the official Docker Desktop documentation for detailed installation steps.
Command :
docker --version
Example :
$ docker --version
-Docker version 20.10.23, build 7155243
+Docker version 26.0.0, build 2ae903e86c
Memory and CPU limits
On macOS and Windows, Docker effectively runs containers inside a VM created by the system hypervisor. Nexus requires at least 2 CPUs and 8 GiB of memory in total. You can increase the limits in Docker settings in the menu Settings > Resources.
@@ -421,8 +421,8 @@ Object. In the example above, _:b0 would be a subject (in our case _:b0 is just an arbitrary ID for the payload), is a predicate, and Dataset an object.\nIf we want to query this graph and return only the name and description of Dataset data types, we can write a simple SPARQL query:\nSELECT ?name ?description\nWHERE {\n ?resource ;\n ?name ;\n ?description .\n}\nThis would result in the following table:\nname description Dataset My first dataset\nLet’s decompose this query. The SELECT statement lists all the fields (i.e. variables) that we want to display. The WHERE clause shows how to get to these fields: which graph paths (or traversals) do we need to do to get to that field. In the WHERE clause, semi-colons ; indicate that we are still talking about the same Subject but with different predicates. A full stop . indicates that the statement is finished, i.e. that we have finished talking about a specific Subject. A WHERE clause can have multiple statements. If an Object in the statement has a value, it has to match, otherwise it won’t give any results. If the Object is prefixed by a question mark ? it becomes a variable. Variables can have any names.\nIf not using semi-colons, we could have written the query as three separate statements:\nSELECT ?name ?description\nWHERE {\n ?resource .\n ?resource ?name .\n ?resource ?description .\n}\nLet’s imagine that we now want to also get the contentUrl, the query can be adapted to:\nSELECT ?name ?description ?contentUrl\nWHERE {\n ?resource ;\n ?name ;\n ?description ;\n ?distribution .\n ?distribution ?contentUrl .\n}\nThis would result in the following table:\nname description contentURL Dataset My first dataset http://example.com/cfb62f82-6d54-4e35-ab5e-3a3a164a04fb\nIf there were more resources that matched the same graph patterns, the query would have returned them as well.","title":"4.1. SPARQL and RDF"},{"location":"/docs/getting-started/try-nexus.html#4-1-1-optional-improving-json-ld-id-context-and-more","text":"The JSON-LD payload above is quite verbose. Let’s have a look at the different ways to improve it.\nThe first thing to notice is that if I want to reference my dataset, I don’t have an identifier. The node in the graph is a blank node _:b0. I can easily add an ID like this:\n{\n \"@id\": \"http://example.com/my_gradient_dataset\",\n \"@type\": [\"http://schema.org/Dataset\", \"http://www.w3.org/ns/prov#Entity\"],\n \"http://schema.org/description\": \"My first dataset\",\n \"http://schema.org/name\": \"Dataset\",\n \"http://schema.org/distribution\": [\n {\n \"@type\": \"http://schema.org/DataDownload\",\n \"http://schema.org/contentUrl\": {\n \"@id\": \"http://example.com/cfb62f82-6d54-4e35-ab5e-3a3a164a04fb\"\n },\n \"http://schema.org/name\": \"mesh-gradient.png\"\n }\n ]\n}\nInstead of _:b0, the node will be identified by http://example.com/my_gradient_dataset. The @id uniquely identifies node objects.\nCan we now make the JSON-LD less verbose and easier to read? Yes, by defining a context. A context defines the short-hand names used in the JSON-LD payload. In particular, a context can contain a vocab.\n{\n \"@context\": {\n \"@vocab\": \"http://schema.org/\"\n },\n \"@id\": \"http://example.com/my_gradient_dataset\",\n \"@type\": [\"Dataset\", \"http://www.w3.org/ns/prov#Entity\"],\n \"description\": \"My first dataset\",\n \"name\": \"Dataset\",\n \"distribution\": [\n {\n \"@type\": \"DataDownload\",\n \"contentUrl\": {\n \"@id\": \"http://example.com/cfb62f82-6d54-4e35-ab5e-3a3a164a04fb\"\n },\n \"name\": \"mesh-gradient.png\"\n }\n ]\n}\nIf you copy the above snippet to the JSON-LD Playground and look at the expanded form, you will notice that the properties all expand with the http://schema.org/ prefix. Don’t hesitate to do the same for the ones below. Play a little with the payload to see what happens to the expanded form.\nBut what if we want to shorten specific values? We can add them in the context as well.\n{\n \"@context\": {\n \"@vocab\": \"http://schema.org/\",\n \"Entity\": \"http://www.w3.org/ns/prov#Entity\"\n },\n \"@id\": \"http://example.com/my_gradient_dataset\",\n \"@type\": [\"Dataset\", \"Entity\"],\n \"description\": \"My first dataset\",\n \"name\": \"Dataset\",\n \"distribution\": [\n {\n \"@type\": \"DataDownload\",\n \"contentUrl\": {\n \"@id\": \"http://example.com/cfb62f82-6d54-4e35-ab5e-3a3a164a04fb\"\n },\n \"name\": \"mesh-gradient.png\"\n }\n ]\n}\nFinally, the last improvement would be to shorten our IDs. For this we can use a base.\n{\n \"@context\": {\n \"@base\": \"http://example.com/\",\n \"@vocab\": \"http://schema.org/\",\n \"Entity\": \"http://www.w3.org/ns/prov#Entity\"\n },\n \"@id\": \"my_gradient_dataset\",\n \"@type\": [\"Dataset\", \"Entity\"],\n \"description\": \"My first dataset\",\n \"name\": \"Dataset\",\n \"distribution\": [\n {\n \"@type\": \"DataDownload\",\n \"contentUrl\": {\n \"@id\": \"cfb62f82-6d54-4e35-ab5e-3a3a164a04fb\"\n },\n \"name\": \"mesh-gradient.png\"\n }\n ]\n}\nBy default, in Nexus, the base (resp. vocab) defaults to your project, e.g. https://bbp.epfl.ch/nexus/v1/resources/github-users/adulbrich/_/ (resp. https://bbp.epfl.ch/nexus/v1/vocabs/github-users/adulbrich/).\nThe context can also point to another resource so that it is defined once and can be re-used in multiple resources. In Nexus, a default context for the Nexus-specific metadata is defined.\nThere’s much more to the JSON-LD syntax. Don’t hesitate to have a look for a more detailed explanation.","title":"4.1.1. (Optional) Improving JSON-LD: ID, Context and More"},{"location":"/docs/getting-started/try-nexus.html#4-1-2-optional-improving-sparql-prefixes","text":"The SPARQL query above was quite verbose in the sense that we had to write the full URI for predicates, similarly to the full properties URIs in the JSON payload.\nThere is a way to shorten these with PREFIX statements. Prefixes are similar to the base and vocabs for JSON-LD in their use.\nPREFIX rdf: \nPREFIX schema: \nSELECT ?name ?description ?contentUrl\nWHERE {\n ?resource rdf:type schema:Dataset ;\n schema:name ?name ;\n schema:description ?description ;\n schema:distribution ?distribution .\n ?distribution schema:contentUrl ?contentUrl .\n}\nThe new query will yield the exact same results as the one defined earlier, but is much more readable. By defining prefixes, we can replace long URIs by the prefix and the actual property. This means that for example becomes schema:distribution.\nYou will often see that instead of rdf:type (the shortened version of ), people use a. This is an RDF specific keyword to point to types. In the example above, the line would then be ?resource a schema:Dataset.\nYou can learn more about SPARQL in the official documentation.","title":"4.1.2. (Optional) Improving SPARQL: Prefixes"},{"location":"/docs/getting-started/try-nexus.html#4-2-project-and-resource-views","text":"As you saw in the example above, we can use SPARQL to query the cells in our Nexus project.\nLet’s start by accessing your Nexus instance or the Sandbox. Go to your project by clicking on the “Projects” card and searching for your project.\nIn the Sandbox, the organization corresponds to the identity provider used, and the project to your username. For example, if you used GitHub, the organization will be github-users and your project will be your GitHub username.\nIn the Project view, you will have the list of all resources that you’ve registered within your project. You can filter by type or search for a specific term in the name, label, or description.\nClick on a resource to open the Resource view.\nDepending on the resource data type, you might see one or more “plugins”. Plugins are components that will show up for specific resources or properties. For example, if you registered a neuron morphology and the data is properly attached through the distribution, you will be able to see a 3D morphology browser plugin.\nMore importantly, you will find the Advanced view plugin at the bottom of the view. Expand it and you will see the actual resource payload stored by Nexus, and navigate the graph through links, or visualize the surrounding graph in the graph tab.\nHere’s an example of the JSON payload of the neuron morphology resource previously registered (context left out for clarity):\n{\n \"@context\": {...},\n \"@id\": \"https://bbp.epfl.ch/neurosciencegraph/data/neuronmorphologies/491ea474-34f1-4143-8e1d-9f077602d36e\",\n \"@type\": [\n \"Dataset\",\n \"NeuronMorphology\"\n ],\n \"apicalDendrite\": \"spiny\",\n \"brainLocation\": {\n \"@type\": \"BrainLocation\",\n \"brainRegion\": {\n \"@id\": \"mba:778\",\n \"label\": \"VISp5\"\n },\n \"coordinatesInBrainAtlas\": {\n \"valueX\": 8881,\n \"valueY\": 953.839501299405,\n \"valueZ\": 7768.22695782726\n },\n \"hemisphere\": \"right\",\n \"layer\": \"5\"\n },\n \"cell_reporter_status\": \"positive\",\n \"contribution\": {\n \"@type\": \"Contribution\",\n \"agent\": {\n \"@id\": \"https://www.grid.ac/institutes/grid.417881.3\",\n \"@type\": \"Organization\",\n \"label\": \"Allen Institute for Brain Science\"\n }\n },\n \"csl__normalized_depth\": 0.478343598387418,\n \"distribution\": {\n \"@type\": \"DataDownload\",\n \"atLocation\": {\n \"@type\": \"Location\",\n \"store\": {\n \"@id\": \"https://bluebrain.github.io/nexus/vocabulary/diskStorageDefault\"\n }\n },\n \"contentSize\": {\n \"unitCode\": \"bytes\",\n \"value\": 83865\n },\n \"contentUrl\": \"https://sandbox.bluebrainnexus.io/v1/files/github-users/adulbrich/30704eaf-7c87-4d74-8d41-50f673961580\",\n \"digest\": {\n \"algorithm\": \"SHA-256\",\n \"value\": \"833a94e1b6de4b4bc5beef4ccc75f84cfec9d5e5f40752d7cefb0ed1f545bbda\"\n },\n \"encodingFormat\": \"application/swc\",\n \"name\": \"reconstruction.swc\"\n },\n \"generation\": {\n \"@type\": \"Generation\",\n \"activity\": {\n \"@type\": \"NeuronMorphologyReconstruction\",\n \"hadProtocol\": {}\n }\n },\n \"identifier\": 485909730,\n \"license\": {\n \"@id\": \"https://alleninstitute.org/legal/terms-use/\",\n \"@type\": \"License\"\n },\n \"name\": \"Cux2-CreERT2;Ai14-205530.03.02.01\",\n \"nr__average_contraction\": 0.891127894162828,\n \"nr__average_parent_daughter_ratio\": 0.844376941289302,\n \"nr__max_euclidean_distance\": 446.97383394351,\n \"nr__number_bifurcations\": 18,\n \"nr__number_stems\": 7,\n \"nr__reconstruction_type\": \"dendrite-only\",\n \"objectOfStudy\": {\n \"@id\": \"http://bbp.epfl.ch/neurosciencegraph/taxonomies/objectsofstudy/singlecells\",\n \"@type\": \"ObjectOfStudy\",\n \"label\": \"Single Cell\"\n },\n \"subject\": {\n \"@type\": \"Subject\",\n \"age\": {\n \"period\": \"Post-natal\"\n },\n \"identifier\": 485250100,\n \"name\": \"Cux2-CreERT2;Ai14-205530\",\n \"sex\": {},\n \"species\": {\n \"label\": \"Mus musculus\"\n },\n \"strain\": {\n \"label\": \"Cux2-CreERT2\"\n }\n },\n \"tag__apical\": \"intact\"\n}","title":"4.2. Project and Resource Views"},{"location":"/docs/getting-started/try-nexus.html#4-3-query-neuroscience-data","text":"Going back to the project view, you will notice a tab named ‘Query’. Let’s click on it and start experimenting. On clicking on the ‘Query tab’, you will see a sparql query editor.\nWe want to list the morphologies that we previously registered in our project. Let’s write some SPARQL to retrieve it.\nPREFIX nxv: \nPREFIX nsg: \nPREFIX schema: \nPREFIX prov: \nPREFIX rdf: \nPREFIX rdfs: \nPREFIX skos: \nSELECT DISTINCT ?self ?name ?brainRegion ?brainRegionLayer ?subjectSpecies ?subjectStrain ?registered_by ?registeredAt\nWHERE {\n?entity nxv:self ?self ;\n a nsg:NeuronMorphology ;\n schema:name ?name ;\n nxv:createdBy ?registered_by ;\n nxv:createdAt ?registeredAt ;\n nxv:deprecated false ;\n nsg:brainLocation / nsg:brainRegion / rdfs:label ?brainRegion ;\n nsg:brainLocation / nsg:layer ?brainRegionLayer ;\n nsg:subject / nsg:species / rdfs:label ?subjectSpecies ;\n nsg:subject / nsg:strain / rdfs:label ?subjectStrain.\n}\nLIMIT 1000\nWe list a couple more prefixes in this query. Even though we don’t use most of them, they are common ones.\nWe introduce a new notation to traverse the graph: slashes /. This helps us writing more succinct queries by not referencing a temporary variable every time we want to traverse a node.\nFinally, you will notice the self. This is an internal Nexus property (in addition to createdBy and createdAt, as illustrated by the use of the nxv prefix) that points to the actual source of the resource. We will need the self to open the resource view from a Studio (see next section).\nNote that we have have a LIMIT 1000 clause at the end. This will limit the number of returned results in case there are more than 1000.\nHere’s the result of the above query:\nself name brainRegion brainRegionLayer subjectSpecies subjectStrain registered_by registeredAt https://sandbox.bluebrainnexus.io/v1/resources/github-users/adulbrich/_/... Cux2-CreERT2;Ai14-205530.03.02.01 VISp5 5 Mus musculus Cux2-CreERT2 https://sandbox.bluebrainnexus.io/v1/realms/github/users/adulbrich 2021-07-27T18:58:45.238Z","title":"4.3. Query Neuroscience Data"},{"location":"/docs/getting-started/try-nexus.html#4-4-create-a-studio","text":"Go back to your project view. Click on the ‘Studios’ tab. It will open in a new window.\nYou will land on:\nOn this page you can create a new studio. A Studio is a dedicated web page in your project that you can organise into pages (workspaces, as horizontal tabs at the top) and sections (dashboards, as vertical tabs on the left side) and list your data in a logical way. The tables are powered by SPARQL queries and the data present in your project.\nStart by creating a Workspace. To create a workspace, click the Workspace button and in the menu that appears click Add Workspace. You will be presented with a dialog requesting the label of the workspace and optionally a description. Enter a label of your choosing and click Save.\nNext, create a Dashboard. Click the Dashboard button and choose Add from the pop-up menu. In the Dashboard creation dialog, specify the default Sparql Index view and specify the SPARQL query above. The ‘Configure Columns’ button will list the columns to be returned by the query and support specifying options on the columns such as enabling searching or sorting.\nBecause we are using the self, clicking on a row of the newly created table will open the resource view.\nIt’s your turn now to add a dashboard to list your Neuron Electrophysiology data. Create the dashboard and modify the SPARQL query above.\nCongratulations! You’ve created your very first studio, which completes this tutorial step.","title":"4.4. Create a Studio"},{"location":"/docs/getting-started/try-nexus.html#step-5-finding-similar-datasets-using-recommendations","text":"In this section, you will first learn about recommendation systems, then reuse the data you have integrated in Nexus in previous steps and build a recommendation system to find datasets that are similar to a chosen neuron morphology or electrophysiology recording.","title":"Step 5: Finding Similar Datasets using Recommendations"},{"location":"/docs/getting-started/try-nexus.html#5-1-introduction-to-recommendations","text":"Recommendation systems are widely used in many domains, for example, streaming services provide recommendations for movies or songs, online stores generate product recommendations, etc. Such systems allow selecting the most relevant entities from the vast space of all the available choices. This selection can be based on different criteria, for example, various features of target entities (movie genre, country, cast), user profiles, and interactions with the entities of interest (for example, previously watched movies).\nIn a similar way, there is a need for recommendation systems that help us to explore our Knowledge Graphs when they become overwhelmingly large. Given a node in a Knowledge Graph (corresponding to, for example, a neuron morphology dataset), we may want to recommend a set of most similar nodes according to some complex criteria.\nOne of the most common techniques for building a recommendation system is based on entity embedding that represents each entity with a numerical vector. Given a starting entity (a neuron morphology dataset), the task of finding similar entities can be reduced to a simple search for the nearest neighbors in the vector space of our embedding.\nOne of the first modern approaches to entity embedding reflecting their semantic similarity was developed by the Natural Language Processing (NLP) community and is called word2vec. To generate vector representations of words, it trains a neural network on a large text corpus from which word contexts are extracted. The resulting vector representation is able to capture the semantic similarity between different words.\nSimilarity to word2vec, node2vec builds vector representations of graph nodes. To generate ‘context’ for different nodes, this approach performs random walks and explores the neighborhood of a given node.\nFinally, another derivative of word2vec, adapted specifically for building node embedding on Knowledge Graphs, is called rdf2vec.\nIn this tutorial, we use rdf2vec in order to build a toy recommendation system for exploring similar neuron morphologies and electrophysiology recordings.\nFurther reads on graph embedding:\nGraph embedding techniques, applications, and performance: A survey Knowledge Graph Embedding: A Survey of Approaches and Applications","title":"5.1. Introduction to Recommendations"},{"location":"/docs/getting-started/try-nexus.html#5-2-running-the-notebook","text":"Github Google Colab Binder\nTo run the notebook locally, open your terminal, clone the Nexus repository, go to the notebook directory, and run Jupyter:\ngit clone https://github.com/BlueBrain/nexus.git\ncd nexus\ncd docs/src/main/paradox/docs/getting-started/notebooks\nNote: You will need gcc installed locally to run the following notebook.\njupyter notebook MOOC_Content_based_Recommender_System_using_Blue_Brain_Nexus.ipynb\nWell done! You have now completed the last part of this tutorial. To learn more, scroll down or navigate to our documentation, or start contributing to our Github repositories.\nIf you have reached this tutorial via the Simulation Neuroscience MOOC, you can now head back to the edX platform and complete this week assignment.","title":"5.2. Running the Notebook"},{"location":"/docs/getting-started/try-nexus.html#learn-more","text":"","title":"Learn More"},{"location":"/docs/getting-started/try-nexus.html#another-tutorial-with-the-movielens-dataset","text":"Nexus can be used to manage more than neuroscience data. If you want to try it, head over to our MovieLens Tutorial!","title":"Another Tutorial with the MovieLens Dataset"},{"location":"/docs/getting-started/try-nexus-movielens.html","text":"","title":"Try Nexus with the MovieLens Dataset"},{"location":"/docs/getting-started/try-nexus-movielens.html#try-nexus-with-the-movielens-dataset","text":"In this tutorial, you will use the core features of the Nexus ecosystem through our sandbox. This requires minimal technical knowledge but the ability to install a Python library and run a jupyter notebook.\nIn the first step, you’ll learn:\nto login into our Nexus Sandbox, create an organization and project, get your personal token.\nIn the second step, you’ll learn:\ninstall Nexus Forge, configure a Knowledge Graph forge, transform data, load the transformed data into the project, search for data using a SPARQL query.\nIn the third step, you’ll learn:\ncreate a Studio in Nexus Fusion, visualize and filter loaded data.\nFinally, check our Learn More section for more advanced tutorials based on the same datasets.","title":"Try Nexus with the MovieLens Dataset"},{"location":"/docs/getting-started/try-nexus-movielens.html#configuring-your-project-in-nexus-fusion","text":"The Nexus Sandbox is a deployment of Nexus Delta and Fusion publicly available to anybody. Please note that you should not store any sensitive data in this environment. Also, we do not offer guaranty as to how long the data will be kept, this is only for learning and testing purposes.\nNexus Fusion is the web interface that you will use in order to interact with Nexus Delta (the web services that manages the underlying knowledge graph).\nPlease bear in mind that the data stored in the Nexus Sandbox is being purged at regular intervals. We recommend you do not store any sensitive data in this environment since it is accessible to many other users.\nThe first step is to login, by clicking in the upper right corner of the screen. You can login with your Github credentials.\nThe Sandbox environment automatically provisions a project for you so you don’t have to. From the landing page, click on the “Organizations” card and you will see the list of organisations in Nexus. A project is contained in an organisation. The organisation where your project is created depends on your identity provider. If you logged in with GitHub for example, your project was created under the github-users organisation.\nNow open the github-users organisation and find your own project, which is named after your login. Once the project is created, you’ll land on the project view. There is no resources at first. Wait for it. You will quickly see that the project has finished indexing (top right corner).\nWhich means that the system has created default indices and storage for you.\nWe’re all set! We now have a project to host our resources and datasets. Let’s move on to the second part of this tutorial.","title":"Configuring your Project in Nexus Fusion"},{"location":"/docs/getting-started/try-nexus-movielens.html#working-with-data-in-nexus-forge","text":"We’ll load the MovieLens dataset into the created project within Nexus Delta using the python framework Nexus Forge.\nA jupyter notebook is available for this part of the tutorial and can be spawn easily using Google Colab, binder, or locally:\nGoogle Colab binder Github\nFor local execution, Nexus Forge can be installed using these instructions. Make sure that the jupyter notebook|lab is launched in the same virtual environment where Nexus Forge is installed. Alternatively, set up a specialized kernel.\nIf you want to try some other examples of Nexus Forge, you can use these notebooks.\nThe next step is to use this query to create a Studio view in Nexus Fusion.","title":"Working with Data in Nexus Forge"},{"location":"/docs/getting-started/try-nexus-movielens.html#exploring-the-graph-in-nexus-fusion","text":"Login the Sandbox and navigate your previously created project.\nClick on the studio tab.\nIn a new browser tab, you will see a list of all studios you have access to. Click on Create Studio.\nGive a name to your Studio and click Save.\nHere’s your empty Studio. Click the + icon to Add Workspace.\nGive a name to your Workspace and click Save.\nYou now have one Workspace configured. Click the + icon to Add Dashboard..\nIn order to query the graph in a Studio Dashboard, a small modification of the previous query is necessary. You can find more information about it in the Studio docs.\nPREFIX vocab: \nPREFIX nxv: \nSELECT DISTINCT ?self ?title\nWHERE {\n?id nxv:self ?self ;\n nxv:deprecated false ;\n vocab:title ?title ;\n ^vocab:movieId / vocab:tag \"thought-provoking\" .\n}\nLIMIT 20\nChoose a name for your Dashboard, copy the query. For the “View” select https://bluebrain.github.io/nexus/vocabulary/defaultSparqlIndex from the dropdown. Click on Configure Columns button to see a preview of all the columns the dashboard will have. Now click Save.\nAnd there are the results:\nGood job! You just finished this introductory course to Nexus using our Sandbox. You can now install Nexus locally or continue with the tutorials below.","title":"Exploring the Graph in Nexus Fusion"},{"location":"/docs/getting-started/try-nexus-movielens.html#learn-more","text":"","title":"Learn More"},{"location":"/docs/getting-started/try-nexus-movielens.html#querying-knowledge-graph-using-sparql","text":"This tutorial introduces the basics of SPARQL, a query language for querying RDF based knowledge graph. It also demonstrates how to query a Nexus SparqlView.\nYou will build queries to explore and navigate a knowledge graph using SPARQL and Nexus.\nYou will learn:\nthe basics of the SPARQL query language, how to connect to and query a SparqlView in Nexus.\nYou will need Python 3.5 or higher with support for Jupyter notebook.\nThis tutorial code is available on:\nGithub Google Colab","title":"Querying knowledge graph using SPARQL"},{"location":"/docs/getting-started/try-nexus-movielens.html#querying-a-knowledge-graph-using-elasticsearch","text":"The goal of this notebook is to learn how to connect to an Elasticsearch view and run queries against it.\nIt is not a tutorial about the Elasticsearch DSL language for which many well written learning resources are available.\nYou will build a simple python client to connect to a Nexus ElasticSearchView and query a knowledge graph using Elasticsearch DSL.\nYou will learn how to connect to and query a ElasticSearchView in Nexus.\nYou will need Python 3.5 or higher with support for Jupyter notebook.\nThe tutorial code is available on:\nGithub Google Colab","title":"Querying a Knowledge Graph using Elasticsearch"},{"location":"/docs/getting-started/try-nexus-movielens.html#linking-data-on-the-web","text":"In this tutorial, we demonstrate how to consume structured data published on the web according to the Linked data principles to extend and enrich a knowledge graph.\nYou will build a simple pipeline to query entities managed within Blue Brain Nexus, connect them with entities available on the web as structured data and extend and enrich their metadata.\nYou will learn:\nan understanding of linked data principles, how to query data stored in a Nexus SparqlView, how to query structured data on the web, how to extend the metadata of entities managed within Blue Brain Nexus with external structured data on the web: we target Wikidata as an example, how to update entities within Blue Brain Nexus using the SDK and enrich their metadata.\nYou will need Python 3.6 or higher with support for Jupyter notebook.\nThis tutorial code is available on:\nGithub Google Colab","title":"Linking data on the web"},{"location":"/docs/getting-started/running-nexus/index.html","text":"","title":"Running Nexus"},{"location":"/docs/getting-started/running-nexus/index.html#running-nexus","text":"If you wish to quickly try out Nexus, we provide a public sandbox. For a more in-depth test-drive of Nexus on your machine, we recommend the Docker Compose approach. For a production deployment on your in-house or cloud infrastructure, please refer to our deployment guide.","title":"Running Nexus"},{"location":"/docs/getting-started/running-nexus/index.html#using-the-public-sandbox","text":"A public instance of Nexus is running at https://sandbox.bluebrainnexus.io. You can log in with a GitHub account. It’s provided for evaluation purposes only, without any guarantees.\nThe API root is https://sandbox.bluebrainnexus.io/v1.\nNote Do not ingest any proprietary or sensitive data. The environment will be wiped regularly, your data and credentials can disappear anytime.","title":"Using the public sandbox"},{"location":"/docs/getting-started/running-nexus/index.html#run-nexus-locally-with-docker","text":"","title":"Run Nexus locally with Docker"},{"location":"/docs/getting-started/running-nexus/index.html#requirements","text":"","title":"Requirements"},{"location":"/docs/getting-started/running-nexus/index.html#docker","text":"Regardless of your OS, make sure to run a recent version of the Docker Engine. This was tested with version 20.10.23. The Docker Engine, along the Docker CLI, come with an installation of Docker Desktop. Visit the official Docker Desktop documentation for detailed installation steps.\nCommand :\ndocker --version\nExample :\n$ docker --version\nDocker version 20.10.23, build 7155243","title":"Docker"},{"location":"/docs/getting-started/running-nexus/index.html#memory-and-cpu-limits","text":"On macOS and Windows, Docker effectively runs containers inside a VM created by the system hypervisor. Nexus requires at least 2 CPUs and 8 GiB of memory in total. You can increase the limits in Docker settings in the menu Settings > Resources.\nFor a proper evaluation using Docker, we recommend allocating at least 16GiB of RAM to run the provided templates. Feel free to tweak memory limits in order to fit your hardware constraints. At the cost of a slower startup and a decreased overall performance, you should be able to go as low as:\nService Memory [MiB] PostgreSQL 512 Elasticsearch 512 Blazegraph 1024 Delta 1024","title":"Memory and CPU limits"},{"location":"/docs/getting-started/running-nexus/index.html#docker-compose","text":"","title":"Docker Compose"},{"location":"/docs/getting-started/running-nexus/index.html#set-up","text":"Download the Docker Compose template into a directory of your choice, for instance ~/docker/nexus/. Download the Delta configuration to the same directory. Download the http proxy configuration to the same directory.","title":"Set-up"},{"location":"/docs/getting-started/running-nexus/index.html#starting-nexus","text":"From within the directory that contains the docker-compose.yaml you downloaded, run the containers using Docker Compose:\nCommand :\ndocker compose --project-name nexus --file=docker-compose.yaml up --detach\nExample :\n$ cd ~/docker/nexus\n$ docker compose --project-name nexus --file=docker-compose.yaml up --detach\n...\n⠿ Network nexus_default Created 0.0s\n⠿ Container nexus-elasticsearch-1 Started 1.2s\n⠿ Container nexus-postgres-1 Started 1.2s\n⠿ Container nexus-blazegraph-1 Started 1.1s\n⠿ Container nexus-delta-1 Started 1.6s\n⠿ Container nexus-web-1 Started 1.8s\n⠿ Container nexus-router-1 Started 2.1s\nWhen running the command for the first time, Docker will pull all necessary images from Dockerhub if they are not available locally. Once all containers are running, wait one or two minutes and you should be able to access Nexus locally, on the port 80:\nCommand :\ncurl http://localhost/v1/version\nExample :\n$ curl http://localhost/v1/version | jq\n{\n \"@context\": \"https://bluebrain.github.io/nexus/contexts/version.json\",\n \"delta\": \"1.10.0\",\n \"dependencies\": {\n \"blazegraph\": \"2.1.6-SNAPSHOT\",\n \"elasticsearch\": \"8.13.3\",\n \"postgres\": \"15.7\"\n },\n \"environment\": \"dev\",\n \"plugins\": {\n \"archive\": \"1.10.0\",\n \"blazegraph\": \"1.10.0\",\n \"composite-views\": \"1.10.0\",\n \"elasticsearch\": \"1.10.0\",\n \"storage\": \"1.10.0\"\n }\n}","title":"Starting Nexus"},{"location":"/docs/getting-started/running-nexus/index.html#using-fusion","text":"Fusion can be accessed by opening http://localhost in your web browser. You can start by creating an organization from the http://localhost/admin page.\nNote This setup runs the Nexus ecosystem without an identity provider, and the anonymous user is given all default permissions; do not publicly expose the endpoints of such a deployment.","title":"Using Fusion"},{"location":"/docs/getting-started/running-nexus/index.html#administration","text":"To list running services or access logs, please refer to the official Docker documentation.","title":"Administration"},{"location":"/docs/getting-started/running-nexus/index.html#stopping-nexus","text":"You can stop and delete the entire deployment with:\nCommand :\ndocker compose --project-name nexus down --volumes\nExample :\n$ docker compose --project-name nexus down --volumes\n[+] Running 7/7\n⠿ Container nexus-router-1 Removed 0.2s\n⠿ Container nexus-web-1 Removed 0.3s\n⠿ Container nexus-delta-1 Removed 0.5s\n⠿ Container nexus-postgres-1 Removed 0.3s\n⠿ Container nexus-blazegraph-1 Removed 10.3s\n⠿ Container nexus-elasticsearch-1 Removed 1.0s\n⠿ Network nexus_default Removed 0.1s\nNote As no data is persisted outside the containers, everything will be lost once you shut down the Nexus deployment. If you’d like help with creating persistent volumes, feel free to contact us on Github Discussions.","title":"Stopping Nexus"},{"location":"/docs/getting-started/running-nexus/index.html#endpoints","text":"The provided reverse proxy (the nginx image) exposes several endpoints:\nroot: Nexus Fusion v1: Nexus Delta elasticsearch: Elasticsearch endpoint blazegraph: Blazegraph web interface\nIf you’d like to customize the listening port or remove unnecessary endpoints, you can simply modify the nginx.conf file.","title":"Endpoints"},{"location":"/docs/getting-started/running-nexus/index.html#postgresql-partitioning","text":"Nexus Delta takes advantage of PostgreSQL’s Table Partitioning feature. This allows for improved query performance, and facilitates loading, deleting, or transferring data.\nThe public.scoped_events and public.scoped_states are partitioned by organization, which is itself partitioned by the projects it contains; this follows the natural hierarchy that can be found in Nexus Delta.\nNexus Delta takes care of handling the creation and deletion of the partitions.\nIf the created project is the first one of a given organization, both the organization partition and the project subpartition will be created. If the organization partition already exist, then only the project subpartition will be created upon project creation.\nThe naming scheme of the (sub)partitions is as follows:\n{table_name}_{MD5_org_hash} for organization partitions\n{table_name}_{MD5_project_hash} for project partition\nwhere\n{table_name} is either scoped_events or scoped_states {MD5_org_hash} is the MD5 hash of the organization name {MD5_project_has} is the MD5 hash of the project reference (i.e. has the form {org_name}/{project_name})\nMD5 hashing is used in order to guarantee a constant partition name length (PostgreSQL table names are limited to 63 character by default), as well as to avoid any special characters that might be allowed in project names but not in PostgreSQL table names (such as -).\nExample:\nYou create the organization called myorg, inside of which you create the myproject project. When the project is created, Nexus Delta will have created the following partitions:\nscoped_events_B665280652D01C4679777AFD9861170C, the partition of events from the myorg organization scoped_events_7922DA7049D5E38C83053EE145B27596, the subpartition of the events from the myorg/myproject project scoped_states_B665280652D01C4679777AFD9861170C, the partition of states from the myorg organization scoped_states_7922DA7049D5E38C83053EE145B27596, the subpartition of the states from the myorg/myproject project","title":"PostgreSQL partitioning"},{"location":"/docs/getting-started/running-nexus/index.html#advanced-subpartitioning","text":"While Nexus Delta provides table partitioning out-of-the-box, it is primarily addressing the case where the data is more or less uniformly spread out across multiple projects. If however there is one or more project that are very large, it is possible to add further subpartitions according to a custom rule. This custom subpartitioning must be decided on a case-by-case basis using your knowledge of the given project; the idea is to create uniform partitions of your project. Please refer to the PostgreSQL Table Partitioning documentation.","title":"Advanced subpartitioning"},{"location":"/docs/getting-started/running-nexus/index.html#on-premise-cloud-deployment","text":"There are several things to consider when preparing to deploy Nexus “on premise” because the setup depends a lot on the various usage profiles, but the most important categories would be:\nAvailability Latency & throughput Capacity Efficient use of hardware resources Backup and restore Monitoring & alerting\nEach of the Nexus services and “off the shelf” products can be deployed as a single instance or as a cluster (with one exception at this point being Blazegraph which doesn’t come with a clustering option). The advantages for deploying clusters are generally higher availability, capacity and throughput at the cost of higher latency, consistency and having to potentially deal with network instability.\nThe decision to go with single node deployments or clustered deployments can be revisited later on and mixed setups (some services single node while others clustered) are also possible.\nThe Nexus distribution is made up of Docker images which can be run on any host operating system and each of the “off the shelf” products that also offer Docker as a deployment option. We would generally recommend using a container orchestration solution like Kubernetes as it offers good management capabilities, discovery, load balancing and self-healing. They also accommodate changes in hardware allocations for the deployments, changes that can occur due to evolving usage patterns, software updates etc. Currently, the largest Nexus deployment is at EPFL and runs on Kubernetes.","title":"On premise / cloud deployment"},{"location":"/docs/getting-started/running-nexus/index.html#choice-of-hardware","text":"Depending on the target throughput, usage profiles and data volume the hardware specification can vary greatly; please take a look at the benchmarks section to get an idea of what you should expect in terms of throughput with various hardware configurations. When the usage profiles are unknown a couple of rules of thumb should narrow the scope:\nNexus uses a collection of data stores (PostgreSQL, Elasticsearch, Blazegraph) which depend performance wise to the underlying disk access, so: prefer local storage over network storage for lower latency when doing IO, prefer SSD over HDDs because random access speed is more important than sequential access, one exception is the file storage (file resources which are stored as binary blobs on the filesystem) where the network disks should not be a cause for concern, nor random access speed; this assumes that accessing attachments is not the at the top in the usage profile All of Nexus services and most of the “off the shelf” products are built to run on top of the JVM which usually require more memory over computing power. A rough ratio of 2 CPU cores per 8GB of RAM is probably a good one (this of course depends on the CPU specification). Due to the design for scalability of Nexus services and “off the shelf” products the network is a very important characteristic of the deployment as frequent dropped packets or network partitions can seriously affect the availability of the system. Clustered / distributed systems generally use some form of consensus){ open=new } which is significantly affected by the reliability of the network. If the reliability of the network is a concern within the target deployment then vertical scalability is desirable over horizontal scalability: fewer host nodes with better specifications is better over more commodity hardware host nodes.","title":"Choice of hardware"},{"location":"/docs/getting-started/running-nexus/index.html#postgresql","text":"Nexus uses PostgreSQL as its primary store as for its strong reputation for performance, reliability and flexibility. It can also be run in different contexts from integration to\nSince this is the primary store it is the most important system to be backed up. All of the data that Nexus uses in other stores can be recomputed from the one stored in PostgreSQL as the other stores are used as mere indexing systems.\n// TODO capacity planning + recommendations\nAs described in the architecture section the generally adopted persistence model is an EventSourced model in which the data store is used as an append only store. This has implications to the total amount of disk used by the primary store.\n// TODO formula computing disk space","title":"PostgreSQL"},{"location":"/docs/getting-started/running-nexus/index.html#elasticsearch","text":"Nexus uses Elasticsearch to host several system indices and user defined ones. It offers sharding and replication out of the box. Deciding whether this system requires backup depends on the tolerated time for a restore. Nexus can be instructed to rebuild all indices using the data from the primary store, but being an incremental indexing process it can take longer than restoring from a backup. Since it can be configured to host a number of replicas for each shard it can tolerate a number of node failures.\nThe Elasticsearch setup documentation contains the necessary information on how to install and configure it, but recommendations on sizing the nodes and cluster are scarce because it depends on usage.\nA formula for computing the required disk space:\ntotal = (resource_size * count * documents + lucene_index) * replication_factor\n… where the lucene_index while it can vary should be less than twice the size of the original documents.\nAn example, assuming:\n10KB per resource 1.000.000 distinct resources 3 documents per resource (the number of documents depends on the configured views in the system) 2 additional shard replicas (replication factor of 3)\n… the total required disk size would be:\n(10KB * 1.000.000 * 3 + 2 * (10KB * 1.000.000 * 3)) * 3 = 270.000.000KB ~= 260GB\nThe resulting size represents the total disk space of the data nodes in the cluster; a 5 data node cluster with the data volume in the example above would have to be configured with 60GB disks per node.","title":"Elasticsearch"},{"location":"/docs/getting-started/running-nexus/index.html#blazegraph","text":"Nexus uses Blazegraph as an RDF (triple) store to provide a advanced querying capabilities on the hosted data. This store is treated as a specialized index on the data so as with Kafka and Elasticsearch in case of failures, the system can be fully restored from the primary store. While the technology is advertised to support High Availability and Scaleout deployment configurations, we have yet to be able to setup a deployment in this fashion.\nWe currently recommend deploying Blazegraph using the prepackaged tar.gz distribution available to download from GitHub.\nNote We’re looking at alternative technologies and possible application level (within Nexus) sharding and replicas.\nThe Hardware Configuration section in the documentation gives a couple of hints about the requirements to operate Blazegraph and there are additional sections for optimizations in terms of Performance, IO and Query.\nBlazegraph stores data in an append only journal which means updates will use additional disk space.\nA formula for computing the required disk space:\ntotal = (resource_triples + nexus_triples) * count * number_updates * triple_size + lucene_index\n… where the lucene_index while it can vary should be less than twice the size of the original documents.\nAn example, assuming:\n100 triples (rough estimate for a 10KB json-ld resource representation) 20 additional nexus triples on average 1.000.000 distinct resources 10 updates per resource 200 bytes triple size (using quads mode)\n… the total required disk size would be:\n(100 + 20) * 1.000.000 * 10 * 200 / 1024 * 3 ~= 700.000.000KB ~= 670GB\nCompactions can be applied to the journal using the CompactJournalUtility to reduce the disk usage, but it takes quite a bit a time and requires taking the software offline during the process.","title":"Blazegraph"},{"location":"/docs/getting-started/running-nexus/configuration/index.html","text":"","title":"Nexus configuration"},{"location":"/docs/getting-started/running-nexus/configuration/index.html#nexus-configuration","text":"Nexus Delta service can be highly customized using configuration file(s). Many things can be adapted to your deployment needs: port where the service is running, timeouts, pagination defaults, etc.\nThere are 3 ways to modify the default configuration:\nSetting the env variable DELTA_EXTERNAL_CONF which defines the path to a HOCON file. The configuration keys that are defined here can be overridden by the other methods. Using JVM properties as arguments when running the service: -D{property}. For example: -Dapp.http.interface=\"127.0.0.1\". Using FORCE_CONFIG_{property} environment variables. In order to enable this style of configuration, the JVM property -Dconfig.override_with_env_vars=true needs to be set. Once set, a configuration flag can be overridden. For example: CONFIG_FORCE_app_http_interface=\"127.0.0.1\".\nIn terms of JVM pool memory allocation, we recommend setting the following values to the JAVA_OPTS environment variable: -Xms4g -Xmx4g. The recommended values should be changed accordingly with the usage of Nexus Delta, the number of projects and the resources/schemas size.\nIn order to successfully run Nexus Delta there is a minimum set of configuration flags that need to be specified","title":"Nexus configuration"},{"location":"/docs/getting-started/running-nexus/configuration/index.html#http-configuration","text":"The http section of the configuration defines the binding address and port where the service will be listening.\nThe configuration flag akka.http.server.parsing.max-content-length can be used to control the maximum payload size allowed for Nexus Delta resources. This value applies to all posted resources except for files.","title":"Http configuration"},{"location":"/docs/getting-started/running-nexus/configuration/index.html#postgres-configuration","text":"The database section of the configuration defines the postgres specific configuration. As Nexus Delta uses three separate pools (‘read’, ‘write’, ‘streaming’), it is recommended to set the host, port, database name, username, and password via the app.defaults.database field, as it will apply to all pools. It is however possible to accommodate more advanced setups by configuring each pool separately by changing its respective app.database.{read|write|streaming} fields.\nThe pool size can be set using the app.defaults.database.access.pool-size setting for all pools, or individually for each pool (app.database.{read|write|streaming}.access.pool-size).\nNote A default Postgres deployment will limit the number of connections to 100, unless configured otherwise. See the Postgres Connection and Authentication documentation.\nBefore running Nexus Delta, the init scripts should be run in the lexicographical order.\nIt is possible to let Nexus Delta automatically create them using the following configuration parameters: app.database.tables-autocreate=true.\nNote Auto creation of tables is included as a development convenience and should be avoided in production.","title":"Postgres configuration"},{"location":"/docs/getting-started/running-nexus/configuration/index.html#rdf-parser","text":"The underlying Apache Jena parser used to validate incoming data is configurable using the json-ld-api field to enable different levels of strictness.","title":"RDF parser"},{"location":"/docs/getting-started/running-nexus/configuration/index.html#service-account-configuration","text":"Nexus Delta uses a service account to perform automatic tasks under the hood. Examples of it are:\nGranting default ACLs to the user creating a project. Creating default views on project creation.\nThe service-account section of the configuration defines the service account configuration.","title":"Service account configuration"},{"location":"/docs/getting-started/running-nexus/configuration/index.html#automatic-project-provisioning","text":"Automatic project provisioning allows to create a dedicated project for users the first time they connect to Delta that is to say the first time, they query the project listing endpoints.\nThe generated project label will be:\nThe current username where only non-diacritic alphabetic characters ([a-zA-Z]), numbers, dashes and underscores will be preserved. This resulting string is then truncated to 64 characters if needed.\nThis feature can be turned on via the flag app.automatic-provisioning.enabled.\nThe automatic-provisioning section of the configuration defines the project provisioning configuration.","title":"Automatic project provisioning"},{"location":"/docs/getting-started/running-nexus/configuration/index.html#fusion-configuration","text":"When fetching a resource, Nexus Delta allows to return a redirection to its representation in Fusion by providing text/html in the Accept header.\nThe fusion section of the configuration defines the fusion configuration.","title":"Fusion configuration"},{"location":"/docs/getting-started/running-nexus/configuration/index.html#projections-configuration","text":"Projections in Nexus Delta are asynchronous processes that can replay the event log and process this information. For more information on projections, please refer to the Architecture page.\nThe projections section of the configuration allows to configure the projections.\nIn case of failure in a projection, Nexus Delta records the failure information inside the public.failed_elem_logs PostgreSQL table, which can be used for analysis, and ultimately resolution of the failures. The configuration allows to set how long the failure information is stored for (app.projections.failed-elem-ttl), and how often the projection deleting the expired failures is awoken (app.projections.delete-expired-every).","title":"Projections configuration"},{"location":"/docs/getting-started/running-nexus/configuration/index.html#plugins-configuration","text":"Since 1.5.0, Nexus Delta supports plugins. Jar files present inside the local directory defined by the DELTA_PLUGINS environment variable are loaded as plugins into the Delta service.\nEach plugin configuration is rooted under plugins.{plugin_name}. All plugins have a plugins.{plugin_name}.priority configuration flag used to determine the order in which the routes are handled in case of collisions.\nFor more information about plugins, please refer to the Plugins page.","title":"Plugins configuration"},{"location":"/docs/getting-started/running-nexus/configuration/index.html#elasticsearch-views-plugin-configuration","text":"The elasticsearch plugin configuration can be found here.\nThe most important flags are: * plugins.elasticsearch.base which defines the endpoint where the Elasticsearch service is running. * plugins.elasticsearch.credentials.username and plugins.elasticsearch.credentials.password to allow to access to a secured Elasticsearch cluster. The user provided should have the privileges to create/delete indices and read/index from them.\nPlease refer to the Elasticsearch configuration which describes the different steps to achieve this.","title":"Elasticsearch views plugin configuration"},{"location":"/docs/getting-started/running-nexus/configuration/index.html#blazegraph-views-plugin-configuration","text":"The blazegraph plugin configuration can be found here.\nThe most important flag is plugins.blazegraph.base which defines the endpoint where the Blazegraph service is running.\nThe plugins.blazegraph.slow-queries section of the Blazegraph configuration defines what is considered a slow Blazegraph query, which will get logged in the public.blazegraph_queries PostgreSQL table. This can be used to understand which Blazegraph queries can be improved.","title":"Blazegraph views plugin configuration"},{"location":"/docs/getting-started/running-nexus/configuration/index.html#composite-views-plugin-configuration","text":"The composite views plugin configuration can be found here.\nThere are several configuration flags related to tweaking the range of values allowed for sources, projections and rebuild interval.\nAuthentication for remote sources can be specified in three different ways. The value of plugins.composite-views.remote-source-credentials should be speficied in the same way as remote storages, as shown here","title":"Composite views plugin configuration"},{"location":"/docs/getting-started/running-nexus/configuration/index.html#storage-plugin-configuration","text":"The storage plugin configuration can be found here.\nNexus Delta supports 3 types of storages: ‘disk’, ‘amazon’ (s3 compatible) and ‘remote’.\nFor disk storages the most relevant configuration flag is plugins.storage.storages.disk.default-volume, which defines the default location in the Nexus Delta filesystem where the files using that storage are going to be saved. For S3 compatible storages the most relevant configuration flags are the ones related to the S3 settings: plugins.storage.storages.amazon.default-endpoint, plugins.storage.storages.amazon.default-access-key and plugins.storage.storages.amazon.default-secret-key. For remote disk storages the most relevant configuration flags are plugins.storage.storages.remote-disk.default-endpoint (the endpoint where the remote storage service is running) and plugins.storage.storages.remote-disk.credentials (the method to authenticate to the remote storage service).","title":"Storage plugin configuration"},{"location":"/docs/getting-started/running-nexus/configuration/index.html#file-configuration","text":"When the media type is not provided by the user, Delta relies on automatic detection based on the file extension in order to provide one.\nFrom 1.9, it is possible to provide a list of extensions with an associated media type to compute the media type.\nThis list can be defined at files.media-type-detector.extensions:\nfiles {\n # Allows to define default media types for the given file extensions\n media-type-detector {\n extensions {\n custom = \"application/custom\"\n ntriples = \"application/n-triples\"\n }\n }\n}\nThe media type resolution process follow this order stopping at the first successful step:\nSelect the Content-Type header from the file creation/update request Compare the extension to the custom list provided in the configuratio Fallback on akka automatic detection Fallback to the default value application/octet-stream","title":"File configuration"},{"location":"/docs/getting-started/running-nexus/configuration/index.html#remote-storage-configuration","text":"Authentication for remote storage can be specified in three different ways. The value of plugins.storage.storages.remote-disk.credentials can be:","title":"Remote storage configuration"},{"location":"/docs/getting-started/running-nexus/configuration/index.html#recommended-client-credentials-openid-authentication-","text":"{\n type: \"client-credentials\"\n user: \"username\"\n password: \"password\"\n realm: \"internal\"\n}\nThis configuration tells Delta to log into the internal realm (which should have already been defined) with the user and password credentials, which will give Delta an access token to use when making requests to remote storage","title":"Recommended: client credentials (OpenId authentication)"},{"location":"/docs/getting-started/running-nexus/configuration/index.html#anonymous","text":"{\n type: \"anonymous\"\n}","title":"Anonymous"},{"location":"/docs/getting-started/running-nexus/configuration/index.html#long-living-auth-token-legacy-","text":"{\n type: \"jwt-token\"\n token: \"long-living-auth-token\"\n}","title":"Long-living auth token (legacy)"},{"location":"/docs/getting-started/running-nexus/configuration/index.html#archive-plugin-configuration","text":"The archive plugin configuration can be found here.","title":"Archive plugin configuration"},{"location":"/docs/getting-started/running-nexus/configuration/index.html#jira-plugin-configuration","text":"The Jira plugin configuration can be found here.\nSetting up the Jira plugin needs to set up the endpoint of the Jira instance but also the consumer key, the consumer secret and the private key required to interact with Jira (more details including the configuration steps in Jira here).","title":"Jira plugin configuration"},{"location":"/docs/getting-started/running-nexus/configuration/index.html#monitoring","text":"For monitoring, Nexus Delta relies on Kamon.\nKamon can be disabled by passing the environment variable KAMON_ENABLED set to false.\nDelta configuration for Kamon is provided in the monitoring section. For a more complete description on the different options available, please look at the Kamon website.","title":"Monitoring"},{"location":"/docs/getting-started/running-nexus/configuration/index.html#instrumentation","text":"Delta provides the Kamon instrumentation for:\nExecutors Scala futures Logback System metrics","title":"Instrumentation"},{"location":"/docs/getting-started/running-nexus/configuration/index.html#reporters","text":"Kamon reporters are also available for:\nJaeger Prometheus","title":"Reporters"},{"location":"/docs/getting-started/running-nexus/search-configuration.html","text":"","title":"Search configuration"},{"location":"/docs/getting-started/running-nexus/search-configuration.html#search-configuration","text":"Nexus provides global search functionality across all projects through the search plugin.\nWarning The search plugin is experimental and its functionality and API can change without notice.\nThere are several aspects that have been taken into consideration when adding global search capabilities in Nexus:\nglobal search requires a common (index) data model for correct analysis, indexing and querying it must obey the configured access control (search results should include only entries that the client has access to) clients must be able to discover the data model in order to be able to build appropriate queries projects administrators should be able to control, reconfigure or opt out of presenting information in global search","title":"Search configuration"},{"location":"/docs/getting-started/running-nexus/search-configuration.html#how-global-search-works","text":"Considering the requirements listed above, the implementation relies on existing Nexus features, namely:\nComposite Views as control resources for partitioning of indices, how data is indexed, what information to collect, what permissions are required for querying Automatic provisioning of project resources after creation Plugins for orchestrating the behaviour and exposing specific endpoints\nWhen the search plugin is enabled and configured it will automatically create within each project a CompositeView that controls what resources are indexed and what information is collected for each resource. The reasons for using one CompositeView per project are that resources shapes may differ between projects but also the indices must be partitioned such that when consuming the query interface, the query must be dispatched only to the indices that the client has access to. The CompositeView id is identical for each project: https://bluebrain.github.io/nexus/vocabulary/searchView.\nOnce the CompositeView is created by the plugin, it can be updated by each project administrator (specifically any client that demonstrates views/write permission on the target project) to adjust the configuration based on the specifics of the project (different access control, different resource shapes, custom selection of resources etc.).\nCompositeViews have been chosen because they are quite versatile, support a wide range of configuration options:\nmultiple sources multiple projections (indices) project graph traversal for collecting the necessary fields simple transformations\nMore information about CompositeViews can be found in the API Reference.\nThe search plugin introduces a new namespace (/v1/search) with two sub-resources (query and config).\nThe query endpoint accepts submitting an Elasticsearch query via POST, similar to other views based on Elasticsearch, like ElasticSearchView, AggregateElasticSearchView or CompositeView with configured Elasticsearch projections. The query will be dispatched to all ElasticSearch indices managed by the CompositeViews created by the search plugin (the ones that share the id mentioned above) for which the client has access to. This ensures that access to information is restricted based on each project’s access control.\nThe config endpoint allows clients to discover the underlying index data model such that it can present users (like in the case of Fusion) an appropriate interface for querying, filtering, sorting, aggregations etc. A minimal response for the config endpoint is like the following example:\n{\n \"fields\": [\n {\n \"name\": \"project\",\n \"label\": \"Project\",\n \"array\": false,\n \"optional\": false,\n \"fields\": [\n {\n \"name\": \"identifier\",\n \"format\": [\n \"uri\"\n ],\n \"optional\": false\n },\n {\n \"name\": \"label\",\n \"format\": [\n \"keyword\",\n \"text\"\n ],\n \"optional\": false\n }\n ]\n },\n {\n \"name\": \"@type\",\n \"label\": \"Types\",\n \"array\": true,\n \"optional\": false,\n \"format\": [\n \"uri\"\n ]\n }\n ]\n}\n… where the returned document describes a set of fields to be expected in the indices:\nname: String - the name of the field in the Elasticsearch document label: String - a human-readable label to be presented to users array: Boolean - true if the field can have multiple values, false otherwise optional: Boolean - true if the field may not exist in certain documents, false otherwise format: Array(String) - the expected formats of the field (e.g. uri, keyword, text, boolean, number etc.); format and fields cannot be present at the same time fields: Array(Object) - enumeration of nested fields; there are situations where a field value can (should) be handled differently depending on the intent, like for example ontological values that are represented by a Uri but also a String (name or label). Clients should be aware of such case to understand what to present to their users but also how to properly compute queries. format and fields cannot be present at the same time fields.name: String - the name of the sub-field in the Elasticsearch document fields.format: Array(String) - the expected formats of the field (e.g. uri, keyword, text, boolean, number etc.) fields.optional: Boolean - true if the field may not exist in certain documents, false otherwise\nThe config endpoint was created to allow clients to discover how resources are indexed and can be queried. It is currently loaded as a static file (plugins.search.fields={pathToFile}) during Delta’s bootstrapping, and it must match the rest of the search configuration:\nplugins.search.indexing.resource-types={pathToFile} - the list of types which will be used to filter resources to be indexed in the ElasticSearch projection plugins.search.indexing.query={pathToFile} - SPARQL construct query that will be used to create triples which will be then compacted using the provided context and indexed in the ElasticSearch projection plugins.search.indexing.context={pathToFile} - the context which is used to transform the results of the SPARQL construct query into compacted JSON-LD which will be indexed in the ElasticSearch projection plugins.search.indexing.mapping={pathToFile} - the Elasticsearch mappings that will be used in the ElasticSearch projection plugins.search.indexing.settings={pathToFile}- additional Elasticsearch settings that will be used in the ElasticSearch projection\nThe search plugin must also be enabled using the plugins.search.enabled=true setting.\nThese additional settings pertain to the configuration of the CompositeViews that are automatically provisioned by the search plugin. The CompositeView API Reference provides a detailed explanation on how CompositeViews work and how these options affect the generation of the indices.","title":"How Global Search works"},{"location":"/docs/getting-started/running-nexus/search-configuration.html#example-use-case","text":"This section describes a search configuration for a hypothetical data model presented in the diagram below. The example uses four related data types (Dataset, DataDownload, Person and License) and the intent is to provide the ability to query resources of type Dataset along with information registered in related resources (of type DataDownload, Person or License).\nThere are a couple of things to notice in the data model diagram:\nschema:name and rdfs:label are both optional but mostly used for the same purpose; the information should be collected from one with a fallback on the other schema:license is also optional, not all datasets may have a license when some fields are marked as optional it means that the resource of type Dataset may not include those fields, but this doesn’t mean that all related resources exist (a Dataset may refer to a Person that does not exist in the project)\nThe goal is that indexing will produce Elasticsearch documents that has the following structure:\n{\n \"@id\": \"...\",\n \"@type\": [\"http://schema.org/Dataset\", \"http://other...\"],\n \"name\": \"\",\n \"description\": \"...\",\n \"author\": \" if exists\",\n \"license\": \"