-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #2 from ebi-gdp/dev
Release 2.0.0
- Loading branch information
Showing
13 changed files
with
454 additions
and
65 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,7 +3,7 @@ | |
We needed a way to: | ||
|
||
1) Reliably download files from a Globus collection over HTTPS | ||
2) Decrypt them on the fly ([crypt4gh](https://github.com/EGA-archive/crypt4gh)) | ||
2) Optionally decrypt them on the fly ([crypt4gh](https://github.com/EGA-archive/crypt4gh)) | ||
3) Store the plaintext files in an object store (bucket), ready for cloud based data science workflows | ||
|
||
The [file handler CLI](https://github.com/ebi-gdp/globus-file-handler-cli) takes care of 1) and 2). | ||
|
@@ -12,18 +12,35 @@ The [file handler CLI](https://github.com/ebi-gdp/globus-file-handler-cli) takes | |
|
||
Downloaded files can also be saved to a local filesystem. | ||
|
||
> [!NOTE] | ||
> This workflow grabs crypt4gh secret keys from the INTERVENE key handler service, but could be adapted to work with local crypt4gh key pairs | ||
### Table of Contents | ||
|
||
- [Parameters](#parameters) | ||
* [File input](#file-input) | ||
* [Secret key](#secret-key) | ||
* [Application properties](#application-properties) | ||
* [crypt4gh application properties](#crypt4gh-application-properties) | ||
- [Example use cases](#example-use-cases) | ||
* [Download files from a Globus collection over HTTPS](#download-files-from-a-globus-collection-over-https) | ||
* [Downloading files with crypt4gh decryption on the fly](#downloading-files-with-crypt4gh-decryption-on-the-fly) | ||
* [Downloading files to an object store (bucket)](#downloading-files-to-an-object-store) | ||
- [Helm support](#helm-support) | ||
|
||
## Parameters | ||
|
||
### File input | ||
|
||
> [!IMPORTANT] | ||
> This is parameter is mandatory | ||
`--input` must be a JSON array with the following structure: | ||
|
||
``` | ||
{ | ||
"dir_path_on_guest_collection": "[email protected]/test_hapnest/", | ||
"files": [ | ||
{ | ||
"filename": "hapnest.pvar", | ||
"size": 278705850 | ||
}, | ||
{ | ||
"filename": "hapnest.pgen.crypt4gh", | ||
"size": 278825058 | ||
|
@@ -32,17 +49,39 @@ Downloaded files can also be saved to a local filesystem. | |
} | ||
``` | ||
|
||
`--config_secrets` must be a path to a spring boot application properties file with the following structure: | ||
### Secret key | ||
|
||
> [!IMPORTANT] | ||
> This is parameter is optional | ||
`--secret_key` must be a JSON file with the following structure: | ||
|
||
``` | ||
{"secretId": "77451C57-0FCC-460F-91A3-E0DED05B440F", "secretIdVersion": "1"} | ||
``` | ||
|
||
The secret key is used to contact the platform key handler service and grab the correct crypt4gh secret key. | ||
|
||
### Application properties | ||
|
||
> [!IMPORTANT] | ||
> This parameter is mandatory | ||
> [!TIP] | ||
> Be careful of trailing whitespace in properties files | ||
`--config_application` must be a path to a spring boot application properties file with the following structure: | ||
|
||
``` | ||
##################################################################################### | ||
# Application config | ||
##################################################################################### | ||
spring.main.web-application-type=none | ||
data.copy.buffer-size=8192 | ||
##################################################################################### | ||
# Apache HttpClient connection config | ||
##################################################################################### | ||
webclient.connection.pipe-size=4096 | ||
webclient.connection.pipe-size=${data.copy.buffer-size} | ||
webclient.connection.connection-timeout=5 | ||
webclient.connection.socket-timeout=0 | ||
webclient.connection.read-write-timeout=30000 | ||
|
@@ -61,31 +100,43 @@ file.download.retry.attempts.back-off-period=2000 | |
##################################################################################### | ||
# Globus config | ||
##################################################################################### | ||
globus.guest-collection.domain=<url> | ||
globus.guest-collection.domain=@globus.guest-collection.url@ | ||
#Oauth | ||
globus.aai.access-token.uri=https://auth.globus.org/v2/oauth2/token | ||
globus.aai.client-id=<id> | ||
globus.aai.client-secret=<token> | ||
globus.aai.scopes=<url> | ||
##################################################################################### | ||
# Crypt4gh config | ||
##################################################################################### | ||
crypt4gh.binary-path=/opt/bin/crypt4gh | ||
crypt4gh.shell-path=/bin/bash -c | ||
[email protected]@ | ||
[email protected]@ | ||
globus.aai.scopes=https://auth.globus.org/scopes/c1e6310c-11d5-4e8a-9443-211884f04c6f/https | ||
##################################################################################### | ||
# Logging config | ||
##################################################################################### | ||
logging.level.uk.ac.ebi.intervene=INFO | ||
logging.level.org.springframework=WARN | ||
logging.level.org.apache.http=WARN | ||
logging.level.org.apache.http.wire=WARN | ||
``` | ||
|
||
See the [file handler CLI](https://github.com/ebi-gdp/globus-file-handler-cli) README for a description of the configuration. | ||
|
||
### crypt4gh application properties | ||
|
||
> [!IMPORTANT] | ||
> This is parameter is optional | ||
`--config_crypt4gh` must be a path to a spring boot application properties file with the following structure: | ||
|
||
``` | ||
##################################################################################### | ||
# key handler service config | ||
# Crypt4gh config | ||
##################################################################################### | ||
intervene.key-handler.basic-auth=Basic <token> | ||
intervene.key-handler.secret-key.password=<password> | ||
intervene.key-handler.base-url=https://<url>/key-handler | ||
crypt4gh.binary-path=/opt/bin/crypt4gh | ||
crypt4gh.shell-path=/bin/bash -c | ||
##################################################################################### | ||
# Intervene service config | ||
##################################################################################### | ||
intervene.key-handler.base-url=http://localhost:8040/bff/key-handler | ||
intervene.key-handler.keys.uri=/key/{secretId}/version/{secretIdVersion} | ||
intervene.key-handler.basic-auth=${KEY_HANDLER_BASIC_AUTH:basic-auth} | ||
intervene.key-handler.secret-key.password=${SEC_KEY_PASSWD:test-password} | ||
``` | ||
|
||
See the [file handler CLI](https://github.com/ebi-gdp/globus-file-handler-cli) README for a description of the configuration. | ||
|
@@ -103,29 +154,47 @@ which integrates with the key handler service. | |
|
||
## Example use cases | ||
|
||
> [!TIP] | ||
> `--debug` can be helpful to keep files containing sensitive data if you're having problems with a transfer (disabled by default) | ||
### Download files from a Globus collection over HTTPS | ||
|
||
``` | ||
$ nextflow run main.nf -profile docker \ | ||
--input input.json \ | ||
--config_application application.properties \ | ||
--outdir downloads | ||
``` | ||
|
||
### Downloading files with crypt4gh decryption on the fly | ||
|
||
It makes sense to submit these jobs to [a grid executor](https://www.nextflow.io/docs/latest/executor.html), like SLURM or cloud batch, because decryption on the fly will use ~1 CPU for each file: | ||
|
||
``` | ||
$ nextflow run main.nf -profile <docker/singularity> \ | ||
$ nextflow run main.nf -profile docker \ | ||
--input input.json \ | ||
--secret_key key.json \ | ||
--config_application application.properties \ | ||
--config_crypt4gh application-crypt4gh-secret-manager.properties \ | ||
--config_secrets assets/secret.properties \ | ||
--input assets/example_input.json \ | ||
--outdir downloads \ | ||
--secret_key key | ||
--decrypt | ||
``` | ||
|
||
### Downloading files to an object store (bucket) | ||
### Downloading files to an object store | ||
|
||
It's possible to use nextflow's support for object storage to transfer files from Globus directly to a bucket: | ||
|
||
``` | ||
$ nextflow run main.nf -profile <docker/singularity> \ | ||
$ nextflow run main.nf -profile docker \ | ||
-c cloud.config \ | ||
--input input.json \ | ||
--secret_key key.json \ | ||
--config_application application.properties \ | ||
--config_crypt4gh application-crypt4gh-secret-manager.properties \ | ||
--config_secrets assets/secret.properties \ | ||
--input assets/example_input.json \ | ||
--secret_key key \ | ||
--outdir gs://test-bucket/downloads \ | ||
-w gs://test-bucket/work | ||
--outdir gs://pathtobucket/downloads \ | ||
-w gs://pathworkbucket/work | ||
``` | ||
|
||
For best performance use a cloud executor and enable fusion in the nextflow configuration: | ||
|
@@ -145,6 +214,7 @@ fusion { | |
tower { | ||
accessToken = 'token' | ||
workspaceId = 'work' | ||
enabled = true | ||
} | ||
|
@@ -156,3 +226,9 @@ google { | |
} | ||
} | ||
``` | ||
|
||
## Helm support | ||
|
||
`helm/` contains a [helm chart](https://helm.sh/docs/topics/charts/) which can install a [Job](https://kubernetes.io/docs/concepts/workloads/controllers/job/) to a Kubernetes cluster. | ||
|
||
In the helm chart worker processes run in Cloud Batch by default with crypt4gh decryption on the fly enabled. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,10 +1,6 @@ | ||
{ | ||
"dir_path_on_guest_collection": "[email protected]/test_hapnest/", | ||
"files": [ | ||
{ | ||
"filename": "hapnest.pvar", | ||
"size": 278705850 | ||
}, | ||
{ | ||
"filename": "hapnest.pgen.crypt4gh", | ||
"size": 278825058 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
{ | ||
"secretId": "8D705854-9EEA-44C5-9937-E4E5228B8457", | ||
"secretIdVersion": "1" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
values.yaml |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
# Patterns to ignore when building packages. | ||
# This supports shell glob matching, relative path matching, and | ||
# negation (prefixed with !). Only one pattern per line. | ||
.DS_Store | ||
# Common VCS dirs | ||
.git/ | ||
.gitignore | ||
.bzr/ | ||
.bzrignore | ||
.hg/ | ||
.hgignore | ||
.svn/ | ||
# Common backup files | ||
*.swp | ||
*.bak | ||
*.tmp | ||
*.orig | ||
*~ | ||
# Various IDEs | ||
.project | ||
.idea/ | ||
*.tmproj | ||
.vscode/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
apiVersion: v2 | ||
name: globflow | ||
description: A Helm chart for a globflow file transfer with crypt4gh decryption on the fly | ||
|
||
# A chart can be either an 'application' or a 'library' chart. | ||
# | ||
# Application charts are a collection of templates that can be packaged into versioned archives | ||
# to be deployed. | ||
# | ||
# Library charts provide useful utilities or functions for the chart developer. They're included as | ||
# a dependency of application charts to inject those utilities and functions into the rendering | ||
# pipeline. Library charts do not define any templates and therefore cannot be deployed. | ||
type: application | ||
|
||
# This is the chart version. This version number should be incremented each time you make changes | ||
# to the chart and its templates, including the app version. | ||
# Versions are expected to follow Semantic Versioning (https://semver.org/) | ||
version: 0.1.0 | ||
|
||
# This is the version number of the application being deployed. This version number should be | ||
# incremented each time you make changes to the application. Versions are not expected to | ||
# follow Semantic Versioning. They should reflect the version the application is using. | ||
# It is recommended to use it with quotes. | ||
appVersion: "2.0.0" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
apiVersion: v1 | ||
kind: ConfigMap | ||
metadata: | ||
name: {{ .Release.Name }}-transfer-config | ||
data: | ||
input.json: {{ toJson .Values.globflowInput | quote }} | ||
key.json: {{ toJson .Values.keyHandlerSecret | quote }} | ||
params.yml: | | ||
{{- range $key, $value := .Values.globflowParams }} | ||
{{ $key }}: {{ $value }} | ||
{{- end }} | ||
nxf.config: | | ||
workDir = {{ .Values.nxfParams.workBucketPath | quote }} | ||
process { | ||
executor = 'google-batch' | ||
maxRetries = 1 | ||
} | ||
google { | ||
project = {{ .Values.nxfParams.gcpProject | quote }} | ||
location = {{ .Values.nxfParams.location | quote }} | ||
batch { | ||
spot = {{ .Values.nxfParams.spot }} | ||
} | ||
} | ||
wave { | ||
enabled = {{ .Values.nxfParams.wave }} | ||
} | ||
fusion { | ||
enabled = {{ .Values.nxfParams.fusion }} | ||
} | ||
tower { | ||
accessToken = {{ .Values.secrets.towerToken | quote }} | ||
workspaceId = {{ .Values.secrets.towerId | quote }} | ||
enabled = true | ||
} | ||
scm: | | ||
providers { | ||
ebi { | ||
server = 'https://gitlab.ebi.ac.uk' | ||
platform = 'gitlab' | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
apiVersion: batch/v1 | ||
kind: Job | ||
metadata: | ||
name: {{ .Release.Name }} | ||
spec: | ||
ttlSecondsAfterFinished: 3600 | ||
backoffLimit: 0 | ||
template: | ||
metadata: | ||
annotations: | ||
cluster-autoscaler.kubernetes.io/safe-to-evict: "false" | ||
spec: | ||
serviceAccountName: nextflow | ||
containers: | ||
- name: globflow | ||
image: {{ .Values.baseImage }}:{{ .Values.dockerTag }} | ||
imagePullPolicy: {{ .Values.pullPolicy }} | ||
command: ['sh', '-c', "nextflow run https://gitlab.ebi.ac.uk/gdp-public/globflow.git -params-file /opt/nxf/params.yml -c /opt/nxf/nxf.config --decrypt"] | ||
env: | ||
- name: NXF_SCM_FILE | ||
value: /opt/nxf/scm | ||
resources: | ||
requests: | ||
cpu: "1" | ||
memory: 2G | ||
ephemeral-storage: 10G | ||
volumeMounts: | ||
- name: transfer-config | ||
mountPath: /opt/nxf | ||
- name: globflow-secrets | ||
mountPath: /opt/globflow/ | ||
readOnly: true | ||
volumes: | ||
- name: transfer-config | ||
configMap: | ||
name: {{ .Release.Name }}-transfer-config | ||
items: | ||
- key: nxf.config | ||
path: nxf.config | ||
- key: scm | ||
path: scm | ||
- key: params.yml | ||
path: params.yml | ||
- key: input.json | ||
path: input.json | ||
- key: key.json | ||
path: key.json | ||
- name: globflow-secrets | ||
secret: | ||
secretName: {{ .Release.Name }}-transfer-secrets | ||
restartPolicy: Never |
Oops, something went wrong.