Conventions for fs fetch-type plugins #25
jennydaman
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Abstract
This document describes a convention for ChRIS fs-type plugins which can download files from network resources. Plugins conformant to this convention must have the name
pull-{SCHEME}
and must accept one required argument--uri
which comes in the form{SCHEME}://{RESOURCE}
. ChRIS clients such as the ChRIS_ui can benefit from this convention by being able to create feeds automatically given just a URI to some data.Introduction
ChRIS fs-type plugins create ChRIS feeds by producing output data files to a given output directory. fs plugins are a means to import datasets into ChRIS.
Currently there exists some fs plugins, which have been described as "data pack plugins," which copy data from a hard-coded path inside their container image to the output directory. Some examples:
Distributing data inside of Docker containers has its advantages such as versioning and immutability, however it’s very inefficient. Moreover, it is repetitive to be pairing each of these datasets with their custom Python copy-files script.
A more efficient solution would be to have one fs plugin which is able to "pull" datasets from a public CDN, and make these datasets available from a public CDN.
Convention
Here, the "ChRIS URI 'Pull'" plugin convention is defined.
A plugin conformant to the "URI Pull" convention must:
pull-{SCHEME}
{"name": "uri", "long_flag": "--uri"}
The plugin should:
uri
is in the form{SCHEME}://{RESOURCE}
{SCHEME}://{RESOURCE}
and write it to the output directoryWhere:
SCHEME
is a network protocol, e.g.http
,https
,git
,ftp
,ipfs
,rsync
{SCHEME}://{RESOURCE}
is a URIIntegration
Plugins To Create
We plan to create the following ChRIS plugins which will conform to the "URI Pull" convention:
pull-http
andpull-https
which does something like awget --recursive
pull-git
for public git repositories, e.g.git://github.com/FNNDSC/SAG-anon-nii.git
pull-ipfs
: see https://ipfs.io/, https://en.wikipedia.org/wiki/InterPlanetary_File_Systempull-datalad
: see https://www.datalad.org/Clients
ChRIS clients for end-users, such as the ChRIS_ui, can take advantage of the "URI Pull" convention and provide a feature where feeds can be created given a dataset URI.
Issues
What’s a better way to support clients which support multiple protocols? For instance, we would only want to create one
pull-http
which supports bothhttp://
andhttps://
.One option is to have a single code base registered with two different plugin names. Another option is for clients to hard-code a mapping of schemes to plugin names.
A more difficult case would be for datalad and git repositories: a URI matching
^https://.+\.git$
could be any of: a HTTPS resource, a git repository, or a datalad repository.Beta Was this translation helpful? Give feedback.
All reactions