Skip to content

Commit

Permalink
Merge pull request #1216 from datalad-handbook/facts
Browse files Browse the repository at this point in the history
Add content from factsheets into FAQ
  • Loading branch information
adswa authored Feb 28, 2024
2 parents 0db27ca + 330be64 commit 3abde85
Showing 1 changed file with 61 additions and 1 deletion.
62 changes: 61 additions & 1 deletion docs/basics/101-180-FAQ.rst
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,22 @@ and functions:

more here.

What kind of data is compatible with DataLad datasets?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Any information that can be expressed in digital files.
This includes text files, tabular data, images, in any file format, and of any size and number.

What does a DataLad dataset contain? In what way is it “lightweight”?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Simply speaking, a DataLad dataset only contains metadata.
Metadata on the identity and availability of data.
On a computer with the DataLad software installed, a DataLad dataset looks like any other folder with files.
However, file content is only obtained on-access.
When file content needs to be accessed, it is downloaded from any of the known storage locations for a file.
It can only be downloaded, if the requesting user has the necessary authorization to access a file.


Does DataLad host my data?
^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand All @@ -112,6 +128,17 @@ published dataset and its data are as easy as if it would lie on your own
machine.
You can find a typical workflow in the chapter :ref:`chapter_thirdparty`.

Is my data automatically "open" when I publish it with DataLad?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

How openly your data is published is your choice.
Combined with a fitting hosting provider, you can make all your data openly available, or available only partially, or to a specific audience, or only in the form of metadata.

Can I selectively publish only some data?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Yes. Or even just selected metadata.

How does GitHub relate to DataLad?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand All @@ -135,7 +162,7 @@ Conceptually and technically, there is no difference between a dataset, a
subdataset, or a superdataset. The only aspect that makes a dataset a sub- or
superdataset is whether it is *registered* in another dataset (by means of an entry in the
``.gitmodules``, automatically performed upon an appropriate ``datalad
install -d`` or ``datalad create -d`` command) or contains registered datasets.
clone -d`` or ``datalad create -d`` command) or contains registered datasets.


How can I convert/import/transform an existing Git or git-annex repository into a DataLad dataset?
Expand All @@ -159,6 +186,17 @@ How can I convert an existing DataLad dataset with annexed data back to a plain
If you decide to stop using git-annex or DataLad, or if you want to turn an annex repo back into a Git repo, you can do so with the git-annex uninit command.
The section :ref:`uninit` contains more details.

What does DataLad cost?
^^^^^^^^^^^^^^^^^^^^^^^

DataLad is free and open source software. There are no fees, no running costs.
A necessary investment is to learn how to use this tool.

Who develops DataLad?
^^^^^^^^^^^^^^^^^^^^^

DataLad is an international academic open source project with more than a hundred contributors, spearheaded by a US-German collaboration between Dartmouth College and the Research Centre Jülich.

How can I cite DataLad?
^^^^^^^^^^^^^^^^^^^^^^^

Expand Down Expand Up @@ -507,6 +545,28 @@ or :dlcmd:`install`.

There is more info about this in the :ref:`OpenNeuro Quickstart Guide <openneuro>`.

How does DataLad process the data given to it?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

DataLad does not modify file content, or enforce a particular data organization inside a dataset.
From a user-perspective, a DataLad dataset is a regular directory on a computer’s file system.
This directory is populated with files that a user has placed into this directory.
DataLad manages identity information of these files over time (version controlled content identifiers, typically based on checksums).
DataLad also assists with file transport (upload/download) to and from this directory, and tracks the associated file content availability metadata.
Users may also associate arbitrary additional metadata with any file content or dataset version.
Any and all metadata and DataLad-internal management information is kept separate from the managed content, but located inside the managed directory (in a .git subdirectory).
No information is transmitted to a location outside this local directory, unless a user explicitly performs such an action.
DataLad is agnostic with respect to the content of a file it manages within a dataset.
DataLad reads all file content in binary form for the sole purpose of computing a content identifier, which is typically based on a checksum (e.g., MD5 or SHA1).
This content identifier is used to associate file content availability and other metadata.
DataLad supports the execution of user-defined, user-provided metadata extractor algorithms.
These software components can process files of a particular format, in order to derive metadata from it.
Volume, format and terminology of such metadata are determined by the provider of an extractor implementation, and a user’s parameterization.
DataLad also supports the execution of user-defined programs and scripts.
When executed through DataLad, users can record inputs and parameters of such a process, and DataLad can capture the identity of any generated output files.
This enables metadata-based queries on the origin of files, and programmatic recomputing.


.. _bidsvalidator:

BIDS validator issues in datasets with missing file content
Expand Down

0 comments on commit 3abde85

Please sign in to comment.