- Author: Mike D'Arcy, Kyle Chard, Ian Foster, Carl Kesselman, Ravi Madduri, Nickolaus Saint, Rick Wagner
- Title: Big Data Bags: A Scalable Packaging Format for Science
- Submission: https://doi.org/10.5281/zenodo.3338725
- Submitted as: RO2019 abstract
- Decision: accept talk (accepted for oral presentation)
- Reviewer: (anonymous)
Is the text easy to follow? Are core concepts defined or referenced? Is it clear what is the author's contribution?
- good
URL for a Research Object or Zenodo record provided? Guidelines followed? Open format (e.g. HTML)? Sufficient metadata, e.g. links to software? Some form of Data Package provided? Add text below if you need to clarify your score.
- basic (e.g. Zenodo with PDF and minimal metadata)
Please provide a brief review, including a justification for your scores. Both score and review text are required.
- weak accept
The abstract presents a data format called Big Data Bags (BDBags), built on top of BagIt to describe the contents of a dataset, and using research object annotations to describe complex resources. The contributions of this work also include a tool and GUI to simplify the use of BDBags. Three use cases are presented that illustrate the approach.
Rather than scientific novelty, this abstract shares with the research object community an in-use case of the research object models and principles, supported by ad-hoc tooling. For this reason, i.e. the practical and in-use angle, it would be interesting to have further discussions during the workshop about lessons learnt and future work.
- Reviewer: Oscar Corcho https://orcid.org/0000-0002-9260-0753
Is the text easy to follow? Are core concepts defined or referenced? Is it clear what is the author's contribution?
- excellent
The abstract is clear on the contribution that will be presented, namely BDBag, and its characteristics.
URL for a Research Object or Zenodo record provided? Guidelines followed? Open format (e.g. HTML)? Sufficient metadata, e.g. links to software? Some form of Data Package provided? Add text below if you need to clarify your score.
- basic (e.g. Zenodo with PDF and minimal metadata)
I would have appreciated having at least one example of a BDBag associated to this abstract, including for example the packaging of the software that is mentioned as a repository in GitHub. Something like an eat-your-own-dog-food example.
Please provide a brief review, including a justification for your scores. Both score and review text are required.
- accept
The features of the BDBags spec, including both the attributes that are being used and the extensions to BagIt, as well as the way in which contents can be packaged inside a BDBag and how they can be used, will surely attract the attention and discussions of the attendees of the workshop. As a result, this abstract is very relevant.
The second page of the abstract provides some descriptions of how BDBags has been used in different scenarios. I would have appreciated a bit more descriptive text there since that part of the paper reads a bit more "commercial" than focusing on the main challenges that each domain had, but that can be solved when published.
- Reviewer: (anonymous)
Is the text easy to follow? Are core concepts defined or referenced? Is it clear what is the author's contribution?
(delete as appropriate)
- good
URL for a Research Object or Zenodo record provided? Guidelines followed? Open format (e.g. HTML)? Sufficient metadata, e.g. links to software? Some form of Data Package provided? Add text below if you need to clarify your score.
- good (followed guidelines, demonstrating own format, related resources included, but some details missing)
The project and use-cases are well described and referenced, and the text is easy to follow.
Please provide a brief review, including a justification for your scores. Both score and review text are required.
- accept
BDBags provide a concrete, real-world example of research objects in use. The pragmatic combination of (optionally 'holey') Bagit bags and Research Objects metadata is compelling. While prior-work BDBags have been previously presented, the more recent applications outlined in "II. Use Cases" will be of interest to participants.