Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for Preston provenance tracking / archives #52

Open
jhpoelen opened this issue May 3, 2022 · 3 comments
Open

add support for Preston provenance tracking / archives #52

jhpoelen opened this issue May 3, 2022 · 3 comments

Comments

@jhpoelen
Copy link
Member

jhpoelen commented May 3, 2022

Currently, Elton is using a built-in provenance tracking of biodiversity datasets.

Suggested is to add support for Preston archives / provenance tracking.

@jhpoelen
Copy link
Member Author

jhpoelen commented Mar 10, 2023

@seltmann suggested to take https://zenodo.org/record/7194486 [1] and translate them into a "clonable" publication, so that you don't have to click a hundred of times to download load the different versions.

References

[1] Poelen, Jorrit H., Seltmann, Katja C., Campbell, Mariel, Orlofske, Sarah A., Light, Jessica E., Tucker, Erika M., Demboski, John R, McElrath, Tommy, Grinter, Christopher C, Diaz-Bastin, Rachel, Bush, Sarah E, Delapena, Robin, Cook, Joseph, Gall, Lawrence F., Whiting, Michael F, Clark, Shawn M, Cameron, Stephen L, Replogle, Charla R, Rund, Samuel S.C., … Bailey, Colin. (2022). Terrestrial Parasite Tracker indexed biotic interactions and review summary (0.7) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7194486

@jhpoelen
Copy link
Member Author

jhpoelen commented Nov 30, 2023

With addition of:

elton log

and

elton cat

a first bridge to Preston is built from Elton

Example 1 - Calculate size of dataset resources.

# get template dataset
elton pull globalbioticinteractions/template-dataset
# calculate size of resources associated with the template dataset
elton log | elton cat | pv -b > /dev/null

yielding:

9.44KiB

Example 2 - Retrieve resource assets via software heritage

Many GloBI index configurations are kept as GitHub repositories. These repositories can be (and often are) tracked by the Software Heritage Library https://softwareheritage.org . Preston supports Software Heritage as a content remote. So, you can retrieve content logged by elton via the software heritage library if available.

# clone a species interaction dataset
elton pull globalbioticinteractions/template-dataset

# log resources associated with the species interaction dataset
#, selecting globi.json resource 
# and retrieving them via software heritage
elton log globalbioticinteractions/template-dataset\
| grep globi.json\
| preston cat --remote https://softwareheritage.org 

yields:

{ 
  "_comment": "Sample GloBI dataset descriptor. See http://github.com/globalbioticinteractions for more information.",
  "citation": "Jorrit H. Poelen. 2014. Species associations manually extracted from literature."
}

Example 3 . Package Elton dataset as a Preston archive

# clone a version of a species interaction dataset
elton pull globalbioticinteractions/template-dataset

# add version to preston 
elton log | preston append 

# move content from Elton into Preston
elton log |  parallel -N1 --pipe 'elton cat | preston track'

# zip up preston data dir
zip -r globalbioticinteractions_template-dataset.zip data 

yields attached
globalbioticinteractions_template-dataset.zip

with preston head hash://sha256/3cb3ac31bbb057d97090cfd067b531c343b7de77ba7a7226281072addf308a18

@jhpoelen
Copy link
Member Author

fyi @mielliott @zedomel - I just connected elton and preston with a creaky integration bridge via elton log and elton cat .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant