Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

include original data that was used to populate the parsed interaction data #43

Open
jhpoelen opened this issue Nov 18, 2020 · 1 comment

Comments

@jhpoelen
Copy link
Member

GloBI parses versioned original datasets into a common model. Currently, there's no explicit link between the original raw data and their parsed counterparts. However, there is an explicit link with the version of the dataset used to access the data.

Once the original data is available, we can include this in the "dataContext" of the data review.

E.g.,

"sourceDataHeader": "1.collectionobject.catalogNumber,\"1,7.accession.accessionNumber\",1.collectionobject.altCatalogNumber,1.collectionobject.catalogedDate,"1,5-cataloger.collectionobject.cataloger","1,9-determinations,4.taxon.Class","1,9-determinations,4.taxon.Subclass","1,9-determinations,4.taxon.Superorder","1,9-determinations,4.taxon.Order","1,9-determinations,4.taxon.Suborder","1,9-determinations,4.taxon.Infraorder","1,9-determinations,4.taxon.Parvorder","1,9-determinations,4.taxon.Superfamily","1,9-determinations,4.taxon.Family","1,9-determinations,4.taxon.Subfamily","1,9-determinations,4.taxon.Genus","1,9-determinations,4.taxon.Subgenus","1,9-determinations,4.taxon.Species","1,9-determinations,4.taxon.Subspecies","1,9-determinations,5-determiner.determination.determiner","1,9-determinations.determination.determinedDate","1,9-determinations.determination.typeStatusName","1,93.collectionobjectattribute.text4","1,93.collectionobjectattribute.text1","1,93.collectionobjectattribute.text8","1,10.collectingevent.startDate","1,10.collectingevent.endDate","1,10.collectingevent.verbatimDate","1,10,2.collectingevent.locality","1,10.collectingevent.stationFieldNumber","1,10.collectingevent.remarks","1,10.collectingevent.method","1,10,30-collectors.collectingevent.collectors","1,93.collectionobjectattribute.text10","1,93.collectionobjectattribute.text11","1,93.collectionobjectattribute.text12","1,93.collectionobjectattribute.text13","1,93.collectionobjectattribute.text14","1,93.collectionobjectattribute.text15","1,93.collectionobjectattribute.text16","1,93.collectionobjectattribute.text9","1,93.collectionobjectattribute.text17","1,93.collectionobjectattribute.remarks","1,63-preparations,65.preparation.prepType","1,63-preparations.preparation.countAmt",1.collectionobject.guid",
"sourceData": ",,,2016-06-01,Danielle Tanzer,Insecta,,,Odonata,ZYGOPTERA,,,,Coenagrionidae,,Argia ,,plana,,,,,adult,M,,,,07/08/1931,\"United States, Arizona, Cochise Co.; Huachuca Mts., Ramsey Canyon; 31.4587300000; -110.2968700000\",,,,\"Gloyd, Leonora\",,,,,,,,,,,Paper Triangle,1,dee674d7-219f-4f81-96a1-5dba4ee1726a"

Also include the mapping schema that translates the source data into the GloBI model (e.g., mapping columns into "GloBI" columns), in addition to the value translations made / introduced by GloBI (translating values into values that GloBI understands (e.g., "host of" -> "RO:123455").

@jhpoelen
Copy link
Member Author

jhpoelen commented Nov 18, 2020

So, the current dataContext in the review would consist of four parts:

  1. original raw source data (e.g., or by reference gz:hash://sha256/fc5d26b3cd96656bbaa8a2f19f6ae4914b9fbdc884e84d45744ad69db2874514!/b345-397)
  2. parsed raw source data (no translations/ mapping yet)
  3. mappings / schema's used
  4. the mapped / translated data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant