marp | theme | paginate | license | title | author | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
true |
marp-theme_dataplant-ceplas-ccby |
true |
[CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/) |
Metadata and ISA |
|
<style scoped> section { text-align: center; background: #F9CD69; } section::after { display: none; } footer { display: none; } </style>
Exercise: Take 5 minutes to note down the metadata
<style scoped> section { text-align: justify; } </style>Viola investigates the effect of the plant circadian clock on sugar metabolism in W. mirabilis. For her PhD project, which is part of an EU-funded consortium in Prof. Beetroot's lab, she acquires seeds from a South-African botanical society. Viola grows the plants under different light regimes, harvests leaves from a two-day time series experiment, extracts polar metabolites as well as RNA and submits the samples to nearby core facilities for metabolomics and transcriptomics measurements, respectively. After a few weeks of iterative consultation with the facilities' heads as well as technicians and computational biologists involved, Viola receives back a wealth of raw and processed data. From the data she produces figures and wraps everything up to publish the results in the Journal of Wonderful Plant Sciences.
<style scoped> section { text-align: justify; } </style>
Viola
investigates the effect of the plant circadian clock
on sugar metabolism
in W. mirabilis
. For her PhD project
, which is part of an EU-funded consortium
in Prof. Beetroot's lab
, she acquires seeds
from a South-African botanical society
. Viola grows the plants
under different light regimes
, harvests leaves
from a two-day time series experiment
, extracts polar metabolites
as well as RNA
and submits the samples to nearby core facilities for metabolomics and transcriptomics
measurements, respectively. After a few weeks
of iterative consultation with the facilities' heads as well as technicians
and computational biologists
involved, Viola receives back a wealth of raw and processed data
. From the data she produces figures
and wraps everything up to publish the results in the Journal of Wonderful Plant Sciences
.
<style scoped> .columns { display: grid; grid-template-columns: repeat(3, minmax(0, 1fr)); gap: 1rem; } ul { margin: 5; padding: 0; } </style>
- researcher
- institute and project
- biological context
- research question
- purpose of data collection
- ...
- origin and nature of the biological material
- lab protocols
- instrument model
- ...
<style scoped> .columns { display: grid; grid-template-columns: repeat(3, minmax(0, 1fr)); gap: 1rem; } ul { margin: 5; padding: 0; } </style>
<style scoped> .columns { display: grid; grid-template-columns: repeat(2, minmax(0, 1fr)); gap: 4rem; } </style>
Findable
- metadata names the content of the data
- basis for search engines
- makes it categorizable for people and machines
Accessible
- information about origin
- location of storage
- access rights
Interoperable
- metadata identifies software and file formats
- required conversions between file formats
Reusable
- obtain and reuse research data according to clear rules described in licenses
Examples from Minimum Information for Biological and Biomedical Investigations (MIBBI):
- MIAPPE | Minimum Information About a Plant Phenotyping Experiment https://www.miappe.org
- MIAME | Minimum Information About a Microarray Experiment https://www.fged.org/projects/miame/
- MIAPE | Minimum Information About a Proteomics Experiment https://www.psidev.info/miape
- MINSEQE | Minimum Information about a high-throughput SEQuencing Experiment https://www.fged.org/projects/minseqe
💡 Check out https://fairsharing.org/ for more examples
- Determine (minimal) required information
- Usually do not determine the format (i.e. shape or file type)
-> favorite Movie
Schema.org
- create, maintain, and promote schemas for structured data on the Internet, on web pages, in email messages, ...
- Structured data can be used to mark up all kinds of items from products to events to recipes
- Communicate with search engines (-> SEO, search engine optimization)
- Enhance findability from search engine results
- Provide context to an ambigous webpage
- Metadata interoperability and standardization across all website using schema.org
<style scoped> code { display: inline-block; width: 700px; font-size: 18px; } </style>
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Person",
"address": {
"@type": "PostalAddress",
"addressLocality": "Seattle",
"addressRegion": "WA",
"postalCode": "98052",
"streetAddress": "20341 Whitworth Institute 405 N. Whitworth"
},
"colleague": [
"http://www.xyz.edu/students/alicejones.html",
"http://www.xyz.edu/students/bobsmith.html"
],
"email": "mailto:[email protected]",
"image": "janedoe.jpg",
"jobTitle": "Professor",
"name": "Jane Doe",
"telephone": "(425) 123-4567",
"url": "http://www.janedoe.com"
}
</script>
<style scoped> code { display: inline-block; width: 700px; } </style>
JSON-LD = JavaScript Object Notation for Linked Data
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "SportsTeam",
"name": "San Francisco 49ers",
"member": {
"@type": "OrganizationRole",
"member": {
"@type": "Person",
"name": "Joe Montana"
},
"startDate": "1979",
"endDate": "1992",
"roleName": "Quarterback"
}
}
</script>
RDFa = Resource Description Framework in Attributes
<div vocab="http://schema.org/" typeof="SportsTeam">
<span property="name">San Francisco 49ers</span>
<div property="member" typeof="OrganizationRole">
<div property="member" typeof="http://schema.org/Person">
<span property="name">Joe Montana</span>
</div>
<span property="startDate">1979</span>
<span property="endDate">1992</span>
<span property="roleName">Quarterback</span>
</div>
</div>
https://www.dublincore.org/schemas/
- Schema: http://schema.datacite.org/meta/kernel-4.3/metadata.xsd
- Full Example: https://schema.datacite.org/meta/kernel-4.3/example/datacite-example-full-v4.xml
<style scoped> code { /*display: inline-block;*/ font-size: 12px; } </style>
...
<identifier identifierType="DOI">10.5072/D3P26Q35R-Test</identifier>
<creators>
<creator>
<creatorName nameType="Personal">Fosmire, Michael</creatorName>
<givenName>Michael</givenName>
<familyName>Fosmire</familyName>
</creator>
<creator>
<creatorName nameType="Personal">Wertz, Ruth</creatorName>
<givenName>Ruth</givenName>
<familyName>Wertz</familyName>
</creator>
<creator>
<creatorName nameType="Personal">Purzer, Senay</creatorName>
<givenName>Senay</givenName>
<familyName>Purzer</familyName>
</creator>
</creators>
<titles>
<title xml:lang="en">Critical Engineering Literacy Test (CELT)</title>
</titles>
<publisher xml:lang="en">Purdue University Research Repository (PURR)</publisher>
<publicationYear>2013</publicationYear>
<subjects>
<subject xml:lang="en">Assessment</subject>
<subject xml:lang="en">Information Literacy</subject>
<subject xml:lang="en">Engineering</subject>
<subject xml:lang="en">Undergraduate Students</subject>
<subject xml:lang="en">CELT</subject>
<subject xml:lang="en">Purdue University</subject>
</subjects>
<language>en</language>
<resourceType resourceTypeGeneral="Dataset">Dataset</resourceType>
...
https://schema.datacite.org/meta/kernel-4.3/example/datacite-example-dataset-v4.xml
(Sometimes also referred to "semantic model")
An ontology combines features of
- a dictionary,
- a taxonomy, and
- a thesaurus
Alphabetically lists terms and their definitions
Pizza: "a dish made typically of flattened bread dough spread with a savory mixture usually including tomatoes and cheese and often other toppings and baked"
Hierarchy or classification
Dictionary of synonyms and relations
Pizza ≈ Lahmacun ≈ Focaccia ≈ Flammkuchen
- Structures a set of concepts in a particular area and the relations between them in a graph-like manner
- Can be used in disambiguation, defining hierarchies, a standard to define terms
- Define a common vocabulary of concepts and their relationships to model a particular domain while making it machine understandable
An object of one triplet can be the subject to another
An ontology can be queried:
- "name all pizzas with topping mushrooms"
- Example from protege: https://protege.stanford.edu/ontologies/pizza/pizza.owl
- Visualize via WebVOWL http://vowl.visualdataweb.org/webvowl.html
- Description: http://edamontology.org/page
- Browser: https://edamontology.github.io/edam-browser
- Human-readble: https://www.ebi.ac.uk/ols/ontologies/peco
- Raw (OWL): http://purl.obolibrary.org/obo/peco.owl
Explore more examples
https://isa-tools.org/format/specification.html
Output building blocks:
- Sample Name
- Raw Data File
- Derived Data File
- Low-friction metadata annotation
- Familiar spreadsheet, row/column-based environment
- Low-friction metadata annotation
- Familiar spreadsheet, row/column-based environment
- Swate can be used for the annotation of isa.study.xlsx and isa.assay.xlsx files
<style scoped> section{ font-size: 25px } </style>
- Source Name (Input)
- Protocol Columns
- Protocol Type, Protocol Ref
- Characteristic
- Parameter
- Factor
- Component
- Output Columns
- Sample Name, Raw Data File, Derived Data File
Let's take a detour on Annotation Principles | slides
<style scoped> h1{ text-align: left } section { text-align: center; } </style>
Enable related term directed search to directly fill cells with child terms
Metadata standards or repository requirements can be represented as templates
<style scoped> h1{ text-align: left } section { text-align: center; } </style>Facilities can define their most common workflows as templates
<style scoped> h1{ text-align: left } section { text-align: center; } </style>- DataPLANT curated
- Community templates
Slides presented here include contributions by
- name: Dominik Brilhaus github: https://github.com/brilator orcid: https://orcid.org/0000-0001-9021-3197
- name: Martin Kuhl github: https://github.com/Martin-Kuhl orcid: https://orcid.org/0000-0002-8493-1077