marp

theme

paginate

license

title

author

true

marp-theme_dataplant-ceplas-ccby

true

[CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/)

Metadata and ISA

name	github	orcid
Dominik Brilhaus	https://github.com/brilator	https://orcid.org/0000-0001-9021-3197

name	github	orcid
Martin Kuhl	https://github.com/Martin-Kuhl	https://orcid.org/0000-0002-8493-1077

Metadata and ISA

What is
metadata?

Viola's PhD Project

Exercise: Take 5 minutes to note down the metadata

Viola investigates the effect of the plant circadian clock on sugar metabolism in W. mirabilis. For her PhD project, which is part of an EU-funded consortium in Prof. Beetroot's lab, she acquires seeds from a South-African botanical society. Viola grows the plants under different light regimes, harvests leaves from a two-day time series experiment, extracts polar metabolites as well as RNA and submits the samples to nearby core facilities for metabolomics and transcriptomics measurements, respectively. After a few weeks of iterative consultation with the facilities' heads as well as technicians and computational biologists involved, Viola receives back a wealth of raw and processed data. From the data she produces figures and wraps everything up to publish the results in the Journal of Wonderful Plant Sciences.

Metadata everywhere

Viola investigates the effect of the plant circadian clock on sugar metabolism in W. mirabilis. For her PhD project, which is part of an EU-funded consortium in Prof. Beetroot's lab, she acquires seeds from a South-African botanical society. Viola grows the plants under different light regimes, harvests leaves from a two-day time series experiment, extracts polar metabolites as well as RNA and submits the samples to nearby core facilities for metabolomics and transcriptomics measurements, respectively. After a few weeks of iterative consultation with the facilities' heads as well as technicians and computational biologists involved, Viola receives back a wealth of raw and processed data. From the data she produces figures and wraps everything up to publish the results in the Journal of Wonderful Plant Sciences.

Project metadata

project design

researcher
institute and project
biological context
research question
purpose of data collection
...

experimental processes

origin and nature of the biological material
lab protocols
instrument model
...

data-analytical processes

algorithms
tools
software versions and dependencies employed
...

Other types of metadata

bibliographic

Title
Publication date and title
Description
Author
Contacts
Keywords
...

legal or administrative

data origin, ownership, rovenance,
licensing
ethical aspects
...

technical

expected data volume
storage location
file formats
...

Metadata from a FAIR perspective

Findable

metadata names the content of the data
basis for search engines
makes it categorizable for people and machines

Accessible

information about origin
location of storage
access rights

Interoperable

metadata identifies software and file formats
required conversions between file formats

Reusable

obtain and reuse research data according to clear rules described in licenses

Metadata "Standards"

Examples from Minimum Information for Biological and Biomedical Investigations (MIBBI):

MIAPPE | Minimum Information About a Plant Phenotyping Experiment https://www.miappe.org
MIAME | Minimum Information About a Microarray Experiment https://www.fged.org/projects/miame/
MIAPE | Minimum Information About a Proteomics Experiment https://www.psidev.info/miape
MINSEQE | Minimum Information about a high-throughput SEQuencing Experiment https://www.fged.org/projects/minseqe

💡 Check out https://fairsharing.org/ for more examples

Metadata standards ≈ Checklists

Determine (minimal) required information
Usually do not determine the format (i.e. shape or file type)

A small Interactive detour

-> favorite Movie

How does google "know"?!

Schemas and machine-readability

Structured data and the internet

Schema.org

create, maintain, and promote schemas for structured data on the Internet, on web pages, in email messages, ...
Structured data can be used to mark up all kinds of items from products to events to recipes
Communicate with search engines (-> SEO, search engine optimization)
Enhance findability from search engine results
Provide context to an ambigous webpage
Metadata interoperability and standardization across all website using schema.org

Structured data and the internet: Schema.org

https://schema.org/Person

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Person",
  "address": {
    "@type": "PostalAddress",
    "addressLocality": "Seattle",
    "addressRegion": "WA",
    "postalCode": "98052",
    "streetAddress": "20341 Whitworth Institute 405 N. Whitworth"
  },
  "colleague": [
    "http://www.xyz.edu/students/alicejones.html",
    "http://www.xyz.edu/students/bobsmith.html"
  ],
  "email": "mailto:jane-doe@xyz.edu",
  "image": "janedoe.jpg",
  "jobTitle": "Professor",
  "name": "Jane Doe",
  "telephone": "(425) 123-4567",
  "url": "http://www.janedoe.com"
}
</script>

JSON-LD

JSON-LD = JavaScript Object Notation for Linked Data

<script type="application/ld+json">
  {
    "@context": "https://schema.org",
    "@type": "SportsTeam",
    "name": "San Francisco 49ers",
    "member": {
      "@type": "OrganizationRole",
      "member": {
        "@type": "Person",
        "name": "Joe Montana"
      },
      "startDate": "1979",
      "endDate": "1992",
      "roleName": "Quarterback"
    }
  }
</script>

RDFa

RDFa = Resource Description Framework in Attributes

<div vocab="http://schema.org/" typeof="SportsTeam">
  <span property="name">San Francisco 49ers</span>
  <div property="member" typeof="OrganizationRole">
    <div property="member" typeof="http://schema.org/Person">
      <span property="name">Joe Montana</span>
    </div>
    <span property="startDate">1979</span>
    <span property="endDate">1992</span>
    <span property="roleName">Quarterback</span>
  </div>
</div>

Standards

Dublin Core

https://www.dublincore.org/schemas/

DataCite Schema

Schema: http://schema.datacite.org/meta/kernel-4.3/metadata.xsd
Full Example: https://schema.datacite.org/meta/kernel-4.3/example/datacite-example-full-v4.xml

DataCite Schema: Simple Example

...
  <identifier identifierType="DOI">10.5072/D3P26Q35R-Test</identifier>
  <creators>
    <creator>
      <creatorName nameType="Personal">Fosmire, Michael</creatorName>
      <givenName>Michael</givenName>
      <familyName>Fosmire</familyName>
    </creator>
    <creator>
      <creatorName nameType="Personal">Wertz, Ruth</creatorName>
      <givenName>Ruth</givenName>
      <familyName>Wertz</familyName>
    </creator>
    <creator>
      <creatorName nameType="Personal">Purzer, Senay</creatorName>
      <givenName>Senay</givenName>
      <familyName>Purzer</familyName>
    </creator>
  </creators>
  <titles>
    <title xml:lang="en">Critical Engineering Literacy Test (CELT)</title>
  </titles>
  <publisher xml:lang="en">Purdue University Research Repository (PURR)</publisher>
  <publicationYear>2013</publicationYear>
  <subjects>
    <subject xml:lang="en">Assessment</subject>
    <subject xml:lang="en">Information Literacy</subject>
    <subject xml:lang="en">Engineering</subject>
    <subject xml:lang="en">Undergraduate Students</subject>
    <subject xml:lang="en">CELT</subject>
    <subject xml:lang="en">Purdue University</subject>
  </subjects>
  <language>en</language>
  <resourceType resourceTypeGeneral="Dataset">Dataset</resourceType>
...

https://schema.datacite.org/meta/kernel-4.3/example/datacite-example-dataset-v4.xml

Ontologies

Ontology

(Sometimes also referred to "semantic model")

An ontology combines features of

a dictionary,
a taxonomy, and
a thesaurus

Dictionary

Alphabetically lists terms and their definitions

Pizza: "a dish made typically of flattened bread dough spread with a savory mixture usually including tomatoes and cheese and often other toppings and baked"

Taxonomy

Hierarchy or classification

Thesaurus

Dictionary of synonyms and relations

Pizza ≈ Lahmacun ≈ Focaccia ≈ Flammkuchen

Ontology

Structures a set of concepts in a particular area and the relations between them in a graph-like manner
Can be used in disambiguation, defining hierarchies, a standard to define terms
Define a common vocabulary of concepts and their relationships to model a particular domain while making it machine understandable

The semantic triple

Modeling a pizza menu

Predicates have two directions

Looking at the menu from a different perspective

An object of one triplet can be the subject to another

(Towards) a knowledge graph

Searching the menu

An ontology can be queried:

"name all pizzas with topping mushrooms"

The Pizza Ontology

Example from protege: https://protege.stanford.edu/ontologies/pizza/pizza.owl
Visualize via WebVOWL http://vowl.visualdataweb.org/webvowl.html

Example ontologies

EDAM ontology

Description: http://edamontology.org/page
Browser: https://edamontology.github.io/edam-browser

PECO ontology

Human-readble: https://www.ebi.ac.uk/ols/ontologies/peco
Raw (OWL): http://purl.obolibrary.org/obo/peco.owl

Explore more examples

https://www.ebi.ac.uk/ols/

https://bioportal.bioontology.org

ARC builds on ISA

https://isa-tools.org/format/specification.html

ARC builds on ISA

isa.<>.xlsx files within ARCs

Study and assay files are registered in the investigation file

The output of a study or assay file can function as input for a new isa.assay.xlsx

Output building blocks:

Sample Name
Raw Data File
Derived Data File

Swate

Annotation by flattening the knowledge graph

Low-friction metadata annotation
Familiar spreadsheet, row/column-based environment

Annotation principle

Low-friction metadata annotation
Familiar spreadsheet, row/column-based environment

Adding new building blocks (columns)

Swate can be used for the annotation of isa.study.xlsx and isa.assay.xlsx files

Annotation Building Block types

Source Name (Input)
Protocol Columns
- Protocol Type, Protocol Ref
Characteristic
Parameter
Factor
Component
Output Columns
- Sample Name, Raw Data File, Derived Data File

Let's take a detour on Annotation Principles | slides

Ontology term search

Enable related term directed search to directly fill cells with child terms

Fill your table with ontology terms

Hierarchical combination of ontologies

Swate templates

Checklists and Templates

Metadata standards or repository requirements can be represented as templates

Realization of lab-specific metadata templates

Facilities can define their most common workflows as templates

Directly import templates via Swate

DataPLANT curated
Community templates

Contributors

Slides presented here include contributions by

name: Dominik Brilhaus github: https://github.com/brilator orcid: https://orcid.org/0000-0001-9021-3197
name: Martin Kuhl github: https://github.com/Martin-Kuhl orcid: https://orcid.org/0000-0002-8493-1077

Files

60-MetadataISA.md

Latest commit

History

60-MetadataISA.md

File metadata and controls

Metadata and ISA

What is metadata?

Viola's PhD Project

Metadata everywhere

Project metadata

project design

experimental processes

data-analytical processes

Other types of metadata

bibliographic

legal or administrative

technical

Metadata from a FAIR perspective

Metadata "Standards"

Metadata standards ≈ Checklists

A small Interactive detour

How does google "know"?!

Schemas and machine-readability

Structured data and the internet

Structured data and the internet: Schema.org

JSON-LD

RDFa

Standards

Dublin Core

DataCite Schema

DataCite Schema: Simple Example

Ontologies

Ontology

Dictionary

Taxonomy

Thesaurus

Ontology

The semantic triple

Modeling a pizza menu

Modeling a pizza menu

Modeling a pizza menu

Predicates have two directions

Looking at the menu from a different perspective

(Towards) a knowledge graph

Searching the menu

The Pizza Ontology

Example ontologies

EDAM ontology

PECO ontology

ARC builds on ISA

ARC builds on ISA

isa.<>.xlsx files within ARCs

Study and assay files are registered in the investigation file

The output of a study or assay file can function as input for a new isa.assay.xlsx

Swate

Annotation by flattening the knowledge graph

Annotation principle

Adding new building blocks (columns)

Annotation Building Block types

Ontology term search

Fill your table with ontology terms

Hierarchical combination of ontologies

Swate templates

Checklists and Templates

Realization of lab-specific metadata templates

Directly import templates via Swate

Contributors

What is
metadata?