Project 26: Literature Biocuration Practices and Guidelines

Abstract

It is common for data resources, including ELIXIR Core Data Resources, to connect with scientific literature as the gateway to scientific evidence, in support of the curation effort. The granularity of annotation is often at the level of the article, such as PMID or DOI, meaning that a paper is cross-referenced from a database entry or from some sub-annotations of the entry.

This cross-reference is often described via terminological or ontological descriptors (e.g., molecular functions, cellular locations, diseases, traits) within the entry. It is only in rare cases that a higher granularity of the annotation is available. E.g., sentence-level annotations in GeneRiF.

In complement, literature services (e.g., JATS) and text mining communities (e.g., BioC), as well as specific biological communities (e.g., TaxPub for taxonomic treatments) have developed standards to tentatively capture annotations directly or as a supplement to the published text. While annotations like Accession Number are trivially captured by literature services (e.g., EuropePMC, SIBiLS), more structured evidence (e.g., named-entities or relationships between entities) remains challenging for both curation-support and text mining pipelines. Further, non-textual publication materials (e.g., supplementary data files) have been less used by both curation and publication communities due to the lack of exploration tools and standards to process these files.

More information

Project plan

O1: to establish a landscape analysis of data and services resources;
O2: to enhance existing standards (e.g., JATS) to better capture curated evidence from the literature;
O3: to explore how literature and crediting services (e.g., APICURON) can benefit from these new standardization efforts.

O1 should be delivered by the end of the Hackathon. O2 should be mostly (~80%) completed by the end of the Hackathon. O3 will benefit from the Hackathon’s prototyping effort and could be completed within a year or in a future hackathon.

Timeline: A non-linear timeline could be the following: Landscape analysis [2days], standard developments [2days], prototyping [2days].

Level of expertise and population: We expect balanced contributions from two types of profiles: biocurators (N=3-5) and data/software developers (N=3-5).

Methods: We plan to alternate focus group meetings and RAD development phase, all co-ordinated by senior DevOps (Mihail Anton/SciLifeLab and Alexandre Flament/SIB).

Lead(s)

Silvio Tosatto, Ulrike Wittig, Mihail Anton

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Project 26: Literature Biocuration Practices and Guidelines

Abstract

More information

Lead(s)

Files

README.md

Latest commit

History

README.md

File metadata and controls

Project 26: Literature Biocuration Practices and Guidelines

Abstract

More information

Lead(s)