- BioHackRxiv Report, 2024-01-30
Bioschemas is a community-based effort providing specifications, tools and training on how to add structured markup on webpages in Life Sciences. Although Bioschemas main aim is to facilitate findability, it also provides an initial interoperability layer. To achieve this, Bioschemas recommends the use of EDAM terms and other ontologies whenever it fits. Bioschemas has been successfully implemented across different communities inside and outside ELIXIR; the plant and chemistry communities are two examples.
In this project, we want to create a knowledge graph of plant resources and extract a resource index showing the use of ontology terms. This preliminary analysis will allow us to understand better how Bioschemas markup is currently used in these two communities so we can take actions to improve the markup and therefore interoperability across resources. Thanks to the use of the markup extraction tool, FAIR Checker, we will also get a FAIR assessment of the Bioschemas markup as well as the validity of it (wrt to Bioschemas profiles). We will also get insights on how external vocabularies are used, whether consistently or not (one more opportunity of improvement).
This project builds upon previous work done by the IDP Community in aggregating Bioschemas markup and constructing the IDP knowledge graph, which served as the basis for the IDPcentral registry. Results will also be helpful for the Bioschemas community as a proof-of-concept approach on Bioschemas consumption.
The short-term goal during the Hackathon is an assessment of available markups and pipeline implementation, while the longer-term goal is to engage the communities for maintenance and increased adoption. Incremental prototyping will be performed on the data provider, harvesting and knowledge graph sides.
During the BioHackathon, participants will become familiar with knowledge graphs, such as those used by the FAIR Checker. Selected analyses will be implemented, including assessments of resource FAIRness, use of external ontologies, and consistency checks (e.g., a data catalog should contain at least one dataset).
The project requires a minimum of 4 participants, ideally 6 or more. Participants should have knowledge in one or more of the following areas: JSON+LD/Bioschemas markup, knowledge graphs, SPARQL, Python. At least one co-lead will be on-site and one online, with at least two developers and support for SPARQL queries. Minimal support from FAIR Checker developers is also sought.
Steffen Neumann, Daniel Arend, Ivan Mičetić