Machine-Actionable Data Interoperability for Chemical Sciences (MADICES): Bridging experiments, simulations, and machine learning for spectral data
Recent advances in the computational sciences allow us to simulate many spectra (e.g., X-ray absorption, infrared/Raman, NMR) in silico. In principle, this could open up unprecedented possibilities for the interpretation of experimental data. Experimental data, however, comes in various, often undocumented or proprietary formats. In recent efforts, this experimental data is being recorded in electronic lab notebooks and archived with open data formats, aiding and automating crucial metadata capture. However, most of these lab notebooks have no mechanisms to exchange data between each other and even less so with our simulation tools, and typically, exporting data from such notebooks again requires lossy conversion to a chosen file format.
Standardization is an arduous process, and for a wide enough domain, it is infeasible. Nevertheless, without significant effort, there is a danger that we will not escape the local minima of “★★★/★★★★★” linked open data (as defined by Tim Berners-Lee). In the case of the interoperability between experimental and computational data, there is the additional difficulty that computational systems are completely described, idealized systems with implicit assumptions, whereas for experimental systems parameters are ill-defined, unknown, or uncertain. Moreover, we also often miss a link between spectra data and the (meta) data contextualising the sample and its history. How and where can we be interoperable in this setting? How can we make sure that experimental data can readily be consumed by computational tools, and vice versa, from the bottom-up? How can we share, contextualise and disseminate analysis (e.g., post-processing, peak assignment) in a reproducible way (on platforms such as MaterialsCloud or the Chemotion? What new paradigms could such interoperability enable?
At the CECAM MADICES 2022 workshop, we will bring together developers, scientists and data specialists to discuss the hurdles and opportunities of data interoperability in the context of the chemical and materials sciences. We will strive for general technical recommendations, with X-ray absorption spectroscopy as the first prototype use case.
- To prepare a perspective paper targeted at providing guidelines and recommendations for new projects in our field that work with data.
- For example, how best to prepare and disseminate datasets, databases and APIs such that they are interoperable with existing projects.
- We will be focusing on the area of X-ray absorption spectroscopy as a concrete example of a domain where a sample, its context/metadata and relations with other samples and computational experiments (e.g. peak assignment) are necessary for the science
- To identify particular challenges that can be overcome via new collaborative software libraries/infrastructure and motivate their creation over follow-up meetings/hackathons
- For example, services that aid discoverability of apps/schemas, and ways of discovering if two apps are interoperable
- For example, strategies for progressively adopting open linked data principles that are accessible to non-experts
- To motivate follow-up meetings, hackathons and foster new cross-initiative collaborations.
- Shyam Dwaraknath
- Matthew Evans
- Sebastiaan Huber
- Kevin Maik Jablonka
- Stefan Kuhn
- Carlo Pignedoli