This repository contains the code for the datalab data management system, targeted (broadly) at materials chemistry labs but with customisability and extensability in mind.
The main aim of datalab is to provide a platform for capturing the significant amounts of long-tail experimental data and metadata produced in a typical lab, and enable storage, filtering and future data re-use by humans and machines. The platform provides researchers with a way to record sample- and cell-specific metadata, attach and sync raw data from instruments, and perform analysis and visualisation of many characterisation techniques in the browser (XRD, NMR, electrochemical cycling, TEM, TGA, Mass Spec, Raman). Importantly, datalab stores a network of interconnected research objects in the lab, such that individual pieces of data are stored with the context needed to make them scientifically useful.
The system was originally developed in and is currently deployed for the Grey Group in the Department of Chemistry at the University of Cambridge.
Datalab-subtitled.mp4
datalab consists of two main components:
- a Flask-based Python web server (
pydatalab
) that communicates with a MongoDB database backend and can perform simple analysis and ETL of particular data types, - a Vue 3 web application for a GUI that can be used to record information on samples alongside raw data files and analysis documents.
- A REST API for accessing data and analysis related to chemical samples, inventory and their connections, with ergonomic access provided via the datalab Python API.
- OAuth2-based user authentication via GitHub or ORCID and simple user role management.
- Real-time data streaming and syncing with remote data sources (e.g., instrumentation, archives and file stores).
- A simple, intuitive UI for recording sample-based metadata and relationships with other samples (batches, derivatives, etc.), alongside synthesis parameters and raw data.
- Basic analysis and plotting of live and archived data attached to a sample, e.g., characterisation via XRD or NMR, electrochemical cycling data and images (see "Data blocks" section for a complete list).
- Interactive network visualisation of the connections between samples and inventory.
datalab remains under active development, and the API, data models and UI may change significantly between versions without prior notice. Where possible, breaking changes will be listed in the release notes for every pre-v1 release.
Installation, usage and deployment instructions can be found in INSTALL.md and in the online documentation.
This software is released under the conditions of the MIT license. Please see LICENSE for the full text of the license.
This software was conceived and developed by:
- Prof Joshua Bocarsly (Department of Chemistry, University of Houston, previously Department of Chemistry, University of Cambridge)
- Dr Matthew Evans (MODL-IMCN, UCLouvain & Matgenix)
with contributions and testing performed by other members of the Grey Group.
A full list of code contributions can be found on GitHub.
We are available for consultations on setting up and managing datalab deployments, as well as collaborating on or sponsoring additions of new features and techniques. Please contact Josh or Matthew on their academic emails, or join the public datalab Slack workspace.
This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement 957189 (DOI: 10.3030/957189), the Battery Interface Genome - Materials Acceleration Platform (BIG-MAP), as an external stakeholder project.