This repo will contain code, notebooks and documentation for the talk titled Reproducible Exploration of Neuroimaging Data, given at JupyterCon 2020 (October 2020). The video of this talk is here, and slides are here.
Work here is exploratory in nature, and can be use as a guide for implementing similar reproducibility and data visualization pipelines with your own data. We are not currently releasing the output data used for the dashboard (though may do so at a later date).
Our goal is to release an end-to-end interactive dashboard creation tool called SurfBoard
to encompass the features in this repo.
SurfBoard
would take commonly produced FreeSurfer brain MRI segmentation results and create a dashboard like the one linked above for users to examine and quality check results.
If you have questions or comments, or you're interested in a demo of the neuroimaging visualization dashboard, please leave an issue or reach out to me on Twitter (@ltirrell_)!
Code in this repo has been tested using Python 3.6, and should also work with Python 3.7 and 3.8. To setup a virtual environment and install all requirements, run the following commands:
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
Then to activate your virtual environment in the future you can simply run, from wherever you cloned the repo:
source venv/bin/activate
The LA5c dataset was used for the analyses in this repo, and processed with CorticoMetrics' proprietary re:THINQ software (based off of FreeSurfer version 6.0).
Quilt was used for version control of both input and output data. Some notes on our quilt usage is available in here.
As part of processing, we created JPEGs of each slice of the brain MRI with the segmentation overlaid, as well as interactive HTML files for a 3D view, using this script. These JPEGs and 3D views are used in creating interactive quality control plots, and were created using Nilearn. An example walking through some steps involved is in this notebook.
We provide interactive visualizations of our results.
The first notebook, data_exploration.ipynb
, contains examples of loading in the results of our FreeSurfer-based analysis, and creating Altair charts directly in the notebook.
The second and third notebooks (voila.ipynb
and voila_full.ipynb
) are notebooks meant to be rendered as Voilà dashboards.
voila.ipynb
renders results from thedata_exploration
notebook as a dashboard.voila_full.ipynb
is similar to what is rendered at our demo website, where JPEGs and 3D brain viewers are included. Note that this will not work, as the necessary images are not included, but serves as an example of how this can be created.
To view the interactive dashboard, run the following command:
voila notebooks/voila.ipynb --template=flex
The dashboard will be available at http://localhost:8866/
Data versioning is an important and often overlooked part of creating reproducible results (summarized nicely in this article, where a main thesis is model := script(code, environment, data)
).
We investigated available software packages/platforms for data versioning in May 2020, with an overview of what we found in this presentation.
Based on our use case, Quilt worked out best for us. Our data is already stored in AWS S3 buckets, and Quilt provides an easy way to directly version that data. Additionally, the Catalog created on top of these datasets would be useful in sharing results with collaborators. Their docs provide more information on setup, and how to try it out.
Other candidates that we particularly liked are DVC and datalad. These treat data more like a part of a Git repo, so it is more closely stored and versioned with your code. This may be useful for other projects, but we liked treating data as a separate repository, away from code.
We decided to use Nilearn for brain viewing within the notebook after being pointed there from a project we used previously called nbpapaya. Other useful Jupyter or web-based packages for neuroimaging viewing are niwidgets and Papaya, as well as a Jupyter kernel for the Slicer viewer . We are interested to learn about other platforms, so let us know if you are aware of others!
One disadvantage we found with Nilearn was that it takes several seconds to create the 3D viewer for a brain with our use case, so it wasn't quite "interactive" if we wanted to quickly explore a lot of different images. We got around this by saving HTML versions of these viewers during the initial image processing step, and loading them from disk in our interactive charts.
While we normally use seaborn and matplotlib for static plots, we wanted something more interactive for quality control and exploration of results. After seeing The Python Visualization Landscape talk (slides, video) from Jake VanderPlas, Altair seemed to be a good tool for the job. There's also a Data Visualization Course taught using this package to help learn how to use it.
We found Altair (and the Vega JSON output it is built on) has good support in our Jupyter/Voilà/web based use cases, though there are some minor limitations (such as issues with resizing charts based on browser window size). There was also a learning curve (for us) in getting up to speed in creating charts, compared to the more established matplotlib-based approaches.
This work has been partially funded by the following NIH grants:
- R42CA183150
- R42AG062026