- Download SEM TEM Other Classifier weights and place it in
classifier/SEM_TEM_Other_weights
. - Download Particulate Non-Particulate Classifier weights and place it in
classifier/Particulate_nonParticulate_weights
. - Download Figure separation weights and place it in
figure-separator/data
. - Download SRCNN weights and place it in
label_scale_bar_detector/OCR/SRCNN-pytorch/weights/
. - Download Darknet weights and place it in
label_scale_bar_detector/localizer/darknet/backup
. - Download Mask RCNN weights and place it in
particle_segmentation/Mask_RCNN/logs/tem
.
If you would like to run the entire pipeline,
Run conda env create -f environment/environment.yml
.
If you would only like to download the datasets,
Run conda env create -f environment/environment_dataset.yml
.
Note: These installations have been tested only on a Linux system.
- The json file with all extracted size/shape information corresponding to the 4361 literature-mined images can be downloaded from Full_dataset.
- The json file with segmentation annotations corresponding to 131 images used as training data for the Mask-RCNN can be downloaded from Training_dataset.
Place both files at the root of the repository.
- To download the full literature-mined dataset of 4365 images, run
python fetch_urls_full_dataset.py
. - To download the annotated dataset of 131 images that was used to train the segmentation model, run
python fetch_urls_training_dataset.py
.
Run python test_pipeline_single.py
.
The following is an illustration of the steps involved in the pipeline.
- https://github.com/AlexeyAB/darknet
- https://github.com/apple2373/figure-separator
- https://github.com/yjn870/SRCNN-pytorch
- https://github.com/matterport/Mask_RCNN
If you use this code, please cite the following manuscript:
@misc{subramanian2021dataset,
title={Dataset of gold nanoparticle sizes and morphologies extracted from literature-mined microscopy images},
author={Akshay Subramanian and Kevin Cruse and Amalie Trewartha and Xingzhi Wang and Paul Alivisatos and Gerbrand Ceder},
year={2021},
eprint={2112.01689},
archivePrefix={arXiv},
primaryClass={cond-mat.mtrl-sci}
}