Breast MRI dataset setup tutorial for the results of "Reverse Engineering Breast MRIs: Predicting Acquisition Parameters Directly from Images"
In this document we detail how to set up the data used in this paper, in order to reproduce our results.
This paper involves training models on seven different radiological image datasets. Here we detail how to set up these datasets for further experiments.
-
Download the Duke-Breast-Cancer-MRI dataset from The Cancer Imaging Archive here. This will take a couple of steps; all files are found under "Data Access" near the bottom of the page.
-
First, download the annotation and filepath list files "File Path mapping tables (XLSX, 49.6 MB)" and "Annotation Boxes (XLSX, 49 kB)" into
data/dbc/maps
. You'll then need to convert these to.csvs
manually (e.g. using Microsoft Excel). -
Download the DBC Dataset "Images (DICOM, 368.4 GB)" as follows; unfortunately this is large due to the data only being avaliable as DICOM files. You'll have to use TCIA's NBIA Data Retriever tool for this (open the downloaded
.tcia
manifest file with the tool). Make sure that you download the files with the "Classic Directory Name" directory type convention. Otherwise certain files will be mislabeled in the downloaded annotation file, and you'll have to redownload all data from scratch. There are still certain typos in the downloaded annotation files from TCIA, but the subsequent code that we provide has fixes for these. -
Once all of the data is downloaded, it will be in a folder named
manifest-{...}
, where{...}
is some auotgenerated string of numbers, for examplemanifest-1607053360376
. This folder may be within a subdirectory or two. Move this manifest folder intodata/dbc
. -
Open the Python file
data/dbc/png_extractor.py
; on line 13, modifydata_path
in line 4 to be equation to the name of your manifest folder,manifest-{...}
. -
Run
data/dbc/png_extractor.py
to extract the .png images from the raw DICOM files intodata/dbc/png_out
. This will also create the subset of images that we'll use for experiments, and sort them by scanner manufacturer. -
Feel free to delete the original DICOM files in your manifest folder once this is complete, as well as
data/dbc/png_out
, to save space.