The purpose of BadMedicine.Dicom is to generate large volumes of complex (in terms of tags) dicom images for integration/stress testing ETL and image management tools.
There are a number of public sources of Dicom clinical images e.g. TCIA . The difficulty with using these for integration/stress testing is that they often:
- Are anonymised (majority of tags have been removed)
- Do not represent the breadth of Modalities/Tags found in a live clinical PACS.
- Take up a lot of space
BadMedicine.Dicom generates dicom images on demand based on an anonymous aggregate model of tag data found in scottish medical imaging. It is an extension of SynthEHR which generates traditional EHR records.
BadDicom is available as a nuget package for linking as a library
The standalone CLI (BadDicom.exe) is available in the releases section of Github
Usage is as follows:
BadDicom.exe c:\temp\testdicoms
Generates 10 dicom studies (~700MB)
BadDicom.exe c:\temp\testdicoms 5 10 --NoPixels
Generates 10 dicom studies from a pool of 5 patients without pixel data (~3MB)
You can pass -s
to seed the random number generators. Seeding will ensure the same StudyDate, PatientId etc get generated but will not affect UIDs generated (UIDs are always unique)
BadDicom.exe c:\temp\testdicoms 5 10 --NoPixels -s 100
You can generate DICOM metadata directly into a relational database (instead of onto disk). This can be done by downloading an image template or by creating one yourself.
To turn this mode on rename the file BadDicom.template.yaml
to BadDicom.yaml
and provide the connection strings to your database e.g.:
Database:
# The connection string to your database
ConnectionString: server=127.0.0.1;Uid=root;Pwd=;Ssl-Mode=None
# Your DBMS provider ('MySql', 'PostgreSql','Oracle' or 'MicrosoftSQL')
DatabaseType: MySql
# Contains the table schema (which dicom tags to use for which tables)
Template: CT.it
# Database to create/use on the server
DatabaseName: SynthEHRTestData
# Setting this deduplicates study/series level schemas (works only if tables do not already exist on server)
MakeDistinct: true
If you want to generate EHR datasets with a shared patient pool with the dicom data (e.g. for doing linkage) you can provided the -s (seed) and use the main SynthEHR application.
BadDicom.exe c:\temp\testdicoms 12 10 -s 100
SynthEHR.exe c:\temp\testdicoms 12 100 -s 100
Generates a pool of 12 patients and 10 Studies (for random patients) then 100 rows of data for each EHR dataset
You can generate test data for your program yourself by referencing the nuget package:
//create a test person
var r = new Random(23);
var person = new Person(r);
//create a generator
using (var generator = new DicomDataGenerator(r, null, "CT"))
{
//create a dataset in memory
DicomDataset dataset = generator.GenerateTestDataset(person, r);
//values should match the patient details
Assert.AreEqual(person.CHI,dataset.GetValue<string>(DicomTag.PatientID,0));
Assert.GreaterOrEqual(dataset.GetValue<DateTime>(DicomTag.StudyDate,0),person.DateOfBirth);
//should have a study description
Assert.IsNotNull(dataset.GetValue<string>(DicomTag.StudyDescription,0));
}
Building requires MSBuild 15 or later (or Visual Studio 2017 or later). You will also need to install the DotNetCore 2.2 SDK.
Csproj files are in the 2017 format and require Visual Studio 2017 or later to run. The following projects are part of the solution:
Project | Runtime | Purpose | Build Output |
---|---|---|---|
BadDicom.csproj | Dot Net Core 2.2 | Command Line Tool | BadDicom.exe generated by dotnet publish* |
BadMedicine.Dicom.csproj | Dot Net Standard 2.0 | Library | BadMedicine.Dicom.dll. Use BadMedicine.Dicom.nuspec to upload to NuGet |
BadMedicine.Dicom.Tests | Dot Net Core 2.2 | Tests for library | None |
*Publish an OS specific binary by building BadDicom.csproj then running:
dotnet publish BadDicom.csproj -r win-x64 --self-contained
cd .\bin\Debug\netcoreapp2.2\win-x64\
For Linux, a few extra requirements are needed:
# For Ubuntu
$ sudo apt install libc6-dev libgdiplus
# For CentOS
$ sudo yum install libc6-devel libgdiplus
Basic random patient information (age, CHI etc) are generated by SynthEHR
The following tags are populated in dicom files generated:
Tag | Model |
---|---|
PatientID | The CHI number of the (random) patient |
StudyDate | A random date during the patients lifetime |
StudyTime | Random time of day with hours favoured during the middle of the working day. |
SeriesDate | Same as StudyDate* |
PatientAge | Age of patient at SeriesDate e.g. "032Y" |
Modality | Random modality for which we have at least 1 image locally. (proportionate to modality popularity) |
StudyDescription | Random description that exists in the modality (proportionate to frequency seen) |
*SeriesDate is always the same as Study Date (see Seres
constructor), for secondary capture this should/could not be the case (we should look at how this corresponds in the PACS data we have)
Currently pixel data is written as a black square with the SOP Instance UID written in white.