Datasets

Introduction

Here you'll learn how to create your own dataset for training or evaluation. There are two main ways of labeling your custom data.

Completely manual - if your dataset is very heterogeneous and previous automated detection attempts have failed annotate each spot by clicking on its position. For this please do ✏️ Manual labeling, and 📤 Manual label export.
TrackMate based - if your dataset is slightly easier you can use TrackMate to speed up the process. Please follow the steps described in 🤖 TrackMate based labeling and export.

Both - no matter which method you chose above, the final step is described in 🗃️ Create dataset npz file. In general, the model can only be as good as the dataset. Therefore, make sure to label as precisely as reasonably possible.

Manual labeling

Labeling is done in Fiji (download here) using the Multi-point Tool. To open this tool, right-click on the Point Tool. You should have the view shown below. Double-click on the Icon to configure (if you want to remove label numbers, change point size, etc.).

Multi-point Tool in Fiji

After opening an image, each blob is labelled by clicking on the spot thereby adding a point. If you aren't happy with your selection, click+drag to move point (wait until the cursor turns into a hand), option(alt)+click to remove a point, or shift+A to delete all current points.

Example image labeling in Fiji

Now all that's left to do is saving a file with labels into an empty directory of your liking (see the next step).

Manual label export

After the previous step, you should have a directory with labelled images only. Please download our Fiji export macro, unzip, and open / drag it into Fiji and execute as shown below (if the download link does not work, save the raw file).

Run Fiji export macro

After execution, you should have a labels directory inside the select image directory. You can now skip over 🤖 TrackMate based labeling and export and directly continue with 🗃️ Create dataset npz file.

TrackMate based labeling and export

There is one quick setup step that has to be done:

Install TrackMate.

Now that everything's set up, do the following:

Open up one image for labeling at a time.
If the image has micron scale, remove it by opening Set Scale in Analyze>Set Scale... and pressing the button Click to Remove Scale.
Open TrackMate in Plugins>Tracking>TrackMate.
Follow the dialog prompt until reaching the Settings for detector screen.
Play around with the Estimated blob diameter and Threshold settings and visualizing with Preview (1) until most spots have been detected.
Falsely detected or not detected spots can be added/removed by hovering over the spot and pressing A / D respectively. Additionally, spots can be moved using Space. Additional commands for editing can be found in Section 3.2 "Creating spots one by one" of the TrackMate manual.
Once all desired spots have been detected, export spot coordinates by clicking on the 🔧 wrench icon (2) to open up Display Options and Shift+click on the Analysis (3) button.
This should open up a table titled All Spots statistics. Save this table using File>Save As... or Command+S (Control+S for Windows users). Make sure to rename the file to match the name of the image just labelled. Otherwise, images and labels won't match!
Continue these steps for all images and save all labels in the same folder.

Note TrackMate differs a bit between versions. Newer releases do not require you to Shift+click and use File>Save As... anymore but have dedicated buttons to make your life easier :).

Congrats! Most of the work is done. Now all that's left is to create an npz file for training. Follow the instructions below.

Create dataset npz file

Now, we can use deepBlink to convert the raw files into a single npz file ready for usage. Please run:

deepblink create --input INPUT --labels LABELS --name NAME --pixel-size PIXEL_SIZE

A quick explanation of what is going on:

The NAME is the name of your dataset. The generated file will have the name NAME.npz. Feel free to pass in a path to change the saving location.
The INPUT will take the directory with all images and a labels subdirectory (as previously created) by default.
If you have a different file structure you can use the --labels LABELS flag to customize the path to the labels. The INPUT will then only be used as path to the images.
Change the ratio of train/validation/test split by using the --validsplit VALIDSPLIT or --testsplit TESTSPLIT flags. Both are values between 0-1 corresponding to the percentages of images used (e.g. TESTSPLIT of 0.2 will use 20% of images for testing). First TESTSPLIT will be applied to the entire dataset. Then, VALIDSPLIT will be applied to the remaining non-test data (i.e. a VALSPLIT of 0.2 is slightly less than 20% depending on the TESTSPLIT).
To resize images uniformly, use the --size SIZE flag. Note that for deepBlink to work properly, training images have to be square and a power of two (256, 512, 1024). So we don't train on duplicate images, any crops that would overlap with existing images are ignored. Similarly, all images smaller than the specified size will not be included in the dataset.
Depending on the image metadata, deepblink will automatically convert micron labels to pixel based ones. However, you can change this behavior by specifying a pixel-size using the --pixel-size PIXEL_SIZE flag. If your labels are already in pixels, just set the PIXEL_SIZE to 1.

Additional insights

This dataset npz file is nothing else than six numpy arrays bundled together. These arrays are x_train, y_train, x_valid, y_valid, x_test, y_test where x denotes the input / images and y the ground truth / labels. An npz file can be easily read in python using our deepblink.io.load_npz function.

You can easily inspect the npz file using the deepblink visualize submodule. This is explained in more detail here.