-
Notifications
You must be signed in to change notification settings - Fork 8
Datasets
- ℹ️ Introduction
- ✏️ Manual labeling
- 📤 Manual label export
- 🤖 TrackMate based labeling and export
- 🗃️ Create dataset npz file
Here you'll learn how to create your own dataset for training or evaluation. There are two main ways of labeling your custom data.
- Completely manual - if your dataset is very heterogeneous and previous automated detection attempts have failed annotate each spot by clicking on its position. For this please do ✏️ Manual labeling, and 📤 Manual label export.
- TrackMate based - if your dataset is slightly easier you can use TrackMate to speed up the process. Please follow the steps described in 🤖 TrackMate based labeling and export.
Both - no matter which method you chose above, the final step is described in 🗃️ Create dataset npz file. In general, the model can only be as good as the dataset. Therefore, make sure to label as precisely as reasonably possible.
Labeling is done in Fiji (download here) using the Multi-point Tool
. To open this tool, right-click
on the Point Tool
. You should have the view shown below. Double-click
on the Icon to configure (if you want to remove label numbers, change point size, etc.).
After opening an image, each blob is labelled by clicking on the spot thereby adding a point. If you aren't happy with your selection, click+drag
to move point (wait until the cursor turns into a hand), option(alt)+click
to remove a point, or shift+A
to delete all current points.
Now all that's left to do is saving a file with labels into an empty directory of your liking (see the next step).
After the previous step, you should have a directory with labelled images only. Please download our Fiji export macro, unzip, and open / drag it into Fiji and execute as shown below (if the download link does not work, save the raw file).
After execution, you should have a labels
directory inside the select image directory. You can now skip over 🤖 TrackMate based labeling and export and directly continue with 🗃️ Create dataset npz file.
There is one quick setup step that has to be done:
- Install TrackMate.
Now that everything's set up, do the following:
- Open up one image for labeling at a time.
- If the image has micron scale, remove it by opening Set Scale in
Analyze>Set Scale...
and pressing the buttonClick to Remove Scale
. - Open TrackMate in
Plugins>Tracking>TrackMate
. - Follow the dialog prompt until reaching the
Settings for detector
screen. - Play around with the
Estimated blob diameter
andThreshold
settings and visualizing withPreview
(1) until most spots have been detected. - Falsely detected or not detected spots can be added/removed by hovering over the spot and pressing
A
/D
respectively. Additionally, spots can be moved usingSpace
. Additional commands for editing can be found in Section 3.2 "Creating spots one by one" of the TrackMate manual. - Once all desired spots have been detected, export spot coordinates by clicking on the 🔧 wrench icon (2) to open up
Display Options
andShift+click
on theAnalysis
(3) button. - This should open up a table titled
All Spots statistics
. Save this table usingFile>Save As...
orCommand+S
(Control+S
for Windows users). Make sure to rename the file to match the name of the image just labelled. Otherwise, images and labels won't match! - Continue these steps for all images and save all labels in the same folder.
Note TrackMate differs a bit between versions. Newer releases do not require you to Shift+click
and use File>Save As...
anymore but have dedicated buttons to make your life easier :).
Congrats! Most of the work is done. Now all that's left is to create an npz file for training. Follow the instructions below.
Now, we can use deepBlink to convert the raw files into a single npz file ready for usage. Please run:
deepblink create --input INPUT --labels LABELS --name NAME --pixel-size PIXEL_SIZE
A quick explanation of what is going on:
- The
NAME
is the name of your dataset. The generated file will have the nameNAME.npz
. Feel free to pass in a path to change the saving location. - The
INPUT
will take the directory with all images and alabels
subdirectory (as previously created) by default. - If you have a different file structure you can use the
--labels LABELS
flag to customize the path to the labels. TheINPUT
will then only be used as path to the images. - Change the ratio of train/validation/test split by using the
--validsplit VALIDSPLIT
or--testsplit TESTSPLIT
flags. Both are values between 0-1 corresponding to the percentages of images used (e.g.TESTSPLIT
of 0.2 will use 20% of images for testing). FirstTESTSPLIT
will be applied to the entire dataset. Then,VALIDSPLIT
will be applied to the remaining non-test data (i.e. aVALSPLIT
of 0.2 is slightly less than 20% depending on theTESTSPLIT
). - To resize images uniformly, use the
--size SIZE
flag. Note that for deepBlink to work properly, training images have to be square and a power of two (256, 512, 1024). So we don't train on duplicate images, any crops that would overlap with existing images are ignored. Similarly, all images smaller than the specified size will not be included in the dataset. - Depending on the image metadata, deepblink will automatically convert micron labels to pixel based ones. However, you can change this behavior by specifying a pixel-size using the
--pixel-size PIXEL_SIZE
flag. If your labels are already in pixels, just set thePIXEL_SIZE
to1
.
This dataset npz file is nothing else than six numpy arrays bundled together. These arrays are x_train
, y_train
, x_valid
, y_valid
, x_test
, y_test
where x
denotes the input / images and y
the ground truth / labels. An npz file can be easily read in python using our deepblink.io.load_npz
function.
You can easily inspect the npz
file using the deepblink visualize
submodule. This is explained in more detail here.