Refactor `TiledDataset` into `TiledDataset` and `TiledMaskedDataset` #32

Wiebke · 2024-09-06T02:25:37Z

The previous iteration of TiledDataset used boolean parameters is_full_segmentation and is_training, as well as the setting attributes mask_client and mask_idx to None if no Tiled client with mask information was provided.

This refactor converts TiledDataset to be equivalent to have the functionality previously used with is_full_segmentation. It is defined based on a Tiled client with data. Optionally a set of indices can be provided. This is useful for crafting data sets with any subsets of indices, but is also applied when mask information is given during training.

TiledMaskedDataset is a subclass of TiledDataset that additionally requires a Tiled client with mask information. This client is expected to contain a list of integers in within .metadata["mask_idx"], and contain actual mask data under the key "mask". Presence of both is asserted.

The parameter is_training is still in use for the function initialize_tiled_datasets (which has been moved from utils.py to tiled_dataset.py), but this may change in upcoming refactoring of IOParameters.

To summarize:
-TiledDataset(data_client, mask_client=None, is_training=False, is_full_segmentation=True) → TiledDataset(data_client)
-TiledDataset(data_client, mask_client, is_training=False, is_full_segmentation=False) → TiledDataset(data_client, mask_client.metadata["mask_idx"])
-TiledDataset(data_client, mask_client, is_training=True, is_full_segmentation=False) → TiledMaskedDataset(data_client, mask_client)

We may need to consider bringing back the transform parameter that used to convert to torch.Tensor and adapt downstream procedures to operate on tensors rather than np.array.

This is obsolete code that is never called, as `using_qlty` is set to `False` by default and and is also set to `False` in the only place where it is specified.

Obsolete code that is never applied.

New structure easily enables inference on a subset of the data.

Note: `partial_inference` function currently has no test.

Circumvents `AttributeError: 'PatchedStreamingResponse' object has no attribute 'background'`, updating Tiled to `v0.1.0b8` will resolve this too.

TibbersHao

Thanks for the hard work, the redesigned classes does clearly separate out cases for train vs full inference, and pytests run properly on my end.

In order to let the real pipeline run properly, one minor changes are needed:

In train.py line 211

image = dataset[idx]

a further slice is needed to take only the image part from the image mask pair tuple.

In utils.py line 38

assert io_parameters.mask_tiled_uri, "Mask URI not provided for training."

mask_tiled_uri should no longer be checked as this won't be provided for the full inference anymore, thus this should be taken out.

With that, I believe this PR is ready to be merged to the refactor branch.

instead of `TiledDataset`

Wiebke · 2024-09-07T00:53:28Z

Thanks for taking a look and testing the full pipeline on your end!
I fixed the issue in partial_inference by correcting the initialization of the dataset in that function. Due to accidentally setting is_training=True, a TiledMaskedDataset is initialized. With is_training=False, a TiledDataset that uses the mask indices for iteration, no tuple unpacking in train.py line 211 should be needed.

I would like to defer the correct initialization of io_parameters and changes to the validate_parameter function to refactoring the parameter setup and validation. This does indeed momentarily break the main function of segment.py (such that is runs partial inference), but this will be addressed then.

TibbersHao

Nice catch. Tested the train + quick inference and it runs properly on my end.

I agree that further refactor of the full inference part should be deferred.

This PR is ready to be merged.

Wiebke added 5 commits August 30, 2024 16:20

Remove is_qlty parameter from TiledDataSet

e3b6f24

This is obsolete code that is never called, as `using_qlty` is set to `False` by default and and is also set to `False` in the only place where it is specified.

Remove transform parameter from TiledDataSet

34c48d8

Obsolete code that is never applied.

Move initialize_tiled_datasets to tiled_datasets

7a7b52d

Refactor TiledDataset into TiledDataset and TiledMaskedDataset

4d0cfc7

New structure easily enables inference on a subset of the data.

🐛 TiledDataset for partial inference was not covered in initialization

36ddad9

Note: `partial_inference` function currently has no test.

Wiebke changed the title ~~Refactor TiledDataset into TiledDataset and `TiledMaskedDataset~~ Refactor TiledDataset into TiledDataset and TiledMaskedDataset Sep 6, 2024

Wiebke requested a review from TibbersHao September 6, 2024 02:26

Wiebke added 3 commits September 6, 2024 11:17

Temporarily pin starlette==0.38.2

bec321d

Circumvents `AttributeError: 'PatchedStreamingResponse' object has no attribute 'background'`, updating Tiled to `v0.1.0b8` will resolve this too.

Move mask node and mask_idx metadata access outside of datasets

f629120

Catching self-raised KeyError and from_uri Exceptions separately

0508a81

TibbersHao reviewed Sep 7, 2024

View reviewed changes

🐛 partial inference initialized TiledMaskedDataset

73d2fa8

instead of `TiledDataset`

Wiebke requested a review from TibbersHao September 7, 2024 00:53

TibbersHao approved these changes Sep 7, 2024

View reviewed changes

TibbersHao merged commit bd024fa into mlexchange:refactor-train Sep 7, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor `TiledDataset` into `TiledDataset` and `TiledMaskedDataset` #32

Refactor `TiledDataset` into `TiledDataset` and `TiledMaskedDataset` #32

Wiebke commented Sep 6, 2024

TibbersHao left a comment •

edited

Loading

Wiebke commented Sep 7, 2024 •

edited

Loading

TibbersHao left a comment

Refactor TiledDataset into TiledDataset and TiledMaskedDataset #32

Refactor TiledDataset into TiledDataset and TiledMaskedDataset #32

Conversation

Wiebke commented Sep 6, 2024

TibbersHao left a comment • edited Loading

Choose a reason for hiding this comment

Wiebke commented Sep 7, 2024 • edited Loading

TibbersHao left a comment

Choose a reason for hiding this comment

Refactor `TiledDataset` into `TiledDataset` and `TiledMaskedDataset` #32

Refactor `TiledDataset` into `TiledDataset` and `TiledMaskedDataset` #32

TibbersHao left a comment •

edited

Loading

Wiebke commented Sep 7, 2024 •

edited

Loading