Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor TiledDataset into TiledDataset and TiledMaskedDataset #32

Conversation

Wiebke
Copy link
Member

@Wiebke Wiebke commented Sep 6, 2024

The previous iteration of TiledDataset used boolean parameters is_full_segmentation and is_training, as well as the setting attributes mask_client and mask_idx to None if no Tiled client with mask information was provided.

This refactor converts TiledDataset to be equivalent to have the functionality previously used with is_full_segmentation. It is defined based on a Tiled client with data. Optionally a set of indices can be provided. This is useful for crafting data sets with any subsets of indices, but is also applied when mask information is given during training.

TiledMaskedDataset is a subclass of TiledDataset that additionally requires a Tiled client with mask information. This client is expected to contain a list of integers in within .metadata["mask_idx"], and contain actual mask data under the key "mask". Presence of both is asserted.

The parameter is_training is still in use for the function initialize_tiled_datasets (which has been moved from utils.py to tiled_dataset.py), but this may change in upcoming refactoring of IOParameters.

To summarize:
-TiledDataset(data_client, mask_client=None, is_training=False, is_full_segmentation=True)TiledDataset(data_client)
-TiledDataset(data_client, mask_client, is_training=False, is_full_segmentation=False)TiledDataset(data_client, mask_client.metadata["mask_idx"])
-TiledDataset(data_client, mask_client, is_training=True, is_full_segmentation=False)TiledMaskedDataset(data_client, mask_client)

We may need to consider bringing back the transform parameter that used to convert to torch.Tensor and adapt downstream procedures to operate on tensors rather than np.array.

Wiebke added 5 commits August 30, 2024 16:20
This is obsolete code that is never called, as `using_qlty` is set to `False` by default and and is also set to `False` in the only place where it is specified.
Obsolete code that is never applied.
New structure easily enables inference on a subset of the data.
Note: `partial_inference` function currently has no test.
@Wiebke Wiebke changed the title Refactor TiledDataset into TiledDataset and `TiledMaskedDataset Refactor TiledDataset into TiledDataset and TiledMaskedDataset Sep 6, 2024
@Wiebke Wiebke requested a review from TibbersHao September 6, 2024 02:26
Circumvents `AttributeError: 'PatchedStreamingResponse' object has no attribute 'background'`, updating Tiled to `v0.1.0b8` will resolve this too.
Copy link
Member

@TibbersHao TibbersHao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the hard work, the redesigned classes does clearly separate out cases for train vs full inference, and pytests run properly on my end.

In order to let the real pipeline run properly, one minor changes are needed:

In train.py line 211

image = dataset[idx]

a further slice is needed to take only the image part from the image mask pair tuple.

In utils.py line 38

assert io_parameters.mask_tiled_uri, "Mask URI not provided for training."

mask_tiled_uri should no longer be checked as this won't be provided for the full inference anymore, thus this should be taken out.

With that, I believe this PR is ready to be merged to the refactor branch.

@Wiebke
Copy link
Member Author

Wiebke commented Sep 7, 2024

Thanks for taking a look and testing the full pipeline on your end!
I fixed the issue in partial_inference by correcting the initialization of the dataset in that function. Due to accidentally setting is_training=True, a TiledMaskedDataset is initialized. With is_training=False, a TiledDataset that uses the mask indices for iteration, no tuple unpacking in train.py line 211 should be needed.

I would like to defer the correct initialization of io_parameters and changes to the validate_parameter function to refactoring the parameter setup and validation. This does indeed momentarily break the main function of segment.py (such that is runs partial inference), but this will be addressed then.

@Wiebke Wiebke requested a review from TibbersHao September 7, 2024 00:53
Copy link
Member

@TibbersHao TibbersHao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. Tested the train + quick inference and it runs properly on my end.

I agree that further refactor of the full inference part should be deferred.

This PR is ready to be merged.

@TibbersHao TibbersHao merged commit bd024fa into mlexchange:refactor-train Sep 7, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants