Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable metadata access for remote datasets. #1163

Merged
merged 17 commits into from
Sep 23, 2024
Merged

Conversation

markbader
Copy link
Contributor

@markbader markbader commented Aug 6, 2024

Description:

  • Implements getters and setters for the new metadata field in remote datasets and folders.

Issues:

Todos:

Make sure to delete unnecessary points or to check all before merging:

How to test:

Just run the example in accessing_metadata.py and enter your webknossos token when asked. After execution, the dataset l4_sample and the folder Datasets/Demo-Datasets should have an entry with metadata.

@markbader markbader self-assigned this Aug 6, 2024
@markbader markbader marked this pull request as ready for review August 8, 2024 09:37
@markbader markbader requested a review from fm3 August 22, 2024 13:30
Copy link
Member

@fm3 fm3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice that this is added so quickly :)

I added a couple of comments

webknossos/webknossos/client/api_client/models.py Outdated Show resolved Hide resolved
@@ -0,0 +1,32 @@
import webknossos as wk
from webknossos.client.api_client.models import ApiMetadata
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The api_client.models classes should stay private to our own code, to enable future changes to the HTTP API without changing the user-exposed python API.

Please create an abstraction layer between ApiMetadata and the user-exposed metadata in Dataset and Folder classes. Maybe even without a type property, but using python types instead? The type property could be inferred in the conversion.
Maybe it could even be just a dict? Not sure about that, maybe @philippotto or @MichaelBuessemeyer have a preference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, a dict mapping from key to primitive would be cool, but I'm not sure it's worth the complexity. in the end, this api should be easy and clear to use. directly using primitives might make people believe that they can also put python objects there. so, it would definitely need thorough validation.
I don't have a strong opinion about this, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided against a dict so far because it seems that the key is not unique so far and as philipp already mentioned the value types must be validated somehow to allow conversion to TypeScript types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because it seems that the key is not unique so far

to which implementation are you referring? the wklibs one? because in wk, it should be unique (if not, we should fix it 😅)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wk one, it seems that the creation of duplicates is restricted by the frontend but not in the backend. I can send the same key-value pair multiple times with the API without an error. I just tested it with the l4_test_mark dataset.
Screenshot 2024-09-05 at 09-58-33 WEBKNOSSOS

@MichaelBuessemeyer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for entering the thread so late

because in wk, it should be unique (if not, we should fix it 😅)

The frontend does the validation that no key is duplicated. The backend just takes what the frontend sends and saves & serves it.

Do you think a validation would be helpful? I can certainly build that :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think a validation would be helpful?

I changed the wk-libs implementation to use a dict now. So when using the libs as intended, the key should be unique too. Nevertheless, a validation would be helpful to ensure that the API is used correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think a validation would be helpful? I can certainly build that :)

We should either fully support duplicates, or ensure that none arise. So with the current design, I’d say yes, the backend should assert this in the update routes. Could you create an issue for that? It’s not super urgent, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created an issue for this: scalableminds/webknossos#8068

And as I wanted a rather quick and simple task right now, I simply started the implementation 🙈

@markbader markbader requested a review from fm3 September 10, 2024 07:23
Copy link
Member

@fm3 fm3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good :)
I tested and the access in the style of the example worked for me. However, intuitively, I first tried dataset.metadata["test"] = "asdf", which did not error, but also did have any effect. Do you think it would be simple to support this case as well? (Or alternatively somehow freeze this object, so that an error is provided, suggesting the other access style?) If this is very hard to achieve, then this current version is also ok.

@markbader
Copy link
Contributor Author

I first tried dataset.metadata["test"] = "asdf", which did not error, but also did have any effect. Do you think it would be simple to support this case as well?

As it was currently just a dict neither freezing nor overwriting __setitem__ is super easy without unwanted side effects. I implemented a subclass of dict now, that fetches data on each access and updates the data when the dict is updated. This might be more intuitive, but it is also slower. I guess we don't want to update the metadata very often, so this might be okay. What is your opinion on this approach?

@fm3
Copy link
Member

fm3 commented Sep 16, 2024

Interesting idea! I like this access pattern from the user perspective. However, I don’t know enough about python and its conventions to fully know what unwanted side-effects this specialized dict may have 🤔 Also I’m not certain I can review its implementation. Maybe we can get a second opinion? Maybe @daniel-wer could also have a look?

_T = TypeVar("_T", bound="Metadata")


class Metadata(dict):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered implementing a MutableMapping directly? I think that would remove some trouble with the persistence that a dict does automatically.

Suggested change
class Metadata(dict):
class Metadata(MutableMapping):

Copy link
Member

@fm3 fm3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested again and it works for me now :)
@normanrz could you do a final review round please?

Copy link
Member

@normanrz normanrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine from my side!

@bulldozer-boy bulldozer-boy bot merged commit e237515 into master Sep 23, 2024
19 checks passed
@bulldozer-boy bulldozer-boy bot deleted the accessing-metadata branch September 23, 2024 10:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add dataset/folder metadata
5 participants