Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow renaming datasets & dataset with duplicate names #8075

Open
wants to merge 83 commits into
base: master
Choose a base branch
from

Conversation

MichaelBuessemeyer
Copy link
Contributor

@MichaelBuessemeyer MichaelBuessemeyer commented Sep 12, 2024

Further Notes:

  • Quite some of the line changes are the result of moving the ObjectId class to the utils package so that all wk backend servers have access to this class.

URL of deployed dev instance (used for testing):

  • https://___.webknossos.xyz

Steps to test:

  • Give two datasets the same name and check whether annotations and so on works
  • Test whether the task system still works with duplicate dataset names
  • check dataset upload
    • dataset upload
    • add remote
    • compose
  • ...

TODOs:

  • Add evolution and reversion
    • testing needed
  • Test uploading:
    • Report upload fails
  • Adjust worker to newest job arguments as the dataset name can no longer be used to uniquely identify a dataset
  • rename organization_name in worker to organization_id. see Rename organization_name to organization_id in worker args #8038
  • Dataset Name settings field has an unwanted spinner (see upload view)
  • Check the job list
  • Properly implement legacy searching for datasets when old URI param is used
  • Adjust legacy API routes to return dataset in old format
    • It is just an additional field. Thus, I would say it should be fine.
  • datasets appear to be duplicated in the db
    • Maybe these are created by jobs with an output dataset
  • Fix dataset insert
  • Skeleton & VolumeTracings address a dataset via its name
    • Not really used only during task / annotation creation
    • Use heuristic upon upload and temporary patch the Tracing case classes to carry the datasetId during the creation process once the dataset is identified once.
    • Task creation works
    • Needs testing
      • fix annotation upload
    • needs to support old nmls
  • Put datasetId into newly created nmls
  • In the backend LinkedLayerIdentifier still uses the datasetName as an identifier
    • used in wklibs, maybe just interpret the name as a path and work with this. in case it cannot be found the user needs to update wklibs. Add comment for this!
  • [ ] the dataset C555_tps_demo has quite some bucket loading errors. Unsure why some buckets do not work The dataset seems to be broken. Could reproduce this on other branches
  • Notion-style URLs are missing (i.e. -, but only the id part is actually used)
  • Maybe remove DatasetURIParser

Issues:


(Please delete unneeded items, merge only when none are left open)

@MichaelBuessemeyer
Copy link
Contributor Author

@coderabbitai pause

as pr it tool large anyway

Copy link

coderabbitai bot commented Oct 29, 2024

@MichaelBuessemeyer: Understood, I will pause the automatic reviews for this PR.

✅ Actions performed

Reviews paused.

@MichaelBuessemeyer MichaelBuessemeyer marked this pull request as ready for review October 29, 2024 15:13
@MichaelBuessemeyer
Copy link
Contributor Author

I think this should be ready to go now. In case you notice some errors please tell me 🙏

@MichaelBuessemeyer MichaelBuessemeyer changed the title WIP: Allow renaming datasets & dataset with duplicate names Allow renaming datasets & dataset with duplicate names Oct 29, 2024
Copy link
Member

@fm3 fm3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really cool stuff! I had a look at the backend changes and added a couple of comments.
Most things are pretty small. Two bigger things I noticed

  • I’m not really sure about the reserveUpload protocol. It seems pretty involved, with identical fields being sent back and forth, with some values filled in remotely. Maybe we can simplify this, possibly using multiple requests
  • Looks like URIs with normalized name and id are not yet used? At least the OpenGraphService doesn’t seem to parse URIs in that way. Did you build that into the frontend? If not, did you talk to Norman about this? (I believe he had wished for this feature?)

I did not do any testing in this review round. I already tested a little earlier, and will do another round of testing in a later iteration.

app/controllers/DatasetController.scala Outdated Show resolved Hide resolved
app/controllers/DatasetController.scala Outdated Show resolved Hide resolved
app/controllers/DatasetController.scala Outdated Show resolved Hide resolved
app/models/dataset/Dataset.scala Show resolved Hide resolved
app/models/dataset/Dataset.scala Outdated Show resolved Hide resolved
}

object DataSourceId {
implicit val dataSourceIdFormat: Format[DataSourceId] = Json.format[DataSourceId]
object DataSourceId extends JsonImplicits {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the extends JsonImplicits really needed? Do you know what is used from there?

@@ -51,23 +54,28 @@ object ReserveManualUploadInformation {

case class LinkedLayerIdentifier(organizationId: Option[String],
organizationName: Option[String],
// Filled by backend after identifying the dataset by name. Afterwards this updated value is stored in the redis database.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not really happy with this protocol of sending most of the reserve info back and forth. I feel like we need to be very careful now about what info (before or after) we are using in which spot. Maybe it would be clearer if we send multiple requests, so that each request has a clear concern (reserve, get unique names, etc). I don’t really have a clear plan in mind yet. Maybe let’s talk about this in person again.

conf/webknossos.versioned.routes Outdated Show resolved Hide resolved
app/controllers/UserTokenController.scala Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow to rename datasets
3 participants