Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design doc - Publish Dandisets that contain Zarr archives #1833

Closed
wants to merge 27 commits into from

Conversation

kabilar
Copy link
Member

@kabilar kabilar commented Jan 26, 2024

@kabilar kabilar self-assigned this Jan 26, 2024
@kabilar kabilar changed the title Requirements doc - Support updates to Zarr archives after publishing the corresponding Dandiset Design doc - Support updates to Zarr archives after publishing the corresponding Dandiset Jan 26, 2024
@kabilar kabilar changed the title Design doc - Support updates to Zarr archives after publishing the corresponding Dandiset Design doc - Publish Dandisets that contain Zarr archives, and support updates to the Zarr archive after publishing the Dandiset Jan 31, 2024
@kabilar
Copy link
Member Author

kabilar commented Feb 1, 2024

cc @aaronkanzer

@waxlamp waxlamp added the design-doc Involves creating or discussing a design document label Feb 8, 2024
.gitignore Outdated Show resolved Hide resolved
Copy link
Member

@waxlamp waxlamp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving this review as a comment because it seems this is still WIP. I will also throw the switch to convert this PR to a draft.

doc/design/zarr-publish-1.md Outdated Show resolved Hide resolved

## Current implementation

When a blob asset is updated, a new version (i.e. a copy) is uploaded to the S3 bucket. Zarr archives are too large so multiple copies should not be created. A Zarr archive is uploaded once and it is updated in place. This design means that the Zarr archive is immutable once the Dandiset is published, so that the published Dandiset is immutable. Currently, a Dandiset cannot be published if it contains a Zarr asset. For more details, see the [zarr-support-3 design doc](https://github.com/dandi/dandi-archive/blob/master/doc/design/zarr-support-3.md).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When a blob asset is updated, a new version (i.e. a copy) is uploaded to the S3 bucket. Zarr archives are too large so multiple copies should not be created. A Zarr archive is uploaded once and it is updated in place. This design means that the Zarr archive is immutable once the Dandiset is published, so that the published Dandiset is immutable. Currently, a Dandiset cannot be published if it contains a Zarr asset. For more details, see the [zarr-support-3 design doc](https://github.com/dandi/dandi-archive/blob/master/doc/design/zarr-support-3.md).
When a non-Zarr asset blob is updated, a new copy of that file is uploaded to the S3 bucket. Zarr archives are too large so multiple copies should not be created. A Zarr archive is uploaded once and it is updated in place. This design means that the Zarr archive is immutable once the Dandiset is published, so that the published Dandiset is immutable. Currently, a Dandiset cannot be published if it contains a Zarr asset. For more details, see the [zarr-support-3 design doc](https://github.com/dandi/dandi-archive/blob/master/doc/design/zarr-support-3.md).

Updating to use more accurate domain language (there isn't such a thing as a "blob asset", and the word "version" has a specific meaning in the context of DANDI).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Essentially, we disallow publishing of Zarr-containing Dandisets because we don't want to create copies of Zarrs if and when they are updated, and we're seeking a design that would allow us to do so. You might be able to condense this paragraph down to express that more directly; the sentence in the middle ("this design means...") seems a bit out of place in particular.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would add that there are a two requirements.

  • updating/publishing in a dandiset
  • adding the asset to another dandiset
  • being able to publish both

for implementation:

  • we currently don't have a readonly mode for an asset and any modification of zarr added to another dandiset should fail, unless control is handed over (more complicated).
  • adding should have either a link (readonly) or copy option. on the blob side, we have copy on write, but for assets this could get complicated for large trees.

The publishing procedure would follow the description found in the [publish-1 design doc](https://github.com/dandi/dandi-archive/blob/master/doc/design/publish-1.md). A modified publishing procedure that includes Zarr archive(s) is summarized below.

1. User uploads a new Dandiset which includes a Zarr archive(s).
2. User uploads an updated Zarr archive(s) to the `Draft` version of the Dandiset.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step seems out of place in the use case description. We can already do this part (update a Zarr that is in a draft version), and the important novelty is in step 3 and beyond.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reordered steps 2 and 3, which would reflect functionality that is currently not in place (i.e. publishing a Dandiset with a Zarr archive and subsequently updating the Draft version). Please let me know if I misunderstood your comment.

doc/design/zarr-publish-1.md Outdated Show resolved Hide resolved
doc/design/zarr-publish-1.md Outdated Show resolved Hide resolved
doc/design/zarr-publish-1.md Outdated Show resolved Hide resolved
@waxlamp waxlamp marked this pull request as draft February 8, 2024 15:54
kabilar and others added 2 commits February 13, 2024 09:09
@kabilar kabilar changed the title Design doc - Publish Dandisets that contain Zarr archives, and support updates to the Zarr archive after publishing the Dandiset Design doc - Publishing Dandisets that contain Zarr archives Feb 14, 2024
@kabilar kabilar changed the title Design doc - Publishing Dandisets that contain Zarr archives Design doc - Publish Dandisets that contain Zarr archives Feb 14, 2024
@kabilar
Copy link
Member Author

kabilar commented Oct 21, 2024

Closing this pull request as I am re-forking dandi/dandi-archive directly to kabilar. Will reopen a new pull request.

@kabilar kabilar closed this Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design-doc Involves creating or discussing a design document
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants