Skip to content

Commit

Permalink
[DATALAD RUNCMD] Remove unnecessary **emphasis in section headers
Browse files Browse the repository at this point in the history
=== Do not change lines below ===
{
 "chain": [],
 "cmd": "sed -e 's,# \\*\\*\\(.*\\)\\*\\*,# \\1,g' -i doc/design/s3-trailing-delete.md",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [],
 "pwd": "."
}
^^^ Do not change lines above ^^^
  • Loading branch information
yarikoptic committed Sep 7, 2023
1 parent 7ae9f9e commit f53cc75
Showing 1 changed file with 5 additions and 5 deletions.
10 changes: 5 additions & 5 deletions doc/design/s3-trailing-delete.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,22 @@
# **S3 Trailing Delete**
# S3 Trailing Delete

## **Why is "trailing delete" necessary?**
## Why is "trailing delete" necessary?

The core value of the DANDI Archive comes from the data we host. The process for getting this data into DANDI often involves coordination between several people to get an extremely large volume of data annotated with useful metadata and uploaded to our system. Because of the amount of time and work involved in this process, we need to minimize the risk of accidental data loss to the greatest extent that is possible and reasonable. Additionally, we would like to implement “garbage collection” in the future, which involves programmatically clearing out stale asset blobs from S3. All of this leads to a desire to be able to recover an s3 object that has been deleted.

Our ultimate goal is to prevent data loss from application programming errors. With protection such as a trailing delete capability, we will be safer in implementing application features that involve intentional deletion of data. Any bugs we introduce while doing so are far less likely to destroy data that was not supposed to be deleted.

The original GitHub issue around this feature request can be found at [https://github.com/dandi/dandi-archive/issues/524](https://github.com/dandi/dandi-archive/issues/524). Although the issue asks for a Deep Glacier storage tier, the design in this document solves the underlying problem differently (and in a more robust way). Below we address the possible usage of a Deep Glacier tiered bucket as a solution to the orthogonal problem of data **backup** which addresses a different problem than the trailing delete capability described in this document.

## **Requirements**
## Requirements

- After deletion of an asset blob, there needs to be a period of 30 days during which that blob can be restored.

## **Proposed Solution**
## Proposed Solution

What we want can be described as a “trailing delete” mechanism. Upon deletion of an asset from the bucket, we would like the object to remain recoverable for some amount of time. S3 already supports this in the form of Bucket Versioning.

### **S3 Bucket Versioning**
### S3 Bucket Versioning

Enabling bucket versioning will change what happens when an object in S3 is deleted. Instead of permanently deleting the object, S3 will simply place a delete marker on it. At that point, the object is hidden from view and appears to be deleted, but still exists and is recoverable.

Expand Down

0 comments on commit f53cc75

Please sign in to comment.