Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support destroying non-empty buckets #22

Open
StarTerrarium opened this issue Jul 20, 2021 · 13 comments
Open

Support destroying non-empty buckets #22

StarTerrarium opened this issue Jul 20, 2021 · 13 comments
Assignees

Comments

@StarTerrarium
Copy link

StarTerrarium commented Jul 20, 2021

It would be nice if there were an option on the b2_bucket resource similar to the aws_s3_bucket resource to specify force_destroy = true to allow for destroying non-empty buckets.

The AWS provider docs explain this option as:

force_destroy - (Optional, Default:false) A boolean that indicates all objects (including any locked objects) should be deleted from the bucket so that the bucket can be destroyed without error. These objects are not recoverable.
@ppolewicz
Copy link
Collaborator

I don't think B2 API supports that yet. In order for this to be reliable, it needs to be a transactional operation which locks the bucket for the duration of the deletion of the files so that if you have a thread or two uploading small files all the time, the deleting thread can catch up to them and wipe the bucket AND run the delete before anyone can put anything new again Not sure if AWS actually supports this well, do you know?

@StarTerrarium
Copy link
Author

Interesting point. Just had a glimpse at the implementation in the AWS provider and it does seem to just recursively call itself until all objects are deleted. Although I'm very unfamiliar with Go myself so can't confidently answer.. At least to me it does seem like it is potentially a little bit unreliable if you had something writing to the bucket constantly.

For reference I was looking at this function - https://github.com/hashicorp/terraform-provider-aws/blob/6e5d33805b2d313ffa58d5237e335e0f58f42c38/aws/resource_aws_s3_bucket.go#L1369

For my use-case this is acceptable since I know we won't (or at least shouldn't) have anything writing to the buckets when we want to delete. As it stands we just have to run a script to manually delete files from the buckets prior to a terraform destroy

@ppolewicz
Copy link
Collaborator

I am not 100% sure if having a very easy way to delete buckets with data in them, without explicitly erasing the data, is what B2 terraform provider should provide, but since AWS provides it, maybe we could too?

@nhoughto
Copy link

its actually surprisingly difficult to delete a bucket (which i guess is by design?) so would be wonderful if it became a vendor problem and you solve it for me =)

Got to solve for

  • All the objects obviously, paginating the list-objects call with max 1000 objects per call
  • Versioned objects
  • Multipart uploads
  • Any of the three above that were created whilst you were deleting them since there is no way to block writes to a bucket.

@ppolewicz
Copy link
Collaborator

Generally you should almost never need to delete a bucket. Usually a workflow which deletes a bucket is where the problem is.

A common trick used to wipe a bucket is to use b2 command line tool b2 sync with --delete to sync an empty directory to the bucket.

To prevent writing, you'd have to revoke keys with permissions to the target bucket.

@nhoughto
Copy link

We delete buckets constantly its a regular hourly occurrence, this annoys us everyday.

@ppolewicz
Copy link
Collaborator

Ok, but why do you delete buckets? Can't you just wipe them and reuse the old ones?

@nhoughto
Copy link

Because we use Terraform (not surprising 😬 ) to spin up ephemeral CI and review app environments to conduct automated and exploratory testing, and the bucket is named after the environment.. and the environment is generated and ephemeral, so to provision the resources of these ephemeral environments we need ephemeral buckets, which means deleting buckets when the environment is removed, which means deleting them. We can't not delete them as we will hit the 100 bucket maximum (which has happened a few times).

We could clean and reuse buckets but they would have the wrong names, and the complexity of injecting this state into terraform during provisioning is painful, so we just keep Terraform happy and let it provision/destroy the buckets.. but at the moment it can't delete the bucket because of this issue =)

@ppolewicz
Copy link
Collaborator

Ok so first, your model only works if there are no more than 100 test environments at the same time. There is a better way.

What I would suggest is a bucket called XXX-testing-environments where XXX is a name of your company or something, and then in that bucket you'd have directories for each environment. The authentication key for the ephemeral server should be restricted to that bucket AND path and it should only have read/write permissions (so no update_bucket, no setting retention policy etc).
When spinning down an environment you can sync an empty directory to it and, just to be sure, you could have a cron job using b2 sync with --dryRun and filters to detect any FileVersions that are older than, say, a week. This would detect a situation where a cleanup procedure failed and storage leaked (though unless your jobs crash a lot, you can run that job yearly ;) )

@nhoughto
Copy link

But our terraform definition, which is important to be correct, has N b2_bucket resource one for each bucket.. that definition is used in all our environments to provision the environment before it used. I don't want to have a 'production terraform definition' and a 'other environments terraform definition'. I have one definition and do everything as much like production as possible to ensure consistency, which means every environment gets N buckets and thats how i want it.

If i hit the 100 limit regularly I will be requesting the limit be increased =) this is definitely the right approach for us so supporting it would be great.

@ppolewicz
Copy link
Collaborator

@nhoughto after internal discussion, we'll implement both wiping buckets before deletion and a successful return in terraform when a bucket that should be deleted is already gone. It's going to take a moment because some of this functionality needs to be done in b2sdk and then used in the terraform provider, so we'll have to do the changes to b2sdk first and release to then use the new library in the terraform provider. But we'll get it :)

@nhoughto
Copy link

excellent thanks! i imagine the 'dont fail if bucket already destroyed' is easier than cleaning and removing a bucket. 👍

@ppolewicz
Copy link
Collaborator

Correct. I can argue that with the existence of keys which are authorized to operate on multiple buckets and without the ability to lock buckets for writing, reliably cleaning and removing a bucket is impossible :) We'll do our best though. Not sure if we'll go that far, but I'm thinking that if we run a couple of rounds of deletions and new objects still appear in the bucket, going through keys and deleting those which are restricted to the bucket we are about to delete might be a practical way of stopping the writer (though it won't work if the key is not limited to a single bucket).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants