Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Add S3 backend capability for off-site copying of sparsebundle images. #134

Open
sbates130272 opened this issue May 22, 2023 · 12 comments

Comments

@sbates130272
Copy link

sbates130272 commented May 22, 2023

What problem are you looking to solve?

Off site backup of sparsebundles via AWS S3 or similar.

Describe the solution that you have in mind

Can we use something inside the image or a filesystem that leverages an S3 back-end to ensure sparsebundles are copied to the cloud for off-prem security/safety? For example it might be as simple as ensuring the container volume mount for the backup data resides on something like s3fs-fuse. What might be intesting is that this particular example removes the need for a local copy of the data and thus allows very small devices with no external storage to act as a time machine. Though I am not sure what this means for performance and some of the notes on POSIX limitations could be "interesting".

I will do some research on this and see what I can get working. We might have to be careful about not accidently leaking AWS credentials and I would prefer something that is not tied to AWS and allows other solutions (like MINIO).

Additional Context

No response

@sbates130272 sbates130272 changed the title [Feature]: [Feature]: Add S3 backend capability for off-site copying of sparsebundle images. May 22, 2023
@mbentley
Copy link
Owner

mbentley commented May 22, 2023

I am not aware of there being any method to having Samba being backed by anything S3 related although I haven't looked. I doubt it through as it would be quite slow if there was a backend that would natively work with S3 and Samba and not be extremely poor from a speed perspective but I could be wrong. Otherwise, it would rely on other solutions to sync data to S3 but I also don't know the structure behind the sparesebundles to understand if it would end up copying over excessive amounts of data when doing a sync to s3.

@sbates130272
Copy link
Author

sbates130272 commented May 22, 2023

Thanks @mbentley for the quick response. Let me take a look at this and also do some performance testing to see just how feasible this is. I see two options:

  1. A FUSE based filesystem like s3fs-fuse that removes the need for local storage and uses s3 objects for the filesystem storage. But could be slow and needs to be tested.
  2. A process inside the container that syncs sparsebundle files to an s3 bucket at user-defined intervals.

@sbates130272
Copy link
Author

So I am testing doing some time machine backups to a volume mount in the container where that volume mount is also a s3fs FUSE mountpoint on the host. This is then backed by a AWS s3 bucket. I will let you know how it goes.

@mbentley
Copy link
Owner

Good deal, thanks! That'll be interesting to see as Samba + TM seems to be a bit picky about the underlying filesystem at times related to extended attributes.

@sbates130272
Copy link
Author

@mbentley, yes I am seeing some issues with xattr. Digging into that now to see if it is a showstopper of if we can do something to address this. Cheers.

@sbates130272
Copy link
Author

@mbentley so the lack of appropriate xattr support in s3fs does seem to be a showstopper for now. There might be some clever way around it but I am not sure what that would be. So another option is to add to your Dockerfile to enable a cron job to upload the sparsebundles to AWS every time units. But I can also do that outside your docker image if I want. So the question is would you consider such a feature in your docker image an acceptable enhancement or not?

@mbentley
Copy link
Owner

I am curious on your thoughts on how you might be able to include support for that as I wouldn't necessarily be opposed to it. Seems like something that would be easy enough to have disabled by default and use an env var to enable it with the appropriate keys and whatnot. Something like crond being added to the image and starting through s6 like the other services wouldn't be difficult and if someone doesn't enable it via the env var, the s6 run script would just skip starting crond.

@sbates130272
Copy link
Author

That's exactly what I was thinking @mbentley! Off by default, and then enabled via env variables and using a similar mechanism to provide the AWS credentials. And then another vairable to set the desired backup schedule and some cron variant added to the image. Let me see if I can code up a prototype this week for you to take a look at.

@bitte-ein-bit
Copy link

bitte-ein-bit commented Sep 21, 2023

I'm currently looking into this image, so I can't update anything specific yet. What I do for all my docker-compose environment so far, is the following:

  • have it run on a btrfs volume
  • snapshot the volume daily
  • sync the snapshot only every four days with S3 using rclone. The main reason for only every four days is costs.
#!/bin/bash
day=$(date +%a)
TARGET_DIR=/home/pi/backup/snapshot_$day
# delete if existing
btrfs subvolume show $TARGET_DIR &>/dev/null && btrfs subvolume delete $TARGET_DIR
btrfs subvolume snapshot -r /home $TARGET_DIR

[[ $(($(date +%j)%4)) == 0 ]] && exit 0
rclone  sync $TARGET_DIR/pi aws:bucket-name -v   --exclude=**/node_modules/** --exclude=**/__pycache__/** --update --use-server-modtime --links

I'd go with a similar approach here: use an external battle-tested tool to sync to S3. If the volume is used by multiple Macs or the bandwidth is low, I'd suggest to also implement snapshotting on the volume level to ensure consistency.

EDIT: btrfs not ZFS... too tired I guess.

@Alex1s
Copy link
Contributor

Alex1s commented Dec 18, 2023

@sbates130272 How was xattr a problem with s3fs? Currently the s3fs README.md advertises extended attribute support. Maybe extended attribute support was added after you did research into this? Or what exactly was the problem?

@kaplan-michael
Copy link

I have used this fstab config

backups /srv/backups fuse.s3fs _netdev,passwd_file=/etc/passwd-s3fs,allow_other,use_path_request_style,use_xattr,nodev,nosuid,complement_stat,url=https://minio-address:9000 0 0

note the use_xattr
the xattr check passed fine

[timemachine] | INFO: running test for xattr support on your time machine persistent storage location...
[timemachine] | INFO: xattr test successful - your persistent data store supports xattrs
[timemachine] | INFO: Detected filesystem for /opt/timemachine is fuse.s3fs
[timemachine] | INFO: entrypoint complete; executing 's6-svscan /etc/s6'

but when uploading/downloading files it doesn't seem happy...

[timemachine] | ad_convert_xattr: SMB_VFS_CREATE_FILE failed
[timemachine] | fruit_freaddir_attr: ad_convert("tmp pics") failed
[timemachine] | ad_convert_xattr: SMB_VFS_CREATE_FILE failed
[timemachine] | fruit_create_file: ad_convert("tmp pics:AFP_AfpInfo") failed
[timemachine] | ad_convert_xattr: SMB_VFS_CREATE_FILE failed
[timemachine] | fruit_freaddir_attr: ad_convert("tmp pics") failed
[timemachine] | ad_convert_xattr: SMB_VFS_CREATE_FILE failed
[timemachine] | fruit_freaddir_attr: ad_convert("tmp pics") failed
[timemachine] | ad_convert_xattr: SMB_VFS_CREATE_FILE failed
[timemachine] | fruit_freaddir_attr: ad_convert("tmp pics") failed

Weirdly, I was able to upload some files through smb, then downloaded raw from s3 elsewhere and they were just fine, so not sure.

I'll try to run some timemachine backups on it and see what happens.

@kaplan-michael
Copy link

So it seems it is still missing some required attr functions.
Screenshot 2024-11-06 at 6 59 08

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants