-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cloud store support #129
Comments
We could consider obstore, which supports S3 and Azure: https://developmentseed.org/obstore/latest/ It's being integrated into Zarr here: zarr-developers/zarr-python#1661 - that PR is very close to being completed I think. We might consider moving vcztools to use Zarr Python v3, which would be low-risk since it only needs to read Zarr. (Bio2zarr writes Zarr, so best left on Zarr Python v2 for the time being.) Or at least require v3 for cloud support? |
Yeah, v3 for zcztools does sound sensible as we will want to have async chunk downloading too. Obstore looks promising. I guess we'd have to try out a few of these and see how well they work with the key clouds. |
As a proof-of-concept I ran some of the unit tests using obstore (and Zarr #1661) for files on the local filesystem and they passed. I also uploaded a vcz to S3 and ran the following, using this modification to vcztools: export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
export AWS_DEFAULT_REGION=...
vcztools view s3://sgkit-dev-data/sample.vcf.vcz/ Which produced the expected output:
So it looks promising. It should be straightforward to only use obstore for cloud stores for the moment, and leave local files using the current code paths. |
Amazing! Is this on Zarr 2 or Zarr 3? |
Zarr-Python 3 - the obstore integration only works with that. |
I did another test, and running the same command with Zarr-Python v2 and fsspec (specifically
|
Wow, literally no changes at all?? Something to discuss, but I think it might be better to stick with one Zarr version across the different packages if we could, so maybe the fsspec version for now is the easiest path |
I agree. We could release what we have now (which will work with S3), and then integrate obstore for Azure (and asyncio) later. |
For this release we'll just document how to run on S3 using fsspec. |
There are some good tools for mocking out S3 and azure that we could use in CI: |
We need to have cloud store support soon, so it would be good to spec out the options here.
From an interface perspective, I guess we'll need to support various side-channels for passing authentication tokens.
I guess the first choice is whether to use fspec or work directly with S3Map, ABStore, etc.
Fsspec would seem like less work, and should be performant enough for purposes here?
Hmm, looks like Fspec doesn't support Azure directly though. As S3 and Azure are the most immediately important targets here, I wonder if there's much actual value in using Fsspec.
@tomwhite any thoughts here?
The text was updated successfully, but these errors were encountered: