Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow hub access to production veda data store #53

Closed
1 task done
anayeaye opened this issue Jul 30, 2024 · 12 comments · Fixed by 2i2c-org/infrastructure#4533
Closed
1 task done

Allow hub access to production veda data store #53

anayeaye opened this issue Jul 30, 2024 · 12 comments · Fixed by 2i2c-org/infrastructure#4533
Assignees

Comments

@anayeaye
Copy link

anayeaye commented Jul 30, 2024

What

Allow hubs to read production objects veda-data-store. We now have a stable production catalog and S3 data store and need to update our notebook examples to refer to the same data that users see in the dashboard.

Notes

In MCP I have updated the veda-data-store bucket policy to allow GetObject and ListBucket to these roles: "arn:aws:iam::444055461661:role/nasa-veda-prod", "arn:aws:iam::444055461661:role/nasa-veda-staging".

I think the hub has full Get, List, and Put set up for staging so the update might be here even though we do not want hub users to be able to Put in production (but the bucket will not allow that operation anyway): https://github.com/2i2c-org/infrastructure/blob/main/terraform/aws/projects/nasa-veda.tfvars#L47

AC

  • hub users can access production bucket

Testable with

The Download STAC Assets notebook should work using production STAC_API_URL = "https://openveda.cloud/api/stac" when run in the hub.

@wildintellect
Copy link

Minor: I don't think that notebook is the best test.

Easier test on the hub

$ rio cogeo info s3://veda-data-store/barc-thomasfire/thomas_fire_barc_201712.cog.tiff
WARNING:rasterio._env:CPLE_AppDefined in HTTP response code on https://veda-data-store.s3.amazonaws.com/barc-thomasfire/thomas_fire_barc_201712.cog.tiff: 403
s2n_init() failed: 402653198 (error opening urandom)
Fatal error condition occurred in /home/conda/feedstock_root/build_artifacts/tiledb_1708024446644/work/build/externals/src/ep_awssdk/crt/aws-crt-cpp/crt/aws-c-io/source/s2n/s2n_tls_channel_handler.c:203: 0 && "s2n_init() failed"
Exiting Application
################################################################################
Stack trace:
################################################################################

Another random item from a different collection

rio cogeo info s3://veda-data-store/bangladesh-landcover-2001-2020/MODIS_LC_2020_BD.cog.tif
WARNING:rasterio._env:CPLE_AppDefined in HTTP response code on https://veda-data-store.s3.amazonaws.com/bangladesh-landcover-2001-2020/MODIS_LC_2020_BD.cog.tif: 403
Traceback (most recent call last):
  File "rasterio/_base.pyx", line 310, in rasterio._base.DatasetBase.__init__
  File "rasterio/_base.pyx", line 221, in rasterio._base.open_dataset
  File "rasterio/_err.pyx", line 221, in rasterio._err.exc_wrap_pointer
rasterio._err.CPLE_AWSAccessDeniedError: Access Denied

@anayeaye it appears the bucket policy is not correct. Can you please share the policy internally (not on this ticket for review)

@wildintellect
Copy link

Correction the branch is https://github.com/NASA-IMPACT/veda-hub-infrastructure/tree/veda-data-store
2i2c-org/infrastructure#4533 just adds the bucket to "staging" I think it's worth verifying all the permissions blocking write/deletes are correct on the VEDA side before deploying more widely.

@anayeaye
Copy link
Author

anayeaye commented Jul 31, 2024

Question should ESDIS and GHG instance get the same bucket access? They all currently only have staging.

All hubs in the VEDA universe should have GetObject and ListBucket perms veda-data-store. It is slowish but we are still trying to encourage sharing rather than duplicating data to every environment. EDIT we also need to add/confirm those instances are covered the bucket policy.

it appears the bucket policy is not correct. Can you please share the policy internally (not on this ticket for review)

I will share it with you internally. I would be surprised if it is not correct because I have granted the same permissions as the hubs currently have for the staging bucket which can be accessed via. the hub.

The rio cogeo info routine in the hub is easier to test than running the notebook example, thanks for the snippet!

## veda-data-store-staging accessible
(notebook) jovyan@jupyter-anayeaye:~$ rio cogeo info s3://veda-data-store-staging/EIS/COG/Fire-Hydro/bs_to_save.tif
Driver: GTiff
File: s3://veda-data-store-staging/EIS/COG/Fire-Hydro/bs_to_save.tif
COG: True
Compression: DEFLATE

## veda-data-store equivalent object is not accessible
(notebook) jovyan@jupyter-anayeaye:~$ rio cogeo info s3://veda-data-store/caldor-fire-burn-severity/bs_to_save.tif
WARNING:rasterio._env:CPLE_AppDefined in HTTP response code on https://veda-data-store.s3.amazonaws.com/caldor-fire-burn-severity/bs_to_save.tif: 403
Traceback (most recent call last):
  File "rasterio/_base.pyx", line 310, in rasterio._base.DatasetBase.__init__
  File "rasterio/_base.pyx", line 221, in rasterio._base.open_dataset
  File "rasterio/_err.pyx", line 221, in rasterio._err.exc_wrap_pointer
rasterio._err.CPLE_AWSAccessDeniedError: Access Denied

@wildintellect
Copy link

@anayeaye yes I spoke to soon, the blocker is actually on the Hub side right now. One you approve 2i2c will deploy to staging hub, we can test, then do a 2nd PR pushing that bucket to all the VEDA related hubs.

@wildintellect
Copy link

wildintellect commented Aug 1, 2024

@anayeaye I've tested on staging that read access works. How would you like to test that other actions are blocked? Do you want try making a file in the bucket - is there a safe object to test removing? etc....

Then when you're happy I can open another PR To apply the fix to all the hubs/production.

(notebook) jovyan@jupyter-wildintellect:~$ rio cogeo info s3://veda-data-store/barc-thomasfire/thomas_fire_barc_201712.cog.tiff
Driver: GTiff
File: s3://veda-data-store/barc-thomasfire/thomas_fire_barc_201712.cog.tiff
COG: True
Compression: DEFLATE
ColorSpace: None
...

@anayeaye
Copy link
Author

anayeaye commented Aug 9, 2024

@wildintellect I'm comfortable with the MCP bucket policy blocking. Would be nice to see things more specific in the hub role but it doesn't need to be. So I say we are ready for the PR to apply the fix to production. Thanks!

@wildintellect
Copy link

PR completed 2i2c-org/infrastructure#4609 (comment)
TODO: verify with quick test.

@smohiudd
Copy link

I ran a few of the veda-docs quickstart notebooks now and not getting anymore access denied errors.

@wildintellect
Copy link

If it all looks good please comment on 2i2c-org/infrastructure#4535 (comment) and then we can close this.

@anayeaye
Copy link
Author

currently having a pydantic v2 version conflict problem in the hub so I used a new test :(.

BUT I can read prod from hub.openveda.cloud ✅

aws s3api head-object --bucket veda-data-store --key caldor-fire-burn-severity/bs_to_save.tif
{
    "AcceptRanges": "bytes",
    "LastModified": "2024-03-15T21:13:17+00:00",
    "ContentLength": 324771,
    "ETag": "\"e3a43004c765f8e69794228258c0c579\"",
    "ContentType": "image/tiff",
    "ServerSideEncryption": "AES256",
   

@wildintellect
Copy link

@anayeaye can we close this ticket?

@anayeaye anayeaye closed this as completed Sep 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants