Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use PotenitialAction to describe arrow dataset access #27

Open
valentinedwv opened this issue Sep 26, 2024 · 3 comments
Open

use PotenitialAction to describe arrow dataset access #27

valentinedwv opened this issue Sep 26, 2024 · 3 comments
Labels
documentation Improvements or additions to documentation enhancement New feature or request question Further information is requested

Comments

@valentinedwv
Copy link
Contributor

Stac catalog has a database access element on the page. We need to find a way to replicate/communicate the same information.

Also, Collections may have assets, if so, then we render as a dataset with a potential action.
Datasets can have s3 links that are really better described with code, add a potenital action

So, for the s3 link, we might think about using potential actions rather than data distributions

Not sure if this is a createAction, or an consumeAction... and not really sure if that matters. we pick one.

@valentinedwv valentinedwv added documentation Improvements or additions to documentation enhancement New feature or request question Further information is requested labels Sep 26, 2024
@ylyangtw
Copy link
Contributor

@ylyangtw
Copy link
Contributor

close as #12 is the same issue

@valentinedwv
Copy link
Contributor Author

valentinedwv commented Jan 23, 2025

Partially, #12 makes it available.

Seems some parts of the croissant spec are missing, and @fils do you think we should follow it?

And noticing the first one also has a code snippet. Does the facet search also make that snippet accessible?

snippet from jsonld:

"distribution": 
[
{
"@type": 
"DataDownload",
"contentUrl": 
"https://sdsc.osn.xsede.org/bio230014-bucket01/challenges/metadata/model_id/cb_prophet.json",
"description": 
"Use `jsonlite::fromJSON()` to download the model metadata JSON file. This R code will return metadata provided during the model registration. ### R ```{r} # Use code below model_metadata <- jsonlite::fromJSON("https://sdsc.osn.xsede.org/bio230014-bucket01/challenges/metadata/model_id/cb_prophet.json") ",
"encodingFormat": 
"application/json",
"name": 
"Model Metadata"
},
{
"@type": 
"DataDownload",
"contentUrl": 
"https://github.com/cboettig/forecasts-darts-framework",
"description": 
"The link to the model code provided by the model submission team",
"encodingFormat": 
"text/html",
"name": 
"Link for Model Code"
},
{
"@type": 
"DataDownload",
"contentUrl": 
"s3://anonymous@bio230014-bucket01/challenges/forecasts/bundled-parquet//project_id=neon4cast/duration=P1D/variable=temperature/model_id=cb_prophet?endpoint_override=sdsc.osn.xsede.org",
"description": 
"Use `R` or `Python` code for remote access to the database. This code will return results for this variable and model combination. ### R ```{r} # Use code below all_results <- arrow::open_dataset("s3://anonymous@bio230014-bucket01/challenges/forecasts/bundled-parquet//project_id=neon4cast/duration=P1D/variable=temperature/model_id=cb_prophet?endpoint_override=sdsc.osn.xsede.org") df <- all_results |> dplyr::collect() ``` You can use dplyr operations before calling `dplyr::collect()` to `summarise`, `select` columns, and/or `filter` rows prior to pulling the data into a local `data.frame`. Reducing the data that is pulled locally will speed up the data download speed and reduce your memory usage. ### Python ```# Use code below import ibis con = ibis.duckdbf.connect() con.raw_sql(f''' CREATE OR REPLACE SECRET secret ( TYPE S3, ENDPOINT 'sdsc.osn.xsede.org', URL_STYLE 'path' );/n ''' path = "s3://bio230014-bucket01/challenges/forecasts/bundled-parquet//project_id=neon4cast/duration=P1D/variable=temperature/model_id=cb_prophet" con.read_parquet(path + "/**")",
"encodingFormat": 
"application/x-parquet",
"name": 
"Database Access for Daily Water_temperature"
}
],

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request question Further information is requested
Projects
Status: Done
Development

No branches or pull requests

2 participants