Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the s3 url accessible to webpage #12

Open
ylyangtw opened this issue Jul 29, 2024 · 2 comments
Open

Make the s3 url accessible to webpage #12

ylyangtw opened this issue Jul 29, 2024 · 2 comments
Assignees

Comments

@ylyangtw
Copy link
Contributor

ylyangtw commented Jul 29, 2024

Tech team needs to revise the s3 url accessible to webpage

Look into cyberduck to make sure the buckets and objects exist

or use croissant
Could we additionalProperty to the distribution element?
Some examples from HuggingFace croissant
https://huggingface.co/docs/dataset-viewer/en/croissant (see distributions)

image

@valentinedwv
Copy link
Contributor

ESIPFed/science-on-schema.org#240

@fils
Copy link
Contributor

fils commented Aug 22, 2024

just a bit of JSON as a strawmaan for discussion.

here I played with properties potentialAction and distribution with some types properties to express the s3 protocol. Not solutions, just ideas for discussion.

{
  "@context": {
    "@vocab": "https://schema.org/"
  },
  "@type": "Dataset",
  "@id": "https://registry.org/permanentUrlToThisJsonDoc",
  "name": "A concise but descriptive name of the dataset",
  "description": "An extended, free-text description of what's in the dataset, who created it, and other attributes",
  "url": "https://urlToTheDatasetOrLandingPage.org/",
  "keywords": ["Keyword 1", "Keyword 2", "Keyword 3"],
  "potentialAction": {
    "@type": "DownloadAction",
    "target": {
      "@type": "EntryPoint",
      "urlTemplate": "s3://your-bucket-name/path/to/file",
      "encodingType": "application/octet-stream",
      "additionalProperty": {
        "@type": "PropertyValue",
        "name": "protocol",
        "value": "s3"
      }
    }
  },
  "distribution": [
    {
      "@type": "DataDownload",
      "contentUrl": "http://urlToDirectDownloadOfThisDataset.org/",
      "encodingFormat": "text/csv"
    },
    {
      "@type": "DataDownload",
      "contentUrl": "s3://urlToDirectDownloadOfThisDataset.org/prefix/object",
      "encodingFormat": "text/csv",
      "additionalProperty": [
        {
          "@type": "PropertyValue",
          "name": "protocol",
          "value": "s3"
        },
        {
          "@type": "PropertyValue",
          "name": "s3Url",
          "value": "s3://your-bucket-name/path/to/file"
        }
      ]
    }
  ],
  "spatialCoverage": {
    "@type": "Place",
    "geo": {
      "@type": "GeoShape",
      "description": "schema.org expects lat long (Y X) coordinate order",
      "polygon": "10.161667 142.014,18.033833 142.014,18.033833 147.997833,10.161667 147.997833,10.161667 142.014"
    },
    "additionalProperty": {
      "@type": "PropertyValue",
      "propertyID": "https://dbpedia.org/page/Spatial_reference_system",
      "value": "https://www.w3.org/2003/01/geo/wgs84_pos"
    }
  },
  "provider": [
    {
      "@type": "Organization",
      "legalName": "Legal Name of Organisation which generated the dataset",
      "name": "Other Name of Organisation which generated the dataset",
      "url": "https://organisationWebsite.org/"
    }
  ]
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

3 participants