You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@web3-storage/capabilities exposes Ucanto Capability objects that capture schema information for nb fields, so we should have everything we need to generate Glue table schemas automatically, which would make it easier to make data available to Athena for querying.
conststoreAddTableName=getCdkNames('store-add-table',app.stage)conststoreAddTable=newglue.CfnTable(stack,storeAddTableName,{catalogId: Aws.ACCOUNT_ID,
databaseName,tableInput: {name: storeAddTableName,partitionKeys: [{name: 'day',type: 'date'}],parameters: {classification: "json",typeOfData: "file",// @see https://docs.aws.amazon.com/athena/latest/ug/partition-projection-kinesis-firehose-example.html for more information on projection// configuration - this should match the "day" parameter and S3 prefix configured in the delivery stream"projection.enabled": "true","projection.day.type": "date","projection.day.format": "yyyy-MM-dd","projection.day.range": "2023-01-01,NOW","projection.day.interval": "1","projection.day.interval.unit": "DAYS","storage.location.template": `s3://${streamLogBucket.bucketName}/logs/receipt/store_add/\${day}/`},storageDescriptor: {location: `s3://${streamLogBucket.bucketName}/logs/receipt/store_add/`,columns: [{name: 'carcid',type: 'string'},// STRUCT here refers to the Apache Hive STRUCT datatype - see https://aws.amazon.com/blogs/big-data/create-tables-in-amazon-athena-from-nested-json-and-mappings-using-jsonserde/{name: 'value',type: 'STRUCT<att:ARRAY<struct<can:STRING,with:STRING,nb:STRUCT<size:BIGINT,link:STRUCT<_cid_slash:STRING>>>>,iss:STRING,aud:STRING>'},{name: "out",type: "STRUCT<error:STRUCT<name:STRING>,ok:STRUCT<status:STRING,link:STRUCT<_cid_slash:STRING>>>"},{name: "ts",type: "timestamp"}],inputFormat: 'org.apache.hadoop.mapred.TextInputFormat',outputFormat: 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat',serdeInfo: {serializationLibrary: 'org.openx.data.jsonserde.JsonSerDe',parameters: {// see https://aws.amazon.com/blogs/big-data/create-tables-in-amazon-athena-from-nested-json-and-mappings-using-jsonserde/'mapping._cid_slash': '/'}}}}})storeAddTable.addDependsOn(glueDatabase)
where the Ucanto Schema of Store.add is used to generate the type field of the out column of the storageDescriptor.
@web3-storage/capabilities does also expose a comprehensive list of the names of capabilities it defines (as abilitiesAsStrings) so we could even automatically generate tables for all known capabilities. I don't think this would be too costly on the AWS side but it would be a good idea to verify that.
The text was updated successfully, but these errors were encountered:
Add two more capability tables to Glue so Athena can query them.
I'd like to start doing this in an automated way at some point, but that doesn't feel high priority enough for the moment so I've filed an issue here:
#243
Add two more capability tables to Glue so Athena can query them.
I'd like to start doing this in an automated way at some point, but that
doesn't feel high priority enough for the moment so I've filed an issue
here:
#243
---------
Co-authored-by: Vasco Santos <[email protected]>
@web3-storage/capabilities
exposes UcantoCapability
objects that capture schema information fornb
fields, so we should have everything we need to generate Glue table schemas automatically, which would make it easier to make data available to Athena for querying.The idea is that something like:
would result in something like the following SST code:
where the Ucanto Schema of
Store.add
is used to generate thetype
field of theout
column of thestorageDescriptor
.@web3-storage/capabilities
does also expose a comprehensive list of the names of capabilities it defines (asabilitiesAsStrings
) so we could even automatically generate tables for all known capabilities. I don't think this would be too costly on the AWS side but it would be a good idea to verify that.The text was updated successfully, but these errors were encountered: