Trino returning zero records for Hudi tables in AWS Glue #24654

drautela-scwx · 2025-01-08T16:00:25Z

We are trying to setup the following in AWS:

Trino on EKS: version 465
HMS: Glue. We have database for both Hudi and non-Hudi external tables for parquet files.

Currently we are able to query non-Hudi tables successfully.

For Hudi tables the query executes without any error but returns zero records. We have confirmed that records should be returned by executing the same query in Athena.

We have two connectors setup as follows:

(1) /etc/trino/catalog/awsdatacatalog.properties

connector.name=hive
hive.metastore=glue
hive.hive-views.enabled=true
hive.partition-projection-enabled=true
fs.native-s3.enabled=true
hive.hudi-catalog-name=hudi

(2) /etc/trino/catalog/hudi.properties

connector.name=hudi
hive.metastore=glue
fs.native-s3.enabled=true

We do have partition projection enabled for Hudi tables.

Zero records are returned for Hudi tables whether we use awsdatacatalog or hudi catalog

select *
from awsdatacatalog.hudi_db.test_table_ro
where partition1 = "123"
and partition2 = "abc"
limit 10;

select *
from hudi.hudi_db.test_table_ro
where partition1 = "123"
and partition2 = "abc"
limit 10;

Any suggestions on how to figure out the reason for zero records being returned? Any specific logging that could be turned on that would help in debugging this issue?

The text was updated successfully, but these errors were encountered:

drautela-scwx · 2025-01-10T17:43:20Z

Does hudi connector for Trino support partition projection?

Below is the sample DDL for the HUDI table in Glue catalog:

CREATE EXTERNAL TABLE `test_table_ro`(
  `_hoodie_commit_time` string COMMENT '', 
  `_hoodie_commit_seqno` string COMMENT '', 
  `_hoodie_record_key` string COMMENT '', 
  `_hoodie_partition_path` string COMMENT '', 
  `_hoodie_file_name` string COMMENT '', 
  `resource_id` string COMMENT '')
PARTITIONED BY ( 
  `tenant` string COMMENT '', 
  `date` string COMMENT '')
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
WITH SERDEPROPERTIES ( 
  'hoodie.query.as.ro.table'='true', 
  'path'='s3://datalake') 
STORED AS INPUTFORMAT 
  'org.apache.hudi.hadoop.HoodieParquetInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  's3://datalake'
TBLPROPERTIES (
  'hudi.metadata-listing-enabled'='FALSE', 
  'last_commit_time_sync'='20240821120712520', 
  'projection.date.format'='yyyyMMdd', 
  'projection.date.range'='19700101,99990101', 
  'projection.date.type'='date', 
  'projection.enabled'='true', 
  'projection.tenant.range'='-1,8675309', 
  'projection.tenant.type'='integer', 
  'spark.sql.sources.provider'='hudi', 
  'spark.sql.sources.schema.numPartCols'='2', 
  'spark.sql.sources.schema.numParts'='4', 
  'spark.sql.sources.schema.part.0'='{...}', 
  'spark.sql.sources.schema.partCol.0'='tenant', 
  'spark.sql.sources.schema.partCol.1'='date')```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trino returning zero records for Hudi tables in AWS Glue #24654

Trino returning zero records for Hudi tables in AWS Glue #24654

drautela-scwx commented Jan 8, 2025

drautela-scwx commented Jan 10, 2025

Trino returning zero records for Hudi tables in AWS Glue #24654

Trino returning zero records for Hudi tables in AWS Glue #24654

Comments

drautela-scwx commented Jan 8, 2025

We are trying to setup the following in AWS:

drautela-scwx commented Jan 10, 2025