You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
In the type_string.sql line 22 to 31 you do a cast to a varchar based on the hashtype. This is not appropriate for databricks as databricks does not know the type varchar. The comparable type in databricks would be string and for it no length is defined.
This leads then so an output in the hash algorithm like this:
UPPER(SHA2(NULLIF(CONCAT(
IFNULL(NULLIF(UPPER(TRIM(CAST(NaturalKey AS VARCHAR(32)))), ''), '^^')
), '^^'), 256)) AS NaturalKey_HK,
This is bad, since the cast to 32 bit could potentially truncate the to be hashed column and lead to unwanted hashing collisions. Luckily, databricks does not truncate the input column. Nevertheless, it should be fixed as it has the potential to corrupt all hash keys.
Environment
dbt version: 1.8.3
automate_dv version: tested on 10.2 but most likely also on 11.0
Database/Platform: Databricks
hash algorithm: SHA
enable_binary_hash: false
Expected behavior
Replace Cast to Varchar with Cast to String
Hi @ambullo, thanks for this thorough and detailed report.
I believe we fixed this as part of v0.11.0. v0.11.0 was explicitly an update to resolve hashing and handling of hash types correctly in both Databricks and BigQuery. Please can you upgrade and try it out?
If it's still an issue then we'd be happy to take a look, however 😄 Thanks!
Describe the bug
In the type_string.sql line 22 to 31 you do a cast to a varchar based on the hashtype. This is not appropriate for databricks as databricks does not know the type varchar. The comparable type in databricks would be string and for it no length is defined.
This leads then so an output in the hash algorithm like this:
UPPER(SHA2(NULLIF(CONCAT(
IFNULL(NULLIF(UPPER(TRIM(CAST(NaturalKey AS VARCHAR(32)))), ''), '^^')
), '^^'), 256)) AS NaturalKey_HK,
This is bad, since the cast to 32 bit could potentially truncate the to be hashed column and lead to unwanted hashing collisions. Luckily, databricks does not truncate the input column. Nevertheless, it should be fixed as it has the potential to corrupt all hash keys.
Environment
dbt version: 1.8.3
automate_dv version: tested on 10.2 but most likely also on 11.0
Database/Platform: Databricks
hash algorithm: SHA
enable_binary_hash: false
Expected behavior
Replace Cast to Varchar with Cast to String
AB#5567
The text was updated successfully, but these errors were encountered: