Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong input column for exploded blocking columns when expand_length not set #145

Open
riley-harper opened this issue Aug 14, 2024 · 0 comments
Labels
component: matching type: bug Something isn't working

Comments

@riley-harper
Copy link
Contributor

riley-harper commented Aug 14, 2024

When working on #142, I noticed that in hlink/linking/matching/link_step_explode.py, if expand_length is not set for a blocking column, we run the following code:

explode_col_expr = explode(col(exploding_column_name))

However, the rest of the code treats exploding_column_name as the output column name and derived_from_column as the input column name. So I think there is a bug here. This should be

explode_col_expr = explode(col(derived_from_column))

instead unless I am misunderstanding something. This is probably a low-impact bug as you need to be blocking on an input column that is an array type to hit it. I believe that most exploded columns are integer columns with expand_length set.

@riley-harper riley-harper added the type: bug Something isn't working label Aug 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: matching type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant