-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Inspect columns #386
Conversation
c50ce03
to
64e33d5
Compare
hey @gastlich! this is looking great, really appreciate your work here! I was able to run this locally and is seems to be working great! the CI check seems to be on our side, not related to your changes as far as I can tell. I think this is a great compromise to provide folks with a base to build column level checks into their projects without creaing unnecessary noise for those who don't want or need column level checks! Couple small things:
Let me know if you need a hand with either of these! |
2716ff6
to
3572103
Compare
3572103
to
8b660f4
Compare
8b660f4
to
4a327b2
Compare
hey @dave-connors-3 Thanks for you feedback and suggestions! I've pushed some changes including:
I still need to figure out what's needed for the integration tests, but this is a task for another evening :) Could you review what has been recently published, please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is looking really good! Given that this package runs on some really large customer projects, and therefore there's a risk that this model might be rather non-performant, I think I'd prefer to have the columns models disabled by default, and allow users to turn them on if they want to run them. Would you mind adding those configs to the dbt_project.yml
?
Other than that, just a couple small suggestions!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work on this one!
I added a couple of comments about small changes.
I just ran some performance tests on our Internal Analytics project wit a significant amount of models The memory footprint was checked with Running DPE with the new models
Running DPE without the new models
Running DPE just for the new models
Conclusions
@dave-connors-3 , what do you think about disabling the model by default. Does it still make sense? I'd be OK with having it enabled by default, especially if it unlocks further use cases (e.g. checking constraints + tests) |
I haven't disabled the model yet. Once you make the final call, I will implement whatever is suggested. :) Let me know what the next steps are. Thanks a lot for looking into this! 🙇 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the changes. One last comment from me.
@dave-connors-3 @b-per any other thoughts. I think I still need to add something to the integration tests? 🤔 |
Hi @gastlich ! Yes, I think that if you add an integration test on the table we can do a final review and get it merged! |
hey @gastlich! any update on adding tests here? would love to get this in! |
Hey @dave-connors-3, Happy New Year! Apologies for the delay; I was quite busy during the Christmas break. I've managed to find some time to work on integration tests. Just to ensure we're aligned: as I haven't created any I hope this meets your requirements. Looking forward to your review! 🙇 |
integration_tests/models/dbt_project_evaluator_schema_tests/core.yml
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for your patience @gastlich! a couple things:
- i think we can get rid of the
core_seeds
stuff -- apologies if i misled you, but we generally just use those seeds to assert the output of the fact models. doing a uniqueness test is probably enough. - i think we should rething the join in
int_all_columns
-- we are excluding source oclumn in the current implementation, and I think we can get away with relatively little additional context in that column table, and people can join back on unique_id as needed. I can talk with the team and see if they agree on that last point, but i think the join will still need to be rethought!
@dave-connors-3 I've removed the test seed file and |
macros/unpack/get_column_values.sql
Outdated
wrap_string_with_quotes(node.unique_id), | ||
wrap_string_with_quotes(dbt.escape_single_quotes(column.name)), | ||
wrap_string_with_quotes(dbt.escape_single_quotes(column.description)), | ||
'null' if not column.data_type else wrap_string_with_quotes(dbt.escape_single_quotes(column.data_type)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we can remove the 'null' if not column.data_type else
because the wrap_string_with_quotes macro automatically handles NULLs
'null' if not column.data_type else wrap_string_with_quotes(dbt.escape_single_quotes(column.data_type)), | |
wrap_string_with_quotes(dbt.escape_single_quotes(column.data_type)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's merge it ! Thanks for all the hard work here, I know folks are going to be stoked !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
neat
This is a:
Link to Issue
Closes #385 #199
Description & motivation
As mentioned in #385, it would be great if this tool could create a model using all the information gathered for columns. This would enable us to begin implementing basic validation rules for columns, such as:
Furthermore, in the future, we can expand its use for more advanced validation purposes.
This is my first PR in this project and it would be good if you shared with me some hints/suggestions how to deliver the best outcome.
TODO
sources.tables
changes data structure fromgraph
Integration Test Screenshot
Checklist