-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-2468] [Feature] For versioned models, automatically create view/clone of latest version in unsuffixed database location #7442
Comments
We have three implementation options. Each has trade-offs around user limitations and complexity of implementation. Next step: Let's spike each of the options, and make a call. |
Hi All, I would like to propose a fourth option, that is a little bit more verbose but adds flexibility: Option 4The main model itself remains unchanged. It is expected to be implemented in
This would result in a second table being created called
And adding the corresponding
If a team decides to have One remaining challenge I can think of is that a developer could introduce a drift between the main model and |
this is concerning to me.. the whole purpose of versioning an api is to NOT introduce breaking changes. unless every model is postfixed with i need to give it more thought but it seems like the pointer object that routes a query to the latest version should instead be postfixed with |
I think I still would expect dim_customers to behave just like like dim_customers_latest. Nevertheless would the proposal for option 4 handle different ways of implementation: dim_customers unchanged & latest materialized as table
dim_customers unchanged & latest as view
and
dim_customers like latest & v2 + latest as view
and
Happy to clarify any questions around this idea and receive some more feedback. |
@nicohein I guess I should have been more specific. I don't agree with |
This is still just a placeholder "YMMV" solution until we decide on a real implementation: - dbt-labs/dbt-core#7442
This is still just a placeholder "YMMV" solution until we decide on a real implementation: - dbt-labs/dbt-core#7442 --- If you're versioning `some_model` for the first time: - that model is materialized as a `table` - `my_db.my_schema.some_model` will exist as a table in the database - needs to be dropped before running `create or replace view` (NOTE: I need to actually test this code) Co-authored-by: Joel Labes <[email protected]>
From discussion yesterday:
Next step: |
My preference is Option 2.5, where it happens automatically but the logic is implemented in an overridable macro in the vein of models:
- name: some_model
versions:
- v: 1
...
- v: 2
config:
generate_pointer_view: false if you wanted model-by-model control, right? |
Link to a Slack message with the way I think about this. https://getdbt.slack.com/archives/C2JRRQDTL/p1692185439979629?thread_ts=1692185332.030239&cid=C2JRRQDTL Specifically:
I also have to make sure foobar(view/MV), foobar_v1(table/MV), and foobar(table/MV) are all getting refreshed data. Assume there are some objects that I could be maintaining 3 versions of at the same time for months. @Spince is 💯 correct - you can't have an object name change version. 1 object name == 1 version == 1 model. If I'm reading correctly, I think that means that Option 4 from @nicohein with @Spince's requirements is the closest thinking. I'm not sure I understand @joellabes's latest comment. Right now, I'm handling this in naming convention as opposed to anything in dbt syntax because I need explicit control and dbt doesn't support Think of this like API version control. I have to give downstream(e.g. BI) a chance to update. The downstream customer is pinned to a version, it's just a matter if they know that explicitly( If I want to have |
@Spince @nicohein @alison985 Thank you for the feedback on this! We've been working on the technical spiking that might make this mechanism possible. I'm coming back around to the questions about the desirable product / end user experience. After reading through the comments above, this is what I'd propose:
versions:
- v: latest # this is an actual version of the model called 'latest'
defined_in: dim_customers_latest # it needs a custom definition
config:
alias: dim_customers_latest # instead of default, which would be '_vlatest' In this case, we are defining another real version of the model — not just a pointer to an existing (latest) version. It has all the same flexibility as any other model version. How should users disable dbt's automatic creation of the "latest" view/clone? Ideas:
models:
- name: dim_customers
versions:
- v: latest
config: {enabled: false} I will give that last piece a bit more thought - but I'm curious to hear how this lands overall! |
Sounds right to me. I would vote for disabling via configuration as @joellabes suggested, but I don't think it can be specified at the version level(see below). New item: you're repeatedly saying "view/clone" instead of just view. But I don't see a proposed spec on where to define which it is in YML. I learned about models:
- name: some_model
config:
materialized: table
########
generate_pointer: false
# OR # both can not be specified at the same time
materialized_pointer: [view | clone] # default to view if not specified
########
versions:
- v: 1
...
- v: 2
# How would we know what period of time an automatic alias of a version should be disabled?
#config:
# generate_pointer_view: false
- v: 3
#config:
# generate_pointer_view: true I thought about doing (Option B) |
@alison985 Thanks for taking another look! Re: "view/clone," I had been thinking that:
The benefit of this approach is that metadata queries expecting this object to be a table would still work, if the conditions for (1) are met. But it is more complex as an approach. I get that it would be simpler to just say: This is always a pointer view. So to answer all the outstanding questions here:
|
Hey everyone 👋 very much in love with this feature. Question - with the pointer, would we be able to pass documentation? The goal here would be to have the pointer have the same documentation as the latest table (through the |
Just had a production bug this morning because the new |
*upvote |
1 similar comment
*upvote |
upvote |
upvote! |
upvote (is there a better way to do this?) |
Yes, use 👍 on the issue |
Regarding the implementation of post-hooks for triggering the view generation of latest model. I think that on-run end hook is better than project-wide post-hook as any model specific post-hooks will run after the generation of a view. This means that if you have a post-hook that is manipulating the data of the model, the view on top will not be reflected by these changes. Consider changing the documentation here https://docs.getdbt.com/docs/collaborate/govern/model-versions#configuring-database-location-with-alias |
Upvoting this as well 👍 |
Up doot 👍🏼 |
*upvote |
Thanks @dbeatty10 @joellabes for working through this with me!
As the producer of a versioned model, I want the unsuffixed database relation (
<db>.<schema>.<model_name>
) to always point to the latest version of that model. This provides downstream consumers with the same optionality that's available toref
: "pin" to a specific version (suffix), or "unpin" (no suffix) and always get the latest.dbt should handle this for me, by creating an additional "pointer" view (
select * from <latest>
) or table clone (depending on the relation type and data platform support) whenever I run the latest version of my model.Implementation options
I'm not in love with any of these options:
post-hook
Of those, (3) is the least magical, and it's what we'll propose to users in the short term (v1.5). We still don't like:
The text was updated successfully, but these errors were encountered: