[Bug] Saved query exports run out of DAG order if they depend on dimensions in other Semantic Models for filters / group-bys #10016

jtcohen6 · 2024-04-23T11:33:18Z

Is this a new bug in dbt-core?

I believe this is a new bug in dbt-core
I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

metrics and saved_queries can depend on semantic_models directly (aggregating over their measures) or indirectly (grouping by / filtering on their dimensions). Today, dbt can detect and understand only the first type of dependency at parse time.

Previous discussion of this limitation:

[CT-2691] Populate MetricNode depends_on #7854 (comment)
Metric nodes depends_on will include semantic_models based on measures, but not based on filters docs.getdbt.com#3682

This has two negative consequences:

[Cosmetic] In dbt list and manifest.json, there are no links between metrics / saved_queries and the semantic_models they depend on indirectly (via dimensions), so we show an incomplete picture of the DAG
[Functional] When dbt Cloud customers are "exporting" saved queries, those need to run in DAG order relative to the models they depend on: [dbt] model -> semantic_model -> metric -> saved_query

Expected Behavior

dbt list -s +<my_saved_query> should include all semantic models that the saved query depends on
dbt build -s +<my_saved_query> should execute resources in appropriate order, with all requisite models upstream of the saved query (no-op / export)

In an ideal world, we'd capture these dependencies at parsing time, so that they would also be reflected in the saved query's depends_on.nodes. IMO it's sufficient to add these edges when we're constructing + linking the DAG, similar to how dbt build adds edges for tests (model -> test -> model).

Steps To Reproduce

Check out jaffle-sl-template
dbt list --select +saved_query:new_customer_orders --resource-type semantic_model — the saved query new_customer_orders depends only on semantic_model:jaffle_shop.orders, even though it needs to be grouped by customers.customer_name and filtered on customers.customer_type
dbt build runs new_customer_orders export (NO-OP in dbt-core) before customers

Relevant log output

$ dbt list -q -s +saved_query:new_customer_orders --resource-type semantic_model
semantic_model:jaffle_shop.orders

$ dbt build
...
11:20:09  29 of 37 NO-OP saved query new_customer_orders ................................. [NO-OP in 0.00s]
...
11:20:09  34 of 37 START sql table model dbt_jcohen.customers ............................ [RUN]
11:20:09  34 of 37 OK created sql table model dbt_jcohen.customers ....................... [SELECT 939 in 0.41s]

Environment

- Python: 3.10.11
- dbt: main

Which database adapter are you using with dbt?

No response

Additional Context

DSI PR:

Interface for SavedQueryDependencyResolver dbt-semantic-interfaces#278

Spike of dbt-core implementation:

https://github.com/dbt-labs/dbt-core/compare/jerco/spike-sq-edges

"It works" (though the current implementation is very naïve):

$ dbt list -q -s +saved_query:new_customer_orders --resource-type semantic_model --profile garage-postgres
semantic_model:jaffle_shop.customers
semantic_model:jaffle_shop.locations
semantic_model:jaffle_shop.order_item
semantic_model:jaffle_shop.orders
semantic_model:jaffle_shop.stg_products
$ dbt build
...

The text was updated successfully, but these errors were encountered:

jtcohen6 · 2024-04-29T21:33:02Z

I opened this issue in the dbt-core repository in the hopes that we could find a naïve way forward. After syncing with lots of folks (@plypaul @marcodamore @QMalcolm @tlento @ChenyuLInx ...):

There is no way to have a satisfying naïve implementation of this logic in dbt-semantic-interfaces; our best attempt still leaves tons of edge cases
Fully capturing those edge cases requires the full foundational logic of MetricFlow
Incorporating that logic is out of scope for dbt-core

Our path forward will be via direct integration between dbt + MetricFlow in dbt Cloud. That's already where we've documented support for the functionality (saved query exports / caching) that this bug is blocking.

In the meantime, we can choose between:

Leaving the functionality as-is: saved queries depend directly on semantic models for measures (used in metrics), but not for dimensions (used in group by / filter)
As naïve as it gets: every saved query depends on every semantic model

We're opting for (1), and to leave this as is.

jtcohen6 added bug Something isn't working semantic Issues related to the semantic layer High Severity bug with significant impact that should be resolved in a reasonable timeframe labels Apr 23, 2024

graciegoheen added this to the v1.8 milestone Apr 23, 2024

jtcohen6 added the wontfix Not a bug or out of scope for dbt-core label Apr 29, 2024

jtcohen6 closed this as not planned Won't fix, can't repro, duplicate, stale Apr 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Saved query exports run out of DAG order if they depend on dimensions in other Semantic Models for filters / group-bys #10016

[Bug] Saved query exports run out of DAG order if they depend on dimensions in other Semantic Models for filters / group-bys #10016

jtcohen6 commented Apr 23, 2024

jtcohen6 commented Apr 29, 2024

[Bug] Saved query exports run out of DAG order if they depend on dimensions in other Semantic Models for filters / group-bys #10016

[Bug] Saved query exports run out of DAG order if they depend on dimensions in other Semantic Models for filters / group-bys #10016

Comments

jtcohen6 commented Apr 23, 2024

Is this a new bug in dbt-core?

Current Behavior

Expected Behavior

Steps To Reproduce

Relevant log output

Environment

Which database adapter are you using with dbt?

Additional Context

jtcohen6 commented Apr 29, 2024