Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Saved query exports run out of DAG order if they depend on dimensions in other Semantic Models for filters / group-bys #10016

Closed
2 tasks done
jtcohen6 opened this issue Apr 23, 2024 · 1 comment
Labels
bug Something isn't working High Severity bug with significant impact that should be resolved in a reasonable timeframe semantic Issues related to the semantic layer wontfix Not a bug or out of scope for dbt-core
Milestone

Comments

@jtcohen6
Copy link
Contributor

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

metrics and saved_queries can depend on semantic_models directly (aggregating over their measures) or indirectly (grouping by / filtering on their dimensions). Today, dbt can detect and understand only the first type of dependency at parse time.

Previous discussion of this limitation:

This has two negative consequences:

  • [Cosmetic] In dbt list and manifest.json, there are no links between metrics / saved_queries and the semantic_models they depend on indirectly (via dimensions), so we show an incomplete picture of the DAG
  • [Functional] When dbt Cloud customers are "exporting" saved queries, those need to run in DAG order relative to the models they depend on: [dbt] model -> semantic_model -> metric -> saved_query

Expected Behavior

  1. dbt list -s +<my_saved_query> should include all semantic models that the saved query depends on
  2. dbt build -s +<my_saved_query> should execute resources in appropriate order, with all requisite models upstream of the saved query (no-op / export)

In an ideal world, we'd capture these dependencies at parsing time, so that they would also be reflected in the saved query's depends_on.nodes. IMO it's sufficient to add these edges when we're constructing + linking the DAG, similar to how dbt build adds edges for tests (model -> test -> model).

Steps To Reproduce

  1. Check out jaffle-sl-template
  2. dbt list --select +saved_query:new_customer_orders --resource-type semantic_model — the saved query new_customer_orders depends only on semantic_model:jaffle_shop.orders, even though it needs to be grouped by customers.customer_name and filtered on customers.customer_type
  3. dbt build runs new_customer_orders export (NO-OP in dbt-core) before customers

Relevant log output

$ dbt list -q -s +saved_query:new_customer_orders --resource-type semantic_model
semantic_model:jaffle_shop.orders

$ dbt build
...
11:20:09  29 of 37 NO-OP saved query new_customer_orders ................................. [NO-OP in 0.00s]
...
11:20:09  34 of 37 START sql table model dbt_jcohen.customers ............................ [RUN]
11:20:09  34 of 37 OK created sql table model dbt_jcohen.customers ....................... [SELECT 939 in 0.41s]

Environment

- Python: 3.10.11
- dbt: main

Which database adapter are you using with dbt?

No response

Additional Context

DSI PR:

Spike of dbt-core implementation:

"It works" (though the current implementation is very naïve):

$ dbt list -q -s +saved_query:new_customer_orders --resource-type semantic_model --profile garage-postgres
semantic_model:jaffle_shop.customers
semantic_model:jaffle_shop.locations
semantic_model:jaffle_shop.order_item
semantic_model:jaffle_shop.orders
semantic_model:jaffle_shop.stg_products
$ dbt build
...
@jtcohen6 jtcohen6 added bug Something isn't working semantic Issues related to the semantic layer High Severity bug with significant impact that should be resolved in a reasonable timeframe labels Apr 23, 2024
@graciegoheen graciegoheen added this to the v1.8 milestone Apr 23, 2024
@jtcohen6 jtcohen6 added the wontfix Not a bug or out of scope for dbt-core label Apr 29, 2024
@jtcohen6
Copy link
Contributor Author

I opened this issue in the dbt-core repository in the hopes that we could find a naïve way forward. After syncing with lots of folks (@plypaul @marcodamore @QMalcolm @tlento @ChenyuLInx ...):

  • There is no way to have a satisfying naïve implementation of this logic in dbt-semantic-interfaces; our best attempt still leaves tons of edge cases
  • Fully capturing those edge cases requires the full foundational logic of MetricFlow
  • Incorporating that logic is out of scope for dbt-core

Our path forward will be via direct integration between dbt + MetricFlow in dbt Cloud. That's already where we've documented support for the functionality (saved query exports / caching) that this bug is blocking.

In the meantime, we can choose between:

  1. Leaving the functionality as-is: saved queries depend directly on semantic models for measures (used in metrics), but not for dimensions (used in group by / filter)
  2. As naïve as it gets: every saved query depends on every semantic model

We're opting for (1), and to leave this as is.

@jtcohen6 jtcohen6 closed this as not planned Won't fix, can't repro, duplicate, stale Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working High Severity bug with significant impact that should be resolved in a reasonable timeframe semantic Issues related to the semantic layer wontfix Not a bug or out of scope for dbt-core
Projects
None yet
Development

No branches or pull requests

2 participants