Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to use INFORMATION_SCHEMA for partition info retrieval #866

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

github-christophe-oudar
Copy link
Contributor

@github-christophe-oudar github-christophe-oudar commented Aug 7, 2023

resolves #867

Problem

Current approach to retrieve partition information is expensive and slow.

Solution

Providing a way to use INFORMATION_SCHEMA metadata would speed up the partition info retrieval as well for a fraction of the cost (almost constant to 10 MB)

Checklist

  • I have read the contributing guide and understand what's expected of me
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX

@cla-bot cla-bot bot added the cla:yes label Aug 7, 2023
@Kayrnt Kayrnt force-pushed the max-partition-information-schema branch 4 times, most recently from 19bd9b8 to 33167c1 Compare August 9, 2023 08:43
@Kayrnt Kayrnt force-pushed the max-partition-information-schema branch 7 times, most recently from 6cc2b6c to 9614b1c Compare August 19, 2023 01:11
The output contains the result from partitions information for your input table.
The details of the retrieved columns can be found on https://cloud.google.com/bigquery/docs/managing-partitioned-tables
It will leverage the INFORMATION_SCHEMA.PARTITIONS table.
#}
{%- macro get_partitions_metadata(table) -%}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping to add an option to keep the previous approach but I can't since the prototype of adapter.get_partitions_metadata is set in dbt-core and I don't think we want to leak the BQ specific option.
I think that, overall, it's acceptable.

@Kayrnt Kayrnt force-pushed the max-partition-information-schema branch from 9614b1c to 4806a5d Compare August 19, 2023 02:32
@github-christophe-oudar github-christophe-oudar force-pushed the max-partition-information-schema branch from 4806a5d to d62d551 Compare August 22, 2023 15:48
@github-christophe-oudar github-christophe-oudar marked this pull request as ready for review August 22, 2023 19:23
@github-christophe-oudar github-christophe-oudar requested a review from a team as a code owner August 22, 2023 19:23
@Kayrnt Kayrnt force-pushed the max-partition-information-schema branch from d62d551 to 717c275 Compare November 4, 2023 20:27
@nathaniel-may nathaniel-may removed the request for review from VersusFacit December 5, 2023 21:51
@nathaniel-may
Copy link
Contributor

Hi @github-christophe-oudar, we appreciate the effort you took to implement this optimization and we thank you for your patience. I've made sure this is properly filed in our queue of work but it may still take some time for us to give it a proper review.

@mikealfare mikealfare added community This PR is from a community member and removed support labels Apr 24, 2024
Copy link
Contributor

This PR has been marked as Stale because it has been open with no activity as of late. If you would like the PR to remain open, please comment on the PR or else it will be closed in 7 days.

@github-actions github-actions bot added the Stale label Oct 22, 2024
@HbirdJ
Copy link

HbirdJ commented Oct 22, 2024

I don't think this should be closed, there's a long history of requests for review here and in the dbt bq slack channel.

@github-christophe-oudar
Copy link
Contributor Author

#867 got closed but I'm not sure it should be canceled.
If @amychen1776 referred to microbatch feature, I think we should proceed as it's not the same need.
I would probably prefer merging #1351 before because this one needs a rebase anyway and I refactored a bit in that PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ADAP-781] [Feature] Add an option to use INFORMATION_SCHEMA for partition info retrieval
6 participants