Add support for inserts via GCP cloud function and pub/sub #670
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Added support for handle_tests_results and insert_rows to use a new insert_rows_method method called gcp-cloud-function which calls a UDF to push results into BigQuery via Pub/Sub rather than direct insert queries. This significantly increases Elementary's capacity to insert records/test results when using BigQuery.
Here is an example configuration in dbt_project.yml. You identify a UDF that publishes to a topic (first argument of SELECT), and then define 3 pubsub topics to send data to. Those topics are defined to pass records straight to BigQuery, which is a simple option. I did also define schemas for those tables (all of this is created in Terraform) which is not ideal, it means this is coupled to schema/code changes for those tables.
This is an example for publish_to_pubsub_function:
CREATE OR REPLACE FUNCTION project.dataset.publish_to_pubsub_function(pubsub_topic STRING, json_data STRING, attributes STRING) RETURNS STRING REMOTE WITH CONNECTION .... OPTIONS (endpoint = ......, max_batching_rows = 500);