From 4204d686fcf28f54b052eacd3a44212fcbf09ee5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Edgar=20Ram=C3=ADrez=20Mondrag=C3=B3n?= Date: Fri, 4 Aug 2023 12:24:41 -0600 Subject: [PATCH 1/4] Add WIP document --- specs/dynamic-pipelines.md | 61 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 61 insertions(+) create mode 100644 specs/dynamic-pipelines.md diff --git a/specs/dynamic-pipelines.md b/specs/dynamic-pipelines.md new file mode 100644 index 0000000..b32d75e --- /dev/null +++ b/specs/dynamic-pipelines.md @@ -0,0 +1,61 @@ +# Dynamically create N number of pipelines from Singer output + +## `meltano.yml` + +```yaml +plugins: + extractors: + # Third-party data source + - name: tap-shopify + + # Configuration sources + - name: tap-postgres--shopify_configs + inherit_from: tap-postgres + select: + - public-shopify_configs.* + + # Third-party destination + loaders: + - name: target-s3-parquet + +schedules: +- name: sync-all-shopify + interval: @hourly + config_source: + tap-shopify: tap-postgres--shopify_configs + extractor: tap-shopify + loader: target-s3-parquet +``` + +Where each key in the `config_source` mapping is a plugin used in the schedule and each value is a **tap** name. + +> [!NOTE] +> TBD: how well the above configuration spec plays with _jobs_ since a job name can be referenced in a schedule. + +> [!NOTE] +> TBD: how to reference a _pipeline_ instead of a plain tap, in case a mapper is required, for example. + +> [!NOTE] +> TBD: do we want to support config sources for loaders too? If so, how could we ensure bot collections of configs have the same cardinality and order? + +## Under the hood + +By writing dynamic configurations at the _schedule_ level, `meltano run` would be able to invoke the respective plugin with each of the configurations, but it is TBD whether this is expected of `meltano run`, or if it's the responsibility of the orchestrator (e.g. Meltano Cloud). + +## Alternatives + +* [Annotations](https://docs.meltano.com/concepts/project/#annotations) could be used in the schedule definition: + + ```yaml + plugins: ... # same as above + schedules: + - name: sync-all-shopify + interval: @hourly + annotations: + config_source: + tap-shopify: tap-postgres--shopify_configs + extractor: tap-shopify + loader: target-s3-parquet + ``` + + This would clearly make it the responsibility of the orchestrator to generate _N_ pipelines. From 2e2974a52ccebb94e98d00d602fe1874685dacff Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Edgar=20Ram=C3=ADrez=20Mondrag=C3=B3n?= Date: Fri, 4 Aug 2023 12:25:10 -0600 Subject: [PATCH 2/4] Fix typo --- specs/dynamic-pipelines.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/specs/dynamic-pipelines.md b/specs/dynamic-pipelines.md index b32d75e..b1528da 100644 --- a/specs/dynamic-pipelines.md +++ b/specs/dynamic-pipelines.md @@ -36,7 +36,7 @@ Where each key in the `config_source` mapping is a plugin used in the schedule a > TBD: how to reference a _pipeline_ instead of a plain tap, in case a mapper is required, for example. > [!NOTE] -> TBD: do we want to support config sources for loaders too? If so, how could we ensure bot collections of configs have the same cardinality and order? +> TBD: do we want to support config sources for loaders too? If so, how could we ensure both collections of configs have the same cardinality and order? ## Under the hood From df036093aa394ce3522180dc8e45c1685ed5160c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Edgar=20Ram=C3=ADrez=20Mondrag=C3=B3n?= Date: Wed, 9 Aug 2023 17:14:15 -0600 Subject: [PATCH 3/4] Clarify design makes sense for extractors --- specs/dynamic-pipelines.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/specs/dynamic-pipelines.md b/specs/dynamic-pipelines.md index b1528da..041d26d 100644 --- a/specs/dynamic-pipelines.md +++ b/specs/dynamic-pipelines.md @@ -36,7 +36,7 @@ Where each key in the `config_source` mapping is a plugin used in the schedule a > TBD: how to reference a _pipeline_ instead of a plain tap, in case a mapper is required, for example. > [!NOTE] -> TBD: do we want to support config sources for loaders too? If so, how could we ensure both collections of configs have the same cardinality and order? +> TBD: This design makes sense for extractors. Do we want to support config sources for loaders too? If so, how could we ensure both collections of configs have the same cardinality and order? ## Under the hood From e4b7106e1dad6750e54c4fa3bb89b3d5e5020a66 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Edgar=20Ram=C3=ADrez=20Mondrag=C3=B3n?= Date: Thu, 17 Aug 2023 11:17:53 -0600 Subject: [PATCH 4/4] Add problem section --- specs/dynamic-pipelines.md | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/specs/dynamic-pipelines.md b/specs/dynamic-pipelines.md index 041d26d..948ab80 100644 --- a/specs/dynamic-pipelines.md +++ b/specs/dynamic-pipelines.md @@ -1,5 +1,38 @@ # Dynamically create N number of pipelines from Singer output +## Problem + +Singer taps are designed to be used in a single pipeline, with a single configuration. However, there are cases where a single tap can be used to extract data from different instances of the same type of source. There are two ways to achieve this today: + +1. Use a single plugin, and pass different configurations to it at runtime: + + ```sh + # Set env vars for the tap + export TAP_POSTGRES_HOST=... + export TAP_POSTGRES_PORT=... + export TAP_POSTGRES_USER=... + export TAP_POSTGRES_PASSWORD=... + export TAP_POSTGRES_DATABASE=... + # Run the tap + meltano run tap-postgres target-snowflake + ``` + + This process can be automated by using a Secret Ops provider like [Infisical](https://github.com/Infisical/infisical) or [chamber](https://github.com/segmentio/chamber) or writing a script that generates the required env vars for each configuration. + +2. Use inheritance to create a new plugin for each configuration: + + ```yaml + plugins: + extractors: + - name: tap-postgres + - name: tap-postgres--tenant1 + inherit_from: tap-postgres + - name: tap-postgres--tenant2 + inherit_from: tap-postgres + ``` + + This approach does not scales to 100s or 1000s of instances, as it requires a new plugin to be created for each configuration. + ## `meltano.yml` ```yaml