From 1123dd051e91c77133a9ed91720305b4be6f77ce Mon Sep 17 00:00:00 2001 From: Pete Hampton Date: Mon, 8 Apr 2024 14:53:50 +0100 Subject: [PATCH 1/3] [ClickPipes] Document S3/GCS Pipe limitations. --- .../data-ingestion/clickpipes/index.md | 27 ++++++++++++++++++- 1 file changed, 26 insertions(+), 1 deletion(-) diff --git a/docs/en/integrations/data-ingestion/clickpipes/index.md b/docs/en/integrations/data-ingestion/clickpipes/index.md index ed5c47a536a..852f5382dc9 100644 --- a/docs/en/integrations/data-ingestion/clickpipes/index.md +++ b/docs/en/integrations/data-ingestion/clickpipes/index.md @@ -174,11 +174,32 @@ The following rules are applied to the mapping between the retrieved Avro schema - If the Avro schema is missing a field defined in the ClickHouse destination mapping, the ClickHouse column will be populated with a "zero" value, such as 0 or an empty string. Note that [DEFAULT](https://clickhouse.com/docs/en/sql-reference/statements/create/table#default) expressions are not currently evaluated for ClickPipes inserts (this is temporary limitation pending updates to the ClickHouse server default processing). - If the Avro schema field and the ClickHouse column are incompatible, inserts of that row/message will fail, and the failure will be recorded in the ClickPipes errors table. Note that several implicit conversions are supported (like between numeric types), but not all (for example, an Avro `record` field can not be inserted into an `Int32` ClickHouse column). -## Current Limitations +## ClickPipes Limitations - Private Link support isn't currently available for ClickPipes but will be released in the near future. Please contact us to express interest. - [DEFAULT](https://clickhouse.com/docs/en/sql-reference/statements/create/table#default) is not supported. + +### S3 / GCS ClickPipe Limations + + - ClickPipes will only attempt to ingest objects at 1GB or smaller in size. + - S3 / GCS ClickPipes **does not** share a listing syntax with the S3 Table Function. + - `?` — Substitutes any single character + - `*` — Substitutes any number of any characters except / including empty string + - `**` — Substitutes any number of any character include / including empty string + +:::note +This is a valid path: + +https://datasets-documentation.s3.eu-west-3.amazonaws.com/http/**.ndjson.gz + + +This is not a valid path. `{N..M}` are not supported in ClickPipes. + +https://datasets-documentation.s3.eu-west-3.amazonaws.com/http/{documents-01,documents-02}.ndjson.gz +::: + + ## List of Static IPs The following are the static NAT IPs that ClickPipes uses to connect to your Kafka brokers separated by region. @@ -278,3 +299,7 @@ No. For interoprability reasons we ask you to replace your `gs://` bucket prefix - **Does ClickPipes support continuous ingestion from object storage?** No, not currently. It is on our roadmap. Please feel free to express interest to us if you would like to be notified. + +- **Is there a maximum file size for S3 / GCS ClickPipes?** + +Yes - there is an upper bound of 1 GB per file. If a file is greater than 1 GB an error will be appended to the ClickPipes dedicated error table. From d1ff35fc3295052b29f75ac76b0194f03480917b Mon Sep 17 00:00:00 2001 From: Pete Hampton Date: Mon, 8 Apr 2024 14:55:33 +0100 Subject: [PATCH 2/3] Add hyperlink to S3 table function. --- docs/en/integrations/data-ingestion/clickpipes/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/en/integrations/data-ingestion/clickpipes/index.md b/docs/en/integrations/data-ingestion/clickpipes/index.md index 852f5382dc9..477982c9ec5 100644 --- a/docs/en/integrations/data-ingestion/clickpipes/index.md +++ b/docs/en/integrations/data-ingestion/clickpipes/index.md @@ -183,7 +183,7 @@ The following rules are applied to the mapping between the retrieved Avro schema ### S3 / GCS ClickPipe Limations - ClickPipes will only attempt to ingest objects at 1GB or smaller in size. - - S3 / GCS ClickPipes **does not** share a listing syntax with the S3 Table Function. + - S3 / GCS ClickPipes **does not** share a listing syntax with the [S3 Table Function](../../../sql-reference/table-functions/file#globs_in_path). - `?` — Substitutes any single character - `*` — Substitutes any number of any characters except / including empty string - `**` — Substitutes any number of any character include / including empty string From e1087b6934b91793e6890ca78e68e628ad59c8ad Mon Sep 17 00:00:00 2001 From: Pete Hampton Date: Mon, 8 Apr 2024 15:13:26 +0100 Subject: [PATCH 3/3] Fix link --- docs/en/integrations/data-ingestion/clickpipes/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/en/integrations/data-ingestion/clickpipes/index.md b/docs/en/integrations/data-ingestion/clickpipes/index.md index 477982c9ec5..f9a74c87755 100644 --- a/docs/en/integrations/data-ingestion/clickpipes/index.md +++ b/docs/en/integrations/data-ingestion/clickpipes/index.md @@ -183,7 +183,7 @@ The following rules are applied to the mapping between the retrieved Avro schema ### S3 / GCS ClickPipe Limations - ClickPipes will only attempt to ingest objects at 1GB or smaller in size. - - S3 / GCS ClickPipes **does not** share a listing syntax with the [S3 Table Function](../../../sql-reference/table-functions/file#globs_in_path). + - S3 / GCS ClickPipes **does not** share a listing syntax with the [S3 Table Function](https://clickhouse.com/docs/en/sql-reference/table-functions/file#globs_in_path). - `?` — Substitutes any single character - `*` — Substitutes any number of any characters except / including empty string - `**` — Substitutes any number of any character include / including empty string