From ed9b973139d9994811e489843b6d3862b75e2f8b Mon Sep 17 00:00:00 2001 From: Ryadh DAHIMENE Date: Fri, 20 Oct 2023 12:46:50 +0200 Subject: [PATCH] Refactor the Kafka Docs --- .../data-ingestion/kafka/confluent/index.md | 13 ++++++ .../data-ingestion/kafka/index.md | 46 ++++++++----------- .../data-ingestion/kafka/msk/index.md | 1 - sidebars.js | 7 ++- 4 files changed, 36 insertions(+), 31 deletions(-) create mode 100644 docs/en/integrations/data-ingestion/kafka/confluent/index.md diff --git a/docs/en/integrations/data-ingestion/kafka/confluent/index.md b/docs/en/integrations/data-ingestion/kafka/confluent/index.md new file mode 100644 index 00000000000..a054b1d7c5a --- /dev/null +++ b/docs/en/integrations/data-ingestion/kafka/confluent/index.md @@ -0,0 +1,13 @@ +--- +sidebar_label: Confluent Platform +sidebar_position: 1 +slug: /en/integrations/kafka/cloud/confluent +description: Kafka Connectivity with Confluent Cloud +--- + +# Integrating Confluent Cloud with ClickHouse + +Confluent platform provides two options to integration with ClickHouse + +* [ClickHouse Connect Sink on Confluent Cloud](./custom-connector.md) using the custom connectors feature +* [HTTP Sink Connector for Confluent Platform](./kafka-connect-http.md) that integrates Apache Kafka with an API via HTTP or HTTPS \ No newline at end of file diff --git a/docs/en/integrations/data-ingestion/kafka/index.md b/docs/en/integrations/data-ingestion/kafka/index.md index cb7de64fee5..657fb316acf 100644 --- a/docs/en/integrations/data-ingestion/kafka/index.md +++ b/docs/en/integrations/data-ingestion/kafka/index.md @@ -7,47 +7,30 @@ description: Introduction to Kafka with ClickHouse # Integrating Kafka with ClickHouse -[Apache Kafka](https://kafka.apache.org/) is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. In most cases involving Kafka and ClickHouse, users will wish to insert Kafka based data into ClickHouse - although the reverse is supported. Below we outline several options for both use cases, identifying the pros and cons of each approach. +[Apache Kafka](https://kafka.apache.org/) is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. In most cases involving Kafka and ClickHouse, users will wish to insert Kafka based data into ClickHouse. Below we outline several options for both use cases, identifying the pros and cons of each approach. -For those who do not have a Kafka instance to hand, we recommend [Confluent Cloud](https://www.confluent.io/get-started/), which offers a free tier adequate for testing these examples. For self-managed alternatives, consider the [Confluent for Kubernetes](https://docs.confluent.io/operator/current/overview.html) or [here](https://docs.confluent.io/platform/current/installation/installing_cp/overview.html) for non-Kubernetes environments. - -## Assumptions - -* You are familiar with the Kafka fundamentals, such as producers, consumers and topics. -* You have a topic prepared for these examples. We assume all data is stored in Kafka as JSON, although the principles remain the same if using Avro. -* We utilise the excellent [kcat](https://github.com/edenhill/kcat) (formerly kafkacat) in our examples to publish and consume Kafka data. -* Whilst we reference some python scripts for loading sample data, feel free to adapt the examples to your dataset. -* You are broadly familiar with ClickHouse materialized views. - -# Choosing an option +## Choosing an option When integrating Kafka with ClickHouse, you will need to make early architectural decisions about the high-level approach used. We outline the most common strategies below: -### ClickPipes for Kafka (new) -* [ClickPipes](../clickpipes/index.md) offers the easiest and most intuitive way to ingest data into ClickHouse Cloud. With support for Apache Kafka and Confluent today, and many more data sources coming soon. +### ClickPipes for Kafka (ClickHouse Cloud) +* [**ClickPipes**](../clickpipes/index.md) offers the easiest and most intuitive way to ingest data into ClickHouse Cloud. With support for Apache Kafka, Confluent Cloud and Amazon MSK today, and many more data sources coming soon. -:::note -ClickPipes is a native capability of [ClickHouse Cloud](https://clickhouse.com/cloud) currently under private preview. -::: -### Cloud-based Kafka Connectivity -* [**Confluent Cloud**](https://confluent.cloud) - Confluent platform provides an option to upload and [run ClickHouse Connector Sink on Confluent Cloud](./confluent/custom-connector.md) or use [HTTP Sink Connector for Confluent Platform](./confluent/kafka-connect-http.md) that integrates Apache Kafka with an API via HTTP or HTTPS. +### 3rd-Party Cloud-based Kafka Connectivity +* [**Confluent Cloud**](./confluent/index.md) - Confluent platform provides an option to upload and [run ClickHouse Connector Sink on Confluent Cloud](./confluent/custom-connector.md) or use [HTTP Sink Connector for Confluent Platform](./confluent/kafka-connect-http.md) that integrates Apache Kafka with an API via HTTP or HTTPS. -* [**Amazon MSK**](./msk/index.md) - support Amazon MSK Connect framework to forward data from Apache Kafka clusters to external systems such as ClickHouse. You can install **ClickHouse Kafka Connect** on Amazon MSK. +* [**Amazon MSK**](./msk/index.md) - support Amazon MSK Connect framework to forward data from Apache Kafka clusters to external systems such as ClickHouse. You can install ClickHouse Kafka Connect on Amazon MSK. ### Self-managed Kafka Connectivity -* [**Kafka Connect**](./kafka-clickhouse-connect-sink.md) - Kafka Connect is a free, open-source component of Apache Kafka® that works as a centralized data hub for simple data integration between Kafka and other data systems. Connectors provide a simple means of scalably and reliably streaming data to and from Kafka. Source Connectors inserts data to Kafka topics from other systems, whilst Sink Connectors delivers data from Kafka topics into other data stores such as ClickHouse. +* [**Kafka Connect**](./kafka-clickhouse-connect-sink.md) - Kafka Connect is a free, open-source component of Apache Kafka that works as a centralized data hub for simple data integration between Kafka and other data systems. Connectors provide a simple means of scalably and reliably streaming data to and from Kafka. Source Connectors inserts data to Kafka topics from other systems, whilst Sink Connectors delivers data from Kafka topics into other data stores such as ClickHouse. * [**Vector**](./kafka-vector.md) - Vector is a vendor agnostic data pipeline. With the ability to read from Kafka, and send events to ClickHouse, this represents a robust integration option. * [**JDBC Connect Sink**](./kafka-connect-jdbc.md) - The Kafka Connect JDBC Sink connector allows you to export data from Kafka topics to any relational database with a JDBC driver * **Custom code** - Custom code using respective client libraries for Kafka and ClickHouse may be appropriate cases where custom processing of events is required. This is beyond the scope of this documentation. +* [**Kafka table engine**](./kafka-table-engine.md) provides a Native ClickHouse integration (not available on ClickHouse Cloud). This table engine **pulls** data from the source system. This requires ClickHouse to have direct access to Kafka. -### Kafka table engine -* The [Kafka table engine](./kafka-table-engine.md) provides a Native ClickHouse integration. This table engine **pulls** data from the source system. This requires ClickHouse to have direct access to Kafka. -:::note -Kafka table engine is not supported on [ClickHouse Cloud](https://clickhouse.com/cloud). Please consider one of the alternatives listed on the page. -::: ### Choosing an approach It comes down to a few decision points: @@ -58,3 +41,14 @@ It comes down to a few decision points: * **External enrichment** - Whilst messages can be manipulated before insertion into ClickHouse, through the use of functions in the select statement of the materialized view, users may prefer to move complex enrichment external to ClickHouse. * **Data flow direction** - Vector only supports the transfer of data from Kafka to ClickHouse. + + +## Assumptions + +The user guides linked above assume the following: + +* You are familiar with the Kafka fundamentals, such as producers, consumers and topics. +* You have a topic prepared for these examples. We assume all data is stored in Kafka as JSON, although the principles remain the same if using Avro. +* We utilise the excellent [kcat](https://github.com/edenhill/kcat) (formerly kafkacat) in our examples to publish and consume Kafka data. +* Whilst we reference some python scripts for loading sample data, feel free to adapt the examples to your dataset. +* You are broadly familiar with ClickHouse materialized views. \ No newline at end of file diff --git a/docs/en/integrations/data-ingestion/kafka/msk/index.md b/docs/en/integrations/data-ingestion/kafka/msk/index.md index 129c2958476..8b1bceb6823 100644 --- a/docs/en/integrations/data-ingestion/kafka/msk/index.md +++ b/docs/en/integrations/data-ingestion/kafka/msk/index.md @@ -12,7 +12,6 @@ import ConnectionDetails from '@site/docs/en/_snippets/_gather_your_details_http We assume: * you are familiar with [ClickHouse Connector Sink](../kafka-clickhouse-connect-sink.md),Amazon MSK and MSK Connectors. We recommend the Amazon MSK [Getting Started guide](https://docs.aws.amazon.com/msk/latest/developerguide/getting-started.html) and [MSK Connect guide](https://docs.aws.amazon.com/msk/latest/developerguide/msk-connect.html). * The MSK broker is publicly accessible. See the [Public Access](https://docs.aws.amazon.com/msk/latest/developerguide/public-access.html) section of the Developer Guide. - * If you wish to allow-list the static IPs for ClickPipes, they can be found [here](../clickpipes/index.md#list-of-static-ips). ## The official Kafka connector from ClickHouse with Amazon MSK diff --git a/sidebars.js b/sidebars.js index 9ba32fde019..0e4b55530d9 100644 --- a/sidebars.js +++ b/sidebars.js @@ -91,15 +91,15 @@ const sidebars = { items: [ 'en/integrations/data-ingestion/s3/index', 'en/integrations/data-ingestion/gcs/index', + 'en/integrations/data-ingestion/kafka/index', 'en/integrations/data-ingestion/clickpipes/index', - 'en/integrations/data-ingestion/dbms/jdbc-with-clickhouse', - 'en/integrations/data-ingestion/dbms/odbc-with-clickhouse', 'en/integrations/data-ingestion/dbms/postgresql/index', 'en/integrations/data-ingestion/dbms/mysql/index', - 'en/integrations/data-ingestion/kafka/index', 'en/integrations/data-ingestion/etl-tools/dbt/index', 'en/integrations/data-ingestion/insert-local-files', 'en/integrations/data-ingestion/redshift/index', + 'en/integrations/data-ingestion/dbms/jdbc-with-clickhouse', + 'en/integrations/data-ingestion/dbms/odbc-with-clickhouse', { type: 'category', label: 'More...', @@ -108,7 +108,6 @@ const sidebars = { collapsible: true, items: [ 'en/integrations/data-ingestion/etl-tools/airbyte-and-clickhouse', - 'en/integrations/data-ingestion/kafka/msk/index', 'en/integrations/data-ingestion/emqx/index', { type: 'link',