Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancing and standardizing Clickhouse documentation #604

Open
wants to merge 5 commits into
base: trunk
Choose a base branch
from
Open
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 59 additions & 19 deletions spiceaidocs/docs/components/data-connectors/clickhouse.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,36 +4,64 @@ sidebar_label: 'Clickhouse Data Connector'
description: 'Clickhouse Data Connector Documentation'
---

## Federated SQL query

To connect to any Clickhouse database as connector for federated SQL query, specify `clickhouse` as the selector in the `from` value for the dataset.
ClickHouse is a fast, open-source columnar database management system designed for online analytical processing (OLAP) and real-time analytics. This connector enables federated/accelerated SQL queries on top of a Clickhouse server.

```yaml
datasets:
- from: clickhouse:path.to.my_dataset
- from: clickhouse:my.dataset
name: my_dataset
```

## Configuration

### `from`

The `from` field for the Clickhouse connector takes the form of `from:path_to_my_dataset` where `path_to_my_dataset` is the path to the Dataset within Clickhouse. In the example above it would be `my.dataset`.
slyons marked this conversation as resolved.
Show resolved Hide resolved

### `name`

The dataset name. This will be used as the table name within Spice.

```yaml
datasets:
- from: clickhouse:my.dataset
name: cool_dataset
```

```sql
SELECT COUNT(*) FROM cool_dataset;
```

```shell
+----------+
| count(*) |
+----------+
| 6001215 |
+----------+
```

### `params`

The Clickhouse data connector can be configured by providing the following `params`:

- `clickhouse_connection_string`: The connection string to use to connect to the Clickhouse server. This can be used instead of providing individual connection parameters. Use the [secret replacement syntax](../secret-stores/index.md) to load the password from a secret store, e.g. `${secrets:my_clickhouse_conn_string}`.
- `clickhouse_host`: The hostname of the Clickhouse server.
- `clickhouse_tcp_port`: The port of the Clickhouse server.
- `clickhouse_db`: The name of the database to connect to.
- `clickhouse_user`: The username to connect with.
- `clickhouse_pass`: The password to connect with. Use the [secret replacement syntax](../secret-stores/index.md) to load the password from a secret store, e.g. `${secrets:my_clickhouse_pass}`.
- `clickhouse_secure`: Optional. Specifies the SSL/TLS behavior for the connection, supported values:
- `true`: (default) This mode requires an SSL connection. If a secure connection cannot be established, server will not connect.
- `false`: This mode will not attempt to use an SSL connection, even if the server supports it.
- `connection_timeout`: Optional. Specifies the connection timeout in milliseconds.
| Parameter Name | Definition |
| ------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `clickhouse_connection_string` | The connection string to use to connect to the Clickhouse server. This can be used instead of providing individual connection parameters. |
| `clickhouse_host` | The hostname of the Clickhouse server. |
| `clickhouse_tcp_port` | The port of the Clickhouse server. |
| `clickhouse_db` | The name of the database to connect to. |
| `clickhouse_user` | The username to connect with. |
| `clickhouse_pass` | The password to connect with. |
| `clickhouse_secure` | Optional. Specifies the SSL/TLS behavior for the connection, supported values:<br /> <ul><li>`true`: (default) This mode requires an SSL connection. If a secure connection cannot be established, server will not connect.</li><li>`false`: This mode will not attempt to use an SSL connection, even if the server supports it.</li></ul> |
| `connection_timeout` | Optional. Specifies the connection timeout in milliseconds. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should support specifying the timeout in the same format as our other timeout parameters, in a human readable format like 10s or 1m.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to do this now, but we should at least create an issue to track it, since its inconsistent with our other connectors.


Configuration `params` are provided in the top level `dataset` for a dataset source and federated SQL query.
## Examples

### Connecting to localhost

```yaml
datasets:
- from: clickhouse:path.to.my_dataset
- from: clickhouse:my.dataset
name: my_dataset
params:
clickhouse_host: localhost
Expand All @@ -42,23 +70,35 @@ datasets:
clickhouse_user: my_user
clickhouse_pass: ${secrets:my_clickhouse_pass}
connection_timeout: 10000
clickhouse_secure: true
clickhouse_secure: false
```

### Specifying a connection timeout

```yaml
datasets:
- from: clickhouse:path.to.my_dataset
- from: clickhouse:my.dataset
name: my_dataset
params:
clickhouse_connection_string: tcp://my_user:${secrets:my_clickhouse_pass}@localhost:9000/my_database
connection_timeout: 10000
clickhouse_secure: true
```

### Using a connection string

```yaml
datasets:
- from: clickhouse:path.to.my_dataset
- from: clickhouse:my.dataset
name: my_dataset
params:
clickhouse_connection_string: tcp://my_user:${secrets:my_clickhouse_pass}@localhost:9000/my_database?connection_timeout=10000&secure=true
```

## Using secrets

There are currently three supported [secret stores](/components/secret-stores/index.md):
slyons marked this conversation as resolved.
Show resolved Hide resolved

* [Environment variables](/components/secret-stores/env)
* [Kubernetes Secret Store](/components/secret-stores/kubernetes)
* [Keyring Secret Store](/components/secret-stores/keyring)