Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chore/update upstream #1

Open
wants to merge 29 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
8a2731f
Distributed incremental materialization (#172)
gladkikhtutu Jul 27, 2023
3a28a66
Update version and tweak docs
genzgd Jul 27, 2023
1a0649e
Lw delete set fix (#174)
genzgd Jul 27, 2023
4b8a202
Fix legacy incremental materialization (#178)
genzgd Aug 9, 2023
9c8139f
fix: distributed_table materialization issue (#184)
zli06160 Aug 22, 2023
80cba25
Bump version and changelog (#185)
genzgd Aug 22, 2023
b79669a
cluster names containing dash characters (#198) (#200)
the4thamigo-uk Oct 21, 2023
d63285a
Add basic error test, fix minor merge conflict (#202)
genzgd Oct 26, 2023
96474f1
Cluster setting and Distributed Table tests (#186)
gfunc Oct 26, 2023
af72593
Update version and CHANGELOG, incorporate cluster name fix (#203)
genzgd Oct 26, 2023
9edb554
Release 1 5 0 (#210)
genzgd Nov 23, 2023
5e8e54b
Update test and dependency versions. (#211)
genzgd Nov 23, 2023
588e5ca
Adjust the wrapper parenthesis around the table materialization sql c…
kris947 Nov 27, 2023
1f17ec2
Update for 1.5.1 bug fix
genzgd Nov 27, 2023
9997b82
Fix creation of replicated tables when using legacy materialization (…
StevenReitsma Nov 28, 2023
3fec9a4
On cluster sync cleanup
genzgd Nov 28, 2023
bf11cbe
Bug fixes related to model settings. (#214)
genzgd Nov 29, 2023
8561210
Add materialization macro for materialized view (#207)
SoryRawyer Nov 29, 2023
246a4d8
Release 1 6 0 (#215)
genzgd Nov 30, 2023
08bbbf9
Release 1 6 1 (#217)
genzgd Dec 5, 2023
e6e74e4
Release 1 6 2 (#219)
genzgd Dec 6, 2023
2e72a00
Release 1 7 0 (#220)
genzgd Dec 7, 2023
5ccdad5
Correctly warn or error if light weight deletes not available
genzgd Dec 8, 2023
2d5c675
Wrap columns_in_query query in select statement (#222)
ptemarvelde Dec 13, 2023
ca9da0b
Update changelog
genzgd Dec 13, 2023
36ae17e
fix: incremental on cluster clause
Savid Jul 12, 2023
c522eb3
feat(icremental): add distrubted table
Savid Jul 18, 2023
6125272
fix(incremental): replicated delete_insert
Savid Jul 24, 2023
665f184
feat(incremental): distrubted append
Savid Jul 25, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/pypi.yml
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
---
name: "PyPI Release"

# yamllint disable-line rule:truthy
on:
push:
tags:
- 'v*'
workflow_dispatch:


jobs:
publish:
name: PyPI Release
Expand Down
5 changes: 3 additions & 2 deletions .github/workflows/test_cloud.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,15 +17,16 @@ jobs:
DBT_CH_TEST_HOST: ${{ secrets.INTEGRATIONS_TEAM_TESTS_CLOUD_HOST }}
DBT_CH_TEST_PASSWORD: ${{ secrets.INTEGRATIONS_TEAM_TESTS_CLOUD_PASSWORD }}
DBT_CH_TEST_CLUSTER_MODE: true
DBT_CH_TEST_CLOUD: true

steps:
- name: Checkout
uses: actions/checkout@v3

- name: Setup Python 3.10
- name: Setup Python 3.11
uses: actions/setup-python@v4
with:
python-version: '3.10'
python-version: '3.11'

- name: Install requirements
run: pip3 install -r dev_requirements.txt
Expand Down
24 changes: 10 additions & 14 deletions .github/workflows/test_matrix.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,15 +23,14 @@ jobs:
strategy:
matrix:
python-version:
- '3.8'
- '3.9'
- '3.10'
- '3.11'
clickhouse-version:
- '22.8'
- '23.3'
- '23.5'
- '23.6'
- '23.8'
- '23.9'
- '23.10'
- latest

steps:
Expand All @@ -44,16 +43,10 @@ jobs:
echo "TEST_SETTINGS_FILE=22_3" >> $GITHUB_ENV
echo "DBT_CH_TEST_CH_VERSION=22.3" >> $GITHUB_ENV

- name: Run ClickHouse Container
run: docker run
-d
-p 8123:8123
-p 9000:9000
--name clickhouse
-v /var/lib/clickhouse
-v ${{ github.workspace }}/tests/integration/test_settings_$TEST_SETTINGS_FILE.xml:/etc/clickhouse-server/users.d/test_settings.xml
--ulimit nofile=262144:262144
clickhouse/clickhouse-server:${{ matrix.clickhouse-version }}
- name: Run ClickHouse Cluster Containers
env:
PROJECT_ROOT: ${{ github.workspace }}/tests/integration
run: REPLICA_NUM=1 docker-compose -f ${{ github.workspace }}/tests/integration/docker-compose.yml up -d

- name: Setup Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
Expand All @@ -64,13 +57,16 @@ jobs:
run: pip3 install -r dev_requirements.txt

- name: Run HTTP tests
env:
DBT_CH_TEST_CLUSTER: test_shard
run: |
PYTHONPATH="${PYTHONPATH}:dbt"
pytest tests

- name: Run Native tests
env:
DBT_CH_TEST_PORT: 9000
DBT_CH_TEST_CLUSTER: test_shard
run: |
PYTHONPATH="${PYTHONPATH}:dbt"
pytest tests
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -96,3 +96,4 @@ dbt-tut
# local development stuff
dev/
.python-version
*_project/
117 changes: 117 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,120 @@
### Release [1.7.1], 2023-12-13
#### Bug Fixes
- Some models with LIMIT clauses were broken in recent releases. This has been fixed. Thanks to
[ptemarvelde](https://github.com/ptemarvelde) for the PR!
- It was possible for incremental models with the delete+insert strategy to fail if ClickHouse "light weight deletes" were
not enabled or the required setting `allow_nondetermistic_mutations` was not enabled and the user did not have permission
to apply it. This condition is now detected on startup, and an exception will be thrown if `use_lw_deletes` is configured
in the profile. Otherwise, a warning will be logged that incremental models will be slower (because such models will
be downgraded to use the `legacy` incremental strategy). This should prevent the confusing behavior in
https://github.com/ClickHouse/dbt-clickhouse/issues/197 by throwing an early exception for an unsupported configuration.

### Release [1.7.0], 2023-12-07
#### Improvements
- Minimal compatibility with dbt 1.7.x. The date_spine macro and additional automated tests have not been implemented,
but are planned for a future patch release.
- DBT 1.7 introduces a (complex) optimization mechanism for retrieving a dbt catalog which is overkill for ClickHouse
(which has no separate schema/database level), so this release includes some internal catalog changes to simplify that process.

### Release [1.6.2], 2023-12-06
#### Bug Fix
- The dbt `on_schema_change` configuration value for incremental models was effectively being ignored. This has been fixed
with a very limited implementation. Closes https://github.com/ClickHouse/dbt-clickhouse/issues/199. Because of the way that
ORDER BY/SORT BY/PARTITION BY/PRIMARY KEYS work in ClickHouse, plus the complexities of correctly transforming ClickHouse data types,
`sync_all_columns` is not currently supported (although an implementation that works for non-key columns is theoretically possible,
such an enhancement is not currently planned). Accordingly, only `ignore`, `fail`, and `append_new_columns` values are supported
for `on_schema_change`. It is also not currently supported for Distributed tables.

Note that actually appending new columns requires a fallback to the `legacy` incremental strategy, which is quite inefficient,
so while theoretically possible, using `append_new_columns` is not recommended except for very small data volumes.

### Release [1.6.1], 2023-12-04
#### Bug Fixes
- Identifier quoting was disabled for tables/databases etc. This would cause failures for schemas or tables using reserved words
or containing special characters. This has been fixed and some macros have been updated to correctly handle such identifiers.
Note that there still may be untested edge cases where nonstandard identifiers cause issues, so they are still not recommended.
Closes https://github.com/ClickHouse/dbt-clickhouse/issues/144. Thanks to [Alexandru Pisarenco](https://github.com/apisarenco) for the
report and initial PR!
- The new `allow_automatic_deduplication` setting was not being correctly propagated to the adapter, so setting it to `True`
did not have the intended affect. In addition, this setting is now ignored for older ClickHouse versions that
do not support `CREATE TABLE AS SELECT ... EMPTY`, since the automatic deduplication window is required to allow correct
inserts in Replicated tables on those older versions. Fixes https://github.com/ClickHouse/dbt-clickhouse/issues/216.

### Release [1.6.0], 2023-11-30
#### Improvements
- Compatible with dbt 1.6.x. Note that dbt new `clone` feature is not supported, as ClickHouse has no native "light weight"
clone functionality, and copying tables without actual data transfer is not possible in ClickHouse (barring file manipulation
outside ClickHouse itself).
- A new ClickHouse specific Materialized View materialization contributed by [Rory Sawyer](https://github.com/SoryRawyer).
This creates a ClickHouse Materialized view using the `TO` form with the name `<model_name>_mv` and the associated target
table `<model_name>`. It's highly recommended to fully understand how ClickHouse materialized views work before using
this materialization.

### Release [1.5.2], 2023-11-28
#### Bug Fixes
- The `ON CLUSTER` clause was in the incorrect place for legacy incremental materializations. This has been fixed. Thanks to
[Steven Reitsma](https://github.com/StevenReitsma) for the fix!
- The `ON CLUSTER` DDL for drop tables did not include a SYNC modifier, which might be the cause of some "table already exists"
errors. The `SYNC` modifier has been added to the `on_cluster` macro when dropping relations.
- Fixed a bug where using table settings such as `allow_nullable_key` would break "legacy" incremental materializations. Closes
https://github.com/ClickHouse/dbt-clickhouse/issues/209. Also see the new model `config` property `insert_settings` described
below.
- Fixed an issue where incremental materializations would incorrectly exclude duplicated inserted elements due to "automatic"
ClickHouse deduplication on replicated tables. Closes https://github.com/ClickHouse/dbt-clickhouse/issues/213. The fix consists
of always sending a `replicated_deduplication_window=0` table setting when creating the incremental relations. This
behavior can be overridden by setting the new profile parameter `allow_automatic_deduplication` to `True`, although for
general dbt operations this is probably not necessary and not recommended. Finally thanks to Andy(https://github.com/andy-miracl)
for the report and debugging help!

#### Improvements
- Added a new profile property `allow_automatic_deduplication`, which defaults to `False`. ClickHouse Replicated deduplication is
now disable for incremental inserts, but this property can be set to true if for some reason the default ClickHouse behavior
for inserted blocks is desired.
- Added a new model `config` property `query_settings` for any ClickHouse settings that should be sent with the `INSERT INTO`
or `DELETE_FROM` queries used with materializations. Note this is distinct from the existing property `settings` which is
used for ClickHouse "table" settings in DDL statements like `CREATE TABLE ... AS`.

### Release [1.5.1], 2023-11-27
#### Bug Fix
- Fix table materialization for compatibility with SQLFluff. Thanks to [Kristof Szaloki](https://github.com/kris947) for the PR!

### Release [1.5.0], 2023-11-23
#### Improvements
- Compatible with dbt 1.5.x
- Contract support (using exact column data types)

#### Bug Fix
- Fix s3 macro when bucket includes `https://` prefix. Closes https://github.com/ClickHouse/dbt-clickhouse/issues/192.

### Release [1.4.9], 2023-10-27
#### Improvement
- Lots of work on Distributed table materializations. Big thanks to [gfunc](https://github.com/gfunc) for the additional PR
and [Zhenbang](https://github.com/zli06160) for code review and suggestions. See the README for details on how to
use the new functionality.
#### Bug Fix
- dbt would fail if a cluster name contained a dash. This has been fixed. Thanks to [Andy](https://github.com/the4thamigo-uk
for the PR

### Release [1.4.8], 2023-08-22
#### Bug Fix
- Fixed issues with experimental Distributed table materializations. Closes https://github.com/ClickHouse/dbt-clickhouse/issues/179.
Thanks to [Zhebnang](https://github.com/zli06160) for the report and for contributing to the fix with [gfunc](https://github.com/gfunc).

### Release [1.4.7], 2023-08-09
#### Bug Fix
- Fixed an exception in "legacy" incremental materializations that are not distributed

### Release [1.4.6], 2023-07-27
#### Bug fix
- Lightweight deletes could fail in environments where the HTTP session was not preserved (such as clusters behind a non-sticky
load balancer). This has been fixed by sending the required settings with every request instead of relying on a SET statement.
A similar approach has been used to persist the 'insert_distributed_sync' setting for Distributed table materializations.

### Release [1.4.5], 2023-07-27
#### Improvement
- Adds additional experimental support for Distributed table engine models and incremental materialization. See the README for
details. Thanks to [gladkikhtutu](https://github.com/gladkikhtutu) for the contribution!

### Release [1.4.4], 2023-07-19
#### Bug Fixes
- Fixed two logging/exception handling issues that would cause exception on startup or when handling some exceptions
Expand Down
Loading
Loading