Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add doc for APPROX_DISTINCT_COUNT aggregate function (#19732) #19779

Merged
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 27 additions & 1 deletion functions-and-operators/aggregate-group-by-functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,33 @@ TiDB 支持的 MySQL `GROUP BY` 聚合函数如下所示:
1 row in set (0.00 sec)
```

上述聚合函数除 `GROUP_CONCAT()` 和 `APPROX_PERCENTILE()` 以外,均可作为[窗口函数](/functions-and-operators/window-functions.md)使用。
+ `APPROX_COUNT_DISTINCT(expr, [expr...])`

该函数的功能与 `COUNT(DISTINCT)` 相似,用于统计不同值的数量,但返回的是一个近似值。它采用 `BJKST` 算法,在处理具有幂律分布特征的大规模数据集时,可以显著降低内存消耗。此外,对于低基数 (low cardinality) 的数据,该函数的结果准确性较高,同时对 CPU 的使用效率也较优。

以下是一个使用该函数的示例:

```sql
DROP TABLE IF EXISTS t;
CREATE TABLE t(a INT, b INT, c INT);
INSERT INTO t VALUES(1, 1, 1), (2, 1, 1), (2, 2, 1), (3, 1, 1), (5, 1, 2), (5, 1, 2), (6, 1, 2), (7, 1, 2);
```

```sql
SELECT APPROX_COUNT_DISTINCT(a, b) FROM t GROUP BY c;
```

```
+-----------------------------+
| approx_count_distinct(a, b) |
+-----------------------------+
| 3 |
| 4 |
+-----------------------------+
2 rows in set (0.00 sec)
```

上述聚合函数除 `GROUP_CONCAT()`、 `APPROX_PERCENTILE()` 和 `APPROX_COUNT_DISTINCT` 以外,均可作为[窗口函数](/functions-and-operators/window-functions.md)使用。

## GROUP BY 修饰符

Expand Down