Skip to content

Commit

Permalink
Update statistics related sql manual.
Browse files Browse the repository at this point in the history
  • Loading branch information
Jibing-Li committed Feb 5, 2025
1 parent b16bf89 commit 4f2e184
Show file tree
Hide file tree
Showing 4 changed files with 236 additions and 129 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -24,38 +24,74 @@ specific language governing permissions and limitations
under the License.
-->




## 描述

该语句用于收集各列的统计信息。
该语句用于收集统计信息。可以针对表(可以指定具体列)或整个数据库进行列统计信息的收集。

## 语法

```sql
ANALYZE < TABLE | DATABASE table_name | db_name >
[ (column_name [, ...]) ]
[ [ WITH SYNC ] [ WITH SAMPLE PERCENT | ROWS ] ];
ANALYZE {TABLE <table_name> [ (<column_name> [, <column_name>...]) ] | DATABASE <database_name> }
[ [ WITH SYNC ] [ WITH SAMPLE {PERCENT | ROWS} <sample_rate> ] ];
```

- table_name: 指定的目标表。可以是  `db_name.table_name`  形式。
- column_name: 指定的目标列。必须是  `table_name`  中存在的列,多个列名称用逗号分隔。
- sync:同步收集统计信息。收集完后返回。若不指定则异步执行并返回 JOB ID。
- sample percent | rows:抽样收集统计信息。可以指定抽样比例或者抽样行数。
## 必选参数

**1. `<table_name>`**

> 指定的目标表。该参数与`<database_name>`参数必须且只能指定其中之一。
**2. `<database_name>`**

> 指定的目标数据库。该参数与`<table_name>`参数必须且只能指定其中之一。
## 可选参数

**1. `<column_name>`**

> 指定表的目标列。必须是  `table_name`  中存在的列,多个列名称用逗号分隔。
**2. `WITH SYNC`**

## 示例
> 指定同步执行该ANALYZE语句。不指定时默认后台异步执行。
对一张表按照 10% 的比例采样收集统计数据:
**3. `WITH SAMPLE {PERCENT | ROWS} <sample_rate>`**

> 指定使用抽样方式收集。当不指定时,默认为全量收集。<sample_rate> 为抽样参数,在PERCENT采样时指定抽样百分比,ROWS采样时指定抽样行数。
## 返回值

| 列名 | 说明 |
| -- |--------------|
| Job_Id | 收集作业的唯一ID |
| Catalog_Name | Catalog名 |
| DB_Name | 数据库名 |
| Columns | 收集的列列表 |

## 权限控制

执行此 SQL 命令的用户必须至少具有以下权限:

| 权限(Privilege) | 对象(Object) | 说明(Notes) |
|:--------------| :------------- |:------------------------------------------------|
| SELECT_PRIV | 表(Table) | 当执行 ANALYZE 时,需要拥有被查询的表的 SELECT_PRIV 权限 |

## 注意事项
- 如果指定 WITH SYNC,则语句没有返回值。当语句执行结束后,收集作业就已经完成
- 不指定 WITH SYNC 时,语句在返回job id等信息后就结束了,收集作业在后台执行,可以通过show analyze语句查看执行状态


## 举例

1. 对一张表按照 10% 的比例采样收集统计数据:

```sql
ANALYZE TABLE lineitem WITH SAMPLE PERCENT 10;
```

对一张表按采样 10 万行收集统计数据
2. 对一张表按采样 10 万行收集统计数据

```sql
ANALYZE TABLE lineitem WITH SAMPLE ROWS 100000;
```

## 关键词

ANALYZE
Original file line number Diff line number Diff line change
Expand Up @@ -24,83 +24,95 @@ specific language governing permissions and limitations
under the License.
-->


## 描述

通过 `SHOW ANALYZE` 来查看统计信息收集作业的信息
该语句用来查看统计信息收集作业的状态

语法如下:
## 语法

```SQL
SHOW [AUTO] ANALYZE < table_name | job_id >
[ WHERE [ STATE = [ "PENDING" | "RUNNING" | "FINISHED" | "FAILED" ] ] ];
SHOW [AUTO] ANALYZE [ < table_name > | < job_id > ]
[ WHERE STATE = { "PENDING" | "RUNNING" | "FINISHED" | "FAILED" } ];
```

- AUTO:仅仅展示自动收集历史作业信息。需要注意的是默认只保存过去 20000 个执行完毕的自动收集作业的状态。
- table_name:表名,指定后可查看该表对应的统计作业信息。可以是  `db_name.table_name`  形式。不指定时返回所有统计作业信息。
- job_id:统计信息作业 ID,执行 `ANALYZE` 异步收集时得到。不指定 id 时此命令返回所有统计作业信息。
## 必选参数

输出:
****

| 列名 | 说明 |
| :--------------------- | :----------- |
| `job_id` | 统计作业 ID |
| `catalog_name` | catalog 名称 |
| `db_name` | 数据库名称 |
| `tbl_name` | 表名称 |
| `col_name` | 列名称列表 |
| `job_type` | 作业类型 |
| `analysis_type` | 统计类型 |
| `message` | 作业信息 |
| `last_exec_time_in_ms` | 上次执行时间 |
| `state` | 作业状态 |
| `schedule_type` | 调度方式 |
## 可选参数

下面是一个例子:
**1. `AUTO `**

```sql
mysql> show analyze 245073\G;
*************************** 1. row ***************************
job_id: 245073
catalog_name: internal
db_name: default_cluster:tpch
tbl_name: lineitem
col_name: [l_returnflag,l_receiptdate,l_tax,l_shipmode,l_suppkey,l_shipdate,l_commitdate,l_partkey,l_orderkey,l_quantity,l_linestatus,l_comment,l_extendedprice,l_linenumber,l_discount,l_shipinstruct]
job_type: MANUAL
analysis_type: FUNDAMENTALS
message:
last_exec_time_in_ms: 2023-11-07 11:00:52
state: FINISHED
progress: 16 Finished | 0 Failed | 0 In Progress | 16 Total
schedule_type: ONCE
```
> 展示自动作业信息。如不指定,默认显示手动作业信息。
**2. `< table_name > `**

> 表名。指定后可查看该表对应的作业信息。不指定时默认返回所有表的作业信息。
**3. `< job_id > `**

> 统计信息作业 ID,执行 `ANALYZE` 异步收集时得到。不指定 id 时此命令返回所有作业信息。
<br/>
## 返回值

每个收集作业中可以包含一到多个任务,每个任务对应一列的收集。用户可通过如下命令查看具体每列的统计信息收集完成情况。
| 列名 | 说明 |
| -- |--------------|
| job_id | 收集作业的唯一ID |
| catalog_name | Catalog名 |
| db_name | 数据库名 |
| tbl_name | 表名 |
| col_name | 收集的列列表 |
| job_type | 作业类型 |
| analysis_type | 分析类型 |
| message | 错误信息 |
| last_exec_time_in_ms | 上次收集完成时间 |
| state | 作业状态 |
| progress | 作业进度 |
| schedule_type | 调度类型 |
| start_time | 作业开始时间 |
| end_time | 作业结束时间 |
| priority | 作业优先级 |
| enable_partition | 是否开启分区收集 |

语法:
## 权限控制

执行此 SQL 命令的用户必须至少具有以下权限:

| 权限(Privilege) | 对象(Object) | 说明(Notes) |
|:--------------| :------------- |:------------------------------------------------|
| SELECT_PRIV | 表(Table) | 当执行 SHOW 时,需要拥有被查询的表的 SELECT_PRIV 权限 |

## 举例

1. 通过表名展示作业

```sql
SHOW ANALYZE TASK STATUS [job_id]
SHOW ANALYZE test1 WHERE STATE="FINISHED";
```

下面是一个例子:

```text
+---------------+--------------+---------+----------+-----------------------+----------+---------------+---------+----------------------+----------+-------------------------------------------------------+---------------+---------------------+---------------------+----------+------------------+
| job_id | catalog_name | db_name | tbl_name | col_name | job_type | analysis_type | message | last_exec_time_in_ms | state | progress | schedule_type | start_time | end_time | priority | enable_partition |
+---------------+--------------+---------+----------+-----------------------+----------+---------------+---------+----------------------+----------+-------------------------------------------------------+---------------+---------------------+---------------------+----------+------------------+
| 1737454119144 | internal | test | test1 | [test1:name,test1:id] | MANUAL | FUNDAMENTALS | | 2025-01-21 18:10:11 | FINISHED | 2 Finished | 0 Failed | 0 In Progress | 2 Total | ONCE | 2025-01-21 18:10:10 | 2025-01-21 18:10:11 | MANUAL | false |
| 1738725887879 | internal | test | test1 | [test1:name,test1:id] | MANUAL | FUNDAMENTALS | | 2025-02-05 11:26:15 | FINISHED | 2 Finished | 0 Failed | 0 In Progress | 2 Total | ONCE | 2025-02-05 11:26:15 | 2025-02-05 11:26:15 | MANUAL | false |
| 1738725887890 | internal | test | test1 | [test1:name,test1:id] | MANUAL | FUNDAMENTALS | | 2025-02-05 12:17:09 | FINISHED | 2 Finished | 0 Failed | 0 In Progress | 2 Total | ONCE | 2025-02-05 12:17:08 | 2025-02-05 12:17:09 | MANUAL | false |
| 1738725887895 | internal | test | test1 | [test1:id] | MANUAL | FUNDAMENTALS | | 2025-02-05 12:17:24 | FINISHED | 1 Finished | 0 Failed | 0 In Progress | 1 Total | ONCE | 2025-02-05 12:17:23 | 2025-02-05 12:17:24 | MANUAL | false |
| 1738725887903 | internal | test | test1 | [test1:id] | MANUAL | FUNDAMENTALS | | 2025-02-05 12:17:42 | FINISHED | 1 Finished | 0 Failed | 0 In Progress | 1 Total | ONCE | 2025-02-05 12:17:41 | 2025-02-05 12:17:42 | MANUAL | false |
+---------------+--------------+---------+----------+-----------------------+----------+---------------+---------+----------------------+----------+-------------------------------------------------------+---------------+---------------------+---------------------+----------+------------------+
```
mysql> show analyze task status 20038 ;
+---------+----------+---------+----------------------+----------+
| task_id | col_name | message | last_exec_time_in_ms | state |
+---------+----------+---------+----------------------+----------+
| 20039 | col4 | | 2023-06-01 17:22:15 | FINISHED |
| 20040 | col2 | | 2023-06-01 17:22:15 | FINISHED |
| 20041 | col3 | | 2023-06-01 17:22:15 | FINISHED |
| 20042 | col1 | | 2023-06-01 17:22:15 | FINISHED |
+---------+----------+---------+----------------------+----------+

2. 通过作业 ID 展示作业

```sql
show analyze 1738725887903;
```

## 关键词
```text
+---------------+--------------+---------+----------+------------+----------+---------------+---------+----------------------+----------+-------------------------------------------------------+---------------+---------------------+---------------------+----------+------------------+
| job_id | catalog_name | db_name | tbl_name | col_name | job_type | analysis_type | message | last_exec_time_in_ms | state | progress | schedule_type | start_time | end_time | priority | enable_partition |
+---------------+--------------+---------+----------+------------+----------+---------------+---------+----------------------+----------+-------------------------------------------------------+---------------+---------------------+---------------------+----------+------------------+
| 1738725887903 | internal | test | test1 | [test1:id] | MANUAL | FUNDAMENTALS | | 2025-02-05 12:17:42 | FINISHED | 1 Finished | 0 Failed | 0 In Progress | 1 Total | ONCE | 2025-02-05 12:17:41 | 2025-02-05 12:17:42 | MANUAL | false |
+---------------+--------------+---------+----------+------------+----------+---------------+---------+----------------------+----------+-------------------------------------------------------+---------------+---------------------+---------------------+----------+------------------+
```

SHOW, ANALYZE
Original file line number Diff line number Diff line change
Expand Up @@ -26,41 +26,88 @@ under the License.

## 描述

通过 `SHOW COLUMN STATS` 来查看列的各项统计数据
该语句用来查看表的列统计信息

语法如下:
## 语法

```SQL
SHOW COLUMN [cached] STATS table_name [ (column_name [, ...]) ];
SHOW COLUMN [CACHED] STATS < table_name > [ (<column_name> [, <column_name>...]) ];
```

其中:
## 必选参数

- cached: 展示当前 FE 内存缓存中的统计信息。
- table_name: 收集统计信息的目标表。可以是  `db_name.table_name`  形式。
- column_name: 指定的目标列,必须是  `table_name`  中存在的列,多个列名称用逗号分隔。
**1. `<table_name>`**

下面是一个例子:
> 需要展示列统计信息的表名。
## 可选参数

**1. `CACHED `**

> 显示FE缓存中的统计信息。不指定的时候默认显示统计信息表中持久化的信息。
**2. `<column_name>`**

> 指定需要显示的列名。列名在表中必须存在,多个列名之间用逗号分隔。如果不指定,默认显示所有列的信息。
## 返回值

| 列名 | 说明 |
| -- |--------------|
| column_name | 列名 |
| index_name | 列所属的索引名 |
| count | 列的行数 |
| ndv | 列的基数 |
| num_null | 列的空值数 |
| data_size | 列的总数据量 |
| avg_size_byte | 列的平均字节数 |
| min | 列的最小值 |
| max | 列的最大值 |
| method | 收集方式 |
| type | 收集类型 |
| trigger | 触发方式 |
| query_times | 信息被查询次数 |
| updated_time | 信息更新时间 |
| update_rows | 上次收集时数据更新行数 |
| last_analyze_row_count | 上次收集时表的总行数 |
| last_analyze_version | 上次收集时表的版本值 |

## 权限控制

执行此 SQL 命令的用户必须至少具有以下权限:

| 权限(Privilege) | 对象(Object) | 说明(Notes) |
|:--------------| :------------- |:------------------------------------------------|
| SELECT_PRIV | 表(Table) | 当执行 SHOW 时,需要拥有被查询的表的 SELECT_PRIV 权限 |

## 举例

1. 展示表test1所有列的统计信息

```sql
mysql> show column stats lineitem(l_tax)\G;
*************************** 1. row ***************************
column_name: l_tax
count: 6001215.0
ndv: 9.0
num_null: 0.0
data_size: 4.800972E7
avg_size_byte: 8.0
min: 0.00
max: 0.08
method: FULL
type: FUNDAMENTALS
trigger: MANUAL
query_times: 0
updated_time: 2023-11-07 11:00:46
SHOW COLUMN STATS test1;
```

```text
+-------------+------------+----------+---------+----------+-----------+---------------+--------+--------+--------+--------------+---------+-------------+---------------------+-------------+------------------------+----------------------+
| column_name | index_name | count | ndv | num_null | data_size | avg_size_byte | min | max | method | type | trigger | query_times | updated_time | update_rows | last_analyze_row_count | last_analyze_version |
+-------------+------------+----------+---------+----------+-----------+---------------+--------+--------+--------+--------------+---------+-------------+---------------------+-------------+------------------------+----------------------+
| name | test1 | 87775.0 | 48824.0 | 0.0 | 351100.0 | 4.0 | '0001' | 'ffff' | FULL | FUNDAMENTALS | MANUAL | 0 | 2025-02-05 12:17:08 | 0 | 100000 | 3 |
| id | test1 | 100000.0 | 8965.0 | 0.0 | 351400.0 | 3.514 | 1000 | 9999 | SAMPLE | FUNDAMENTALS | MANUAL | 0 | 2025-02-05 12:17:41 | 0 | 100000 | 3 |
+-------------+------------+----------+---------+----------+-----------+---------------+--------+--------+--------+--------------+---------+-------------+---------------------+-------------+------------------------+----------------------+
```

## 关键词
2. 展示表test1所有列在当前FE缓存中的统计信息

SHOW, COLUMN, STATS
```sql
SHOW COLUMN CACHED STATS test1;
```

```text
+-------------+------------+----------+---------+----------+-----------+---------------+--------+--------+--------+--------------+---------+-------------+---------------------+-------------+------------------------+----------------------+
| column_name | index_name | count | ndv | num_null | data_size | avg_size_byte | min | max | method | type | trigger | query_times | updated_time | update_rows | last_analyze_row_count | last_analyze_version |
+-------------+------------+----------+---------+----------+-----------+---------------+--------+--------+--------+--------------+---------+-------------+---------------------+-------------+------------------------+----------------------+
| name | test1 | 87775.0 | 48824.0 | 0.0 | 351100.0 | 4.0 | '0001' | 'ffff' | FULL | FUNDAMENTALS | MANUAL | 0 | 2025-02-05 12:17:08 | 0 | 100000 | 3 |
| id | test1 | 100000.0 | 8965.0 | 0.0 | 351400.0 | 3.514 | 1000 | 9999 | SAMPLE | FUNDAMENTALS | MANUAL | 0 | 2025-02-05 12:17:41 | 0 | 100000 | 3 |
+-------------+------------+----------+---------+----------+-----------+---------------+--------+--------+--------+--------------+---------+-------------+---------------------+-------------+------------------------+----------------------+
```
Loading

0 comments on commit 4f2e184

Please sign in to comment.