Skip to content

Commit

Permalink
Update statistics related sql manual.
Browse files Browse the repository at this point in the history
  • Loading branch information
Jibing-Li committed Feb 5, 2025
1 parent b16bf89 commit f42f8fc
Show file tree
Hide file tree
Showing 14 changed files with 824 additions and 465 deletions.
63 changes: 48 additions & 15 deletions docs/sql-manual/sql-statements/statistics/ANALYZE.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,33 +27,66 @@ under the License.

## Description

This statement is used to collect statistical information for various columns.
This statement is used to collect column statistics. Statistics of columns can be collected for a table (specific columns can be specified) or for the entire database.

## Syntax

```sql
ANALYZE < TABLE | DATABASE table_name | db_name >
[ (column_name [, ...]) ]
[ [ WITH SYNC ] [ WITH SAMPLE PERCENT | ROWS ] ];
ANALYZE {TABLE <table_name> [ (<column_name> [, <column_name>...]) ] | DATABASE <database_name> }
[ [ WITH SYNC ] [ WITH SAMPLE {PERCENT | ROWS} <sample_rate> ] ];
```

- `table_name`: The specified target table. It can be in the format `db_name.table_name`.
- `column_name`: The specified target column. It must be an existing column in `table_name`. You can specify multiple column names separated by commas.
- `sync`: Collect statistics synchronously. Returns after collection. If not specified, it executes asynchronously and returns a JOB ID.
- `sample percent | rows`: Collect statistics with sampling. You can specify a sampling percentage or a number of sampling rows.
## Required Parameters

**1. `<table_name>`**

> The specified target table. This parameter and the <database_name> parameter must have and can only have one of them specified.
**2. `<database_name>`**

> The specified target database. This parameter and the <table_name> parameter must have and can only have one of them specified.
## Optional Parameters

**1. `<column_name>`**

> The specified target column. It must be an existing column in `table_name`. You can specify multiple column names separated by commas.
**2. `WITH SYNC`**

> Collect statistics synchronously. Returns after collection. If not specified, it executes asynchronously.
## Example
**3. `WITH SAMPLE {PERCENT | ROWS} <sample_rate>`**

Collect statistical data for a table with a 10% sampling rate:
> Specify to use the sampling method for collection. When not specified, full collection is the default. <sample_rate> is the sampling parameter. When using PERCENT sampling, it specifies the sampling percentage; when using ROWS sampling, it specifies the number of sampled rows.
## Return Value

| Column | Note |
| -- |--------------|
| Job_Id | Uniq Job Id |
| Catalog_Name | Catalog name |
| DB_Name | database name |
| Columns | column name list |

## Access Control Requirements

The user who executes this SQL must have at least the following permissions:

| Privilege | Object | Notes |
|:--------------| :------------- |:------------------------------------------------|
| SELECT_PRIV | Table | When executing ANALYZE, the SELECT_PRIV privilege for the queried table is required. |

## Examples

1. Collect statistics by sampling 10% of table lineitem.

```sql
ANALYZE TABLE lineitem WITH SAMPLE PERCENT 10;
```

Collect statistical data for a table with a sample of 100,000 rows:
2. Collect statistics by sampling 100,000 rows from table lineitem.

```sql
ANALYZE TABLE lineitem WITH SAMPLE ROWS 100000;
```

## Keywords

ANALYZE
129 changes: 71 additions & 58 deletions docs/sql-manual/sql-statements/statistics/SHOW-ANALYZE.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
{
"title": "SHOW ANALYZE",
"language": "en"
"language": "zh-CN"
}
---

Expand All @@ -24,82 +24,95 @@ specific language governing permissions and limitations
under the License.
-->



## Description

Use `SHOW ANALYZE` to view information about statistics collection jobs.
This statement is used to view the status of the statistics collection job.

Syntax:
## Syntax

```SQL
SHOW [AUTO] ANALYZE < table_name | job_id >
[ WHERE [ STATE = [ "PENDING" | "RUNNING" | "FINISHED" | "FAILED" ] ] ];
SHOW [AUTO] ANALYZE [ < table_name > | < job_id > ]
[ WHERE STATE = { "PENDING" | "RUNNING" | "FINISHED" | "FAILED" } ];
```

- AUTO: Show historical information for automatic collection jobs only. Note that, by default, the status of only the last 20,000 completed automatic collection jobs is retained.
- table_name: Table name, specify to view statistics job information for that table. It can be in the format `db_name.table_name`. When not specified, it returns information for all statistics jobs.
- job_id: Job ID for statistics collection, obtained when executing `ANALYZE`. When not specified, this command returns information for all statistics jobs.
## Required Parameters

Output:
**None**

| Column Name | Description |
| :--------------------- | :--------------- |
| `job_id` | Job ID |
| `catalog_name` | Catalog Name |
| `db_name` | Database Name |
| `tbl_name` | Table Name |
| `col_name` | Column Name List |
| `job_type` | Job Type |
| `analysis_type` | Analysis Type |
| `message` | Job Information |
| `last_exec_time_in_ms` | Last Execution Time |
| `state` | Job Status |
| `schedule_type` | Scheduling Method |
## Optional Parameters

Here's an example:
**1. `AUTO `**

```sql
mysql> show analyze 245073\G;
*************************** 1. row ***************************
job_id: 245073
catalog_name: internal
db_name: default_cluster:tpch
tbl_name: lineitem
col_name: [l_returnflag,l_receiptdate,l_tax,l_shipmode,l_suppkey,l_shipdate,l_commitdate,l_partkey,l_orderkey,l_quantity,l_linestatus,l_comment,l_extendedprice,l_linenumber,l_discount,l_shipinstruct]
job_type: MANUAL
analysis_type: FUNDAMENTALS
message:
last_exec_time_in_ms: 2023-11-07 11:00:52
state: FINISHED
progress: 16 Finished | 0 Failed | 0 In Progress | 16 Total
schedule_type: ONCE
```
> Show information about automatic jobs. If not specified, information about manual jobs will be displayed by default.
**2. `< table_name > `**

> Table name. After specifying it, you can view the job information corresponding to this table. When not specified, the job information of all tables will be returned by default.
**3. `< job_id > `**

> Statistics Job ID,Obtained when performing asynchronous collection with ANALYZE. When the ID is not specified, this command returns information about all jobs.
## Return Value

<br/>
| Column | Notes |
| -- |--------------|
| job_id | Uniq statistics job id |
| catalog_name | Catalog name |
| db_name | database name |
| tbl_name | table name |
| col_name | column name list |
| job_type | job type |
| analysis_type | analysis type |
| message | error message |
| last_exec_time_in_ms | last analyze time |
| state | job state |
| progress | job progress |
| schedule_type | schedule type |
| start_time | job start time |
| end_time | job end time |
| priority | job priority |
| enable_partition | enable partition collection flag |

Each collection job can contain one or more tasks, with each task corresponding to the collection of a column. Users can use the following command to view the completion status of statistics collection for each column.
## Access Control Requirements

Syntax:
The user who executes this SQL must have at least the following permissions:

| Privilege | Object | Notes |
|:--------------| :------------- |:------------------------------------------------|
| SELECT_PRIV | Table | When executing SHOW, the SELECT_PRIV privilege for the queried table is required. |

## Examples

1. Show jobs by table name.

```sql
SHOW ANALYZE TASK STATUS [job_id]
SHOW ANALYZE test1 WHERE STATE="FINISHED";
```

Here's an example:

```text
+---------------+--------------+---------+----------+-----------------------+----------+---------------+---------+----------------------+----------+-------------------------------------------------------+---------------+---------------------+---------------------+----------+------------------+
| job_id | catalog_name | db_name | tbl_name | col_name | job_type | analysis_type | message | last_exec_time_in_ms | state | progress | schedule_type | start_time | end_time | priority | enable_partition |
+---------------+--------------+---------+----------+-----------------------+----------+---------------+---------+----------------------+----------+-------------------------------------------------------+---------------+---------------------+---------------------+----------+------------------+
| 1737454119144 | internal | test | test1 | [test1:name,test1:id] | MANUAL | FUNDAMENTALS | | 2025-01-21 18:10:11 | FINISHED | 2 Finished | 0 Failed | 0 In Progress | 2 Total | ONCE | 2025-01-21 18:10:10 | 2025-01-21 18:10:11 | MANUAL | false |
| 1738725887879 | internal | test | test1 | [test1:name,test1:id] | MANUAL | FUNDAMENTALS | | 2025-02-05 11:26:15 | FINISHED | 2 Finished | 0 Failed | 0 In Progress | 2 Total | ONCE | 2025-02-05 11:26:15 | 2025-02-05 11:26:15 | MANUAL | false |
| 1738725887890 | internal | test | test1 | [test1:name,test1:id] | MANUAL | FUNDAMENTALS | | 2025-02-05 12:17:09 | FINISHED | 2 Finished | 0 Failed | 0 In Progress | 2 Total | ONCE | 2025-02-05 12:17:08 | 2025-02-05 12:17:09 | MANUAL | false |
| 1738725887895 | internal | test | test1 | [test1:id] | MANUAL | FUNDAMENTALS | | 2025-02-05 12:17:24 | FINISHED | 1 Finished | 0 Failed | 0 In Progress | 1 Total | ONCE | 2025-02-05 12:17:23 | 2025-02-05 12:17:24 | MANUAL | false |
| 1738725887903 | internal | test | test1 | [test1:id] | MANUAL | FUNDAMENTALS | | 2025-02-05 12:17:42 | FINISHED | 1 Finished | 0 Failed | 0 In Progress | 1 Total | ONCE | 2025-02-05 12:17:41 | 2025-02-05 12:17:42 | MANUAL | false |
+---------------+--------------+---------+----------+-----------------------+----------+---------------+---------+----------------------+----------+-------------------------------------------------------+---------------+---------------------+---------------------+----------+------------------+
```
mysql> show analyze task status 20038 ;
+---------+----------+---------+----------------------+----------+
| task_id | col_name | message | last_exec_time_in_ms | state |
+---------+----------+---------+----------------------+----------+
| 20039 | col4 | | 2023-06-01 17:22:15 | FINISHED |
| 20040 | col2 | | 2023-06-01 17:22:15 | FINISHED |
| 20041 | col3 | | 2023-06-01 17:22:15 | FINISHED |
| 20042 | col1 | | 2023-06-01 17:22:15 | FINISHED |
+---------+----------+---------+----------------------+----------+

2. Show job by job id.

```sql
show analyze 1738725887903;
```

## Keywords
```text
+---------------+--------------+---------+----------+------------+----------+---------------+---------+----------------------+----------+-------------------------------------------------------+---------------+---------------------+---------------------+----------+------------------+
| job_id | catalog_name | db_name | tbl_name | col_name | job_type | analysis_type | message | last_exec_time_in_ms | state | progress | schedule_type | start_time | end_time | priority | enable_partition |
+---------------+--------------+---------+----------+------------+----------+---------------+---------+----------------------+----------+-------------------------------------------------------+---------------+---------------------+---------------------+----------+------------------+
| 1738725887903 | internal | test | test1 | [test1:id] | MANUAL | FUNDAMENTALS | | 2025-02-05 12:17:42 | FINISHED | 1 Finished | 0 Failed | 0 In Progress | 1 Total | ONCE | 2025-02-05 12:17:41 | 2025-02-05 12:17:42 | MANUAL | false |
+---------------+--------------+---------+----------+------------+----------+---------------+---------+----------------------+----------+-------------------------------------------------------+---------------+---------------------+---------------------+----------+------------------+
```

SHOW, ANALYZE
Original file line number Diff line number Diff line change
Expand Up @@ -24,38 +24,69 @@ specific language governing permissions and limitations
under the License.
-->




## 描述

该语句用于收集各列的统计信息。
该语句用于收集统计信息。可以针对表(可以指定具体列)或整个数据库进行列统计信息的收集。

## 语法

```sql
ANALYZE < TABLE | DATABASE table_name | db_name >
[ (column_name [, ...]) ]
[ [ WITH SYNC ] [ WITH SAMPLE PERCENT | ROWS ] ];
ANALYZE {TABLE <table_name> [ (<column_name> [, <column_name>...]) ] | DATABASE <database_name> }
[ [ WITH SYNC ] [ WITH SAMPLE {PERCENT | ROWS} <sample_rate> ] ];
```

- table_name: 指定的目标表。可以是  `db_name.table_name`  形式。
- column_name: 指定的目标列。必须是  `table_name`  中存在的列,多个列名称用逗号分隔。
- sync:同步收集统计信息。收集完后返回。若不指定则异步执行并返回 JOB ID。
- sample percent | rows:抽样收集统计信息。可以指定抽样比例或者抽样行数。
## 必选参数

**1. `<table_name>`**

> 指定的目标表。该参数与`<database_name>`参数必须且只能指定其中之一。
**2. `<database_name>`**

> 指定的目标数据库。该参数与`<table_name>`参数必须且只能指定其中之一。
## 可选参数

**1. `<column_name>`**

> 指定表的目标列。必须是  `table_name`  中存在的列,多个列名称用逗号分隔。
## 示例
**2. `WITH SYNC`**

对一张表按照 10% 的比例采样收集统计数据:
> 指定同步执行该ANALYZE语句。不指定时默认后台异步执行。
**3. `WITH SAMPLE {PERCENT | ROWS} <sample_rate>`**

> 指定使用抽样方式收集。当不指定时,默认为全量收集。<sample_rate> 为抽样参数,在PERCENT采样时指定抽样百分比,ROWS采样时指定抽样行数。
## 返回值

| 列名 | 说明 |
| -- |--------------|
| Job_Id | 收集作业的唯一ID |
| Catalog_Name | Catalog名 |
| DB_Name | 数据库名 |
| Columns | 收集的列列表 |

## 权限控制

执行此 SQL 命令的用户必须至少具有以下权限:

| 权限(Privilege) | 对象(Object) | 说明(Notes) |
|:--------------| :------------- |:------------------------------------------------|
| SELECT_PRIV | 表(Table) | 当执行 ANALYZE 时,需要拥有被查询的表的 SELECT_PRIV 权限 |

## 举例

1. 对lineitem表按照 10% 的比例采样收集统计数据:

```sql
ANALYZE TABLE lineitem WITH SAMPLE PERCENT 10;
```

对一张表按采样 10 万行收集统计数据
2. 对lineitem表按采样 10 万行收集统计数据

```sql
ANALYZE TABLE lineitem WITH SAMPLE ROWS 100000;
```

## 关键词

ANALYZE
Loading

0 comments on commit f42f8fc

Please sign in to comment.