Skip to content

Commit

Permalink
Update statistics related sql manual.
Browse files Browse the repository at this point in the history
  • Loading branch information
Jibing-Li committed Feb 5, 2025
1 parent b16bf89 commit 5653eb5
Show file tree
Hide file tree
Showing 16 changed files with 944 additions and 523 deletions.
63 changes: 48 additions & 15 deletions docs/sql-manual/sql-statements/statistics/ANALYZE.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,33 +27,66 @@ under the License.

## Description

This statement is used to collect statistical information for various columns.
This statement is used to collect column statistics. Statistics of columns can be collected for a table (specific columns can be specified) or for the entire database.

## Syntax

```sql
ANALYZE < TABLE | DATABASE table_name | db_name >
[ (column_name [, ...]) ]
[ [ WITH SYNC ] [ WITH SAMPLE PERCENT | ROWS ] ];
ANALYZE {TABLE <table_name> [ (<column_name> [, <column_name>...]) ] | DATABASE <database_name> }
[ [ WITH SYNC ] [ WITH SAMPLE {PERCENT | ROWS} <sample_rate> ] ];
```

- `table_name`: The specified target table. It can be in the format `db_name.table_name`.
- `column_name`: The specified target column. It must be an existing column in `table_name`. You can specify multiple column names separated by commas.
- `sync`: Collect statistics synchronously. Returns after collection. If not specified, it executes asynchronously and returns a JOB ID.
- `sample percent | rows`: Collect statistics with sampling. You can specify a sampling percentage or a number of sampling rows.
## Required Parameters

**1. `<table_name>`**

> The specified target table. This parameter and the <database_name> parameter must have and can only have one of them specified.
**2. `<database_name>`**

> The specified target database. This parameter and the <table_name> parameter must have and can only have one of them specified.
## Optional Parameters

**1. `<column_name>`**

> The specified target column. It must be an existing column in `table_name`. You can specify multiple column names separated by commas.
**2. `WITH SYNC`**

> Collect statistics synchronously. Returns after collection. If not specified, it executes asynchronously.
## Example
**3. `WITH SAMPLE {PERCENT | ROWS} <sample_rate>`**

Collect statistical data for a table with a 10% sampling rate:
> Specify to use the sampling method for collection. When not specified, full collection is the default. <sample_rate> is the sampling parameter. When using PERCENT sampling, it specifies the sampling percentage; when using ROWS sampling, it specifies the number of sampled rows.
## Return Value

| Column | Note |
| -- |--------------|
| Job_Id | Uniq Job Id |
| Catalog_Name | Catalog name |
| DB_Name | database name |
| Columns | column name list |

## Access Control Requirements

The user who executes this SQL must have at least the following permissions:

| Privilege | Object | Notes |
|:--------------| :------------- |:------------------------------------------------|
| SELECT_PRIV | Table | When executing ANALYZE, the SELECT_PRIV privilege for the queried table is required. |

## Examples

1. Collect statistics by sampling 10% of table lineitem.

```sql
ANALYZE TABLE lineitem WITH SAMPLE PERCENT 10;
```

Collect statistical data for a table with a sample of 100,000 rows:
2. Collect statistics by sampling 100,000 rows from table lineitem.

```sql
ANALYZE TABLE lineitem WITH SAMPLE ROWS 100000;
```

## Keywords

ANALYZE
127 changes: 70 additions & 57 deletions docs/sql-manual/sql-statements/statistics/SHOW-ANALYZE.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,82 +24,95 @@ specific language governing permissions and limitations
under the License.
-->



## Description

Use `SHOW ANALYZE` to view information about statistics collection jobs.
This statement is used to view the status of the statistics collection job.

Syntax:
## Syntax

```SQL
SHOW [AUTO] ANALYZE < table_name | job_id >
[ WHERE [ STATE = [ "PENDING" | "RUNNING" | "FINISHED" | "FAILED" ] ] ];
SHOW [AUTO] ANALYZE [ < table_name > | < job_id > ]
[ WHERE STATE = { "PENDING" | "RUNNING" | "FINISHED" | "FAILED" } ];
```

- AUTO: Show historical information for automatic collection jobs only. Note that, by default, the status of only the last 20,000 completed automatic collection jobs is retained.
- table_name: Table name, specify to view statistics job information for that table. It can be in the format `db_name.table_name`. When not specified, it returns information for all statistics jobs.
- job_id: Job ID for statistics collection, obtained when executing `ANALYZE`. When not specified, this command returns information for all statistics jobs.
## Required Parameters

Output:
**None**

| Column Name | Description |
| :--------------------- | :--------------- |
| `job_id` | Job ID |
| `catalog_name` | Catalog Name |
| `db_name` | Database Name |
| `tbl_name` | Table Name |
| `col_name` | Column Name List |
| `job_type` | Job Type |
| `analysis_type` | Analysis Type |
| `message` | Job Information |
| `last_exec_time_in_ms` | Last Execution Time |
| `state` | Job Status |
| `schedule_type` | Scheduling Method |
## Optional Parameters

Here's an example:
**1. `AUTO `**

```sql
mysql> show analyze 245073\G;
*************************** 1. row ***************************
job_id: 245073
catalog_name: internal
db_name: default_cluster:tpch
tbl_name: lineitem
col_name: [l_returnflag,l_receiptdate,l_tax,l_shipmode,l_suppkey,l_shipdate,l_commitdate,l_partkey,l_orderkey,l_quantity,l_linestatus,l_comment,l_extendedprice,l_linenumber,l_discount,l_shipinstruct]
job_type: MANUAL
analysis_type: FUNDAMENTALS
message:
last_exec_time_in_ms: 2023-11-07 11:00:52
state: FINISHED
progress: 16 Finished | 0 Failed | 0 In Progress | 16 Total
schedule_type: ONCE
```
> Show information about automatic jobs. If not specified, information about manual jobs will be displayed by default.
**2. `< table_name > `**

> Table name. After specifying it, you can view the job information corresponding to this table. When not specified, the job information of all tables will be returned by default.
**3. `< job_id > `**

> Statistics Job ID,Obtained when performing asynchronous collection with ANALYZE. When the ID is not specified, this command returns information about all jobs.
## Return Value

<br/>
| Column | Note |
| -- |--------------|
| job_id | Uniq statistics job id |
| catalog_name | Catalog name |
| db_name | database name |
| tbl_name | table name |
| col_name | column name list |
| job_type | job type |
| analysis_type | analysis type |
| message | error message |
| last_exec_time_in_ms | last analyze time |
| state | job state |
| progress | job progress |
| schedule_type | schedule type |
| start_time | job start time |
| end_time | job end time |
| priority | job priority |
| enable_partition | enable partition collection flag |

Each collection job can contain one or more tasks, with each task corresponding to the collection of a column. Users can use the following command to view the completion status of statistics collection for each column.
## Access Control Requirements

Syntax:
The user who executes this SQL must have at least the following permissions:

| Privilege | Object | Notes |
|:--------------| :------------- |:------------------------------------------------|
| SELECT_PRIV | Table | When executing SHOW, the SELECT_PRIV privilege for the queried table is required. |

## Examples

1. Show jobs by table name.

```sql
SHOW ANALYZE TASK STATUS [job_id]
SHOW ANALYZE test1 WHERE STATE="FINISHED";
```

Here's an example:

```text
+---------------+--------------+---------+----------+-----------------------+----------+---------------+---------+----------------------+----------+-------------------------------------------------------+---------------+---------------------+---------------------+----------+------------------+
| job_id | catalog_name | db_name | tbl_name | col_name | job_type | analysis_type | message | last_exec_time_in_ms | state | progress | schedule_type | start_time | end_time | priority | enable_partition |
+---------------+--------------+---------+----------+-----------------------+----------+---------------+---------+----------------------+----------+-------------------------------------------------------+---------------+---------------------+---------------------+----------+------------------+
| 1737454119144 | internal | test | test1 | [test1:name,test1:id] | MANUAL | FUNDAMENTALS | | 2025-01-21 18:10:11 | FINISHED | 2 Finished | 0 Failed | 0 In Progress | 2 Total | ONCE | 2025-01-21 18:10:10 | 2025-01-21 18:10:11 | MANUAL | false |
| 1738725887879 | internal | test | test1 | [test1:name,test1:id] | MANUAL | FUNDAMENTALS | | 2025-02-05 11:26:15 | FINISHED | 2 Finished | 0 Failed | 0 In Progress | 2 Total | ONCE | 2025-02-05 11:26:15 | 2025-02-05 11:26:15 | MANUAL | false |
| 1738725887890 | internal | test | test1 | [test1:name,test1:id] | MANUAL | FUNDAMENTALS | | 2025-02-05 12:17:09 | FINISHED | 2 Finished | 0 Failed | 0 In Progress | 2 Total | ONCE | 2025-02-05 12:17:08 | 2025-02-05 12:17:09 | MANUAL | false |
| 1738725887895 | internal | test | test1 | [test1:id] | MANUAL | FUNDAMENTALS | | 2025-02-05 12:17:24 | FINISHED | 1 Finished | 0 Failed | 0 In Progress | 1 Total | ONCE | 2025-02-05 12:17:23 | 2025-02-05 12:17:24 | MANUAL | false |
| 1738725887903 | internal | test | test1 | [test1:id] | MANUAL | FUNDAMENTALS | | 2025-02-05 12:17:42 | FINISHED | 1 Finished | 0 Failed | 0 In Progress | 1 Total | ONCE | 2025-02-05 12:17:41 | 2025-02-05 12:17:42 | MANUAL | false |
+---------------+--------------+---------+----------+-----------------------+----------+---------------+---------+----------------------+----------+-------------------------------------------------------+---------------+---------------------+---------------------+----------+------------------+
```
mysql> show analyze task status 20038 ;
+---------+----------+---------+----------------------+----------+
| task_id | col_name | message | last_exec_time_in_ms | state |
+---------+----------+---------+----------------------+----------+
| 20039 | col4 | | 2023-06-01 17:22:15 | FINISHED |
| 20040 | col2 | | 2023-06-01 17:22:15 | FINISHED |
| 20041 | col3 | | 2023-06-01 17:22:15 | FINISHED |
| 20042 | col1 | | 2023-06-01 17:22:15 | FINISHED |
+---------+----------+---------+----------------------+----------+

2. Show job by job id.

```sql
show analyze 1738725887903;
```

## Keywords
```text
+---------------+--------------+---------+----------+------------+----------+---------------+---------+----------------------+----------+-------------------------------------------------------+---------------+---------------------+---------------------+----------+------------------+
| job_id | catalog_name | db_name | tbl_name | col_name | job_type | analysis_type | message | last_exec_time_in_ms | state | progress | schedule_type | start_time | end_time | priority | enable_partition |
+---------------+--------------+---------+----------+------------+----------+---------------+---------+----------------------+----------+-------------------------------------------------------+---------------+---------------------+---------------------+----------+------------------+
| 1738725887903 | internal | test | test1 | [test1:id] | MANUAL | FUNDAMENTALS | | 2025-02-05 12:17:42 | FINISHED | 1 Finished | 0 Failed | 0 In Progress | 1 Total | ONCE | 2025-02-05 12:17:41 | 2025-02-05 12:17:42 | MANUAL | false |
+---------------+--------------+---------+----------+------------+----------+---------------+---------+----------------------+----------+-------------------------------------------------------+---------------+---------------------+---------------------+----------+------------------+
```

SHOW, ANALYZE
101 changes: 74 additions & 27 deletions docs/sql-manual/sql-statements/statistics/SHOW-STATS.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,43 +24,90 @@ specific language governing permissions and limitations
under the License.
-->


## Description

Use `SHOW COLUMN STATS` to view various statistics data for columns.
This statement is used to show the column statistics of a table.

Syntax:
## Syntax

```sql
SHOW COLUMN [cached] STATS table_name [ (column_name [, ...]) ];
```SQL
SHOW COLUMN [CACHED] STATS < table_name > [ (<column_name> [, <column_name>...]) ];
```

Where:
## Required Parameters

**1. `<table_name>`**

> The name of the table for which column statistics needs to be displayed.
## Optional Parameters

**1. `CACHED `**

> Show the column statistics in FE cache. When not specified, the information persisted in the statistics table is displayed by default.
**2. `<column_name>`**

> Specify the column names that need to be displayed. The column names must exist in the table, and multiple column names are separated by commas. If not specified, the information of all columns will be displayed by default.
## Return Value

- cached: Show statistics information in the current FE memory cache.
- table_name: The target table for collecting statistics. It can be in the format `db_name.table_name`.
- column_name: Specifies the target column, which must be an existing column in `table_name`. You can specify multiple column names separated by commas.
| Column | Note |
| -- |--------------|
| column_name | column name |
| index_name | index name |
| count | column row count |
| ndv | column distinct value |
| num_null | column null count |
| data_size | column total data size |
| avg_size_byte | column average size |
| min | min value |
| max | max value |
| method | collect method |
| type | collect type |
| trigger | job trigger method |
| query_times | query times |
| updated_time | update time |
| update_rows | update rows when last analyze |
| last_analyze_row_count | table row count when last analyze |
| last_analyze_version | table version when last analyze |

Here's an example:
## Access Control Requirements

The user who executes this SQL must have at least the following permissions:

| Privilege | Object | Notes |
|:--------------| :------------- |:------------------------------------------------|
| SELECT_PRIV | Table | When executing SHOW, the SELECT_PRIV privilege for the queried table is required. |

## Examples

1. Show the statistics of all columns in table test1.

```sql
mysql> show column stats lineitem(l_tax)\G;
*************************** 1. row ***************************
column_name: l_tax
count: 6001215.0
ndv: 9.0
num_null: 0.0
data_size: 4.800972E7
avg_size_byte: 8.0
min: 0.00
max: 0.08
method: FULL
type: FUNDAMENTALS
trigger: MANUAL
query_times: 0
updated_time: 2023-11-07 11:00:46
SHOW COLUMN STATS test1;
```

```text
+-------------+------------+----------+---------+----------+-----------+---------------+--------+--------+--------+--------------+---------+-------------+---------------------+-------------+------------------------+----------------------+
| column_name | index_name | count | ndv | num_null | data_size | avg_size_byte | min | max | method | type | trigger | query_times | updated_time | update_rows | last_analyze_row_count | last_analyze_version |
+-------------+------------+----------+---------+----------+-----------+---------------+--------+--------+--------+--------------+---------+-------------+---------------------+-------------+------------------------+----------------------+
| name | test1 | 87775.0 | 48824.0 | 0.0 | 351100.0 | 4.0 | '0001' | 'ffff' | FULL | FUNDAMENTALS | MANUAL | 0 | 2025-02-05 12:17:08 | 0 | 100000 | 3 |
| id | test1 | 100000.0 | 8965.0 | 0.0 | 351400.0 | 3.514 | 1000 | 9999 | SAMPLE | FUNDAMENTALS | MANUAL | 0 | 2025-02-05 12:17:41 | 0 | 100000 | 3 |
+-------------+------------+----------+---------+----------+-----------+---------------+--------+--------+--------+--------------+---------+-------------+---------------------+-------------+------------------------+----------------------+
```

## Keywords
2. Show the statistics of all columns in the test1 in the current FE cache.

SHOW, TABLE, STATS
```sql
SHOW COLUMN CACHED STATS test1;
```

```text
+-------------+------------+----------+---------+----------+-----------+---------------+--------+--------+--------+--------------+---------+-------------+---------------------+-------------+------------------------+----------------------+
| column_name | index_name | count | ndv | num_null | data_size | avg_size_byte | min | max | method | type | trigger | query_times | updated_time | update_rows | last_analyze_row_count | last_analyze_version |
+-------------+------------+----------+---------+----------+-----------+---------------+--------+--------+--------+--------------+---------+-------------+---------------------+-------------+------------------------+----------------------+
| name | test1 | 87775.0 | 48824.0 | 0.0 | 351100.0 | 4.0 | '0001' | 'ffff' | FULL | FUNDAMENTALS | MANUAL | 0 | 2025-02-05 12:17:08 | 0 | 100000 | 3 |
| id | test1 | 100000.0 | 8965.0 | 0.0 | 351400.0 | 3.514 | 1000 | 9999 | SAMPLE | FUNDAMENTALS | MANUAL | 0 | 2025-02-05 12:17:41 | 0 | 100000 | 3 |
+-------------+------------+----------+---------+----------+-----------+---------------+--------+--------+--------+--------------+---------+-------------+---------------------+-------------+------------------------+----------------------+
```
Loading

0 comments on commit 5653eb5

Please sign in to comment.