Merge branch 'apache:dev' into dev-1

dzygoon · Mar 6, 2024 · ba5d430 · ba5d430
2 parents c36ec0a + 6ec16ac
commit ba5d430
Show file tree

Hide file tree

Showing 25 changed files with 1,927 additions and 39 deletions.
diff --git a/docs/en/connector-v2/sink/Hive.md b/docs/en/connector-v2/sink/Hive.md
@@ -30,17 +30,18 @@ By default, we use 2PC commit to ensure `exactly-once`
 
 ## Options
 
-|         name         |  type  | required | default value  |
-|----------------------|--------|----------|----------------|
-| table_name           | string | yes      | -              |
-| metastore_uri        | string | yes      | -              |
-| compress_codec       | string | no       | none           |
-| hdfs_site_path       | string | no       | -              |
-| hive_site_path       | string | no       | -              |
-| krb5_path            | string | no       | /etc/krb5.conf |
-| kerberos_principal   | string | no       | -              |
-| kerberos_keytab_path | string | no       | -              |
-| common-options       |        | no       | -              |
+|             name              |  type   | required | default value  |
+|-------------------------------|---------|----------|----------------|
+| table_name                    | string  | yes      | -              |
+| metastore_uri                 | string  | yes      | -              |
+| compress_codec                | string  | no       | none           |
+| hdfs_site_path                | string  | no       | -              |
+| hive_site_path                | string  | no       | -              |
+| krb5_path                     | string  | no       | /etc/krb5.conf |
+| kerberos_principal            | string  | no       | -              |
+| kerberos_keytab_path          | string  | no       | -              |
+| abort_drop_partition_metadata | boolean | no       | true           |
+| common-options                |         | no       | -              |
 
 ### table_name [string]
 
@@ -70,6 +71,10 @@ The principal of kerberos
 
 The keytab path of kerberos
 
+### abort_drop_partition_metadata [list]
+
+Flag to decide whether to drop partition metadata from Hive Metastore during an abort operation. Note: this only affects the metadata in the metastore, the data in the partition will always be deleted(data generated during the synchronization process).
+
 ### common options
 
 Sink plugin common parameters, please refer to [Sink Common Options](common-options.md) for details

diff --git a/docs/en/connector-v2/source/Hive.md b/docs/en/connector-v2/source/Hive.md
@@ -33,20 +33,19 @@ Read all the data in a split in a pollNext call. What splits are read will be sa
 
 ## Options
 
-|             name              |  type   | required | default value  |
-|-------------------------------|---------|----------|----------------|
-| table_name                    | string  | yes      | -              |
-| metastore_uri                 | string  | yes      | -              |
-| krb5_path                     | string  | no       | /etc/krb5.conf |
-| kerberos_principal            | string  | no       | -              |
-| kerberos_keytab_path          | string  | no       | -              |
-| hdfs_site_path                | string  | no       | -              |
-| hive_site_path                | string  | no       | -              |
-| read_partitions               | list    | no       | -              |
-| read_columns                  | list    | no       | -              |
-| abort_drop_partition_metadata | boolean | no       | true           |
-| compress_codec                | string  | no       | none           |
-| common-options                |         | no       | -              |
+|         name         |  type  | required | default value  |
+|----------------------|--------|----------|----------------|
+| table_name           | string | yes      | -              |
+| metastore_uri        | string | yes      | -              |
+| krb5_path            | string | no       | /etc/krb5.conf |
+| kerberos_principal   | string | no       | -              |
+| kerberos_keytab_path | string | no       | -              |
+| hdfs_site_path       | string | no       | -              |
+| hive_site_path       | string | no       | -              |
+| read_partitions      | list   | no       | -              |
+| read_columns         | list   | no       | -              |
+| compress_codec       | string | no       | none           |
+| common-options       |        | no       | -              |
 
 ### table_name [string]
 
@@ -87,10 +86,6 @@ The keytab file path of kerberos authentication
 
 The read column list of the data source, user can use it to implement field projection.
 
-### abort_drop_partition_metadata [list]
-
-Flag to decide whether to drop partition metadata from Hive Metastore during an abort operation. Note: this only affects the metadata in the metastore, the data in the partition will always be deleted(data generated during the synchronization process).
-
 ### compress_codec [string]
 
 The compress codec of files and the details that supported as the following shown:

diff --git a/docs/en/seatunnel-engine/checkpoint-storage.md b/docs/en/seatunnel-engine/checkpoint-storage.md
@@ -143,7 +143,7 @@ seatunnel:
           fs.defaultFS: hdfs://localhost:9000
           // if you used kerberos, you can config like this:
           kerberosPrincipal: your-kerberos-principal
-          kerberosKeytab: your-kerberos-keytab  
+          kerberosKeytabFilePath: your-kerberos-keytab
 ```
 
 if HDFS is in HA mode , you can config like this:

diff --git a/docs/en/start-v2/locally/quick-start-flink.md b/docs/en/start-v2/locally/quick-start-flink.md
@@ -61,7 +61,7 @@ sink {
 
 More information about config please check [config concept](../../concept/config.md)
 
-## Step 3: Run SeaTunnel Application
+## Step 4: Run SeaTunnel Application
 
 You could start the application by the following commands
 

diff --git a/docs/en/start-v2/locally/quick-start-spark.md b/docs/en/start-v2/locally/quick-start-spark.md
@@ -62,7 +62,7 @@ sink {
 
 More information about config please check [config concept](../../concept/config.md)
 
-## Step 3: Run SeaTunnel Application
+## Step 4: Run SeaTunnel Application
 
 You could start the application by the following commands
 

diff --git a/docs/zh/faq.md b/docs/zh/faq.md
@@ -293,7 +293,7 @@ log4j.appender.console.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{
 
 参考：
 
-https://stackoverflow.com/questions/27781187how-to-stop-info-messages-displaying-on-spark-console
+https://stackoverflow.com/questions/27781187/how-to-stop-info-messages-displaying-on-spark-console
 
 http://spark.apache.org/docs/latest/configuration.html#configuring-logging
 

diff --git a/docs/zh/transform-v2/common-options.md b/docs/zh/transform-v2/common-options.md
@@ -0,0 +1,23 @@
+# 转换常见选项
+
+> 源端连接器的常见参数
+
+|       参数名称        |  参数类型  | 是否必须 | 默认值 |
+|-------------------|--------|------|-----|
+| result_table_name | string | no   | -   |
+| source_table_name | string | no   | -   |
+
+### source_table_name [string]
+
+当未指定 `source_table_name` 时，当前插件在配置文件中处理由前一个插件输出的数据集 `(dataset)` ；
+
+当指定了 `source_table_name` 时，当前插件正在处理与该参数对应的数据集
+
+### result_table_name [string]
+
+当未指定 `result_table_name` 时，此插件处理的数据不会被注册为其他插件可以直接访问的数据集，也不会被称为临时表 `(table)`；
+
+当指定了 `result_table_name` 时，此插件处理的数据将被注册为其他插件可以直接访问的数据集 `(dataset)`，或者被称为临时表 `(table)`。在这里注册的数据集可以通过指定 `source_table_name` 被其他插件直接访问。
+
+## 示例
+
diff --git a/docs/zh/transform-v2/copy.md b/docs/zh/transform-v2/copy.md
@@ -0,0 +1,65 @@
+# 复制
+
+> 复制转换插件
+
+## 描述
+
+将字段复制到一个新字段。
+
+## 属性
+
+|   名称   |   类型   | 是否必须 | 默认值 |
+|--------|--------|------|-----|
+| fields | Object | yes  |     |
+
+### fields [config]
+
+指定输入和输出之间的字段复制关系
+
+### 常见选项 [string]
+
+转换插件的常见参数, 请参考  [Transform Plugin](common-options.md) 了解详情。
+
+## 示例
+
+从源读取的数据是这样的一个表:
+
+|   name   | age | card |
+|----------|-----|------|
+| Joy Ding | 20  | 123  |
+| May Ding | 20  | 123  |
+| Kin Dom  | 20  | 123  |
+| Joy Dom  | 20  | 123  |
+
+想要将字段 `name`、`age` 复制到新的字段 `name1`、`name2`、`age1`，我们可以像这样添加 `Copy` 转换：
+
+```
+transform {
+  Copy {
+    source_table_name = "fake"
+    result_table_name = "fake1"
+    fields {
+      name1 = name
+      name2 = name
+      age1 = age
+    }
+  }
+}
+```
+
+那么结果表 `fake1` 中的数据将会像这样：
+
+|   name   | age | card |  name1   |  name2   | age1 |
+|----------|-----|------|----------|----------|------|
+| Joy Ding | 20  | 123  | Joy Ding | Joy Ding | 20   |
+| May Ding | 20  | 123  | May Ding | May Ding | 20   |
+| Kin Dom  | 20  | 123  | Kin Dom  | Kin Dom  | 20   |
+| Joy Dom  | 20  | 123  | Joy Dom  | Joy Dom  | 20   |
+
+## 更新日志
+
+### 新版本
+
+- 添加复制转换连接器
+- 支持将字段复制到新字段
+
diff --git a/docs/zh/transform-v2/field-mapper.md b/docs/zh/transform-v2/field-mapper.md
@@ -0,0 +1,64 @@
+# 字段映射
+
+> 字段映射转换插件
+
+## 描述
+
+添加输入模式和输出模式映射
+
+## 属性
+
+|      名称      |   类型   | 是否必须 | 默认值 |
+|--------------|--------|------|-----|
+| field_mapper | Object | yes  |     |
+
+### field_mapper [config]
+
+指定输入和输出之间的字段映射关系
+
+### common options [config]
+
+转换插件的常见参数, 请参考  [Transform Plugin](common-options.md) 了解详情
+
+## 示例
+
+源端数据读取的表格如下：
+
+| id |   name   | age | card |
+|----|----------|-----|------|
+| 1  | Joy Ding | 20  | 123  |
+| 2  | May Ding | 20  | 123  |
+| 3  | Kin Dom  | 20  | 123  |
+| 4  | Joy Dom  | 20  | 123  |
+
+我们想要删除 `age` 字段，并更新字段顺序为 `id`、`card`、`name`，同时将 `name` 重命名为 `new_name`。我们可以像这样添加 `FieldMapper` 转换：
+
+```
+transform {
+  FieldMapper {
+    source_table_name = "fake"
+    result_table_name = "fake1"
+    field_mapper = {
+        id = id
+        card = card
+        name = new_name
+    }
+  }
+}
+```
+
+那么结果表 `fake1` 中的数据将会像这样：
+
+| id | card | new_name |
+|----|------|----------|
+| 1  | 123  | Joy Ding |
+| 2  | 123  | May Ding |
+| 3  | 123  | Kin Dom  |
+| 4  | 123  | Joy Dom  |
+
+## 更新日志
+
+### 新版本
+
+- 添加复制转换连接器
+
diff --git a/docs/zh/transform-v2/filter-rowkind.md b/docs/zh/transform-v2/filter-rowkind.md
@@ -0,0 +1,68 @@
+# 行类型过滤
+
+> 行类型转换插件
+
+## 描述
+
+按行类型过滤数据
+
+## 操作
+
+|      名称       |  类型   | 是否必须 | 默认值 |
+|---------------|-------|------|-----|
+| include_kinds | array | yes  |     |
+| exclude_kinds | array | yes  |     |
+
+### include_kinds [array]
+
+要包含的行类型
+
+### exclude_kinds [array]
+
+要排除的行类型。
+
+您只能配置 `include_kinds` 和 `exclude_kinds` 中的一个。
+
+### common options [string]
+
+转换插件的常见参数, 请参考  [Transform Plugin](common-options.md) 了解详情
+
+## 示例
+
+FakeSource 生成的数据的行类型是 `INSERT`。如果我们使用 `FilterRowKink` 转换并排除 `INSERT` 数据，我们将不会向接收器写入任何行。
+
+```yaml
+
+env {
+  job.mode = "BATCH"
+}
+
+source {
+  FakeSource {
+    result_table_name = "fake"
+    row.num = 100
+    schema = {
+      fields {
+        id = "int"
+        name = "string"
+        age = "int"
+      }
+    }
+  }
+}
+
+transform {
+  FilterRowKind {
+    source_table_name = "fake"
+    result_table_name = "fake1"
+    exclude_kinds = ["INSERT"]
+  }
+}
+
+sink {
+  Console {
+    source_table_name = "fake1"
+  }
+}
+```
+