Skip to content

Commit

Permalink
Merge branch 'apache:dev' into dev-1
Browse files Browse the repository at this point in the history
  • Loading branch information
dzygoon authored Mar 6, 2024
2 parents c36ec0a + 6ec16ac commit ba5d430
Show file tree
Hide file tree
Showing 25 changed files with 1,927 additions and 39 deletions.
27 changes: 16 additions & 11 deletions docs/en/connector-v2/sink/Hive.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,17 +30,18 @@ By default, we use 2PC commit to ensure `exactly-once`

## Options

| name | type | required | default value |
|----------------------|--------|----------|----------------|
| table_name | string | yes | - |
| metastore_uri | string | yes | - |
| compress_codec | string | no | none |
| hdfs_site_path | string | no | - |
| hive_site_path | string | no | - |
| krb5_path | string | no | /etc/krb5.conf |
| kerberos_principal | string | no | - |
| kerberos_keytab_path | string | no | - |
| common-options | | no | - |
| name | type | required | default value |
|-------------------------------|---------|----------|----------------|
| table_name | string | yes | - |
| metastore_uri | string | yes | - |
| compress_codec | string | no | none |
| hdfs_site_path | string | no | - |
| hive_site_path | string | no | - |
| krb5_path | string | no | /etc/krb5.conf |
| kerberos_principal | string | no | - |
| kerberos_keytab_path | string | no | - |
| abort_drop_partition_metadata | boolean | no | true |
| common-options | | no | - |

### table_name [string]

Expand Down Expand Up @@ -70,6 +71,10 @@ The principal of kerberos

The keytab path of kerberos

### abort_drop_partition_metadata [list]

Flag to decide whether to drop partition metadata from Hive Metastore during an abort operation. Note: this only affects the metadata in the metastore, the data in the partition will always be deleted(data generated during the synchronization process).

### common options

Sink plugin common parameters, please refer to [Sink Common Options](common-options.md) for details
Expand Down
31 changes: 13 additions & 18 deletions docs/en/connector-v2/source/Hive.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,20 +33,19 @@ Read all the data in a split in a pollNext call. What splits are read will be sa

## Options

| name | type | required | default value |
|-------------------------------|---------|----------|----------------|
| table_name | string | yes | - |
| metastore_uri | string | yes | - |
| krb5_path | string | no | /etc/krb5.conf |
| kerberos_principal | string | no | - |
| kerberos_keytab_path | string | no | - |
| hdfs_site_path | string | no | - |
| hive_site_path | string | no | - |
| read_partitions | list | no | - |
| read_columns | list | no | - |
| abort_drop_partition_metadata | boolean | no | true |
| compress_codec | string | no | none |
| common-options | | no | - |
| name | type | required | default value |
|----------------------|--------|----------|----------------|
| table_name | string | yes | - |
| metastore_uri | string | yes | - |
| krb5_path | string | no | /etc/krb5.conf |
| kerberos_principal | string | no | - |
| kerberos_keytab_path | string | no | - |
| hdfs_site_path | string | no | - |
| hive_site_path | string | no | - |
| read_partitions | list | no | - |
| read_columns | list | no | - |
| compress_codec | string | no | none |
| common-options | | no | - |

### table_name [string]

Expand Down Expand Up @@ -87,10 +86,6 @@ The keytab file path of kerberos authentication

The read column list of the data source, user can use it to implement field projection.

### abort_drop_partition_metadata [list]

Flag to decide whether to drop partition metadata from Hive Metastore during an abort operation. Note: this only affects the metadata in the metastore, the data in the partition will always be deleted(data generated during the synchronization process).

### compress_codec [string]

The compress codec of files and the details that supported as the following shown:
Expand Down
2 changes: 1 addition & 1 deletion docs/en/seatunnel-engine/checkpoint-storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@ seatunnel:
fs.defaultFS: hdfs://localhost:9000
// if you used kerberos, you can config like this:
kerberosPrincipal: your-kerberos-principal
kerberosKeytab: your-kerberos-keytab
kerberosKeytabFilePath: your-kerberos-keytab
```

if HDFS is in HA mode , you can config like this:
Expand Down
2 changes: 1 addition & 1 deletion docs/en/start-v2/locally/quick-start-flink.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ sink {

More information about config please check [config concept](../../concept/config.md)

## Step 3: Run SeaTunnel Application
## Step 4: Run SeaTunnel Application

You could start the application by the following commands

Expand Down
2 changes: 1 addition & 1 deletion docs/en/start-v2/locally/quick-start-spark.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ sink {

More information about config please check [config concept](../../concept/config.md)

## Step 3: Run SeaTunnel Application
## Step 4: Run SeaTunnel Application

You could start the application by the following commands

Expand Down
2 changes: 1 addition & 1 deletion docs/zh/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -293,7 +293,7 @@ log4j.appender.console.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{

参考:

https://stackoverflow.com/questions/27781187how-to-stop-info-messages-displaying-on-spark-console
https://stackoverflow.com/questions/27781187/how-to-stop-info-messages-displaying-on-spark-console

http://spark.apache.org/docs/latest/configuration.html#configuring-logging

Expand Down
23 changes: 23 additions & 0 deletions docs/zh/transform-v2/common-options.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# 转换常见选项

> 源端连接器的常见参数
| 参数名称 | 参数类型 | 是否必须 | 默认值 |
|-------------------|--------|------|-----|
| result_table_name | string | no | - |
| source_table_name | string | no | - |

### source_table_name [string]

当未指定 `source_table_name` 时,当前插件在配置文件中处理由前一个插件输出的数据集 `(dataset)`

当指定了 `source_table_name` 时,当前插件正在处理与该参数对应的数据集

### result_table_name [string]

当未指定 `result_table_name` 时,此插件处理的数据不会被注册为其他插件可以直接访问的数据集,也不会被称为临时表 `(table)`

当指定了 `result_table_name` 时,此插件处理的数据将被注册为其他插件可以直接访问的数据集 `(dataset)`,或者被称为临时表 `(table)`。在这里注册的数据集可以通过指定 `source_table_name` 被其他插件直接访问。

## 示例

65 changes: 65 additions & 0 deletions docs/zh/transform-v2/copy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# 复制

> 复制转换插件
## 描述

将字段复制到一个新字段。

## 属性

| 名称 | 类型 | 是否必须 | 默认值 |
|--------|--------|------|-----|
| fields | Object | yes | |

### fields [config]

指定输入和输出之间的字段复制关系

### 常见选项 [string]

转换插件的常见参数, 请参考 [Transform Plugin](common-options.md) 了解详情。

## 示例

从源读取的数据是这样的一个表:

| name | age | card |
|----------|-----|------|
| Joy Ding | 20 | 123 |
| May Ding | 20 | 123 |
| Kin Dom | 20 | 123 |
| Joy Dom | 20 | 123 |

想要将字段 `name``age` 复制到新的字段 `name1``name2``age1`,我们可以像这样添加 `Copy` 转换:

```
transform {
Copy {
source_table_name = "fake"
result_table_name = "fake1"
fields {
name1 = name
name2 = name
age1 = age
}
}
}
```

那么结果表 `fake1` 中的数据将会像这样:

| name | age | card | name1 | name2 | age1 |
|----------|-----|------|----------|----------|------|
| Joy Ding | 20 | 123 | Joy Ding | Joy Ding | 20 |
| May Ding | 20 | 123 | May Ding | May Ding | 20 |
| Kin Dom | 20 | 123 | Kin Dom | Kin Dom | 20 |
| Joy Dom | 20 | 123 | Joy Dom | Joy Dom | 20 |

## 更新日志

### 新版本

- 添加复制转换连接器
- 支持将字段复制到新字段

64 changes: 64 additions & 0 deletions docs/zh/transform-v2/field-mapper.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# 字段映射

> 字段映射转换插件
## 描述

添加输入模式和输出模式映射

## 属性

| 名称 | 类型 | 是否必须 | 默认值 |
|--------------|--------|------|-----|
| field_mapper | Object | yes | |

### field_mapper [config]

指定输入和输出之间的字段映射关系

### common options [config]

转换插件的常见参数, 请参考 [Transform Plugin](common-options.md) 了解详情

## 示例

源端数据读取的表格如下:

| id | name | age | card |
|----|----------|-----|------|
| 1 | Joy Ding | 20 | 123 |
| 2 | May Ding | 20 | 123 |
| 3 | Kin Dom | 20 | 123 |
| 4 | Joy Dom | 20 | 123 |

我们想要删除 `age` 字段,并更新字段顺序为 `id``card``name`,同时将 `name` 重命名为 `new_name`。我们可以像这样添加 `FieldMapper` 转换:

```
transform {
FieldMapper {
source_table_name = "fake"
result_table_name = "fake1"
field_mapper = {
id = id
card = card
name = new_name
}
}
}
```

那么结果表 `fake1` 中的数据将会像这样:

| id | card | new_name |
|----|------|----------|
| 1 | 123 | Joy Ding |
| 2 | 123 | May Ding |
| 3 | 123 | Kin Dom |
| 4 | 123 | Joy Dom |

## 更新日志

### 新版本

- 添加复制转换连接器

68 changes: 68 additions & 0 deletions docs/zh/transform-v2/filter-rowkind.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# 行类型过滤

> 行类型转换插件
## 描述

按行类型过滤数据

## 操作

| 名称 | 类型 | 是否必须 | 默认值 |
|---------------|-------|------|-----|
| include_kinds | array | yes | |
| exclude_kinds | array | yes | |

### include_kinds [array]

要包含的行类型

### exclude_kinds [array]

要排除的行类型。

您只能配置 `include_kinds``exclude_kinds` 中的一个。

### common options [string]

转换插件的常见参数, 请参考 [Transform Plugin](common-options.md) 了解详情

## 示例

FakeSource 生成的数据的行类型是 `INSERT`。如果我们使用 `FilterRowKink` 转换并排除 `INSERT` 数据,我们将不会向接收器写入任何行。

```yaml

env {
job.mode = "BATCH"
}

source {
FakeSource {
result_table_name = "fake"
row.num = 100
schema = {
fields {
id = "int"
name = "string"
age = "int"
}
}
}
}

transform {
FilterRowKind {
source_table_name = "fake"
result_table_name = "fake1"
exclude_kinds = ["INSERT"]
}
}

sink {
Console {
source_table_name = "fake1"
}
}
```

Loading

0 comments on commit ba5d430

Please sign in to comment.