Skip to content

Commit

Permalink
Merge branch 'apache:dev' into dev
Browse files Browse the repository at this point in the history
  • Loading branch information
ZhangWeike2000 authored Sep 10, 2024
2 parents d4fb854 + caea8a1 commit c03325c
Show file tree
Hide file tree
Showing 103 changed files with 6,361 additions and 737 deletions.
3 changes: 3 additions & 0 deletions config/seatunnel.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,6 @@ seatunnel:
namespace: /tmp/seatunnel/checkpoint_snapshot
storage.type: hdfs
fs.defaultFS: file:///tmp/ # Ensure that the directory has written permission
telemetry:
metric:
enabled: false
5 changes: 4 additions & 1 deletion docs/en/concept/sql-config.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,10 @@ CREATE TABLE sink_table WITH (
INSERT INTO sink_table SELECT id, name, age, email FROM source_table;
```

* The `SELECT FROM` part is the table name of the source-mapped table.
* The `SELECT FROM` part is the table name of the source-mapped table. If the select field has keyword([refrence](https://github.com/JSQLParser/JSqlParser/blob/master/src/main/jjtree/net/sf/jsqlparser/parser/JSqlParserCC.jjt)),you should use it like \`filedName\`.
```sql
INSERT INTO sink_table SELECT id, name, age, email,`output` FROM source_table;
```
* The `INSERT INTO` part is the table name of the target-mapped table.
* Note: This syntax does **not support** specifying fields in `INSERT`, like this: `INSERT INTO sink_table (id, name, age, email) SELECT id, name, age, email FROM source_table;`

Expand Down
2 changes: 1 addition & 1 deletion docs/en/connector-v2/source/PostgreSQL-CDC.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@ sink {
generate_sink_sql = true
# You need to configure both database and table
database = postgres_cdc
chema = "inventory"
schema = "inventory"
tablePrefix = "sink_"
primary_keys = ["id"]
}
Expand Down
4 changes: 4 additions & 0 deletions docs/en/seatunnel-engine/resource-isolation.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,3 +82,7 @@ sink {

![img.png](../../images/resource-isolation.png)

3. update running node tags by rest api (optional)

for more information, please refer to [Update the tags of running node](https://seatunnel.apache.org/docs/seatunnel-engine/rest-api/)

62 changes: 62 additions & 0 deletions docs/en/seatunnel-engine/rest-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -637,3 +637,65 @@ For more information about customize encryption, please refer to the documentati

</details>


------------------------------------------------------------------------------------------

### Update the tags of running node

<details><summary><code>POST</code><code><b>/hazelcast/rest/maps/update-tags</b></code><code>Because the update can only target a specific node, the current node's `ip:port` needs to be used for the update</code><code>(If the update is successful, return a success message)</code></summary>


#### update node tags
##### Body
If the request parameter is a `Map` object, it indicates that the tags of the current node need to be updated
```json
{
"tag1": "dev_1",
"tag2": "dev_2"
}
```
##### Responses

```json
{
"status": "success",
"message": "update node tags done."
}
```
#### remove node tags
##### Body
If the parameter is an empty `Map` object, it means that the tags of the current node need to be cleared
```json
{}
```
##### Responses

```json
{
"status": "success",
"message": "update node tags done."
}
```

#### Request parameter exception
- If the parameter body is empty

##### Responses

```json
{
"status": "fail",
"message": "Request body is empty."
}
```
- If the parameter is not a `Map` object
##### Responses

```json
{
"status": "fail",
"message": "Invalid JSON format in request body."
}
```
</details>

152 changes: 152 additions & 0 deletions docs/en/seatunnel-engine/telemetry.md

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions docs/en/seatunnel-engine/telemetry/grafana-dashboard.json

Large diffs are not rendered by default.

296 changes: 296 additions & 0 deletions docs/en/seatunnel-engine/telemetry/metrics.txt

Large diffs are not rendered by default.

295 changes: 295 additions & 0 deletions docs/en/seatunnel-engine/telemetry/openmetrics.txt

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/en/transform-v2/embedding.md
Original file line number Diff line number Diff line change
Expand Up @@ -320,7 +320,7 @@ transform {
inputx = ["${input}"]
}
}
result_table_name = "embedding_output_3"
result_table_name = "embedding_output_1"
}
}
Expand Down
156 changes: 147 additions & 9 deletions docs/en/transform-v2/llm.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,19 +10,23 @@ more.

## Options

| name | type | required | default value |
|------------------|--------|----------|---------------|
| model_provider | enum | yes | |
| output_data_type | enum | no | String |
| prompt | string | yes | |
| model | string | yes | |
| api_key | string | yes | |
| api_path | string | no | |
| name | type | required | default value |
|------------------------|--------|----------|---------------|
| model_provider | enum | yes | |
| output_data_type | enum | no | String |
| prompt | string | yes | |
| model | string | yes | |
| api_key | string | yes | |
| api_path | string | no | |
| custom_config | map | no | |
| custom_response_parse | string | no | |
| custom_request_headers | map | no | |
| custom_request_body | map | no | |

### model_provider

The model provider to use. The available options are:
OPENAI
OPENAI、DOUBAO、CUSTOM

### output_data_type

Expand Down Expand Up @@ -74,6 +78,61 @@ If you use OpenAI model, please refer https://platform.openai.com/docs/api-refer
The API path to use for the model provider. In most cases, you do not need to change this configuration. If you
are using an API agent's service, you may need to configure it to the agent's API address.

### custom_config

The `custom_config` option allows you to provide additional custom configurations for the model. This is a map where you
can define various settings that might be required by the specific model you're using.

### custom_response_parse

The `custom_response_parse` option allows you to specify how to parse the model's response. You can use JsonPath to
extract the specific data you need from the response. For example, by using `$.choices[*].message.content`, you can
extract the `content` field values from the following JSON. For more details on using JsonPath, please refer to
the [JsonPath Getting Started guide](https://github.com/json-path/JsonPath?tab=readme-ov-file#getting-started).

```json
{
"id": "chatcmpl-9s4hoBNGV0d9Mudkhvgzg64DAWPnx",
"object": "chat.completion",
"created": 1722674828,
"model": "gpt-4o-mini",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "[\"Chinese\"]"
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 107,
"completion_tokens": 3,
"total_tokens": 110
},
"system_fingerprint": "fp_0f03d4f0ee",
"code": 0,
"msg": "ok"
}
```

### custom_request_headers

The `custom_request_headers` option allows you to define custom headers that should be included in the request sent to
the model's API. This is useful if the API requires additional headers beyond the standard ones, such as authorization
tokens, content types, etc.

### custom_request_body

The `custom_request_body` option supports placeholders:

- `${model}`: Placeholder for the model name.
- `${input}`: Placeholder to determine input value and define request body request type based on the type of body
value. Example: `"${input}"` -> "input"
- `${prompt}`:Placeholder for LLM model prompts.

### common options [string]

Transform plugin common parameters, please refer to [Transform Plugin](common-options.md) for details
Expand Down Expand Up @@ -122,3 +181,82 @@ sink {
}
```

### Customize the LLM model

```hocon
env {
job.mode = "BATCH"
}
source {
FakeSource {
row.num = 5
schema = {
fields {
id = "int"
name = "string"
}
}
rows = [
{fields = [1, "Jia Fan"], kind = INSERT}
{fields = [2, "Hailin Wang"], kind = INSERT}
{fields = [3, "Tomas"], kind = INSERT}
{fields = [4, "Eric"], kind = INSERT}
{fields = [5, "Guangdong Liu"], kind = INSERT}
]
result_table_name = "fake"
}
}
transform {
LLM {
source_table_name = "fake"
model_provider = CUSTOM
model = gpt-4o-mini
api_key = sk-xxx
prompt = "Determine whether someone is Chinese or American by their name"
openai.api_path = "http://mockserver:1080/v1/chat/completions"
custom_config={
custom_response_parse = "$.choices[*].message.content"
custom_request_headers = {
Content-Type = "application/json"
Authorization = "Bearer xxxxxxxx"
}
custom_request_body ={
model = "${model}"
messages = [
{
role = "system"
content = "${prompt}"
},
{
role = "user"
content = "${input}"
}]
}
}
result_table_name = "llm_output"
}
}
sink {
Assert {
source_table_name = "llm_output"
rules =
{
field_rules = [
{
field_name = llm_output
field_type = string
field_value = [
{
rule_type = NOT_NULL
}
]
}
]
}
}
}
```

Binary file added docs/images/grafana.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 2 additions & 1 deletion docs/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -202,7 +202,8 @@ const sidebars = {
"seatunnel-engine/tcp",
"seatunnel-engine/resource-isolation",
"seatunnel-engine/rest-api",
"seatunnel-engine/user-command"
"seatunnel-engine/user-command",
"seatunnel-engine/telemetry"
]
},
{
Expand Down
5 changes: 4 additions & 1 deletion docs/zh/concept/sql-config.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,10 @@ CREATE TABLE sink_table WITH (
INSERT INTO sink_table SELECT id, name, age, email FROM source_table;
```

* `SELECT FROM` 部分为源端映射表的表名,`SELECT` 部分的语法参考:[SQL-transform](../transform-v2/sql.md) `query` 配置项
* `SELECT FROM` 部分为源端映射表的表名,`SELECT` 部分的语法参考:[SQL-transform](../transform-v2/sql.md) `query` 配置项。如果select的字段是关键字([参考](https://github.com/JSQLParser/JSqlParser/blob/master/src/main/jjtree/net/sf/jsqlparser/parser/JSqlParserCC.jjt)),你应该像这样使用\`filedName\`
```sql
INSERT INTO sink_table SELECT id, name, age, email,`output` FROM source_table;
```
* `INSERT INTO` 部分为目标端映射表的表名
* 注意:该语法**不支持**`INSERT` 中指定字段,如:`INSERT INTO sink_table (id, name, age, email) SELECT id, name, age, email FROM source_table;`

Expand Down
4 changes: 4 additions & 0 deletions docs/zh/seatunnel-engine/resource-isolation.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,3 +82,7 @@ sink {

![img.png](../../images/resource-isolation.png)

3. 更新运行中node的tags (可选)

获取具体的使用信息,请参考 [更新运行节点的tags](https://seatunnel.apache.org/zh-CN/docs/seatunnel-engine/rest-api/)

61 changes: 61 additions & 0 deletions docs/zh/seatunnel-engine/rest-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -642,3 +642,64 @@ network:

</details>

------------------------------------------------------------------------------------------

### 更新运行节点的tags

<details>
<summary><code>POST</code><code><b>/hazelcast/rest/maps/update-tags</b></code><code>因为更新只能针对于某个节点,因此需要用当前节点ip:port用于更新</code><code>(如果更新成功,则返回"success"信息)</code></summary>


#### 更新节点tags
##### 请求体
如果请求参数是`Map`对象,表示要更新当前节点的tags
```json
{
"tag1": "dev_1",
"tag2": "dev_2"
}
```
##### 响应

```json
{
"status": "success",
"message": "update node tags done."
}
```
#### 移除节点tags
##### 请求体
如果参数为空`Map`对象,表示要清除当前节点的tags
```json
{}
```
##### 响应
响应体将为:
```json
{
"status": "success",
"message": "update node tags done."
}
```

#### 请求参数异常
- 如果请求参数为空

##### 响应

```json
{
"status": "fail",
"message": "Request body is empty."
}
```
- 如果参数不是`Map`对象
##### 响应

```json
{
"status": "fail",
"message": "Invalid JSON format in request body."
}
```
</details>
Loading

0 comments on commit c03325c

Please sign in to comment.