Skip to content

Commit

Permalink
[Docs] Update engine related docs info (apache#7228)
Browse files Browse the repository at this point in the history
  • Loading branch information
tcodehuber authored Jul 17, 2024
1 parent 0e61faf commit f40f11a
Show file tree
Hide file tree
Showing 22 changed files with 162 additions and 162 deletions.
14 changes: 7 additions & 7 deletions docs/en/other-engine/flink.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Seatunnel runs on Flink
# Seatunnel Runs On Flink

Flink is a powerful high-performance distributed stream processing engine,More information about it you can,You can search for `Apache Flink`
Flink is a powerful high-performance distributed stream processing engine. More information about it you can search for `Apache Flink`

### Set Flink configuration information in the job
### Set Flink Configuration Information In The Job

Begin with `flink.`

Expand All @@ -19,9 +19,9 @@ env {
Enumeration types are not currently supported, you need to specify them in the Flink conf file ,Only these types of Settings are supported for the time being:<br/>
Integer/Boolean/String/Duration

### How to set up a simple Flink job
### How To Set Up A Simple Flink Job

This is a simple job that runs on Flink Randomly generated data is printed to the console
This is a simple job that runs on Flink. Randomly generated data is printed to the console

```
env {
Expand Down Expand Up @@ -79,6 +79,6 @@ sink{
}
```

### How to run a job in a project
### How To Run A Job In A Project

After you pull the code to the local, go to the `seatunnel-examples/seatunnel-flink-connector-v2-example` module find `org.apache.seatunnel.example.flink.v2.SeaTunnelApiExample` To complete the operation of the job
After you pull the code to the local, go to the `seatunnel-examples/seatunnel-flink-connector-v2-example` module and find `org.apache.seatunnel.example.flink.v2.SeaTunnelApiExample` to complete the operation of the job.
12 changes: 6 additions & 6 deletions docs/en/seatunnel-engine/about.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,21 +18,21 @@ In the future, SeaTunnel Engine will further optimize its functions to support f

### Cluster Management

- Support stand-alone operation;
- Support standalone operation;
- Support cluster operation;
- Support autonomous cluster (decentralized), which saves the users from specifying a master node for the SeaTunnel Engine cluster, because it can select a master node by itself during operation, and a new master node will be chosen automatically when the master node fails.
- Autonomous Cluster nodes-discovery and nodes with the same cluster_name will automatically form a cluster.

### Core functions

- Supports running jobs in local mode, and the cluster is automatically destroyed after the job once completed;
- Supports running jobs in Cluster mode (single machine or cluster), submitting jobs to the SeaTunnel Engine service through the SeaTunnel Client, and the service continues to run after the job is completed and waits for the next job submission;
- Support running jobs in local mode, and the cluster is automatically destroyed after the job once completed;
- Support running jobs in cluster mode (single machine or cluster), submitting jobs to the SeaTunnel Engine service through the SeaTunnel client, and the service continues to run after the job is completed and waits for the next job submission;
- Support offline batch synchronization;
- Support real-time synchronization;
- Batch-stream integration, all SeaTunnel V2 connectors can run in SeaTunnel Engine;
- Supports distributed snapshot algorithm, and supports two-stage submission with SeaTunnel V2 connector, ensuring that data is executed only once.
- Support job invocation at the Pipeline level to ensure that it can be started even when resources are limited;
- Supports fault tolerance for jobs at the Pipeline level. Task failure only affects the Pipeline where it is located, and only the task under the Pipeline needs to be rolled back;
- Support distributed snapshot algorithm, and supports two-stage submission with SeaTunnel V2 connector, ensuring that data is executed only once.
- Support job invocation at the pipeline level to ensure that it can be started even when resources are limited;
- Support fault tolerance for jobs at the Pipeline level. Task failure only affects the pipeline where it is located, and only the task under the Pipeline needs to be rolled back;
- Support dynamic thread sharing to synchronize a large number of small data sets in real-time.

### Quick Start
Expand Down
28 changes: 14 additions & 14 deletions docs/en/seatunnel-engine/checkpoint-storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,11 @@ SeaTunnel Engine supports the following checkpoint storage types:
- HDFS (OSS,S3,HDFS,LocalFile)
- LocalFile (native), (it's deprecated: use Hdfs(LocalFile) instead.

We used the microkernel design pattern to separate the checkpoint storage module from the engine. This allows users to implement their own checkpoint storage modules.
We use the microkernel design pattern to separate the checkpoint storage module from the engine. This allows users to implement their own checkpoint storage modules.

`checkpoint-storage-api` is the checkpoint storage module API, which defines the interface of the checkpoint storage module.

if you want to implement your own checkpoint storage module, you need to implement the `CheckpointStorage` and provide the corresponding `CheckpointStorageFactory` implementation.
If you want to implement your own checkpoint storage module, you need to implement the `CheckpointStorage` and provide the corresponding `CheckpointStorageFactory` implementation.

### Checkpoint Storage Configuration

Expand All @@ -46,12 +46,12 @@ Notice: namespace must end with "/".

#### OSS

Aliyun oss base on hdfs-file, so you can refer [hadoop oss docs](https://hadoop.apache.org/docs/stable/hadoop-aliyun/tools/hadoop-aliyun/index.html) to config oss.
Aliyun OSS based hdfs-file you can refer [Hadoop OSS Docs](https://hadoop.apache.org/docs/stable/hadoop-aliyun/tools/hadoop-aliyun/index.html) to config oss.

Except when interacting with oss buckets, the oss client needs the credentials needed to interact with buckets.
The client supports multiple authentication mechanisms and can be configured as to which mechanisms to use, and their order of use. Custom implementations of org.apache.hadoop.fs.aliyun.oss.AliyunCredentialsProvider may also be used.
if you used AliyunCredentialsProvider (can be obtained from the Aliyun Access Key Management), these consist of an access key, a secret key.
you can config like this:
If you used AliyunCredentialsProvider (can be obtained from the Aliyun Access Key Management), these consist of an access key, a secret key.
You can config like this:

```yaml
seatunnel:
Expand All @@ -71,18 +71,18 @@ seatunnel:
fs.oss.credentials.provider: org.apache.hadoop.fs.aliyun.oss.AliyunCredentialsProvider
```

For additional reading on the Hadoop Credential Provider API see: [Credential Provider API](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html).
For additional reading on the Hadoop Credential Provider API, you can see: [Credential Provider API](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html).

Aliyun oss Credential Provider implements see: [Auth Credential Providers](https://github.com/aliyun/aliyun-oss-java-sdk/tree/master/src/main/java/com/aliyun/oss/common/auth)
For Aliyun OSS Credential Provider implements, you can see: [Auth Credential Providers](https://github.com/aliyun/aliyun-oss-java-sdk/tree/master/src/main/java/com/aliyun/oss/common/auth)

#### S3

S3 base on hdfs-file, so you can refer [hadoop s3 docs](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html) to config s3.
S3 based hdfs-file you can refer [hadoop s3 docs](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html) to config s3.

Except when interacting with public S3 buckets, the S3A client needs the credentials needed to interact with buckets.
The client supports multiple authentication mechanisms and can be configured as to which mechanisms to use, and their order of use. Custom implementations of com.amazonaws.auth.AWSCredentialsProvider may also be used.
if you used SimpleAWSCredentialsProvider (can be obtained from the Amazon Security Token Service), these consist of an access key, a secret key.
you can config like this:
If you used SimpleAWSCredentialsProvider (can be obtained from the Amazon Security Token Service), these consist of an access key, a secret key.
You can config like this:

```yaml
Expand All @@ -104,8 +104,8 @@ seatunnel:
```

if you used `InstanceProfileCredentialsProvider`, this supports use of instance profile credentials if running in an EC2 VM, you could check [iam-roles-for-amazon-ec2](https://docs.aws.amazon.com/zh_cn/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html).
you can config like this:
If you used `InstanceProfileCredentialsProvider`, which supports use of instance profile credentials if running in an EC2 VM, you can check [iam-roles-for-amazon-ec2](https://docs.aws.amazon.com/zh_cn/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html).
You can config like this:

```yaml
Expand Down Expand Up @@ -146,11 +146,11 @@ seatunnel:
# important: The user of this key needs to have write permission for the bucket, otherwise an exception of 403 will be returned
```

For additional reading on the Hadoop Credential Provider API see: [Credential Provider API](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html).
For additional reading on the Hadoop Credential Provider API, you can see: [Credential Provider API](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html).

#### HDFS

if you used HDFS, you can config like this:
if you use HDFS, you can config like this:

```yaml
seatunnel:
Expand Down
10 changes: 5 additions & 5 deletions docs/en/seatunnel-engine/deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,18 +7,18 @@ sidebar_position: 3

SeaTunnel Engine(Zeta) supports three different deployment modes: local mode, hybrid cluster mode, and separated cluster mode.

Each deployment mode has different usage scenarios, advantages, and disadvantages. When choosing a deployment mode, you should choose according to your needs and environment.
Each deployment mode has different usage scenarios, advantages, and disadvantages. You should choose a deployment mode according to your needs and environment.

**Local mode:** Only used for testing, each task will start an independent process, and the process will exit after the task is completed.

**Hybrid cluster mode:** The Master service and Worker service of SeaTunnel Engine are mixed in the same process. All nodes can run jobs and participate in the election to become the master, that is, the master node is also running synchronous tasks simultaneously. In this mode, Imap (saving the state information of the task to provide support for the fault tolerance of the task) data will be distributed among all nodes.

**Separated cluster mode(experimental feature):** The Master service and Worker service of SeaTunnel Engine are separated, and each service is a single process. The Master node is only responsible for job scheduling, rest api, task submission, etc., and Imap data is only stored in the Master node. The Worker node is only responsible for the execution of the task, does not participate in the election to become the master, and does not store Imap data.

**Usage suggestion:** Although [separated cluster mode](separated-cluster-deployment.md) is an experimental feature, the first recommended usage will be made in the future. In the hybrid cluster mode, the Master node needs to run tasks synchronously. When the task scale is large, it will affect the stability of the Master node. Once the Master node crashes or the heartbeat times out, it will lead to the switch of the Master node, and the switch of the Master node will cause fault tolerance of all running tasks, which will further increase the load of the cluster. Therefore, we recommend using the separated mode more.
**Usage suggestion:** Although [Separated Cluster Mode](separated-cluster-deployment.md) is an experimental feature, the first recommended usage will be made in the future. In the hybrid cluster mode, the Master node needs to run tasks synchronously. When the task scale is large, it will affect the stability of the Master node. Once the Master node crashes or the heartbeat times out, it will lead to the switch of the Master node, and the switch of the Master node will cause fault tolerance of all running tasks, which will further increase the load of the cluster. Therefore, we recommend using the separated mode more.

[Local mode deployment](local-mode-deployment.md)
[Local Mode Deployment](local-mode-deployment.md)

[Hybrid cluster mode deployment](hybrid-cluster-deployment.md)
[Hybrid Cluster Mode Deployment](hybrid-cluster-deployment.md)

[Separated cluster mode deployment](separated-cluster-deployment.md)
[Separated Cluster Mode Deployment](separated-cluster-deployment.md)
12 changes: 6 additions & 6 deletions docs/en/seatunnel-engine/download-seatunnel.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ sidebar_position: 2
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

# Download and Make Installation Packages
# Download And Make Installation Packages

## Step 1: Preparation

Expand All @@ -16,7 +16,7 @@ Before starting to download SeaTunnel, you need to ensure that you have installe

## Step 2: Download SeaTunnel

Go to the [seatunnel download page](https://seatunnel.apache.org/download) to download the latest version of the release version installation package `seatunnel-<version>-bin.tar.gz`.
Go to the [Seatunnel Download Page](https://seatunnel.apache.org/download) to download the latest version of the release version installation package `seatunnel-<version>-bin.tar.gz`.

Or you can also download it through the terminal.

Expand All @@ -26,12 +26,12 @@ wget "https://archive.apache.org/dist/seatunnel/${version}/apache-seatunnel-${ve
tar -xzvf "apache-seatunnel-${version}-bin.tar.gz"
```

## Step 3: Download the connector plug-in
## Step 3: Download The Connector Plugin

Starting from the 2.2.0-beta version, the binary package no longer provides the connector dependency by default. Therefore, when using it for the first time, you need to execute the following command to install the connector: (Of course, you can also manually download the connector from the [Apache Maven Repository](https://repo.maven.apache.org/maven2/org/apache/seatunnel/), and then move it to the `connectors/seatunnel` directory).

```bash
sh bin/install-plugin.sh 2.3.6
sh bin/install-plugin.sh
```

If you need a specific connector version, taking 2.3.6 as an example, you need to execute the following command.
Expand Down Expand Up @@ -65,6 +65,6 @@ If you want to install connector plugins by manually downloading connectors, you

:::

Now you have completed the download of the SeaTunnel installation package and the download of the connector plug-in. Next, you can choose different running modes according to your needs to run or deploy SeaTunnel.
Now you have completed the download of the SeaTunnel installation package and the download of the connector plugin. Next, you can choose different running modes according to your needs to run or deploy SeaTunnel.

If you use the SeaTunnel Engine (Zeta) that comes with SeaTunnel to run tasks, you need to deploy the SeaTunnel Engine service first. Refer to [Deployment of SeaTunnel Engine (Zeta) Service](deployment.md).
If you use the SeaTunnel Engine (Zeta) that comes with SeaTunnel to run tasks, you need to deploy the SeaTunnel Engine service first. Refer to [Deployment Of SeaTunnel Engine (Zeta) Service](deployment.md).
Loading

0 comments on commit f40f11a

Please sign in to comment.