Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs]: Quick start docs and fix problem in docker/build.sh #830

Merged
merged 9 commits into from
Dec 5, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,13 @@ We provide a bash script to help you build docker image easier.
You can build all image via script in current dir.

```shell
./build all
./build.sh all
```

or just build only one image.

```shell
./build ams
./build.sh ams
```

- NOTICE: The ams image and flink image required the project had been packaged.
Expand Down
2 changes: 1 addition & 1 deletion docker/ams/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ RUN unzip arctic-${ARCTIC_VERSION}-bin.zip \
WORKDIR /usr/local/ams/arctic-${ARCTIC_VERSION}

COPY config.sh ./bin/config.sh
RUN find ./bin -name "*.sh" | dos2unix
RUN find ./bin -name "*.sh" | xargs dos2unix


EXPOSE 1630/tcp
Expand Down
5 changes: 3 additions & 2 deletions docker/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ CURRENT_DIR="$( cd "$(dirname "$0")" ; pwd -P )"
ARCTIC_HOME="$( cd "$CURRENT_DIR/../" ; pwd -P )"
export ARCTIC_HOME

ARCTIC_VERSION=`cat $ARCTIC_HOME/pom.xml |grep 'arctic-parent' -C 3 |grep -oP '(?<=<version>).*(?=</version>)'`
ARCTIC_VERSION=`cat $ARCTIC_HOME/pom.xml | grep 'arctic-parent' -C 3 | grep -Eo '<version>.*</version>' | awk -F'[><]' '{print $3}'`
ARCTIC_BINARY_PACKAGE=${ARCTIC_HOME}/dist/target/arctic-${ARCTIC_VERSION}-bin.zip
FLINK_VERSION=1.15.3
HADOOP_VERSION=2.10.2
Expand Down Expand Up @@ -127,6 +127,7 @@ function build_ams() {
set -x
AMS_IMAGE_RELEASE_PACKAGE=${CURRENT_DIR}/ams/arctic-${ARCTIC_VERSION}-bin.zip
cp ${ARCTIC_BINARY_PACKAGE} ${AMS_IMAGE_RELEASE_PACKAGE}
# dos2unix ${CURRENT_DIR}/ams/config.sh
docker build -t arctic163/ams --build-arg ARCTIC_VERSION=${ARCTIC_VERSION} \
--build-arg DEBIAN_MIRROR=${DEBIAN_MIRROR} \
ams/.
Expand All @@ -142,7 +143,7 @@ function build_flink() {
echo "=============================================="
echo " arctic163/flink "
echo "=============================================="
FLINK_MAJOR_VERSION=`echo $FLINK_VERSION| grep -oP '\d+.\d+'`
FLINK_MAJOR_VERSION=`echo $FLINK_VERSION| grep -oE '\d+.\d+'`
FLINK_CONNECTOR_BINARY=${ARCTIC_HOME}/flink/v${FLINK_MAJOR_VERSION}/flink-runtime/target/arctic-flink-runtime-${FLINK_MAJOR_VERSION}-${ARCTIC_VERSION}.jar

echo "Start Build arctic163/flink Image, Flink Version: ${FLINK_VERSION}"
Expand Down
2 changes: 1 addition & 1 deletion docker/demo-cluster.sh
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ ARCTIC_POM=${ARCTIC_HOME}/pom.xml

if [ -f "${ARCTIC_POM}" ];then
echo "Current dir in arctic project. parse version from ${ARCTIC_POM}"
PROJECT_VERSION=`cat ${ARCTIC_POM} |grep 'arctic-parent' -C 3 |grep -oP '(?<=<version>).*(?=</version>)'`
PROJECT_VERSION=`cat ${ARCTIC_POM} | grep 'arctic-parent' -C 3 | grep -Eo '<version>.*</version>' | awk -F'[><]' '{print $3}'`
fi


Expand Down
8 changes: 5 additions & 3 deletions site/config/ch/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -56,9 +56,11 @@ nav:
- Self-optimizing: concepts/self-optimizing.md
- Table watermark: concepts/table-watermark.md
- Quick Start:
- Setup: empty.md
- Quick demo: empty.md
- CDC ingestion: empty.md
- Setup:
- Setup from docker: quickstart/setup/setup-from-docker.md
- Setup from binary release: quickstart/setup/setup-from-binary-release.md
- Quick demo: quickstart/quick-demo.md
- CDC ingestion: quickstart/cdc-ingestion.md
- Admin guide:
- Deployment: empty.md
- Managing catalogs: empty.md
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added site/docs/ch/images/quickstart/upsert-result.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added site/docs/ch/images/quickstart/upsert-result2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
70 changes: 70 additions & 0 deletions site/docs/ch/quickstart/cdc-ingestion.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# CDC Ingestion

本部分文档将为您演示通过 Flink CDC 同步 MySQL 数据变更到 Arctic 的过程。本部分文档采用的是
[lakehouse-benchmark](https://github.com/NetEase/lakehouse-benchmark)项目中提供的 TPCC 数据集,
模拟真实业务场景下对 MySQL 的读写, 并使用
[lakehouse-benchmark-ingestion](https://github.com/NetEase/lakehouse-benchmark-ingestion) 工具完成数据通过 Binlog 从 MySQL 到 Arctic 的同步。
您需要提前通过 [Setup from docker](./setup/setup-from-docker.md) 完成集群的部署,
并完成 [Quick Demo](./quick-demo.md) 中,创建 Catalog 并开启 Optimizer 的部分。


# Step1. initialize tables


通过以下命令完成测试数据的初始化:

```shell

docker exec -it lakehouse-benchmark java \
-jar lakehouse-benchmark.jar -b tpcc,chbenchmark \
-c config/mysql/sample_chbenchmark_config.xml \
--create=true --load=true

```

等待命令执行完成,以上命令会在 MySQL 中初始化一个 oltpbench 的数据库,并且创建一系列业务表,并完成表数据的初始化。

# Step2. start streaming ingestion

登录到 Ingestion 容器,通过以下命令完成 Ingestion 任务的启动:

```shell
docker exec -it lakehouse-benchmark-ingestion bash

java -jar lakehouse-benchmark-ingestion-1.0-SNAPSHOT.jar \
-confDir /usr/lib/lakehouse_benchmark_ingestion/conf \
-sinkType arctic \
-sinkDatabase oltpbench
```

开启后可以在 AMS Tables 页面查看到 Table 信息已经同步到 Arctic ,您可以通过 Terminal 执行 SQL 以查询 Arctic 上同步的数据。
Ingestion 容器也是一个 Flink Job, 您可以通过 [Flink Dashboard](http://localhost:8082) 访问 Flink Web UI 以查看 Ingestion 任务信息。


# Step3. start tpcc benchmark

重新回到 Benchmark 容器,通过以下命令可以持续在测试库上执行有业务含义的 OLTP 操作

```shell
docker exec -it lakehouse-benchmark bash

java -jar lakehouse-benchmark.jar -b tpcc,chbenchmark \
-c config/mysql/sample_chbenchmark_config.xml -\
-execute=true
```

此命令会一直不断的在测试库上执行 OLTP 操作,直到程序退出。
此时可以回到 AMS 的 Terminal 页面,通过 Spark SQL 查询到 MySQL 上的数据变更会随着 Ingestion 任务不断的同步到 Arctic Table 上。

???+note "Ingestion 任务的 Checkpoint 周期为 60s, 所以 Arctic 数据湖和 MySQL 的数据变更有 60s 的延迟。"


# Step 4. check table result

整个 TPCC Benchmark 会执行 10min,在 tpcc benchmark 执行完成后,可以通过以下命令登录 mysql 容器

```shell
docker exec -it mysql mysql -ppassword oltpbench
```

然后通过在 MySQL 和 AMS 上执行 Select 对比最终数据是否正确。
Loading