Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improvement] Improve docs for relasing 0.7.0 #2931

Merged
merged 5 commits into from
Jun 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -29,15 +29,12 @@ public class TableProperties {

private TableProperties() {}

public static final String SELF_OPTIMIZING_MIN_PLAN_INTERVAL =
"self-optimizing.min-plan-interval";
public static final long SELF_OPTIMIZING_MIN_PLAN_INTERVAL_DEFAULT = 60000;

public static final String TABLE_PARTITION_PROPERTIES = "table.partition-properties";

public static final String BASE_TABLE_MAX_TRANSACTION_ID = "base.table.max-transaction-id";

public static final String PARTITION_OPTIMIZED_SEQUENCE = "max-txId";

public static final String PARTITION_BASE_OPTIMIZED_TIME = "base-op-time";

public static final String LOCATION = "location";
Expand Down Expand Up @@ -118,6 +115,10 @@ private TableProperties() {}
"self-optimizing.full.rewrite-all-files";
public static final boolean SELF_OPTIMIZING_FULL_REWRITE_ALL_FILES_DEFAULT = true;

public static final String SELF_OPTIMIZING_MIN_PLAN_INTERVAL =
"self-optimizing.min-plan-interval";
public static final long SELF_OPTIMIZING_MIN_PLAN_INTERVAL_DEFAULT = 60000;

/** deprecated table optimize related properties */
@Deprecated public static final String ENABLE_OPTIMIZE = "optimize.enable";

Expand Down Expand Up @@ -145,49 +146,49 @@ private TableProperties() {}
public static final String ENABLE_TABLE_EXPIRE = "table-expire.enabled";

public static final boolean ENABLE_TABLE_EXPIRE_DEFAULT = true;
@Deprecated public static final String ENABLE_TABLE_EXPIRE_LEGACY = "table-expire.enable";

public static final String CHANGE_DATA_TTL = "change.data.ttl.minutes";
public static final long CHANGE_DATA_TTL_DEFAULT = 10080; // 7 Days

public static final String BASE_SNAPSHOT_KEEP_MINUTES = "snapshot.base.keep.minutes";
public static final long BASE_SNAPSHOT_KEEP_MINUTES_DEFAULT = 720; // 12 Hours

public static final String ENABLE_ORPHAN_CLEAN = "clean-orphan-file.enabled";
public static final boolean ENABLE_ORPHAN_CLEAN_DEFAULT = false;

public static final String MIN_ORPHAN_FILE_EXISTING_TIME =
"clean-orphan-file.min-existing-time-minutes";
public static final long MIN_ORPHAN_FILE_EXISTING_TIME_DEFAULT = 2880; // 2 Days
public static final String ENABLE_DANGLING_DELETE_FILES_CLEAN =
"clean-dangling-delete-files.enabled";
public static final boolean ENABLE_DANGLING_DELETE_FILES_CLEAN_DEFAULT = true;

public static final String ENABLE_DATA_EXPIRATION = "data-expire.enabled";
public static final boolean ENABLE_DATA_EXPIRATION_DEFAULT = false;

public static final String DATA_EXPIRATION_LEVEL = "data-expire.level";
public static final String DATA_EXPIRATION_LEVEL_DEFAULT = "partition";

public static final String DATA_EXPIRATION_FIELD = "data-expire.field";

public static final String DATA_EXPIRATION_DATE_STRING_PATTERN =
"data-expire.datetime-string-pattern";
public static final String DATA_EXPIRATION_DATE_STRING_PATTERN_DEFAULT = "yyyy-MM-dd";

public static final String DATA_EXPIRATION_DATE_NUMBER_FORMAT =
"data-expire.datetime-number-format";
public static final String DATA_EXPIRATION_DATE_NUMBER_FORMAT_DEFAULT = "TIMESTAMP_MS";

public static final String DATA_EXPIRATION_RETENTION_TIME = "data-expire.retention-time";

public static final String DATA_EXPIRATION_BASE_ON_RULE = "data-expire.base-on-rule";
public static final String DATA_EXPIRATION_BASE_ON_RULE_DEFAULT = "LAST_COMMIT_TIME";

public static final String ENABLE_DANGLING_DELETE_FILES_CLEAN =
"clean-dangling-delete-files.enabled";
public static final boolean ENABLE_DANGLING_DELETE_FILES_CLEAN_DEFAULT = true;

public static final String ENABLE_ORPHAN_CLEAN = "clean-orphan-file.enabled";
public static final boolean ENABLE_ORPHAN_CLEAN_DEFAULT = false;
@Deprecated public static final String ENABLE_ORPHAN_CLEAN_LEGACY = "clean-orphan-file.enable";

public static final String MIN_ORPHAN_FILE_EXISTING_TIME =
"clean-orphan-file.min-existing-time-minutes";
public static final long MIN_ORPHAN_FILE_EXISTING_TIME_DEFAULT = 2880; // 2 Days

public static final String ENABLE_TABLE_TRASH = "table-trash.enabled";
public static final boolean ENABLE_TABLE_TRASH_DEFAULT = false;

public static final String TABLE_TRASH_CUSTOM_ROOT_LOCATION = "table-trash.custom-root-location";

public static final String TABLE_TRASH_KEEP_DAYS = "table-trash.keep.days";
public static final int TABLE_TRASH_KEEP_DAYS_DEFAULT = 7; // 7 Days

public static final String TABLE_TRASH_FILE_PATTERN = "table-trash.file-pattern";
public static final String TABLE_TRASH_FILE_PATTERN_DEFAULT =
".+\\.parquet"
Expand All @@ -199,6 +200,9 @@ private TableProperties() {}
+ // v123.metadata.json
"|.*[0-9a-fA-F]{8}(-[0-9a-fA-F]{4}){3}-[0-9a-fA-F]{12}-m[0-9]+\\.avro"; // UUID-m0.avro

@Deprecated public static final String ENABLE_TABLE_EXPIRE_LEGACY = "table-expire.enable";
@Deprecated public static final String ENABLE_ORPHAN_CLEAN_LEGACY = "clean-orphan-file.enable";

/** table tag management related properties */
public static final String ENABLE_AUTO_CREATE_TAG = "tag.auto-create.enabled";

Expand Down Expand Up @@ -226,15 +230,18 @@ private TableProperties() {}
public static final String FILE_FORMAT_PARQUET = "parquet";

public static final String FILE_FORMAT_ORC = "orc";

public static final String BASE_FILE_FORMAT = "base.write.format";
public static final String DELTA_FILE_FORMAT = "delta.write.format";
public static final String BASE_FILE_FORMAT_DEFAULT = FILE_FORMAT_PARQUET;

public static final String DELTA_FILE_FORMAT = "delta.write.format";

public static final String CHANGE_FILE_FORMAT = "change.write.format";
public static final String CHANGE_FILE_FORMAT_DEFAULT = FILE_FORMAT_PARQUET;

public static final String DEFAULT_FILE_FORMAT =
org.apache.iceberg.TableProperties.DEFAULT_FILE_FORMAT;

public static final String DEFAULT_FILE_FORMAT_DEFAULT =
org.apache.iceberg.TableProperties.DEFAULT_FILE_FORMAT_DEFAULT;

Expand Down Expand Up @@ -280,11 +287,11 @@ private TableProperties() {}
public static final String SPLIT_OPEN_FILE_COST =
org.apache.iceberg.TableProperties.SPLIT_OPEN_FILE_COST;
public static final long SPLIT_OPEN_FILE_COST_DEFAULT = 4 * 1024 * 1024; // 4MB

/** log store related properties */
public static final String ENABLE_LOG_STORE = "log-store.enabled";

public static final boolean ENABLE_LOG_STORE_DEFAULT = false;
@Deprecated public static final String ENABLE_LOG_STORE_LEGACY = "log-store.enable";

public static final String LOG_STORE_TYPE = "log-store.type";
public static final String LOG_STORE_STORAGE_TYPE_KAFKA = "kafka";
Expand All @@ -305,13 +312,19 @@ private TableProperties() {}

public static final String OWNER = "owner";

@Deprecated public static final String ENABLE_LOG_STORE_LEGACY = "log-store.enable";

/** table format related properties */
public static final String TABLE_FORMAT = "table-format";

public static final String MIXED_FORMAT_PRIMARY_KEY_FIELDS = "mixed-format.primary-key-fields";

public static final String MIXED_FORMAT_TABLE_STORE = "mixed-format.table-store";

public static final String MIXED_FORMAT_TABLE_STORE_BASE = "base";

public static final String MIXED_FORMAT_TABLE_STORE_CHANGE = "change";

public static final String MIXED_FORMAT_CHANGE_STORE_IDENTIFIER =
"mixed-format.change.identifier";

Expand Down
119 changes: 49 additions & 70 deletions docs/admin-guides/deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,10 @@ You can choose to download the stable release package from [download page](../..

## System requirements

- Java 8 is required. Java 17 is required for Trino.
- Java 8 is required.
- Optional: MySQL 5.5 or higher
- Optional: PostgreSQL 14.x or higher
- Optional: ZooKeeper 3.4.x or higher
- Optional: Hive (2.x or 3.x)
- Optional: Hadoop (2.9.x or 3.x)

## Download the distribution

Expand All @@ -32,54 +30,24 @@ Unzip it to create the amoro-x.y.z directory in the same directory, and then go
You can build based on the master branch without compiling Trino. The compilation method and the directory of results are described below:

```shell
git clone https://github.com/apache/amoro.git
cd amoro
base_dir=$(pwd)
mvn clean package -DskipTests
cd amoro-ams/dist/target/
ls
apache-amoro-x.y.z-bin.tar.gz # AMS release package
dist-x.y.z-tests.jar
dist-x.y.z.jar
archive-tmp/
maven-archiver/

cd ${base_dir}/amoro-mixed-format/amoro-mixed-format-flink/v1.15/amoro-mixed-format-flink-runtime-1.15/target
ls
amoro-mixed-format-flink-runtime-1.15-x.y.z-tests.jar
$ git clone https://github.com/apache/amoro.git
$ cd amoro
$ base_dir=$(pwd)
$ mvn clean package -DskipTests
$ cd amoro-ams/dist/target/
$ ls
amoro-x.y.z-bin.zip # AMS release package

$ cd ${base_dir}/amoro-mixed-format/amoro-mixed-format-flink/v1.15/amoro-mixed-format-flink-runtime-1.15/target
$ ls
amoro-mixed-format-flink-runtime-1.15-x.y.z.jar # Flink 1.15 runtime package
original-amoro-mixed-format-flink-runtime-1.15-x.y.z.jar
maven-archiver/

cd ${base_dir}/amoro-mixed-format/amoro-mixed-format-spark/v3.2/amoro-mixed-format-spark-runtime-3.2/target
ls
$ cd ${base_dir}/amoro-mixed-format/amoro-mixed-format-spark/v3.2/amoro-mixed-format-spark-runtime-3.2/target
$ ls
amoro-mixed-format-spark-runtime-3.2-x.y.z.jar # Spark v3.2 runtime package)
amoro-mixed-format-spark-runtime-3.2-x.y.z-tests.jar
amoro-mixed-format-spark-runtime-3.2-x.y.z-sources.jar
original-amoro-mixed-format-spark-runtime-3.2-x.y.z.jar
```

If the Flink version in the amoro-ams/amoro-ams-optimizer/amoro-optimizer-flink module you compiled is lower than 1.15, you must add the `-Pflink-pre-1.15` parameter before mvn.
for example `mvn clean package -Pflink-pre-1.15 -Dflink-optimizer.flink-version=1.14.6 -DskipTests` to compile.

If you need to compile the Trino module at the same time, you need to install jdk17 locally and configure `toolchains.xml` in the user's `${user.home}/.m2/` directory,
then run `mvn package -Ptoolchain,build-mixed-format-trino` to compile the entire project.

```xml
<?xml version="1.0" encoding="UTF-8"?>
<toolchains>
<toolchain>
<type>jdk</type>
<provides>
<version>17</version>
<vendor>sun</vendor>
</provides>
<configuration>
<jdkHome>${YourJDK17Home}</jdkHome>
</configuration>
</toolchain>
</toolchains>
```
More build guide can be found in the project's [README](https://github.com/apache/amoro?tab=readme-ov-file#building).

## Configuration

Expand Down Expand Up @@ -120,13 +88,13 @@ You can use MySQL/PostgreSQL as the system database instead of the default Derby
If you would like to use MySQL as the system database, you need to manually download the [MySQL JDBC Connector](https://repo1.maven.org/maven2/com/mysql/mysql-connector-j/8.1.0/mysql-connector-j-8.1.0.jar)
and move it into the `{AMORO_HOME}/lib/` directory. You can use the following command to complete these operations:
```shell
cd ${AMORO_HOME}
MYSQL_JDBC_DRIVER_VERSION=8.0.30
wget https://repo1.maven.org/maven2/mysql/mysql-connector-java/${MYSQL_JDBC_DRIVER_VERSION}/mysql-connector-java-${MYSQL_JDBC_DRIVER_VERSION}.jar
mv mysql-connector-java-${MYSQL_JDBC_DRIVER_VERSION}.jar lib
$ cd ${AMORO_HOME}
$ MYSQL_JDBC_DRIVER_VERSION=8.0.30
$ wget https://repo1.maven.org/maven2/mysql/mysql-connector-java/${MYSQL_JDBC_DRIVER_VERSION}/mysql-connector-java-${MYSQL_JDBC_DRIVER_VERSION}.jar
$ mv mysql-connector-java-${MYSQL_JDBC_DRIVER_VERSION}.jar lib
```

Create an empty database in MySQL/PostgreSQL, then AMS will automatically create table structures in this MySQL/PostgreSQL database when it first started.
Create an empty database in MySQL/PostgreSQL, then AMS will automatically create tables in this MySQL/PostgreSQL database when it first started.

One thing you need to do is Adding MySQL/PostgreSQL configuration under `config.yaml` of Ams:

Expand Down Expand Up @@ -172,20 +140,33 @@ AMS provides implementations of `LocalContainer` and `FlinkContainer` by default
```yaml
containers:
- name: localContainer
container-impl: org.apache.amoro.optimizer.LocalOptimizerContainer
container-impl: org.apache.amoro.server.manager.LocalOptimizerContainer
properties:
export.JAVA_HOME: "/opt/java" # JDK environment

- name: flinkContainer
container-impl: org.apache.amoro.optimizer.FlinkOptimizerContainer
container-impl: org.apache.amoro.server.manager.FlinkOptimizerContainer
properties:
flink-home: "/opt/flink/" # The installation directory of Flink
export.JVM_ARGS: "-Djava.security.krb5.conf=/opt/krb5.conf" # Submitting Flink jobs with Java parameters, such as Kerberos parameters.
export.HADOOP_CONF_DIR: "/etc/hadoop/conf/" # Hadoop configuration file directory
export.HADOOP_USER_NAME: "hadoop" # Hadoop user
export.FLINK_CONF_DIR: "/etc/hadoop/conf/" # Flink configuration file directory

- name: sparkContainer
container-impl: org.apache.amoro.server.manager.SparkOptimizerContainer
properties:
spark-home: /opt/spark/ # Spark install home
master: yarn # The cluster manager to connect to. See the list of https://spark.apache.org/docs/latest/submitting-applications.html#master-urls.
deploy-mode: cluster # Spark deploy mode, client or cluster
export.JVM_ARGS: -Djava.security.krb5.conf=/opt/krb5.conf # Spark launch jvm args, like kerberos config when ues kerberos
export.HADOOP_CONF_DIR: /etc/hadoop/conf/ # Hadoop config dir
export.HADOOP_USER_NAME: hadoop # Hadoop user submit on yarn
export.SPARK_CONF_DIR: /opt/spark/conf/ # Spark config dir
```

More optimizer container configurations can be found in [managing optimizers](../managing-optimizers/).

### Configure terminal

The Terminal module in the AMS Dashboard allows users to execute SQL directly on the platform. Currently, the Terminal backend supports two implementations: `local` and `kyuubi`.
Expand All @@ -201,6 +182,16 @@ ams:
local.using-session-catalog-for-hive: true
```

More properties the terminal supports including:

| Key | Default | Description |
|--------------------------|---------|---------------------------------------------------------------------------------------------------|
| terminal.backend | local | Terminal backend implementation. local, kyuubi and custom are valid values. |
| terminal.factory | - | Session factory implement of terminal, `terminal.backend` must be `custom` if this is set. |
| terminal.result.limit | 1000 | Row limit of result-set |
| terminal.stop-on-error | false | When a statement fails to execute, stop execution or continue executing the remaining statements. |
| terminal.session.timeout | 30 | Session timeout in minutes. |

### Configure metric reporter

Amoro provides metric reporters by plugin mechanism to connect to external metric systems.
Expand Down Expand Up @@ -266,26 +257,13 @@ The following JVM options could be set in `${AMORO_CONF_DIR}/jvm.properties`.
| jmx.remote.port | "-Dcom.sun.management.jmxremote.port=${value} | Enable remote debug |
| extra.options | "JAVA_OPTS="${JAVA_OPTS} ${JVM_EXTRA_CONFIG}" | The addition jvm options |

### Terminal configurations

Terminal support local and kyuubi, the default is local. If the user uses local to run in the spark local context, you can set the **spark.*** configuration, and if you use kyuubi, you can set the **kyuubi.*** configuration

| Key | Default | Description |
|--------------------------|---------|---------------------------------------------------------------------------------------------------|
| terminal.backend | local | Terminal backend implementation. local, kyuubi and custom are valid values. |
| terminal.factory | - | Session factory implement of terminal, `terminal.backend` must be `custom` if this is set. |
| terminal.result.limit | 1000 | Row limit of result-set |
| terminal.stop-on-error | false | When a statement fails to execute, stop execution or continue executing the remaining statements. |
| terminal.session.timeout | 30 | Session timeout in minutes. |


## Start AMS

Enter the directory amoro-x.y.z and execute bin/ams.sh start to start AMS.

```shell
cd amoro-x.y.z
bin/ams.sh start
$ cd amoro-x.y.z
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apache-amoro-x.y.z-bin

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extracted folder name is still amoro-x.y.z, reference:https://github.com/apache/amoro/blob/master/amoro-ams/dist/src/main/assemblies/bin.xml#L28

$ bin/ams.sh start
```

Then, access http://localhost:1630 through a browser to see the login page. If it appears, it means that the startup is
Expand All @@ -294,7 +272,8 @@ successful. The default username and password for login are both "admin".
You can also restart/stop AMS with the following command:

```shell
bin/ams.sh restart/stop
$ bin/ams.sh restart
$ bin/ams.sh stop
```

## Upgrade AMS
Expand All @@ -313,7 +292,7 @@ Replace all contents in the original `{AMORO_HOME}/plugin` directory with the co
Backup the old content before replacing it, so that you can roll back the upgrade operation if necessary.
{{< /hint >}}

### Configure new parameters
### Configure new properties

The old configuration file `{AMORO_HOME}/conf/config.yaml` is usually compatible with the new version, but the new version may introduce new parameters. Try to compare the configuration files of the old and new versions, and reconfigure the parameters if necessary.

Expand Down
Loading
Loading