Qubership Hive-Metastore is a comprehensive solution for deploying Hive-Metastore in Kubernetes.
In Kubernetes Qubership Hive-Metastore uses Minio S3 storage to use data and PostgreSQL to store metadata.
The table below shows services in K8s that can potentially replaces services in Hadoop cluster.
Note Apache Hive contains two large parts, hive-metastore and hiveServer2. This project contains only hive-metastore and does not hiveServer2 libraries. Hive-metastore libraries can be found at https://repo1.maven.org/maven2/org/apache/hive/hive-standalone-metastore-server .
chart
- helm charts for Qubership Hive-Metastore.docker
- files for building Qubership Hive-Metastore docker imagedocs
- Qubership Hive-Metastore documentation.
After deploying to K8s, log4j2-properties
configmap is created, where it is possible to change hive logging level.
It is possible to connect to deployed Qubership Hive-Metastore using Trino or Spark. Example PySpark configuration can be found below
from pyspark.sql import SparkSession
...
spark = SparkSession \
.builder.master("local") \
.appName("MyApp.com") \
.config("spark.sql.warehouse.dir", "s3a://hive/warehouse") \
.config("spark.sql.hive.metastore.version", "3.1.3") \
.config("spark.sql.hive.metastore.jars", "maven") \
.config("spark.hadoop.hive.metastore.uris", "thrift://hive_address") \
.config("spark.hadoop.hive.metastore.schema.verification", "false") \
.config("spark.hadoop.hive.metastore.schema.verification.record.version", "false") \
.config("spark.hadoop.hive.metastore.use.SSL", "false") \
.config('spark.hadoop.fs.s3.buckets.create.enabled', 'true') \
.config('spark.hadoop.fs.s3a.endpoint', 'https://s3.endpoint.address.com') \
.config('spark.hadoop.fs.s3a.access.key', 's3accesskey') \
.config('spark.hadoop.fs.s3a.secret.key', 's3secretkey') \
.config('spark.hadoop.fs.s3a.connection.ssl.enabled', 'false') \
.config('spark.hadoop.fs.s3a.impl', 'org.apache.hadoop.fs.s3a.S3AFileSystem') \
.config('spark.hadoop.fs.s3a.path.style.access', 'true') \
.config('spark.driver.extraJavaOptions', '-Dcom.amazonaws.sdk.disableCertChecking') \
.config('spark.executor.extraJavaOptions', '-Dcom.amazonaws.sdk.disableCertChecking') \
.enableHiveSupport() \
.getOrCreate()