Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug report] Updable to create fileset with minio #6156

Open
gesaleh opened this issue Jan 8, 2025 · 7 comments · May be fixed by #5406
Open

[Bug report] Updable to create fileset with minio #6156

gesaleh opened this issue Jan 8, 2025 · 7 comments · May be fixed by #5406
Assignees
Labels
0.8.0 Release v0.8.0 bug Something isn't working

Comments

@gesaleh
Copy link

gesaleh commented Jan 8, 2025

Version

main branch

Describe what's wrong

S3 hadoop integration
configuration
minio runing in a docker
bucket created work
playground docker deployment with same network as minio docker

try to create a catalog_s3 using fileset (bucket name work)

# create a S3 catalog
curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" -d '{
  "name": "catalog",
  "type": "FILESET",
  "comment": "comment",
  "provider": "hadoop",
  "properties": {
    "location": "s3a://work/root",
    "s3-access-key-id": "access_key",
    "s3-secret-access-key": "secret_key",
    "s3-endpoint": "http://minio:9000",
    "filesystem-providers": "s3"
  }
}' http://localhost:8090/api/metalakes/metalake_demo/catalogs 

this works but when i try to create scheme it just hangs

curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" -d '{
  "name": "s3_schema",
  "comment": "comment",
  "properties": {
    "location": "s3a://work/schemaS3"
  }
}' http://localhost:8090/api/metalakes/metalake_demo/catalogs/catalog_s3/schemas 

I get dead lock probably I'm doing wrong i'm suspecting the location value

root@7c2e4f9a4c6e:~/gravitino/conf# curl -I http://minio:9000//minio/health/live
HTTP/1.1 400 Bad Request
Accept-Ranges: bytes
Content-Length: 261
Content-Type: application/xml
Server: MinIO
Vary: Origin
Date: Tue, 07 Jan 2025 17:26:33 GMT

root@7c2e4f9a4c6e:~/gravitino/conf# curl -I minio:9000//minio/health/live
HTTP/1.1 400 Bad Request
Accept-Ranges: bytes
Content-Length: 261
Content-Type: application/xml
Server: MinIO
Vary: Origin
Date: Tue, 07 Jan 2025 17:26:40 GMT

i added the jar file aws-bundle-0.7.0-incubating.jar to the root/gravitino/catalogs/hadoop/libs i didn't change any value in the conf files

Error message and/or stacktrace

2025-01-07 15:00:56 2025-01-07 14:00:56.133 WARN [tree-lock-dead-lock-checker-0] [org.apache.gravitino.lock.LockManager.lambda$checkDeadLock$1(LockManager.java:138)] - Dead lock detected for thread with identifier ThreadIdentifier{thread=Thread[Gravitino-webserver-57,5,main], ident=metalake_demo.catalog_s3} on node TreeLockNode{ident=/,hashCode=78}, threads that holding the node: {ThreadIdentifier{thread=Thread[Gravitino-webserver-65,5,main], ident=metalake_demo.catalog_s3}=1736258146570, ThreadIdentifier{thread=Thread[Gravitino-webserver-51,5,main], ident=metalake_demo.catalog_s3}=1736258261725, ThreadIdentifier{thread=Thread[Gravitino-webserver-53,5,main], ident=metalake_demo.catalog_s3}=1736258145174, ThreadIdentifier{thread=Thread[Gravitino-webserver-71,5,main], ident=metalake_demo.catalog_s3}=1736258388375, ThreadIdentifier{thread=Thread[Gravitino-webserver-52,5,main], ident=metalake_demo.catalog_s3}=1736258343241, ThreadIdentifier{thread=Thread[Gravitino-webserver-59,5,main], ident=metalake_demo.catalog_s3}=1736258261725, ThreadIdentifier{thread=Thread[Gravitino-webserver-60,5,main], ident=metalake_demo.catalog_s3}=1736258385578, ThreadIdentifier{thread=Thread[Gravitino-webserver-68,5,main], ident=metalake_demo.catalog_s3}=1736258385578, ThreadIdentifier{thread=Thread[Gravitino-webserver-57,5,main], ident=metalake_demo.catalog_s3}=1736258336696, ThreadIdentifier{thread=Thread[Gravitino-webserver-70,5,main], ident=metalake_demo.catalog_s3}=1736258343242} 
2025-01-07 15:00:56 2025-01-07 14:00:56.133 WARN [tree-lock-dead-lock-checker-0] [org.apache.gravitino.lock.LockManager.lambda$checkDeadLock$1(LockManager.java:138)] - Dead lock detected for thread with identifier ThreadIdentifier{thread=Thread[Gravitino-webserver-70,5,main], ident=metalake_demo.catalog_s3} on node TreeLockNode{ident=/,hashCode=78}, threads that holding the node: {ThreadIdentifier{thread=Thread[Gravitino-webserver-65,5,main], ident=metalake_demo.catalog_s3}=1736258146570, ThreadIdentifier{thread=Thread[Gravitino-webserver-51,5,main], ident=metalake_demo.catalog_s3}=1736258261725, ThreadIdentifier{thread=Thread[Gravitino-webserver-53,5,main], ident=metalake_demo.catalog_s3}=1736258145174, ThreadIdentifier{thread=Thread[Gravitino-webserver-71,5,main], ident=metalake_demo.catalog_s3}=1736258388375, ThreadIdentifier{thread=Thread[Gravitino-webserver-52,5,main], ident=metalake_demo.catalog_s3}=1736258343241, ThreadIdentifier{thread=Thread[Gravitino-webserver-59,5,main], ident=metalake_demo.catalog_s3}=1736258261725, ThreadIdentifier{thread=Thread[Gravitino-webserver-60,5,main], ident=metalake_demo.catalog_s3}=1736258385578, ThreadIdentifier{thread=Thread[Gravitino-webserver-68,5,main], ident=metalake_demo.catalog_s3}=1736258385578, ThreadIdentifier{thread=Thread[Gravitino-webserver-57,5,main], ident=metalake_demo.catalog_s3}=1736258336696, ThreadIdentifier{thread=Thread[Gravitino-webserver-70,5,main], ident=metalake_demo.catalog_s3}=1736258343242} 
2025-01-07 13:55:45.209 INFO [Gravitino-webserver-53] [org.apache.gravitino.gcp.shaded.com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.<clinit>(GoogleHadoopFileSystemBase.java:611)] - GHFS version: 1.9.4-hadoop3
2025-01-07 13:55:45.308 INFO [Gravitino-webserver-53] [org.apache.commons.beanutils.FluentPropertyBeanIntrospector.introspect(FluentPropertyBeanIntrospector.java:147)] - Error when creating PropertyDescriptor for public final void org.apache.commons.configuration2.AbstractConfiguration.setProperty(java.lang.String,java.lang.Object)! Ignoring this property.
2025-01-07 13:55:45.313 WARN [Gravitino-webserver-53] [org.apache.hadoop.metrics2.impl.MetricsConfig.loadFirst(MetricsConfig.java:134)] - Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
# ls gravitino*
gravitino-api-0.7.0-incubating.jar             gravitino-core-0.7.0-incubating.jar  gravitino-server-0.7.0-incubating.jar
gravitino-catalog-common-0.7.0-incubating.jar  gravitino-docs-0.7.0-incubating.jar  gravitino-server-common-0.7.0-incubating.jar
gravitino-common-0.7.0-incubating.jar          gravitino-meta-0.7.0-incubating.jar
# pwd
/root/gravitino/libs 
root@7c2e4f9a4c6e:~/gravitino/conf# more gravitino.conf 
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#  http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.
#

# THE CONFIGURATION FOR Gravitino SERVER
gravitino.server.shutdown.timeout = 3000

# THE CONFIGURATION FOR Gravitino WEB SERVER
# The host name of the built-in web server
gravitino.server.webserver.host = 0.0.0.0
# The http port number of the built-in web server
gravitino.server.webserver.httpPort = 8090
# The min thread size of the built-in web server
gravitino.server.webserver.minThreads = 24
# The max thread size of the built-in web server
gravitino.server.webserver.maxThreads = 200
# The stop timeout of the built-in web server
gravitino.server.webserver.stopTimeout = 30000
# The timeout of idle connections
gravitino.server.webserver.idleTimeout = 30000
# The executor thread pool work queue size of the built-in web server
gravitino.server.webserver.threadPoolWorkQueueSize = 100
# The request header size of the built-in web server
gravitino.server.webserver.requestHeaderSize = 131072
# The response header size of the built-in web server
gravitino.server.webserver.responseHeaderSize = 131072

# THE CONFIGURATION FOR Gravitino ENTITY STORE
# The entity store to use
gravitino.entity.store = relational
# The backend for the entity store, we only supports JDBC
gravitino.entity.store.relational = JDBCBackend

# The JDBC URL for the entity store
gravitino.entity.store.relational.jdbcUrl = jdbc:h2
# The JDBC driver class name
gravitino.entity.store.relational.jdbcDriver = org.h2.Driver
# The JDBC user name
gravitino.entity.store.relational.jdbcUser = gravitino
# The JDBC password
gravitino.entity.store.relational.jdbcPassword = gravitino

# THE CONFIGURATION FOR Gravitino CATALOG
# The interval in milliseconds to evict the catalog cache
gravitino.catalog.cache.evictionIntervalMs = 3600000

# THE CONFIGURATION FOR AUXILIARY SERVICE
# Auxiliary service names, separate by ','
gravitino.auxService.names = iceberg-rest
# Iceberg REST service classpath
gravitino.auxService.iceberg-rest.classpath = iceberg-rest-server/libs, iceberg-rest-server/conf
# Iceberg REST service host
gravitino.auxService.iceberg-rest.host = 0.0.0.0
# Iceberg REST service http port
gravitino.auxService.iceberg-rest.httpPort = 9001
gravitino.auxService.iceberg-rest.catalog-backend = jdbc
gravitino.auxService.iceberg-rest.uri = jdbc:mysql://:3306/db
gravitino.auxService.iceberg-rest.warehouse = hdfs://:9000/user/iceberg/warehouse/
gravitino.auxService.iceberg-rest.jdbc.user = mysql
gravitino.auxService.iceberg-rest.jdbc.password = xxxxxxxxxxxx
gravitino.auxService.iceberg-rest.jdbc-driver = com.mysql.cj.jdbc.Driver
root@7c2e4f9a4c6e:~/gravitino/conf# 

How to reproduce

  1. run docker minio signle node minio:9000

  2. create a bucket

  3. add access Keys

  4. run playground gravitino same network as minio or use localhost as server name

  5. add using UI or API the S3 haddop fileset catalog

  6. add schema to the catalog using API

Additional context

No response

@gesaleh gesaleh added the bug Something isn't working label Jan 8, 2025
@yuqi1129
Copy link
Contributor

yuqi1129 commented Jan 8, 2025

@gesaleh Thank you for reporting this issue. We will promptly follow up.

@yuqi1129
Copy link
Contributor

yuqi1129 commented Jan 8, 2025

@gesaleh

It's okay for me to connect to Minio:

(venv-spark-3.1) ➜  [/Users/yuqi] curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" -d '{
  "name": "minio_catalog_test",
  "type": "FILESET",
  "comment": "comment",
  "provider": "hadoop",
  "properties": {
    "location": "s3a://my-bucket/catalog",
    "s3-access-key-id": "ViLxBltqiqHecTEhKJZS",
    "s3-secret-access-key": "FCBhVC7tmUD9t0KyKrKo5oKvHak2LgSsADsb38Ip",
    "s3-endpoint": "http://192.168.215.4:9000",
    "filesystem-providers": "s3"
  }
}' http://localhost:8090/api/metalakes/test/catalogs
{"code":0,"catalog":{"name":"minio_catalog_test","type":"fileset","provider":"hadoop","comment":"comment","properties":{"s3-access-key-id":"ViLxBltqiqHecTEhKJZS","s3-secret-access-key":"FCBhVC7tmUD9t0KyKrKo5oKvHak2LgSsADsb38Ip","filesystem-providers":"s3","location":"s3a://my-bucket/catalog","s3-endpoint":"http://192.168.215.4:9000","in-use":"true"},"audit":{"creator":"anonymous","createTime":"2025-01-08T10:31:24.762Z","lastModifier":"anonymous","lastModifiedTime":"2025-01-08T10:31:24.762Z"}}}
(venv-spark-3.1) ➜  [/Users/yuqi] curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" -d '{
  "name": "s3_schema",
  "comment": "comment",
  "properties": {
    "location": "s3a://my-bucket/schemaS3"
  }
}' http://localhost:8090/api/metalakes/test/catalogs/minio_catalog_test/schemas
{"code":0,"schema":{"name":"s3_schema","comment":"comment","properties":{"location":"s3a://my-bucket/schemaS3"},"audit":{"creator":"anonymous","createTime":"2025-01-08T10:32:24.189Z"}}}                                                                                                                                                   (venv-spark-3.1) ➜  [/Users/yuqi]

The main I used: The lastest main branch.

The following is how I do the test

  1. Start Minio locally
docker run -p 9000:9000 \
    -e "MINIO_ROOT_USER=admin" \
    -e "MINIO_ROOT_PASSWORD=12345678" \
    -e "MINIO_BUCKET=my-bucket" \
    --name minio \
    minio/minio server /data

Then log into Minio and obtain an AKSK.

  1. Copy aws-bundle to the Hadoop catalog classpath and start the Gravitino server
  2. Create a metake test
  3. Execute the command above.

@gesaleh
Copy link
Author

gesaleh commented Jan 8, 2025

can you give me more details about the copy aws-bundle to the Hadoop catalog classpath and start the Gravitino server

# ls gravitino*
gravitino-api-0.7.0-incubating.jar             gravitino-core-0.7.0-incubating.jar  gravitino-server-0.7.0-incubating.jar
gravitino-catalog-common-0.7.0-incubating.jar  gravitino-docs-0.7.0-incubating.jar  gravitino-server-common-0.7.0-incubating.jar
gravitino-common-0.7.0-incubating.jar          gravitino-meta-0.7.0-incubating.jar
# pwd
/root/gravitino/libs 
root@0d6f17357280:~/gravitino/catalogs/hadoop/libs# ls
accessors-smart-1.2.jar		  commons-lang-2.6.jar				      gravitino-common-0.7.0-incubating.jar	       httpclient-4.5.2.jar	      jsp-api-2.1.jar
asm-5.0.4.jar			  commons-lang3-3.4.jar				      gravitino-core-0.7.0-incubating.jar	       httpcore-4.4.4.jar	      jsr305-3.0.0.jar
avro-1.7.7.jar			  commons-logging-1.2.jar			      gravitino-gcp-bundle-0.7.0-incubating-empty.jar  jackson-annotations-2.7.8.jar  log4j-1.2.17.jar
aws-bundle-0.7.0-incubating.jar   commons-math3-3.1.1.jar			      gravitino-gcp-bundle-0.7.0-incubating.jar        jackson-core-2.7.8.jar	      nimbus-jose-jwt-4.41.1.jar
commons-beanutils-1.9.3.jar	  commons-net-3.6.jar				      gson-2.2.4.jar				       jackson-core-asl-1.9.13.jar    paranamer-2.3.jar
commons-cli-1.2.jar		  gravitino-aliyun-bundle-0.7.0-incubating-empty.jar  hadoop-annotations-3.1.0.jar		       jackson-databind-2.7.8.jar     protobuf-java-2.5.0.jar
commons-codec-1.11.jar		  gravitino-aliyun-bundle-0.7.0-incubating.jar	      hadoop-auth-3.1.0.jar			       jackson-mapper-asl-1.9.13.jar  re2j-1.1.jar
commons-collections-3.2.2.jar	  gravitino-api-0.7.0-incubating.jar		      hadoop-client-3.1.0.jar			       jcip-annotations-1.0-1.jar     stax2-api-3.1.4.jar
commons-compress-1.4.1.jar	  gravitino-aws-bundle-0.7.0-incubating-empty.jar     hadoop-common-3.1.0.jar			       jersey-servlet-1.19.jar	      token-provider-1.0.1.jar
commons-configuration2-2.1.1.jar  gravitino-aws-bundle-0.7.0-incubating.jar	      hadoop-hdfs-3.1.0.jar			       jline-0.9.94.jar		      woodstox-core-5.0.3.jar
commons-daemon-1.0.13.jar	  gravitino-catalog-common-0.7.0-incubating.jar       hadoop-hdfs-client-3.1.0.jar		       jsch-0.1.54.jar		      xz-1.0.jar
commons-io-2.5.jar		  gravitino-catalog-hadoop-0.7.0-incubating.jar       htrace-core4-4.1.0-incubating.jar		       json-smart-2.3.jar
root@0d6f17357280:~/gravitino/catalogs/hadoop/libs# 

or another path ?

@gesaleh
Copy link
Author

gesaleh commented Jan 8, 2025

I found out it work if i gave the IP of container itself not the docker name
can you also help on the fileset and how to list after the fils in the bucket ?

Thanks

@yuqi1129
Copy link
Contributor

yuqi1129 commented Jan 8, 2025

can you also help on the fileset and how to list after the fils in the bucket ?

I'm having trouble understanding your words, could you provide more details?

@justinmclean justinmclean changed the title [Bug report] [Bug report] Updable to create fileset with minio Jan 8, 2025
@jerryshao jerryshao added the 0.8.0 Release v0.8.0 label Jan 10, 2025
@dataageek
Copy link

MinIO might require gravitino.bypass.fs.s3a.path.style.access=true config to work from Gravitino. Ref: https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/connecting.html#Third_party_stores

@yuqi1129
Copy link
Contributor

MinIO might require gravitino.bypass.fs.s3a.path.style.access=true config to work from Gravitino. Ref: hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/connecting.html#Third_party_stores

In my test and example above, this value seems to be optional for MinIO, @dataageek Is there any issue when using Gravitino to connect to MinIO regarding this point?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.8.0 Release v0.8.0 bug Something isn't working
Projects
None yet
4 participants