Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Securityadmin error: exits with node reported failures #1898

Open
prudhvigodithi opened this issue Jun 21, 2022 · 37 comments
Open

[BUG] Securityadmin error: exits with node reported failures #1898

prudhvigodithi opened this issue Jun 21, 2022 · 37 comments
Labels
bug Something isn't working good first issue These are recommended starting points for newcomers looking to make their first contributions. help wanted Community contributions are especially encouraged for these issues. triaged Issues labeled as 'Triaged' have been reviewed and are deemed actionable.

Comments

@prudhvigodithi
Copy link
Member

What is the bug?
Executing /usr/share/opensearch/plugins/opensearch-security/tools/securityadmin.sh throws error as FAIL: Expected 2 nodes to return response, but got 0
Full error

**************************************************************************
** This tool will be deprecated in the next major release of OpenSearch **
** https://github.com/opensearch-project/security/issues/1755           **
**************************************************************************
Security Admin v7
Will connect to my-first-cluster.default.svc.cluster.local:9200 ... done
Connected as "CN=admin,OU=my-first-cluster"
OpenSearch Version: 2.0.1
Contacting opensearch cluster 'opensearch' and wait for YELLOW clusterstate ...
Clustername: my-first-cluster
Clusterstate: GREEN
Number of nodes: 2
Number of data nodes: 1
.opendistro_security index already exists, so we do not need to create one.
Legacy index '.opendistro_security' (ES 6) detected (or forced). You should migrate the configuration!
Populate config from /usr/share/opensearch/config/opensearch-security/
Will update '/config' with /usr/share/opensearch/config/opensearch-security/config.yml (legacy mode)
   SUCC: Configuration for 'config' created or updated
Will update '/roles' with /usr/share/opensearch/config/opensearch-security/roles.yml (legacy mode)
   SUCC: Configuration for 'roles' created or updated
Will update '/rolesmapping' with /usr/share/opensearch/config/opensearch-security/roles_mapping.yml (legacy mode)
   SUCC: Configuration for 'rolesmapping' created or updated
Will update '/internalusers' with /usr/share/opensearch/config/opensearch-security/internal_users.yml (legacy mode)
   SUCC: Configuration for 'internalusers' created or updated
Will update '/actiongroups' with /usr/share/opensearch/config/opensearch-security/action_groups.yml (legacy mode)
   SUCC: Configuration for 'actiongroups' created or updated
Will update '/nodesdn' with /usr/share/opensearch/config/opensearch-security/nodes_dn.yml (legacy mode)
   SUCC: Configuration for 'nodesdn' created or updated
Will update '/whitelist' with /usr/share/opensearch/config/opensearch-security/whitelist.yml (legacy mode)
   SUCC: Configuration for 'whitelist' created or updated
Will update '/audit' with /usr/share/opensearch/config/opensearch-security/audit.yml (legacy mode)
   SUCC: Configuration for 'audit' created or updated
FAIL: 2 nodes reported failures. Failure is /{"_nodes":{"total":2,"successful":0,"failed":2,"failures":[{"type":"failed_node_exception","reason":"Failed node [E_Dyk7VUR_ee4wykVYJSoA]","node_id":"E_Dyk7VUR_ee4wykVYJSoA","caused_by":{"type":"static_resource_exception","reason":"static_resource_exception: Unable to load static tenants"}},{"type":"failed_node_exception","reason":"Failed node [G4U098vuRCGF8RTI3KPRPA]","node_id":"G4U098vuRCGF8RTI3KPRPA","caused_by":{"type":"static_resource_exception","reason":"Unable to load static tenants"}}]},"cluster_name":"my-first-cluster","configupdate_response":{"nodes":{},"node_size":0,"has_failures":true,"failures_size":2}}
FAIL: Expected 2 nodes to return response, but got 0
Done with failures

How can one reproduce the bug?
Start the docker container with some persistence storage and when executed /usr/share/opensearch/plugins/opensearch-security/tools/securityadmin.sh throws this error.

What is the expected behavior?
Executing Securityadmin script should create an security index as expected, when it works logs successful message as

**************************************************************************
** This tool will be deprecated in the next major release of OpenSearch **
** https://github.com/opensearch-project/security/issues/1755           **
**************************************************************************
Security Admin v7
Will connect to my-first-cluster.default.svc.cluster.local:9200 ... done
Connected as "CN=admin,OU=my-first-cluster"
OpenSearch Version: 2.0.1
Contacting opensearch cluster 'opensearch' and wait for YELLOW clusterstate ...
Clustername: my-first-cluster
Clusterstate: YELLOW
Number of nodes: 2
Number of data nodes: 1
.opendistro_security index already exists, so we do not need to create one.
Legacy index '.opendistro_security' (ES 6) detected (or forced). You should migrate the configuration!
Populate config from /usr/share/opensearch/config/opensearch-security/
Will update '/config' with /usr/share/opensearch/config/opensearch-security/config.yml (legacy mode)
   SUCC: Configuration for 'config' created or updated
Will update '/roles' with /usr/share/opensearch/config/opensearch-security/roles.yml (legacy mode)
   SUCC: Configuration for 'roles' created or updated
Will update '/rolesmapping' with /usr/share/opensearch/config/opensearch-security/roles_mapping.yml (legacy mode)
   SUCC: Configuration for 'rolesmapping' created or updated
Will update '/internalusers' with /usr/share/opensearch/config/opensearch-security/internal_users.yml (legacy mode)
   SUCC: Configuration for 'internalusers' created or updated
Will update '/actiongroups' with /usr/share/opensearch/config/opensearch-security/action_groups.yml (legacy mode)
   SUCC: Configuration for 'actiongroups' created or updated
Will update '/nodesdn' with /usr/share/opensearch/config/opensearch-security/nodes_dn.yml (legacy mode)
   SUCC: Configuration for 'nodesdn' created or updated
Will update '/whitelist' with /usr/share/opensearch/config/opensearch-security/whitelist.yml (legacy mode)
   SUCC: Configuration for 'whitelist' created or updated
Will update '/audit' with /usr/share/opensearch/config/opensearch-security/audit.yml (legacy mode)
   SUCC: Configuration for 'audit' created or updated
SUCC: Expected 7 config types for node {"updated_config_types":["config","roles","rolesmapping","internalusers","actiongroups","nodesdn","audit"],"updated_config_size":7,"message":null} is 7 (["config","roles","rolesmapping","internalusers","actiongroups","nodesdn","audit"]) due to: null
SUCC: Expected 7 config types for node {"updated_config_types":["config","roles","rolesmapping","internalusers","actiongroups","nodesdn","audit"],"updated_config_size":7,"message":null} is 7 (["config","roles","rolesmapping","internalusers","actiongroups","nodesdn","audit"]) due to: null
Done with success

What is your host/environment?

  • OS: 2.0.1
  • Version [e.g. 22]
  • Plugins: Docker container docker.io/opensearchproject/opensearch:2.0.1

Do you have any additional context?
Following the issue in past opensearch-project/helm-charts#158, this was not resolved with config_version: 2 in action_groups.yml, is there a co-relation with config_version: 2?

This issue is raised to help OpenSearch Kubernetes Operator compatible with 2.0.0 series of OpenSearch.
opensearch-project/opensearch-k8s-operator#176

@prudhvigodithi prudhvigodithi added bug Something isn't working untriaged Require the attention of the repository maintainers and may need to be prioritized labels Jun 21, 2022
@prudhvigodithi
Copy link
Member Author

This could be an issue

.opendistro_security index does not exists, attempt to create it ... ERR: An unexpected SocketTimeoutException occured: 30,000 milliseconds timeout on connection http-outgoing-5 [ACTIVE]
Trace:
java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-5 [ACTIVE]
	at org.opensearch.client.RestClient.extractAndWrapCause(RestClient.java:905)
	at org.opensearch.client.RestClient.performRequest(RestClient.java:307)
	at org.opensearch.client.RestClient.performRequest(RestClient.java:295)
	at org.opensearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1762)
	at org.opensearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1745)
	at org.opensearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1709)
	at org.opensearch.client.IndicesClient.create(IndicesClient.java:159)
	at org.opensearch.security.tools.SecurityAdmin.createConfigIndex(SecurityAdmin.java:1171)
	at org.opensearch.security.tools.SecurityAdmin.execute(SecurityAdmin.java:677)
	at org.opensearch.security.tools.SecurityAdmin.main(SecurityAdmin.java:161)
Caused by: java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-5 [ACTIVE]
	at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.timeout(HttpAsyncRequestExecutor.java:387)
	at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:92)
	at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:39)
	at org.apache.http.impl.nio.reactor.AbstractIODispatch.timeout(AbstractIODispatch.java:175)
	at org.apache.http.impl.nio.reactor.BaseIOReactor.sessionTimedOut(BaseIOReactor.java:261)
	at org.apache.http.impl.nio.reactor.AbstractIOReactor.timeoutCheck(AbstractIOReactor.java:502)
	at org.apache.http.impl.nio.reactor.BaseIOReactor.validate(BaseIOReactor.java:211)
	at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:280)
	at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
	at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591)
	at java.base/java.lang.Thread.run(Thread.java:833)

Trying to create an index early, causing this error, after a certain period, then I get Done with success, would be helpful, if this can be handled by /usr/share/opensearch/plugins/opensearch-security/tools/securityadmin.sh script.

@smlx
Copy link

smlx commented Jun 24, 2022

Can you please share a full docker-compose.yaml reproducing this error?

@prudhvigodithi
Copy link
Member Author

Can you please share a full docker-compose.yaml reproducing this error?

Hey @smlx, I can even try with docker-compose.yaml but I have seen this with a k8s stateful set executing /usr/share/opensearch/plugins/opensearch-security/tools/securityadmin.sh -cacert /certs/ca.crt -cert /certs/tls.crt -key /certs/tls.key -cd /usr/share/opensearch/config/opensearch-security -icl -nhnv -h my-first-cluster.default.svc.cluster.local -p 9200 , just immediately the cluster is launched, but that said works after sleep 60's and then initiate the above command again, so from the above error , I think .opendistro_security was not properly created, hence it fails with error

FAIL: 2 nodes reported failures. Failure is /{"_nodes":{"total":2,"successful":0,"failed":2,"failures":[{"type":"failed_node_exception","reason":"Failed node [E_Dyk7VUR_ee4wykVYJSoA]","node_id":"E_Dyk7VUR_ee4wykVYJSoA","caused_by":{"type":"static_resource_exception","reason":"static_resource_exception: Unable to load static tenants"}},{"type":"failed_node_exception","reason":"Failed node [G4U098vuRCGF8RTI3KPRPA]","node_id":"G4U098vuRCGF8RTI3KPRPA","caused_by":{"type":"static_resource_exception","reason":"Unable to load static tenants"}}]},"cluster_name":"my-first-cluster","configupdate_response":{"nodes":{},"node_size":0,"has_failures":true,"failures_size":2}}
FAIL: Expected 2 nodes to return response, but got 0
Done with failures

Additionally even without sleep 60's I have not seen when using emptyDir: {}, only with persistence storage, as this will add little more time to provision the persistence storage and allow the contents to be written on it (pre-warning).

@prudhvigodithi
Copy link
Member Author

Additionally, once I get this java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-5 [ACTIVE] , next time when I run the securityadmin.sh I get the message as .opendistro_security index already exists, so we do not need to create one. and not as .opendistro_security index does not exists, attempt to create it ... done (0-all replicas)

@daxxog
Copy link

daxxog commented Jun 27, 2022

I ran into this same issue with securityadmin.sh timing out (java.net.SocketTimeoutException: 30,000 milliseconds) when try to deploy an OpenSearch 2.0.0 cluster on Kubernetes. I am running my securityadmin.sh as a k8s Job, which fails with the timeout and re-creates the pod and yields the same .opendistro_security index already exists, so we do not need to create one. and Done with failures on the next run(s). My cluster health was green and I was running three manager nodes in a StatefulSet. I deleted the namespace I was running the OpenSearch cluster in and rebuilt the cluster using emptyDir: {} as @prudhvigodithi mentioned, which yielded a successful result when running securityadmin.sh! My suspicion is that the StorageClass I am using in my Kubernetes environment is too slow to run OpenSearch. I do know that the StorageClass is backed by slow spinning disks, whereas the nodes are on SSDs which would explain why emptyDir: {} worked without timeouts and using an actual persistent volume failed.

@prudhvigodithi
Copy link
Member Author

Hey @daxxog just to add as a hack, this sleep scenario should work temporarily, but still there are some chances this can re-occur. Also yes this causing with slow disk startups with using StorageClass.
CC. @bbarani @peterzhuamazon

@daxxog
Copy link

daxxog commented Jun 27, 2022

Even with the sleep I was unable to get it to work, as my cluster storage was just slow in general (not just at startup). What I ended up doing:

  • Created a 3-node manager cluster using emptyDir: {}
  • Ran securityadmin.sh, without errors
  • Added 6 manager-eligible data nodes to the cluster, using my slow StorageClass
  • Waited for cluster to be green
  • Deleted the original 3 manager nodes
  • Waited for cluster to be green
  • Added 3 manager nodes back, with slow StorageClass
  • Waited for cluster to be green
  • Deleted one data node at a time, waiting for cluster to be green in-between each deletion

@DarshitChanpura
Copy link
Member

[Triage] Hey @prudhvigodithi, looks like you got this resolved. For anyone with an issue please open a new github issue. For anyone is looking for support please open an issue on the forum so we can track it.

@DarshitChanpura DarshitChanpura removed the untriaged Require the attention of the repository maintainers and may need to be prioritized label Jun 28, 2022
@prudhvigodithi
Copy link
Member Author

Hey @DarshitChanpura, I have used a hack to make it work but looks like for @daxxog its not yet resolved he ended up with another hack, the idea here is for securityadmin to handle this connection to the cluster and create a .opendistro_security index once the cluster is fully ready, could we consider reopening this and can be tracked here ?
Thank you

@DarshitChanpura
Copy link
Member

Hey @prudhvigodithi Would you mind providing more details on how to reproduce this as it seems like a migration issue from an older version. What are the steps needed to follow to reproduce the issue independently, including which previous version you were using?

@prudhvigodithi
Copy link
Member Author

Hey @DarshitChanpura this is an installation from scratch for 2.0.1 version, here are the steps that we can re-produce in kubernetes. For 1.x version somehow this error does not show up as the connection is initiated via transport client using port 9300, have seen this with port 9200 (or with another port from http.port ) http connection.

  1. Create securityadmin.sh as a k8s Job that can connect to the cluster that has just started.
apiVersion: batch/v1
kind: Job
metadata:
  generation: 1
  labels:
    controller-uid: 0881c5cd-a44a-4d34-938f-2f05a14807de
    job-name: my-first-cluster-securityconfig-update
  name: my-first-cluster-securityconfig-update
  namespace: default
spec:
  backoffLimit: 0
  completionMode: NonIndexed
  completions: 1
  parallelism: 1
  selector:
    matchLabels:
      controller-uid: 0881c5cd-a44a-4d34-938f-2f05a14807de
  suspend: false
  template:
    metadata:
      creationTimestamp: null
      labels:
        controller-uid: 0881c5cd-a44a-4d34-938f-2f05a14807de
        job-name: my-first-cluster-securityconfig-update
      name: my-first-cluster-securityconfig-update
    spec:
      containers:
      - args:
        - ADMIN=/usr/share/opensearch/plugins/opensearch-security/tools/securityadmin.sh;chmod
          +x $ADMIN; count=0; until $ADMIN -cacert /certs/ca.crt -cert /certs/tls.crt -key /certs/tls.key
          -cd /usr/share/opensearch/config/opensearch-security -icl -nhnv -h my-first-cluster.default.svc.cluster.local
          -p 9200 || (( count++ >= 20 )); do  sleep 20; done
        command:
        - /bin/bash
        - -c
        image: docker.io/opensearchproject/opensearch:2.0.1
        imagePullPolicy: IfNotPresent
        name: updater
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /usr/share/opensearch/config/tls-transport
          name: transport-cert
        - mountPath: /usr/share/opensearch/config/tls-http
          name: http-cert
        - mountPath: /usr/share/opensearch/config/opensearch-security/action_groups.yml
          name: securityconfig
          readOnly: true
          subPath: action_groups.yml
        - mountPath: /usr/share/opensearch/config/opensearch-security/config.yml
          name: securityconfig
          readOnly: true
          subPath: config.yml
        - mountPath: /usr/share/opensearch/config/opensearch-security/internal_users.yml
          name: securityconfig
          readOnly: true
          subPath: internal_users.yml
        - mountPath: /usr/share/opensearch/config/opensearch-security/nodes_dn.yml
          name: securityconfig
          readOnly: true
          subPath: nodes_dn.yml
        - mountPath: /usr/share/opensearch/config/opensearch-security/roles.yml
          name: securityconfig
          readOnly: true
          subPath: roles.yml
        - mountPath: /usr/share/opensearch/config/opensearch-security/roles_mapping.yml
          name: securityconfig
          readOnly: true
          subPath: roles_mapping.yml
        - mountPath: /usr/share/opensearch/config/opensearch-security/tenants.yml
          name: securityconfig
          readOnly: true
          subPath: tenants.yml
        - mountPath: /usr/share/opensearch/config/opensearch-security/whitelist.yml
          name: securityconfig
          readOnly: true
          subPath: whitelist.yml
        - mountPath: /certs
          name: admin-cert
      dnsPolicy: ClusterFirst
      restartPolicy: Never
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 5
      volumes:
      - name: transport-cert
        secret:
          defaultMode: 420
          secretName: my-first-cluster-transport-cert
      - name: http-cert
        secret:
          defaultMode: 420
          secretName: my-first-cluster-http-cert
      - name: securityconfig
        secret:
          defaultMode: 420
          secretName: securityconfig-secret
      - name: admin-cert
        secret:
          defaultMode: 420
          secretName: my-first-cluster-admin-cert
  1. The cluster uses backend Persistence layer (StorageClass), backend can be AWS EBS or any cloud provider storage.

  2. While the cluster is being initialized, invoking securityadmin.sh from the above job using as

/usr/share/opensearch/plugins/opensearch-security/tools/securityadmin.sh -cacert /certs/ca.crt -cert /certs/tls.crt -key /certs/tls.key -cd /usr/share/opensearch/config/opensearch-security -icl -nhnv -h my-first-cluster.default.svc.cluster.local -p 9200
  1. The logs of the job are:
    Will connect to my-first-cluster.default.svc.cluster.local:9300 ... done
    Then it will error out as follows,
.opendistro_security index does not exists, attempt to create it ... ERR: An unexpected SocketTimeoutException occured: 30,000 milliseconds timeout on connection http-outgoing-5 [ACTIVE]
Trace:
java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-5 [ACTIVE]
	at org.opensearch.client.RestClient.extractAndWrapCause(RestClient.java:905)
	at org.opensearch.client.RestClient.performRequest(RestClient.java:307)
	at org.opensearch.client.RestClient.performRequest(RestClient.java:295)
	at org.opensearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1762)
	at org.opensearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1745)
	at org.opensearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1709)
	at org.opensearch.client.IndicesClient.create(IndicesClient.java:159)
	at org.opensearch.security.tools.SecurityAdmin.createConfigIndex(SecurityAdmin.java:1171)
	at org.opensearch.security.tools.SecurityAdmin.execute(SecurityAdmin.java:677)
	at org.opensearch.security.tools.SecurityAdmin.main(SecurityAdmin.java:161)
Caused by: java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-5 [ACTIVE]

After few mins Upon retry

FAIL: 2 nodes reported failures. Failure is /{"_nodes":{"total":2,"successful":0,"failed":2,"failures":[{"type":"failed_node_exception","reason":"Failed node [tFlDyOnUTv2jmIxX7ZT3Gw]","node_id":"tFlDyOnUTv2jmIxX7ZT3Gw","caused_by":{"type":"static_resource_exception","reason":"static_resource_exception: Unable to load static tenants"}},{"type":"failed_node_exception","reason":"Failed node [-5cpR334Sm-4GGtL7f46yQ]","node_id":"-5cpR334Sm-4GGtL7f46yQ","caused_by":{"type":"static_resource_exception","reason":"Unable to load static tenants"}}]},"cluster_name":"my-first-cluster","configupdate_response":{"nodes":{},"node_size":0,"has_failures":true,"failures_size":2}}

The cluster logs meanwhile: This is catch 22, the cluster wont start as the securityadmin not initialized, but the job that is responsible to to run securityadmin fails with java.net.SocketTimeoutException: 30,000
logs: [2022-06-28T19:51:24,580][ERROR][o.o.s.a.BackendRegistry ] [my-first-cluster-bootstrap-0] Not yet initialized (you may need to run securityadmin)

@cliu123
Copy link
Member

cliu123 commented Jun 28, 2022

@prudhvigodithi The scenario that you have does not seem to be exactly same as the one originally being filed. Looking at the issue description, I see the following in the logs. That indicates that there was an existing .opendistro_security index in the cluster. So there are more than 1 use case that needs to be investigated here.

.opendistro_security index already exists, so we do not need to create one.
Legacy index '.opendistro_security' (ES 6) detected (or forced). You should migrate the configuration!

@prudhvigodithi
Copy link
Member Author

prudhvigodithi commented Jun 28, 2022

Hey @cliu123, it starts with java.net.SocketTimeoutException: 30,000, and then upon retry it says .opendistro_security index already exists, and error as FAIL: 2 nodes reported failures. Failure is /{"_nodes":{"total":2,"successful":0,"failed":2,"failures":[{"type":"failed_node_exception","reason":"Failed node
Which is what I have raised the issue with.

I suspect the .opendistro_security gets created even with the error java.net.SocketTimeoutException: 30,000.

Full Error log

k logs my-first-cluster-securityconfig-update--1-9z55w -f
Waiting to connect to the cluster
OpenSearch Security not initialized.**************************************************************************
** This tool will be deprecated in the next major release of OpenSearch **
** https://github.com/opensearch-project/security/issues/1755           **
**************************************************************************
Security Admin v7
Will connect to my-first-cluster.default.svc.cluster.local:9400 ... done
Connected as "CN=admin,OU=my-first-cluster"
OpenSearch Version: 2.0.1
Contacting opensearch cluster 'opensearch' and wait for YELLOW clusterstate ...
Clustername: my-first-cluster
Clusterstate: GREEN
Number of nodes: 1
Number of data nodes: 0
.opendistro_security index does not exists, attempt to create it ... ERR: An unexpected SocketTimeoutException occured: 30,000 milliseconds timeout on connection http-outgoing-5 [ACTIVE]
Trace:
java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-5 [ACTIVE]
	at org.opensearch.client.RestClient.extractAndWrapCause(RestClient.java:905)
	at org.opensearch.client.RestClient.performRequest(RestClient.java:307)
	at org.opensearch.client.RestClient.performRequest(RestClient.java:295)
	at org.opensearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1762)
	at org.opensearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1745)
	at org.opensearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1709)
	at org.opensearch.client.IndicesClient.create(IndicesClient.java:159)
	at org.opensearch.security.tools.SecurityAdmin.createConfigIndex(SecurityAdmin.java:1171)
	at org.opensearch.security.tools.SecurityAdmin.execute(SecurityAdmin.java:677)
	at org.opensearch.security.tools.SecurityAdmin.main(SecurityAdmin.java:161)
Caused by: java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-5 [ACTIVE]
	at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.timeout(HttpAsyncRequestExecutor.java:387)
	at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:92)
	at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:39)
	at org.apache.http.impl.nio.reactor.AbstractIODispatch.timeout(AbstractIODispatch.java:175)
	at org.apache.http.impl.nio.reactor.BaseIOReactor.sessionTimedOut(BaseIOReactor.java:261)
	at org.apache.http.impl.nio.reactor.AbstractIOReactor.timeoutCheck(AbstractIOReactor.java:502)
	at org.apache.http.impl.nio.reactor.BaseIOReactor.validate(BaseIOReactor.java:211)
	at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:280)
	at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
	at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591)
	at java.base/java.lang.Thread.run(Thread.java:833)


**************************************************************************
** This tool will be deprecated in the next major release of OpenSearch **
** https://github.com/opensearch-project/security/issues/1755           **
**************************************************************************
Security Admin v7
Will connect to my-first-cluster.default.svc.cluster.local:9400 ... done
Connected as "CN=admin,OU=my-first-cluster"
OpenSearch Version: 2.0.1
Contacting opensearch cluster 'opensearch' and wait for YELLOW clusterstate ...
Clustername: my-first-cluster
Clusterstate: GREEN
Number of nodes: 2
Number of data nodes: 1
.opendistro_security index already exists, so we do not need to create one.
Legacy index '.opendistro_security' (ES 6) detected (or forced). You should migrate the configuration!
Populate config from /usr/share/opensearch/config/opensearch-security/
Will update '/config' with /usr/share/opensearch/config/opensearch-security/config.yml (legacy mode)
   SUCC: Configuration for 'config' created or updated
Will update '/roles' with /usr/share/opensearch/config/opensearch-security/roles.yml (legacy mode)
   SUCC: Configuration for 'roles' created or updated
Will update '/rolesmapping' with /usr/share/opensearch/config/opensearch-security/roles_mapping.yml (legacy mode)
   SUCC: Configuration for 'rolesmapping' created or updated
Will update '/internalusers' with /usr/share/opensearch/config/opensearch-security/internal_users.yml (legacy mode)
   SUCC: Configuration for 'internalusers' created or updated
Will update '/actiongroups' with /usr/share/opensearch/config/opensearch-security/action_groups.yml (legacy mode)
   SUCC: Configuration for 'actiongroups' created or updated
Will update '/nodesdn' with /usr/share/opensearch/config/opensearch-security/nodes_dn.yml (legacy mode)
   SUCC: Configuration for 'nodesdn' created or updated
Will update '/whitelist' with /usr/share/opensearch/config/opensearch-security/whitelist.yml (legacy mode)
   SUCC: Configuration for 'whitelist' created or updated
Will update '/audit' with /usr/share/opensearch/config/opensearch-security/audit.yml (legacy mode)
   SUCC: Configuration for 'audit' created or updated
FAIL: 2 nodes reported failures. Failure is /{"_nodes":{"total":2,"successful":0,"failed":2,"failures":[{"type":"failed_node_exception","reason":"Failed node [tFlDyOnUTv2jmIxX7ZT3Gw]","node_id":"tFlDyOnUTv2jmIxX7ZT3Gw","caused_by":{"type":"static_resource_exception","reason":"static_resource_exception: Unable to load static tenants"}},{"type":"failed_node_exception","reason":"Failed node [-5cpR334Sm-4GGtL7f46yQ]","node_id":"-5cpR334Sm-4GGtL7f46yQ","caused_by":{"type":"static_resource_exception","reason":"Unable to load static tenants"}}]},"cluster_name":"my-first-cluster","configupdate_response":{"nodes":{},"node_size":0,"has_failures":true,"failures_size":2}}
FAIL: Expected 2 nodes to return response, but got 0
Done with failures
**************************************************************************
** This tool will be deprecated in the next major release of OpenSearch **
** https://github.com/opensearch-project/security/issues/1755           **
**************************************************************************
Security Admin v7
Will connect to my-first-cluster.default.svc.cluster.local:9400 ... done
Connected as "CN=admin,OU=my-first-cluster"
OpenSearch Version: 2.0.1
Contacting opensearch cluster 'opensearch' and wait for YELLOW clusterstate ...
Clustername: my-first-cluster
Clusterstate: GREEN
Number of nodes: 2
Number of data nodes: 1
.opendistro_security index already exists, so we do not need to create one.
Legacy index '.opendistro_security' (ES 6) detected (or forced). You should migrate the configuration!
Populate config from /usr/share/opensearch/config/opensearch-security/
Will update '/config' with /usr/share/opensearch/config/opensearch-security/config.yml (legacy mode)
   SUCC: Configuration for 'config' created or updated
Will update '/roles' with /usr/share/opensearch/config/opensearch-security/roles.yml (legacy mode)
   SUCC: Configuration for 'roles' created or updated
Will update '/rolesmapping' with /usr/share/opensearch/config/opensearch-security/roles_mapping.yml (legacy mode)
   SUCC: Configuration for 'rolesmapping' created or updated
Will update '/internalusers' with /usr/share/opensearch/config/opensearch-security/internal_users.yml (legacy mode)
   SUCC: Configuration for 'internalusers' created or updated
Will update '/actiongroups' with /usr/share/opensearch/config/opensearch-security/action_groups.yml (legacy mode)
   SUCC: Configuration for 'actiongroups' created or updated
Will update '/nodesdn' with /usr/share/opensearch/config/opensearch-security/nodes_dn.yml (legacy mode)
   SUCC: Configuration for 'nodesdn' created or updated
Will update '/whitelist' with /usr/share/opensearch/config/opensearch-security/whitelist.yml (legacy mode)
   SUCC: Configuration for 'whitelist' created or updated
Will update '/audit' with /usr/share/opensearch/config/opensearch-security/audit.yml (legacy mode)
   SUCC: Configuration for 'audit' created or updated
FAIL: 2 nodes reported failures. Failure is /{"_nodes":{"total":2,"successful":0,"failed":2,"failures":[{"type":"failed_node_exception","reason":"Failed node [tFlDyOnUTv2jmIxX7ZT3Gw]","node_id":"tFlDyOnUTv2jmIxX7ZT3Gw","caused_by":{"type":"static_resource_exception","reason":"static_resource_exception: Unable to load static tenants"}},{"type":"failed_node_exception","reason":"Failed node [-5cpR334Sm-4GGtL7f46yQ]","node_id":"-5cpR334Sm-4GGtL7f46yQ","caused_by":{"type":"static_resource_exception","reason":"Unable to load static tenants"}}]},"cluster_name":"my-first-cluster","configupdate_response":{"nodes":{},"node_size":0,"has_failures":true,"failures_size":2}}
FAIL: Expected 2 nodes to return response, but got 0
Done with failures
**************************************************************************
** This tool will be deprecated in the next major release of OpenSearch **
** https://github.com/opensearch-project/security/issues/1755           **
**************************************************************************
Security Admin v7
Will connect to my-first-cluster.default.svc.cluster.local:9400 ... done
Connected as "CN=admin,OU=my-first-cluster"
OpenSearch Version: 2.0.1
Contacting opensearch cluster 'opensearch' and wait for YELLOW clusterstate ...
Clustername: my-first-cluster
Clusterstate: GREEN
Number of nodes: 2
Number of data nodes: 1
.opendistro_security index already exists, so we do not need to create one.
Legacy index '.opendistro_security' (ES 6) detected (or forced). You should migrate the configuration!
Populate config from /usr/share/opensearch/config/opensearch-security/
Will update '/config' with /usr/share/opensearch/config/opensearch-security/config.yml (legacy mode)
   SUCC: Configuration for 'config' created or updated
Will update '/roles' with /usr/share/opensearch/config/opensearch-security/roles.yml (legacy mode)
   SUCC: Configuration for 'roles' created or updated
Will update '/rolesmapping' with /usr/share/opensearch/config/opensearch-security/roles_mapping.yml (legacy mode)
   SUCC: Configuration for 'rolesmapping' created or updated
Will update '/internalusers' with /usr/share/opensearch/config/opensearch-security/internal_users.yml (legacy mode)
   SUCC: Configuration for 'internalusers' created or updated
Will update '/actiongroups' with /usr/share/opensearch/config/opensearch-security/action_groups.yml (legacy mode)
   SUCC: Configuration for 'actiongroups' created or updated
Will update '/nodesdn' with /usr/share/opensearch/config/opensearch-security/nodes_dn.yml (legacy mode)
   SUCC: Configuration for 'nodesdn' created or updated
Will update '/whitelist' with /usr/share/opensearch/config/opensearch-security/whitelist.yml (legacy mode)
   SUCC: Configuration for 'whitelist' created or updated
Will update '/audit' with /usr/share/opensearch/config/opensearch-security/audit.yml (legacy mode)
   SUCC: Configuration for 'audit' created or updated
FAIL: 2 nodes reported failures. Failure is /{"_nodes":{"total":2,"successful":0,"failed":2,"failures":[{"type":"failed_node_exception","reason":"Failed node [tFlDyOnUTv2jmIxX7ZT3Gw]","node_id":"tFlDyOnUTv2jmIxX7ZT3Gw","caused_by":{"type":"static_resource_exception","reason":"static_resource_exception: Unable to load static tenants"}},{"type":"failed_node_exception","reason":"Failed node [-5cpR334Sm-4GGtL7f46yQ]","node_id":"-5cpR334Sm-4GGtL7f46yQ","caused_by":{"type":"static_resource_exception","reason":"Unable to load static tenants"}}]},"cluster_name":"my-first-cluster","configupdate_response":{"nodes":{},"node_size":0,"has_failures":true,"failures_size":2}}
FAIL: Expected 2 nodes to return response, but got 0

@daxxog
Copy link

daxxog commented Jun 30, 2022

Hey @prudhvigodithi Would you mind providing more details on how to reproduce this as it seems like a migration issue from an older version. What are the steps needed to follow to reproduce the issue independently, including which previous version you were using?

My issue was also encountered during a from-scratch installation, not an upgrade.

@prudhvigodithi
Copy link
Member Author

I have done research and have some findings for this, the securityadmin client should have .setSocketTimeout for the RestLevelClient, initially I assumed its because of net.ipv4.tcp_keepalive_time thats dropping the connections but its not true, I see the same even after passing this sysctls with PodSecurityContext. So socketTimeout should be set during client creation.

@peternied
Copy link
Member

[Triage] @cliu123 could you look into this and let us know what your findings are?

@cliu123
Copy link
Member

cliu123 commented Jul 12, 2022

The error does not happen when using emptyDir as @prudhvigodithi mentioned.

@prudhvigodithi
Copy link
Member Author

I have some findings for securityadmin error, the fix has to be from the client end (which is on our case the securityadmin)to use .setSocketTimeout for the RestHighLevelClient.
So socketTimeout should be set during client creation.
Something like .setSocketTimeout(OpenSearchConfig().getClientSocketTimeout());

@cliu123
Copy link
Member

cliu123 commented Jul 12, 2022

@prudhvigodithi Good finding! If you'd like to PR the fix, we'd love to review. Thanks!

@elkh510
Copy link

elkh510 commented Jul 19, 2022

hi
any updates
thank you !

@cliu123 cliu123 added the help wanted Community contributions are especially encouraged for these issues. label Jul 19, 2022
@Freeaqingme
Copy link

I'm having a similar problem as well. Am trying to bootstrap a 2.2.0 cluster from scratch, and when running securityadmin I get:

OPENSEARCH_JAVA_HOME=/opt/opensearch/opensearch-2.2.0/jdk/ bash -x /opt/opensearch/opensearch-2.2.0/plugins/opensearch-security/tools/securityadmin.sh -cd /etc/opensearch/opensearch-security/ -icl -nhnv -cacert /etc/opensearch-pki/root-ca.pem -cert /etc/opensearch-pki/client-CLUSTERADMIN.pem -key /etc/opensearch-pki/client-CLUSTERADMIN-key.pem
+ echo '**************************************************************************'
**************************************************************************
+ echo '** This tool will be deprecated in the next major release of OpenSearch **'
** This tool will be deprecated in the next major release of OpenSearch **
+ echo '** https://github.com/opensearch-project/security/issues/1755           **'
** https://github.com/opensearch-project/security/issues/1755           **
+ echo '**************************************************************************'
**************************************************************************
+ SCRIPT_PATH=/opt/opensearch/opensearch-2.2.0/plugins/opensearch-security/tools/securityadmin.sh
++ command -v realpath
+ '[' -x /usr/bin/realpath ']'
++++ realpath /opt/opensearch/opensearch-2.2.0/plugins/opensearch-security/tools/securityadmin.sh
+++ dirname /opt/opensearch/opensearch-2.2.0/plugins/opensearch-security/tools/securityadmin.sh
++ cd /opt/opensearch/opensearch-2.2.0/plugins/opensearch-security/tools
++ pwd -P
+ DIR=/opt/opensearch/opensearch-2.2.0/plugins/opensearch-security/tools
+ BIN_PATH=java
+ '[' '!' -z /opt/opensearch/opensearch-2.2.0/jdk/ ']'
+ BIN_PATH=/opt/opensearch/opensearch-2.2.0/jdk//bin/java
+ /opt/opensearch/opensearch-2.2.0/jdk//bin/java -Dorg.apache.logging.log4j.simplelog.StatusLogger.level=OFF -cp '/opt/opensearch/opensearch-2.2.0/plugins/opensearch-security/tools/../*:/opt/opensearch/opensearch-2.2.0/plugins/opensearch-security/tools/../../../lib/*:/opt/opensearch/opensearch-2.2.0/plugins/opensearch-security/tools/../deps/*' org.opensearch.security.tools.SecurityAdmin -cd /etc/opensearch/opensearch-security/ -icl -nhnv -cacert /etc/opensearch-pki/root-ca.pem -cert /etc/opensearch-pki/client-CLUSTERADMIN.pem -key /etc/opensearch-pki/client-CLUSTERADMIN-key.pem
Security Admin v7
Will connect to localhost:9200 ... done
Connected as "CN=CLIENT-CLUSTERADMIN"
OpenSearch Version: 2.2.0
Contacting opensearch cluster 'opensearch' and wait for YELLOW clusterstate ...
Clustername: foobar-ote
Clusterstate: GREEN
Number of nodes: 1
Number of data nodes: 1
.opendistro_security index already exists, so we do not need to create one.
Legacy index '.opendistro_security' (ES 6) detected (or forced). You should migrate the configuration!
Populate config from /etc/opensearch/opensearch-security/
Will update '/config' with /etc/opensearch/opensearch-security/config.yml (legacy mode)
   SUCC: Configuration for 'config' created or updated
Will update '/roles' with /etc/opensearch/opensearch-security/roles.yml (legacy mode)
   SUCC: Configuration for 'roles' created or updated
Will update '/rolesmapping' with /etc/opensearch/opensearch-security/roles_mapping.yml (legacy mode)
   SUCC: Configuration for 'rolesmapping' created or updated
Will update '/internalusers' with /etc/opensearch/opensearch-security/internal_users.yml (legacy mode)
   SUCC: Configuration for 'internalusers' created or updated
Will update '/actiongroups' with /etc/opensearch/opensearch-security/action_groups.yml (legacy mode)
   SUCC: Configuration for 'actiongroups' created or updated
Will update '/nodesdn' with /etc/opensearch/opensearch-security/nodes_dn.yml (legacy mode)
   SUCC: Configuration for 'nodesdn' created or updated
Will update '/whitelist' with /etc/opensearch/opensearch-security/whitelist.yml (legacy mode)
   SUCC: Configuration for 'whitelist' created or updated
Will update '/audit' with /etc/opensearch/opensearch-security/audit.yml (legacy mode)
   SUCC: Configuration for 'audit' created or updated
FAIL: 1 nodes reported failures. Failure is /{"_nodes":{"total":1,"successful":0,"failed":1,"failures":[{"type":"failed_node_exception","reason":"Failed node [MerZFlm7TM-pyIlvp2QKwA]","node_id":"MerZFlm7TM-pyIlvp2QKwA","caused_by":{"type":"static_resource_exception","reason":"Unable to load static tenants"}}]},"cluster_name":"foobar-ote","configupdate_response":{"nodes":{},"node_size":0,"has_failures":true,"failures_size":1}}
FAIL: Expected 1 nodes to return response, but got 0
Done with failures
sh -x foo  3.35s user 0.17s system 156% cpu 2.252 total

The script finishes in ~3 seconds, so I'm not sure if it's related to a timeout. In my case it's run on Ubuntu, without k8s or anything. Is there an obvious fix I'm missing here?

@peternied
Copy link
Member

@Freeaqingme Have you examined the local opensearch log? Exceptions/errors should be visible to help you troubleshoot what is at issue.

@Freeaqingme
Copy link

Actually I was about to update this. It's possible that this specific node was installed with 2.1.0, and right after upgraded to 2.2.0. I did follow the logs, but the only exception I noticed was "Unable to load static tenants". Having said that, I removed its datadir and afterwards everything ran just fine. So in this case, that was one way of solving the problem...

@prudhvigodithi
Copy link
Member Author

For me I got this with fresh installation itself, so nothing to do with the old version nodes ;), just following up to see if there is any update with this issue?
@peternied @cliu123
Thank you

@kcharikrish
Copy link

kcharikrish commented Sep 16, 2022

Hello All,
I have tried to explain my scenario in points.

  1. Installed opensearch and opensearch dashboards - version 2.2.1 using yum
  2. Was able to start both opensearch and opensearch dashboards without any issues
  3. Accessed OS dashboards using Chrome and even data ingestion from metric beat was successful
  4. As it was single node cluster, tried building a cluster formation with one cluster-master node and two data nodes.
  5. After making required changes as listed in the documentation, I was able to start the cluster-master.
  6. But while doing a CURL command to the cluster, was getting "OpenSearch Security not initialized"
  7. So as per my research following the link to sort out the security admin issue

Request the group's help to see if this a known issue and confirm any workaround in place?

Execution of security admin is leading up to the following errors:
./securityadmin.sh -cd /etc/opensearch/opensearch-security -rev -icl -nhnv -cacert ../../../config/root-ca.pem -cert ../../../config/kirk.pem -key ../../../config/kirk-key.pem -h <private-IP> -p 9200 --accept-red-cluster

Logs from security admin execution:

Security Admin v7
Will connect to 10.1.4.117:9200 ... done
Connected as "CN=kirk,OU=client,O=client,L=test,C=de"
OpenSearch Version: 2.2.1
Contacting opensearch cluster 'opensearch' ...
Clustername: opensearch-cluster
Clusterstate: RED
Number of nodes: 1
Number of data nodes: 0
.opendistro_security index already exists, so we do not need to create one.
ERR: .opendistro_security index state is RED.
Legacy index '.opendistro_security' (ES 6) detected (or forced). You should migrate the configuration!
Populate config from /etc/opensearch/opensearch-security/
Will update '/config' with /etc/opensearch/opensearch-security/config.yml (legacy mode)
   FAIL: Configuration for 'config' failed because of java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-6 [ACTIVE]
Will update '/roles' with /etc/opensearch/opensearch-security/roles.yml (legacy mode)
   FAIL: Configuration for 'roles' failed because of java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-7 [ACTIVE]
Will update '/rolesmapping' with /etc/opensearch/opensearch-security/roles_mapping.yml (legacy mode)
   FAIL: Configuration for 'rolesmapping' failed because of java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-8 [ACTIVE]
Will update '/internalusers' with /etc/opensearch/opensearch-security/internal_users.yml (legacy mode)
   FAIL: Configuration for 'internalusers' failed because of java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-9 [ACTIVE]
Will update '/actiongroups' with /etc/opensearch/opensearch-security/action_groups.yml (legacy mode)
   FAIL: Configuration for 'actiongroups' failed because of java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-10 [ACTIVE]
Will update '/nodesdn' with /etc/opensearch/opensearch-security/nodes_dn.yml (legacy mode)
   FAIL: Configuration for 'nodesdn' failed because of java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-11 [ACTIVE]
Will update '/whitelist' with /etc/opensearch/opensearch-security/whitelist.yml (legacy mode)
   FAIL: Configuration for 'whitelist' failed because of java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-12 [ACTIVE]
Will update '/audit' with /etc/opensearch/opensearch-security/audit.yml (legacy mode)
   FAIL: Configuration for 'audit' failed because of java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-13 [ACTIVE]
ERR: cannot upload configuration, see errors above

@davidlago davidlago added the triaged Issues labeled as 'Triaged' have been reviewed and are deemed actionable. label Oct 10, 2022
@stephen-crawford
Copy link
Contributor

[TRIAGE] @peternied can you follow up with this issue to make sure the issue remains. Thank you.

@peternied
Copy link
Member

@kcharikrish To capture the discussion about your issue, I've created #2173 to so we can close out this older issue.

@prudhvigodithi
Copy link
Member Author

Hey @peternied I would suggest to keep this issue open and track the progress on this same issue as there are lot of good points/discussions from multiple users. Also the concern raised by @kcharikrish is the same that was raised earlier by myself and others, only difference I see is the installation is by yum and not by docker or k8s but the error is same. Long story short lets keep this issue open until we have a fix.
Thank you

@peternied
Copy link
Member

@prudhvigodithi There is a lot of discussion on this issue making it hard to understand what is a problem that needs resolution, what was an incidental issue, and what remains. What scenario is broken that you think needs to be resolved?

@prudhvigodithi
Copy link
Member Author

prudhvigodithi commented Oct 18, 2022

@peternied the issue is still the same as stated in #1898 (comment), #1898 (comment) #1898 (comment) #1898 (comment). I have proposed a solution here #1898 (comment), this could help fix the issue.

There is a lot of discussion on this issue making it hard to understand

One way its good to have lot of discussion it helps to validate the scenarios once we have the solution. Finally i'm just trying to help/keep this issue open to everyone facing this problem are aware once problem is fixed. I'm fine as well however you wanted to proceed further.
Thanks

@peternied
Copy link
Member

@prudhvigodithi Thanks for reviewing the issue, to me the key part that needs to be followed up on is:

I have some findings for securityadmin error, the fix has to be from the client end (which is on our case the securityadmin)to use .setSocketTimeout for the RestHighLevelClient. So socketTimeout should be set during client creation.
Something like .setSocketTimeout(OpenSearchConfig().getClientSocketTimeout());

It looks like it might be useful to see this to a larger timeout for environments when the cluster is slower to start up, whereas today its fixed at the default of 30 seconds.

@stephen-crawford stephen-crawford added good first issue These are recommended starting points for newcomers looking to make their first contributions. hacktoberfest Global event that encourages people to contribute to open-source. labels Oct 24, 2022
@davidlago davidlago removed the hacktoberfest Global event that encourages people to contribute to open-source. label Nov 2, 2022
@stephen-crawford
Copy link
Contributor

stephen-crawford commented Jan 30, 2023

[Triage] This issue remains a great first issue for a contributor. The ideal solution would to be for the startup time to be configurable so that systems where 30 seconds is already long could maintain speed and other users could extend the startup time. Testing will be required for making sure that the changes are safe and function with different configuration options.

@kannanvr
Copy link

We are also facing the similar issue now.. Any fix happened recently?

@aggarwalShivani
Copy link

Hi @peternied, @prudhvigodithi
I had a look at the RestHighLevelClient class and it does not have a way to set the socket-timeout.

Any suggestions on how this could be solved?

@prudhvigodithi
Copy link
Member Author

prudhvigodithi commented Jan 22, 2024

Hey @scrawfor99 from your comment above, even better way is having ./securityadmin.sh poll for the cluster status, wait until ready and then create the security index. Having some option like --wait-for-cluster could be passed with ./securityadmin.sh to check if the cluster is fully up and running before invoking the create index operation, when not passed it can continue with the existing setup to directly invoke the security index creation.
Adding @dblock @bbarani @peternied

@aggarwalShivani at this point to overcome this added sleep in a while loop as quick hack.
Example as https://github.com/opensearch-project/opensearch-k8s-operator/blob/main/opensearch-operator/pkg/reconcilers/securityconfig.go#L32-L37

@aggarwalShivani
Copy link

Hi @prudhvigodithi,
Securityadmin already polls for cluster status here and waits for health to be atleast yellow, before running further steps (unless we are running with --accept-red-cluster flag).

Are you suggesting to again invoke this check before creating the security index (for ex. here) or somewhere else perhaps?

@prudhvigodithi
Copy link
Member Author

Hey @aggarwalShivani if you check this error log in my comment, even though the cluster status is green (Clusterstate: GREEN) still seen the error, so along with green check there has to be another check to avoid this error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue These are recommended starting points for newcomers looking to make their first contributions. help wanted Community contributions are especially encouraged for these issues. triaged Issues labeled as 'Triaged' have been reviewed and are deemed actionable.
Projects
None yet
Development

No branches or pull requests