-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Securityadmin error: exits with node reported failures #1898
Comments
This could be an issue
Trying to create an index early, causing this error, after a certain period, then I get |
Can you please share a full |
Hey @smlx, I can even try with
Additionally even without sleep 60's I have not seen when using |
Additionally, once I get this |
I ran into this same issue with |
Hey @daxxog just to add as a hack, this sleep scenario should work temporarily, but still there are some chances this can re-occur. Also yes this causing with slow disk startups with using |
Even with the sleep I was unable to get it to work, as my cluster storage was just slow in general (not just at startup). What I ended up doing:
|
[Triage] Hey @prudhvigodithi, looks like you got this resolved. For anyone with an issue please open a new github issue. For anyone is looking for support please open an issue on the forum so we can track it. |
Hey @DarshitChanpura, I have used a hack to make it work but looks like for @daxxog its not yet resolved he ended up with another hack, the idea here is for |
Hey @prudhvigodithi Would you mind providing more details on how to reproduce this as it seems like a migration issue from an older version. What are the steps needed to follow to reproduce the issue independently, including which previous version you were using? |
Hey @DarshitChanpura this is an installation from scratch for 2.0.1 version, here are the steps that we can re-produce in kubernetes. For 1.x version somehow this error does not show up as the connection is initiated via transport client using port 9300, have seen this with port 9200 (or with another port from
After few mins Upon retry
The cluster logs meanwhile: This is catch 22, the cluster wont start as the securityadmin not initialized, but the job that is responsible to to run securityadmin fails with |
@prudhvigodithi The scenario that you have does not seem to be exactly same as the one originally being filed. Looking at the issue description, I see the following in the logs. That indicates that there was an existing
|
Hey @cliu123, it starts with I suspect the Full Error log
|
My issue was also encountered during a from-scratch installation, not an upgrade. |
I have done research and have some findings for this, the securityadmin client should have |
[Triage] @cliu123 could you look into this and let us know what your findings are? |
The error does not happen when using |
I have some findings for securityadmin error, the fix has to be from the client end (which is on our case the securityadmin)to use |
@prudhvigodithi Good finding! If you'd like to PR the fix, we'd love to review. Thanks! |
hi |
I'm having a similar problem as well. Am trying to bootstrap a 2.2.0 cluster from scratch, and when running securityadmin I get:
The script finishes in ~3 seconds, so I'm not sure if it's related to a timeout. In my case it's run on Ubuntu, without k8s or anything. Is there an obvious fix I'm missing here? |
@Freeaqingme Have you examined the local opensearch log? Exceptions/errors should be visible to help you troubleshoot what is at issue. |
Actually I was about to update this. It's possible that this specific node was installed with 2.1.0, and right after upgraded to 2.2.0. I did follow the logs, but the only exception I noticed was "Unable to load static tenants". Having said that, I removed its datadir and afterwards everything ran just fine. So in this case, that was one way of solving the problem... |
For me I got this with fresh installation itself, so nothing to do with the old version nodes ;), just following up to see if there is any update with this issue? |
Hello All,
Request the group's help to see if this a known issue and confirm any workaround in place? Execution of security admin is leading up to the following errors: Logs from security admin execution:
|
[TRIAGE] @peternied can you follow up with this issue to make sure the issue remains. Thank you. |
@kcharikrish To capture the discussion about your issue, I've created #2173 to so we can close out this older issue. |
Hey @peternied I would suggest to keep this issue open and track the progress on this same issue as there are lot of good points/discussions from multiple users. Also the concern raised by @kcharikrish is the same that was raised earlier by myself and others, only difference I see is the installation is by yum and not by docker or k8s but the error is same. Long story short lets keep this issue open until we have a fix. |
@prudhvigodithi There is a lot of discussion on this issue making it hard to understand what is a problem that needs resolution, what was an incidental issue, and what remains. What scenario is broken that you think needs to be resolved? |
@peternied the issue is still the same as stated in #1898 (comment), #1898 (comment) #1898 (comment) #1898 (comment). I have proposed a solution here #1898 (comment), this could help fix the issue.
One way its good to have lot of discussion it helps to validate the scenarios once we have the solution. Finally i'm just trying to help/keep this issue open to everyone facing this problem are aware once problem is fixed. I'm fine as well however you wanted to proceed further. |
@prudhvigodithi Thanks for reviewing the issue, to me the key part that needs to be followed up on is:
It looks like it might be useful to see this to a larger timeout for environments when the cluster is slower to start up, whereas today its fixed at the default of 30 seconds. |
[Triage] This issue remains a great first issue for a contributor. The ideal solution would to be for the startup time to be configurable so that systems where 30 seconds is already long could maintain speed and other users could extend the startup time. Testing will be required for making sure that the changes are safe and function with different configuration options. |
We are also facing the similar issue now.. Any fix happened recently? |
Hi @peternied, @prudhvigodithi Any suggestions on how this could be solved? |
Hey @scrawfor99 from your comment above, even better way is having @aggarwalShivani at this point to overcome this added sleep in a while loop as quick hack. |
Hi @prudhvigodithi, Are you suggesting to again invoke this check before creating the security index (for ex. here) or somewhere else perhaps? |
Hey @aggarwalShivani if you check this error log in my comment, even though the cluster status is green ( |
What is the bug?
Executing
/usr/share/opensearch/plugins/opensearch-security/tools/securityadmin.sh
throws error asFAIL: Expected 2 nodes to return response, but got 0
Full error
How can one reproduce the bug?
Start the docker container with some persistence storage and when executed
/usr/share/opensearch/plugins/opensearch-security/tools/securityadmin.sh
throws this error.What is the expected behavior?
Executing Securityadmin script should create an security index as expected, when it works logs successful message as
What is your host/environment?
docker.io/opensearchproject/opensearch:2.0.1
Do you have any additional context?
Following the issue in past opensearch-project/helm-charts#158, this was not resolved with
config_version: 2
inaction_groups.yml
, is there a co-relation withconfig_version: 2
?This issue is raised to help OpenSearch Kubernetes Operator compatible with 2.0.0 series of OpenSearch.
opensearch-project/opensearch-k8s-operator#176
The text was updated successfully, but these errors were encountered: