-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Monitoring Mediator Service #748
Comments
Bells did not go off because there wasn't an issue with the health endpoint for the mediator agent; switching to HA did not adversely impact http endpoints, only the WebSocket connections. We are currently monitoring the agent's admin health endpoints. In particular the Mobile apps communicate through the mediator using WebSockets, rather than http. WebSockets are stateful and were being impacted by the default routing scheme of the OpenShift routes and services. The traffic was not being routed over the same path between the client and server. Therefore the connections were breaking. There are services available that can be used to monitor WebSockets, see the resources section below, however, these services appear to only test the connection and not the long term communication over the Sockets. In the case of the mediators the initial connection is not the issue, the long term communication path over the connection is the issue. The route the communications are taking is affected by the OpenShift resource routing policies as mentioned above. We are still testing updates to the mediators HA configuration. Since WebSockets are stateful the routing of the communications also needs to be stateful. The communication channel must take the same path back and forth from client to server. In an attempt to accomplish this we have set Routes to use the So we need two things. A reliable way to monitor the stability of the WebSocket communications channel(s), and a way to reliably scale up and down based on the number of WebSocket connections. Resources:
|
Adding some additional comments I had regarding monitoring while we were troubleshooting the issues. The Circling back to the service monitoring conversation. There are ways to monitor the ability to establish a websocket connection with the mediator, however I don’t think that is going to give us anything we don’t already have with the http based monitoring. The current http based monitoring reaches through the proxy to the agent’s health endpoints (specifically If so, does anyone have any thoughts on how we could be testing the websocket communications without a lot of overhead on the agent. I’m assuming we’d have to setup another agent in order to accomplish this goal; an agent monitoring service. We could then interface that with standard uptime monitoring tools via an api. Does this sound too complicated or is there a simple way to implement this, or an even simpler way to accomplish the monitoring? |
@esune, it there a ticket somewhere else so we can close this one? |
@cvarjao ask Emiliano if they have it we can close |
I created bcgov/DITP-DevOps#145 to replace this issue. An alternative would be to move this issue to the DITP-DevOps repo if we want to keep the thread as-is rather than referencing it - either way works for me. |
When we turned on HPA in OpenShift something broken, and we didn't get any bells going off.
The text was updated successfully, but these errors were encountered: