-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new status page of all public services #3774
Comments
@LeoHChen as well as checking for endpoint uptime, what do you say we also use synthetic monitoring to inspect the response payload of key API methods to ensure data structure is valid? |
Agree. It would be better to monitor the uptime/response time of a few key APIs. A more systematic way of monitoring RPC calls would need to add instruments to the node to keep track of the number and response time of all RPC calls. However, for now, we can just add a list of key APIs that we need to monitor. @gupadhyaya , needs your input on which API we shall monitor in our status/dashboard? |
There is already a request to track the response time of curl --request POST 'http://54.189.61.183:9500' --header 'Content-Type: application/json' --data-raw '{
"jsonrpc": "2.0",
"method": "trace_block",
"params": ["0xd6739e"],
"id": 1
}' |
We need |
May be it is also worthwhile to add following APIs:
|
what will be the frequency for these critical apis in the monitoring system? if we can extend a bit more, we could also include
|
It's totally up to us. Currently, critical APIs are checked once a minute. I will add the API method checks shortly. |
@LeoHChen I've added all the metrics as per your list. Please confirm https://status.harmony.one/ - Note all of these monitors have been automated and will reflect outages and recoveries. Outstanding items:
|
@givp we need also implement @gupadhyaya specific RPC test to make sure specific feature of the RPC are working fine. @gupadhyaya would you have the actual test (what params to use) and the expected behavior ? with our recent issue was due to missing recent transaction, we might need to implement some logic to detect whether the RPCs are healthy or not |
Summary
New public status page of harmony blockchain and services
Current Design
We currently have a status page https://status.harmony.one/ to display the status of mostly the internal bootnode, validator nodes, and explorer nodes.
Problems
The current status page didn't capture all the public service like uptime of the RPC endpoints that may impact users. We need to improve it and also provide a single source of truth regarding the incidents and response.
Proposal
In the new status page, we shall display the uptime/availability of the following services on the mainnet. We may consider to add a similar page to display the status of the testnet later.
dig txt _dnsaddr.bootstrap.t.hmny.io
api.harmony.one, api.s0.t.hmny.io
ws.s0.t.hmny.io
Please add a link to the network metrics page as well. https://monitor.hmny.io/status
Reference
https://status.slack.com/
The text was updated successfully, but these errors were encountered: