Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new status page of all public services #3774

Open
LeoHChen opened this issue Jun 15, 2021 · 10 comments
Open

new status page of all public services #3774

LeoHChen opened this issue Jun 15, 2021 · 10 comments
Assignees
Labels
design Design and architectural plans/issues

Comments

@LeoHChen
Copy link
Contributor

LeoHChen commented Jun 15, 2021

Summary

New public status page of harmony blockchain and services

Current Design

We currently have a status page https://status.harmony.one/ to display the status of mostly the internal bootnode, validator nodes, and explorer nodes.

Problems

The current status page didn't capture all the public service like uptime of the RPC endpoints that may impact users. We need to improve it and also provide a single source of truth regarding the incidents and response.

Proposal

In the new status page, we shall display the uptime/availability of the following services on the mainnet. We may consider to add a similar page to display the status of the testnet later.

  • bootstrap nodes uptime, checking the connectivity of the specific port
    dig txt _dnsaddr.bootstrap.t.hmny.io
  • uptime of all API RPC endpoints, checking the connectivity of the RPC port
    api.harmony.one, api.s0.t.hmny.io
  • uptime of the WSS endpoints, using WebSocket connectivity check
    ws.s0.t.hmny.io
  • explorer, the frontend and backend service to serve the https://explorer.harmony.one
  • staking dashboard, the frontend and backend service to serve the https://staking.harmony.one
  • graph nodes backend
  • bridge service, the frontend, and backend service to serve https://bridge.harmony.one, both ETH and BSC bridges
  • multi-sig service, the frontend, and backend service to serve https://multisig.harmony.one

Please add a link to the network metrics page as well. https://monitor.hmny.io/status

Reference

https://status.slack.com/

@LeoHChen LeoHChen added the design Design and architectural plans/issues label Jun 15, 2021
@LeoHChen LeoHChen pinned this issue Jun 15, 2021
@givp
Copy link

givp commented Jun 15, 2021

@LeoHChen as well as checking for endpoint uptime, what do you say we also use synthetic monitoring to inspect the response payload of key API methods to ensure data structure is valid?

@LeoHChen
Copy link
Contributor Author

Agree. It would be better to monitor the uptime/response time of a few key APIs. A more systematic way of monitoring RPC calls would need to add instruments to the node to keep track of the number and response time of all RPC calls. However, for now, we can just add a list of key APIs that we need to monitor.

@gupadhyaya , needs your input on which API we shall monitor in our status/dashboard?

@LeoHChen
Copy link
Contributor Author

There is already a request to track the response time of trace_block coming from @hypnagonia , #3780

curl --request POST 'http://54.189.61.183:9500' --header 'Content-Type: application/json' --data-raw '{
    "jsonrpc": "2.0",
    "method": "trace_block",
    "params": ["0xd6739e"],
    "id": 1
}'

@gupadhyaya
Copy link
Contributor

gupadhyaya commented Jun 15, 2021

We need hmy_getTransactionReceipt, hmyv2_getTransactionReceipt and web socket subscription to Logs, which I think is calling hmy_getLogs. These are the two keys APIs for bridge.

@gupadhyaya
Copy link
Contributor

gupadhyaya commented Jun 15, 2021

May be it is also worthwhile to add following APIs:

  • hmy_getTransactionsHistory & hmyv2_getTransactionsHistory - related to account page loading
  • hmy_call & hmyv2_call - for smart contract calls.

@gupadhyaya
Copy link
Contributor

what will be the frequency for these critical apis in the monitoring system? if we can extend a bit more, we could also include

  • hmy_getTransactionByHash & hmyv2_getTransactionByHash - indicates tx exists in the blockchain
  • hmy_sendRawTransaction * hmyv2_sendRawTransaction - for normal transfers and any simple smart contract execution

@givp
Copy link

givp commented Jun 15, 2021

what will be the frequency for these critical apis in the monitoring system? if we can extend a bit more, we could also include

  • hmy_getTransactionByHash & hmyv2_getTransactionByHash - indicates tx exists in the blockchain
  • hmy_sendRawTransaction * hmyv2_sendRawTransaction - for normal transfers and any simple smart contract execution

It's totally up to us. Currently, critical APIs are checked once a minute. I will add the API method checks shortly.

@givp
Copy link

givp commented Jun 15, 2021

@LeoHChen I've added all the metrics as per your list. Please confirm https://status.harmony.one/ - Note all of these monitors have been automated and will reflect outages and recoveries.

Outstanding items:

  • Adding custom HTML link to Metrics page
  • Adding synthetic monitoring to analyze RPC method responses

@sophoah
Copy link
Contributor

sophoah commented Jun 16, 2021

@givp we need also implement @gupadhyaya specific RPC test to make sure specific feature of the RPC are working fine.

@gupadhyaya would you have the actual test (what params to use) and the expected behavior ? with our recent issue was due to missing recent transaction, we might need to implement some logic to detect whether the RPCs are healthy or not

@givp
Copy link

givp commented Jun 16, 2021

@sophoah yes, that's what I'm working on right now. I'm going to use default parameters for all the methods from the docs to create the initial tests. We can then iterate and improve over time but I want to make sure we are getting back consistent data schemas for every test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Design and architectural plans/issues
Projects
None yet
Development

No branches or pull requests

4 participants