Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tracer][core] call tracer module when insert blocks #3799

Merged
merged 26 commits into from
Apr 26, 2022

Conversation

peekpi
Copy link
Contributor

@peekpi peekpi commented Jun 22, 2021

Issue

#3780 calculate trace result when first execute the block, and store it into leveldb.

2021-06-27:

The trace result has been stored in the Explorer DB and can be queried by RPC.
The remaining issue is that the trace data can be large and needs to be optimized.
The current implementation is to store the json data into the db, so need a more compact serialization method.

2021-07-01:

Storage for trace results is complete, but I need to write some unit tests.
The storage store the key values separately so that duplicate data only needs to be stored once.

2021-07-02:

Development and testing are complete. I test it with a mainnet node, and it works fine and stable.
When querying trace data via trace_block rpc, first try to read DB, if not found, then re-execute the block.

Tracing Enable

  1. cmd flag
    --tracing=true

  2. config file

[General]
  TraceEnable = true

Test

Test with 11 blocks:

block json size storage data size(accumulative) db query time(ms) replay time(ms)
14598791 9M 462K 17 1158
14598792 9M 864K 27 711
14598793 9M 1.2M 39 804
14598794 9M 1.6M 28 793
14598795 9M 2.0M 28 914
14598796 9M 2.4M 55 751
14598797 9M 2.8M 20 1253
14598798 9M 3.2M 16 1146
14598799 9M 3.6M 17 935
14598800 9M 4.0M 17 892
14598801 9M 4.4M 17 723

4.4M of data has been written to DB for 11 blocks (key+value). if use gizp to compress these data, the total size reduced to 225k. if use snappy (used by leveldb default) to compress these data, the total size reduced to 1.1M.

Unit Test Coverage

Before:

<!-- copy/paste 'go test -cover' result in the directory you made change -->

After:

<!-- copy/paste 'go test -cover' result in the directory you made change -->

Test/Run Logs

Operational Checklist

  1. Does this PR introduce backward-incompatible changes to the on-disk data structure and/or the over-the-wire protocol?. (If no, skip to question 8.)

    NO

  2. Does this PR introduce backward-incompatible changes NOT related to on-disk data structure and/or over-the-wire protocol? (If no, continue to question 11.)

    NO

  3. Does this PR introduce significant changes to the operational requirements of the node software, such as >20% increase in CPU, memory, and/or disk usage?
    ** YES **
    if an explorer node is enabled tracing flag:

    • each block will add approximately 110K + disk usage..
    • block execution time increases by 20% during block insertion.

@gupadhyaya
Copy link
Contributor

@peekpi can you add some experimental results on the improvements before and after this optimization?

@gupadhyaya
Copy link
Contributor

@peekpi generally looks good. please add some experimental results. thanks.

@peekpi
Copy link
Contributor Author

peekpi commented Jun 25, 2021

@peekpi generally looks good. please add some experimental results. thanks.

I'll test it soon. and I worry about the disk usage, and post some data here #3780 about the size of trace result.

@gupadhyaya
Copy link
Contributor

@peekpi do you have some benchmarking results?

@peekpi
Copy link
Contributor Author

peekpi commented Jul 20, 2021

@peekpi do you have some benchmarking results?

added comparison of execution time. the times in the table do not include network transfer times, although network transfer times can be large.

@@ -50,6 +51,110 @@ func (c *action) push(ac *action) {
c.subCalls = append(c.subCalls, ac)
}

func (c *action) fromStorage(blockStorage *TraceBlockStorage, acStorage *ActionStorage) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these code from existing codebase or it's new code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new code

// You should have received a copy of the GNU Lesser General Public License
// along with the go-ethereum library. If not, see <http://www.gnu.org/licenses/>.

package tracers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here for this file. Is it just reuse of code from other repo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also new code

@peekpi
Copy link
Contributor Author

peekpi commented Dec 31, 2021

there a block that is the correct output?

Just tested one

I tested this rpc, and it worked fine.
But look like trace rpc has been moved to port 9501, I guess you access it via 9500 then rejected by the node.
try curl -H 'Content-Type:application/json' --data '{"method":"trace_block","params":["0x13C36A6"],"id":1,"jsonrpc":"2.0"}' http://testnodexxx:9501

@peekpi
Copy link
Contributor Author

peekpi commented Jan 1, 2022

here's another issue, when inserting a new block, it uses HMY tx hash, and the tracer store this hash to DB. but the output requires an ETH tx hash. changing tx hash in the block insertion function is a dangerous operation, I need to think about how to do it.

@peekpi
Copy link
Contributor Author

peekpi commented Jan 1, 2022

I think it works fine now. but due to the DB has been written some wrong history trace data, the node can only return correct trace results while block >= 21212710. @sophoah

@sophoah
Copy link
Contributor

sophoah commented Jan 4, 2022

@peekpi as discussed you mentioned the trace data only save into db, when node insert a new blocks. can we measure how much cpu/memory this will take ? can this cause slow block sync or longer consensus ? cc @gupadhyaya @rlan35

@sophoah
Copy link
Contributor

sophoah commented Jan 4, 2022

Tested the latest build, and the http rpc call result returns the same output

$ curl --location --request POST 'http://internalnode' --header 'Content-Type: application/json' --data-raw '{"method":"trace_block","params":["0x1455A52"],"id":1,"jsonrpc":"2.0"}' > internal.txt
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 3665k    0 3665k  100    70   9.8M    193 --:--:-- --:--:-- --:--:--  9.8M
$ curl --location --request POST 'http://testnode/' --header 'Content-Type: application/json' --data-raw '{"method":"trace_block","params":["0x1455A52"],"id":1,"jsonrpc":"2.0"}' > test.txt
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 3665k    0 3665k  100    70  5210k     99 --:--:-- --:--:-- --:--:-- 5206k
$ diff internal.txt test.txt 
$ curl --location --request POST 'http://internalnode' --header 'Content-Type: application/json' --data-raw '{"method":"trace_block","params":["0x1455AD9"],"id":1,"jsonrpc":"2.0"}' > internal.txt
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 4807k    0 4807k  100    70  12.5M    187 --:--:-- --:--:-- --:--:-- 12.5M
$ curl --location --request POST 'http://testnode' --header 'Content-Type: application/json' --data-raw '{"method":"trace_block","params":["0x1455AD9"],"id":1,"jsonrpc":"2.0"}' > test.txt
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 4807k    0 4807k  100    70  12.8M    192 --:--:-- --:--:-- --:--:-- 12.8M
$ diff internal.txt test.txt
$ 

@ArtemKolodko
Copy link

ArtemKolodko commented Jan 4, 2022

Tested HTTP RPC API with Postman, interesting results for big blocks (900+ transactions)

Note:
Transfer started = time required to get the first byte of the response

1) Internal node

Block number RPC Request Transfer started (ms)
21338207 "method": "trace_block", "params": ["0x145985F"] 2016
21341194 "method": "trace_block", "params": ["0x145A40A"] 1078
21341195 "method": "trace_block", "params": ["0x145A40B"] 1463
21341521 "method": "trace_block", "params": ["0x145A551"] 1463

2) Test node

Block number RPC Request Transfer started (ms)
21338207 "method": "trace_block", "params": ["0x145985F"] 357
21341194 "method": "trace_block", "params": ["0x145A40A"] 367
21341195 "method": "trace_block", "params": ["0x145A40B"] 537
21341521 "method": "trace_block", "params": ["0x145A551"] 335

Returned data content is equal

@peekpi
Copy link
Contributor Author

peekpi commented Jan 5, 2022

@peekpi as discussed you mentioned the trace data only save into db, when node insert a new blocks. can we measure how much cpu/memory this will take ? can this cause slow block sync or longer consensus ? cc @gupadhyaya @rlan35

According to my test results in July, the block execution time will increase by about 20%.
The memory required is related to the number of transactions in the block and should be smaller than the size of the trace result of the block.

This effect occurs only when trace is enabled.

@peekpi
Copy link
Contributor Author

peekpi commented Jan 5, 2022

Transfer started = time required to get the first byte of the response

2) Test node

| Block number | RPC Request | Transfer started (ms) |

| ----------- | -------------- | ----------- |

| 21338207 | "method": "trace_block", "params": ["0x145985F"] | 357 |

| 21341194 | "method": "trace_block", "params": ["0x145A40A"] | 367 |

| 21341195 | "method": "trace_block", "params": ["0x145A40B"] | 537 |

| 21341521 | "method": "trace_block", "params": ["0x145A551"] | 335 |

Returned data content is equal

I think most of the time is spent on data transfer, the data generation should be less than 100ms.

For faster access, consider enabling GZIP compression and access over the AWS internal network.

@rlan35
Copy link
Contributor

rlan35 commented Jan 5, 2022

@peekpi as discussed you mentioned the trace data only save into db, when node insert a new blocks. can we measure how much cpu/memory this will take ? can this cause slow block sync or longer consensus ? cc @gupadhyaya @rlan35

Since this only happens on the archival nodes. It won't impact consensus. Will only make the archival node slower to process blocks (cause syncing slowness)

@sophoah
Copy link
Contributor

sophoah commented Jan 5, 2022

Since this only happens on the archival nodes. It won't impact consensus. Will only make the archival node slower to process blocks (cause syncing slowness)

ah right so only the slower sync

@peekpi
Copy link
Contributor Author

peekpi commented Feb 21, 2022

There is another issue with go version trace_block, should merge #4064 first.

@sophoah
Copy link
Contributor

sophoah commented Feb 23, 2022

@peekpi is it possible to have this feature enabled on demand (through harmony config and cli args) ? We know this only allow be used by explorer archival node, but considering the additional disk space it takes, having an option to to enable it or not would be useful.

@peekpi
Copy link
Contributor Author

peekpi commented Feb 24, 2022

@peekpi is it possible to have this feature enabled on demand (through harmony config and cli args) ? We know this only allow be used by explorer archival node, but considering the additional disk space it takes, having an option to to enable it or not would be useful.

yes. use --tracing=true in cli to enable this feature.
or in harmony config file:

[General]
  TraceEnable = true

and this feature does not require an archival node, but only archival nodes can trace historical block/tx.

@peekpi peekpi requested review from LuttyYang and gupadhyaya and removed request for JackyWYX and gupadhyaya March 4, 2022 10:00
Copy link
Contributor

@LuttyYang LuttyYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@LeoHChen LeoHChen merged commit df0a56b into harmony-one:main Apr 26, 2022
@peekpi peekpi deleted the traceDB branch June 13, 2022 10:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants