-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bridge continiously sending transactions with 'eth_sendTransaction timed out' message during stress testing #33
Comments
The first of transactions arrived on Ropsten: https://ropsten.etherscan.io/tx/0x42d204dbf87b8b53c00a1ce11c79ed6eee91e189ebb1106211db7f24ba20f4b6. The last transaction: https://ropsten.etherscan.io/tx/0xb53ea50e62bfcfda6ba1441d243a4df56f472b2e7d9b895943dbd8dcb8257d47. Totally it was about 17000 txs sent by the bridge till it was killed. |
On it |
Thank you Do you have a script that reproduces this issue? |
I think the root cause of the issue could relate to hardware on my testbed: it is an ordinary laptop with Intel i5-3317U CPU 1.70GHz, 4 cores, HDD (not SSD) and WiFi connection to Internet. |
So, from the log above we can see that parity is actually sending a response to those timed out requests (bridge doesn't lose the connection, it simply abandons it after a timeout is experienced). The way bridge is structured it effectively considers timed out transactions to "never happen". There are perhaps a couple of measures we can take here:
|
I think the most efficient first step on my end here would be (3). After that, we can do (2). |
I completely agree with you thoughts. #1 is too platform specific and do not provide any guaranty that in some moment a system will not get in a state when the timeout is too short again. |
When too many transactions are being sent out, the response from the node comes after the operation has timed out. This is particularly noticeable on heavy loads on slower computers. Solution: chunk transactions into batches By default, the size of the batch is 2, however, it is important to note that since there's no coordination between different parties, there might be more than one batch at a time (but they should be within a single digit since the number of operations performed by bridge is limited) Addresses omni#33
Would you mind trying yrashk@304a843 out on your hardware setup? This is a first draft. This change limits the size of the batch. The integration test pass. Let me know if this helps or not. |
I have tested the changes with 1K and 2K deposits (transactions). Here is difference in bridge logs:
2K deposts:
So, the bridge combined two sequential blocks in one batch for the second test. It means that it is makes sense to understand why Parity behaves differently in these two cases. Most probably we will see a proper fix for the issue in that case. |
When too many transactions are being sent out, the response from the node comes after the operation has timed out. This is particularly noticeable on heavy loads on slower computers. Solution: chunk transactions into batches By default, the size of the batch is 2, however, it is important to note that since there's no coordination between different parties, there might be more than one batch at a time (but they should be within a single digit since the number of operations performed by bridge is limited) Addresses omni#33
Do we still experience this issue in any severe form? |
Did not test it with new version of bridge supporting RPC. Are you able to do generate traffic (dozen of transaction in one block) and test it by yourself? |
This is the issue is the same as paritytech/parity-bridge#149 but behavior is even worse due to automatic bridge restart implemented in POA bridge.
Network setup:
There are 1200 deposit transactions sent successfully to
HomeBridge
contract by a special python script. It took 8 blocks to validate all transactions.https://sokol-explorer.poa.network/block/1418808
...
https://sokol-explorer.poa.network/block/1418815
The bridge discovered part of these transactions, tried to relay some of them, lost the connection and restarted:
The database file was not updated so the restart of the bridge thread caused the same error.
So, the bridge is continuing to send transactions forever.
Even if the bridge process is killed manually, it is necessary to do manual modification of database but it causes lock of funds on
HomeBridge
contract side since incomplete amount of tokens is transfered byForeignBridge
contract.The text was updated successfully, but these errors were encountered: