Improve debug_traceBlock calls performance #8076

ahamlat · 2025-01-03T14:12:56Z

PR description

This PR introduces several key improvements to the debug_traceBlock, debug_traceBlockByNumber, and debug_traceBlockByHash methods, significantly enhancing both performance and memory usage:

The core change is the adoption of a pipeline architecture, where:
- One thread handles transaction execution.
- Other threads processes the execution results and generates the JSON output. This parallelization ensures better resource utilization and reduces overall execution time.
Memory usage is optimized by the pipeline processing approach. Instead of retaining all trace frames for every transaction, the pipeline now keeps only the trace frames relevant to the current transaction, significantly reducing memory consumption.
The transformation of trace frames into Java JSON representations now occurs in parallel.
Default trace options have been adjusted to align with the behavior of Geth and Nethermind:
- disableStorage is set to false (enabling storage tracing).
- disableMemory is set to true (disabling memory tracing).
- disableStack is set to false (enabling stack tracing).
The structure of the JSON output has been modified to match Geth's format:
- Hexadecimal values are now displayed as short hex strings (e.g., 0x1, 0xabc), which is more compact and efficient.
- The stack array is always displayed when the option is enabled, even when empty, to ensure consistency with other Ethereum clients. However, memory and storage are omitted if they are empty, which reduces unnecessary data.
- Storage keys and values are now displayed without the 0x prefix, in line with Geth's behavior.
Change the implementation of toShortHexString, by creating a new method toCompactHex. The main improvement is to not relay on Bytes, but on byte[] to avoid dynamic dispatch mechanism related to the use of methods get(i) and size(). Those methods are not used anymore thanks to the switch to StringBuilder, even if it has its own cons. This method will replace in the future toFastHex implementation, once we have finished the work on Tuweni.

These changes have resulted in a 3X performance improvement, reducing the size of the output by 2 to 3 times, making previously unresponsive calls fast enough with comparable performance with other clients. The optimizations not only improve response times but also allow for smoother operation with lower memory usage, even when processing large traces.

Overall, the new approach ensures better performance, reduced resource usage, and more consistent output with other Ethereum clients, particularly Geth and Nethermind.

Fixed Issue(s)

Fixes #5322

Thanks for sending a pull request! Have you done the following?

Checked out our contribution guidelines?
Considered documentation and added the doc-change-required label to this PR if updates are required.
Considered the changelog and included an update if required.
For database changes (e.g. KeyValueSegmentIdentifier) considered compatibility and performed forwards and backwards compatibility tests

Locally, you can run these tests to catch failures early:

unit tests: ./gradlew build
acceptance tests: ./gradlew acceptanceTest
integration tests: ./gradlew integrationTest
reference tests: ./gradlew ethereum:referenceTests:referenceTests

Matilda-Clerke

LGTM, just a few suggestions in comments. I'll leave unapproved to make sure someone more experienced in the area gets to review too.

Matilda-Clerke · 2025-01-06T23:40:51Z

...java/org/hyperledger/besu/ethereum/api/jsonrpc/internal/methods/AbstractDebugTraceBlock.java

+                      }
+                      return Optional.of(tracesList);
+                    }))
+        .orElse(null);


Might be nicer to return an empty collection rather than null

This would change the rpc call output from

{"jsonrpc":"2.0","id":1,"result":null}

to

{"jsonrpc":"2.0","id":1,"result":[]}

It was the original behaviour before this PR and would like to keep it as it is.

Matilda-Clerke · 2025-01-07T00:01:47Z

...java/org/hyperledger/besu/ethereum/api/jsonrpc/internal/methods/AbstractDebugTraceBlock.java

+                    block.getHash(),
+                    traceableState -> {
+                      Collection<DebugTraceTransactionResult> tracesList =
+                          new CopyOnWriteArrayList<>();


Is there a reason we're using CopyOnWriteArrayList? It seems wasteful to be copying the entire array every time we add an element, rather than adding a more substantial array size less often, as ArrayList does.

If it's just because CopyOnWriteArrayList is threadsafe, why not use Collections.synchronizedList to produce a synchronized ArrayList?

Yes, you're right, Collections.synchronizedList is a better choice here as CopyOnWriteArrayList is not suitable for heavy write loads, which is the case here. I tend to not use Collections.synchronized* but in this case it is a better choice, even if CopyOnWriteArrayList doesn't lock on reads. Collections.synchronized* synchronize (lock) on each method call.

Matilda-Clerke · 2025-01-07T01:31:12Z

.../api/src/main/java/org/hyperledger/besu/ethereum/api/jsonrpc/internal/results/StructLog.java

@@ -153,4 +148,39 @@ public int hashCode() {
    result = 31 * result + Arrays.hashCode(stack);
    return result;
  }
+
+  public static String toCompactHex(final Bytes abytes, final boolean prefix) {


Any reason we can't use Bytes.toHexString or Bytes.toFastHex?

The reason is explained in the description. We were using toShortHexString and it was one of the biggest bottelnecks. As we're currently migrating tuweni, I decided to create this method directly in the StructLog java class. It should be replaced in the future Bytes.toFastHex implementation.

Change the implementation of toShortHexString, by creating a new method toCompactHex. The main improvement is to not rely on Bytes, but on byte[] to avoid dynamic dispatch mechanism related to the use of methods get(i) and size(). Those methods are not used anymore thanks to the switch to StringBuilder, even if it has its own cons. This method will replace in the future toFastHex implementation, once we have finished the work on Tuweni.

Matilda-Clerke · 2025-01-07T01:43:06Z

...va/org/hyperledger/besu/ethereum/api/jsonrpc/internal/methods/DebugTraceBlockByHashTest.java

-        .getAndMapWorldState(any(), any());
-    when(blockchainQueries.getBlockHeaderByHash(any(Hash.class)))
-        .thenReturn(Optional.of(blockHeader));
+    MockitoAnnotations.openMocks(this);


Might be nicer to just add @ExtendWith(MockitoExtension.class) annotation to the class (and optionally @MockitoSettings(strictness = Strictness.LENIENT) if desired

Matilda-Clerke · 2025-01-07T01:44:35Z

.../org/hyperledger/besu/ethereum/api/jsonrpc/internal/methods/DebugTraceBlockByNumberTest.java

+
+  @BeforeEach
+  public void setUp() {
+    MockitoAnnotations.openMocks(this);


Matilda-Clerke · 2025-01-07T01:45:11Z

...est/java/org/hyperledger/besu/ethereum/api/jsonrpc/internal/methods/DebugTraceBlockTest.java

+
+  @BeforeEach
+  public void setUp() {
+    MockitoAnnotations.openMocks(this);


Matilda-Clerke · 2025-01-07T01:46:44Z

.../src/test/java/org/hyperledger/besu/ethereum/api/jsonrpc/internal/results/StructLogTest.java

+
+  @BeforeEach
+  public void setUp() {
+    MockitoAnnotations.openMocks(this);


siladu

Just an initial pass. Would like to dive deeper into the pipeline changes but feel free to merge if someone else approves

siladu · 2025-01-07T07:08:19Z

...java/org/hyperledger/besu/ethereum/api/jsonrpc/internal/methods/AbstractDebugTraceBlock.java

+    this.outputCounter =
+        metricsSystem.createLabelledCounter(
+            BesuMetricCategory.BLOCKCHAIN,
+            "transactions_debugTraceblock_pipeline_processed_total",


nit: is the casing correct here?

siladu · 2025-01-07T07:26:55Z

...java/org/hyperledger/besu/ethereum/api/jsonrpc/internal/methods/AbstractDebugTraceBlock.java

+                              .andFinishWith("collect_results", tracesList::add);
+
+                      try {
+                        if (getBlockchainQueries().getEthScheduler().isPresent()) {


Is there a realistic scenario where EthScheduler isn't available?

This was inherited from Trace_block implementation. I changed both Trace* and DebugTrace* implementations to use the Besu ethScheduler : 6a541e5, created when we start Besu. Now, we don't need anymore to check if it null or not.

siladu · 2025-01-07T07:28:31Z

...java/org/hyperledger/besu/ethereum/api/jsonrpc/internal/methods/AbstractDebugTraceBlock.java

+                          EthScheduler ethScheduler =
+                              new EthScheduler(1, 1, 1, 1, new NoOpMetricsSystem());


This is a little mysterious, might be good to name the variable something like "singleThreadedEthScheduler" perhaps? Don't we want metrics too?

I changed the implementation and deleted this code 6a541e5

...java/org/hyperledger/besu/ethereum/api/jsonrpc/internal/methods/DebugTraceBlockByNumber.java

siladu · 2025-01-07T07:32:11Z

...java/org/hyperledger/besu/ethereum/api/jsonrpc/internal/methods/DebugTraceBlockByNumber.java

        .flatMap(
-            hash ->
+            block ->
                Tracer.processTracing(


Why can't this reuse the same call made in AbstractDebugTraceBlock (as ByHash appears to)?

Good question, this is because DebugTraceBlockByNumber already inherit from AbstractBlockParameterMethod, to handle flags related to block number, like latest, safe or finalized.

.../src/test/java/org/hyperledger/besu/ethereum/api/jsonrpc/internal/results/StructLogTest.java

siladu · 2025-01-07T07:36:58Z

ethereum/core/src/main/java/org/hyperledger/besu/ethereum/vm/DebugOperationTracer.java

Can you explain what's happening with the changes in this file please?

The changes made in DebugOperationTracer are related to not displaying storage and memory in structlog json document if they are empty. This is to match Geth behaviour and reduce Json output size.

ahamlat · 2025-01-07T09:31:28Z

Just referencing here https://github.com/Consensys/protocol-misc/issues/1010, to be sure toCompactHex method will be migrated to Tuweni to replace toFastHex

ahamlat · 2025-01-07T14:10:53Z

Just an initial pass. Would like to dive deeper into the pipeline changes but feel free to merge if someone else approves

To validate the changes with the pipeline, during the testing phase I created a new RPC method debug_traceBlock_with_pipeline without changing the existing one and checked on different examples that the output with both implementations is the same.

Matilda-Clerke

Definitely some good changes here

Matilda-Clerke · 2025-01-07T23:25:37Z

...java/org/hyperledger/besu/ethereum/api/jsonrpc/internal/methods/AbstractDebugTraceBlock.java

-                              new EthScheduler(1, 1, 1, 1, new NoOpMetricsSystem());
-                          ethScheduler.startPipeline(traceBlockPipeline).get();
-                        }
+                        ethScheduler.startPipeline(traceBlockPipeline).get();


Much cleaner, very nice

Matilda-Clerke · 2025-01-07T23:27:56Z

...test/java/org/hyperledger/besu/ethereum/api/jsonrpc/JsonRpcHttpServiceHostAllowlistTest.java

@@ -140,7 +141,8 @@ public void initServerAndClient() throws Exception {
                vertx,
                mock(ApiConfiguration.class),
                Optional.empty(),
-                mock(TransactionSimulator.class));
+                mock(TransactionSimulator.class),
+                new EthScheduler(1, 1, 1, new NoOpMetricsSystem()));


We should consider using DeterministicEthScheduler as done in most other unit tests requiring an EthScheduler.

I noticed that DeterministicEthScheduler caused a performance issue on some trace calls unit tests on trace_block, trace_filter and trace_replayBlockTransactions. Unit tests were failing because of a timeout. I rollbacked that change on that specific test.

ahamlat · 2025-01-09T10:13:50Z

@macfarla @siladu @Gabriel-Trintinalia Would you mind have a look to this PR ?

ahamlat · 2025-01-10T14:47:56Z

I closed this PR by error abd I couldn't reopen it even by adding more commits.
I created another one #8103

ahamlat marked this pull request as ready for review January 3, 2025 17:07

ahamlat added doc-change-required Indicates an issue or PR that requires doc to be updated performance labels Jan 5, 2025

Matilda-Clerke reviewed Jan 7, 2025

View reviewed changes

siladu reviewed Jan 7, 2025

View reviewed changes

Matilda-Clerke reviewed Jan 7, 2025

View reviewed changes

ahamlat closed this Jan 10, 2025

ahamlat force-pushed the debug-trace-with-pipeline branch from 9a5330d to 8cddcfd Compare January 10, 2025 14:16

ahamlat mentioned this pull request Jan 10, 2025

Improve debug_traceBlock calls performance #8103

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve debug_traceBlock calls performance #8076

Improve debug_traceBlock calls performance #8076

ahamlat commented Jan 3, 2025 •

edited

Loading

Matilda-Clerke left a comment

Matilda-Clerke Jan 6, 2025

ahamlat Jan 7, 2025

Matilda-Clerke Jan 7, 2025

ahamlat Jan 7, 2025 •

edited

Loading

Matilda-Clerke Jan 7, 2025

ahamlat Jan 7, 2025 •

edited

Loading

Matilda-Clerke Jan 7, 2025

Matilda-Clerke Jan 7, 2025

Matilda-Clerke Jan 7, 2025

Matilda-Clerke Jan 7, 2025

siladu left a comment

siladu Jan 7, 2025

siladu Jan 7, 2025

ahamlat Jan 7, 2025 •

edited

Loading

siladu Jan 7, 2025

ahamlat Jan 7, 2025

siladu Jan 7, 2025

ahamlat Jan 7, 2025

siladu Jan 7, 2025

ahamlat Jan 7, 2025

ahamlat commented Jan 7, 2025

ahamlat commented Jan 7, 2025 •

edited

Loading

Matilda-Clerke left a comment

Matilda-Clerke Jan 7, 2025

Matilda-Clerke Jan 7, 2025

ahamlat Jan 9, 2025

ahamlat commented Jan 9, 2025

ahamlat commented Jan 10, 2025 •

edited

Loading

		EthScheduler ethScheduler =
		new EthScheduler(1, 1, 1, 1, new NoOpMetricsSystem());

Improve debug_traceBlock calls performance #8076

Improve debug_traceBlock calls performance #8076

Conversation

ahamlat commented Jan 3, 2025 • edited Loading

PR description

Fixed Issue(s)

Thanks for sending a pull request! Have you done the following?

Locally, you can run these tests to catch failures early:

Matilda-Clerke left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahamlat Jan 7, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahamlat Jan 7, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

siladu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahamlat Jan 7, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahamlat commented Jan 7, 2025

ahamlat commented Jan 7, 2025 • edited Loading

Matilda-Clerke left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahamlat commented Jan 9, 2025

ahamlat commented Jan 10, 2025 • edited Loading

ahamlat commented Jan 3, 2025 •

edited

Loading

ahamlat Jan 7, 2025 •

edited

Loading

ahamlat Jan 7, 2025 •

edited

Loading

ahamlat Jan 7, 2025 •

edited

Loading

ahamlat commented Jan 7, 2025 •

edited

Loading

ahamlat commented Jan 10, 2025 •

edited

Loading