Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure all small allocations (<= 512 bytes) are batched together, #54

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

greg7mdp
Copy link
Contributor

@greg7mdp greg7mdp commented Dec 11, 2024

Resolves #58.

Currently, many nodeos allocations hit the segment manager, which is costly both in time (involves updating a rbtree and rebalancing it) and in space overhead (16 to 40 bytes depending on the size allocated).

This PR adds a small size allocator, which internally maintains 64 free lists (for allocation sizes from 8 to 512 bytes, increment = 8 bytes). The first allocation allocates 512 units (so 511 remain of the free list), reducing the hits of the segment manager allocator by a factor of 512x.

Also, the chainbase node allocator, which also does batch allocations for undo_index tree nodes, is also using the small size allocator when allocating more than one value to be pushed into the undo_stack deque.

preallocate

Some tables loaded from the snapshot have a very large number of rows (I saw one with 95 million rows). I thought that instead of allocating by batches of 512 tree nodes, it might be faster to allocate for the 95 million rows in one allocation, which is what I tried with the preallocate.

However my testing didn't show any significant difference, so the call to preallocate is commented out in controller.cpp. I have left the implementation in case we want to experiment further later.

batch size, number of allocators.

When experimenting with the batch size for allocations, it was clear that larger batch sizes provide better performance when loading the snapshot. The same goes with the number of allocators, and 128 allocators (allowing to batch allocations up to 1024 bytes) perform better than 64 allocators (allowing to batch allocations up to 512 bytes).

However, with 128 allocators, and a batch size of 512, some tests fail because the configuration of the chainbase memory segment in the test is too small.

So I added some code that grows the allocation batch size from 32 to 512.:

if (_allocation_batch_size < max_allocation_batch_size)
_allocation_batch_size *= 2;

I should note that I originally started at 4 (instead of the current 32), but my testing showed a slowdown compared to using 32 (which I found somewhat surprising).

Why keep the chainbase_node_allocator

The chainbase_node_allocator still has some benefits:

  • it allocates the exact size required for the node
  • it doesn't need mutex protection (unlike the small_size_allocator which is used from multiple threads when loading the snapshot).

Why not link into the free list in get_some?

My experimentation show that not linking newly allocated blocks into the free list is faster (even though the code in allocate() is slightly longer):

void get_some(size_t num_to_alloc) {
static_assert(sizeof(T) >= sizeof(list_item), "Too small for free list");
static_assert(sizeof(T) % alignof(list_item) == 0, "Bad alignment for free list");
_block_start = static_cast<char*>(_manager->allocate(sizeof(T) * num_to_alloc));
_block_end = _block_start + sizeof(T) * num_to_alloc;
if (_allocation_batch_size < max_allocation_batch_size)
_allocation_batch_size *= 2;
}

@greg7mdp greg7mdp marked this pull request as draft December 11, 2024 17:02
@greg7mdp greg7mdp marked this pull request as ready for review December 16, 2024 14:46
include/chainbase/small_size_allocator.hpp Outdated Show resolved Hide resolved
include/chainbase/small_size_allocator.hpp Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make sure all small allocations (<= 512 bytes) are batched together
2 participants