Make sure all small allocations (<= 512 bytes) are batched together, #54

greg7mdp · 2024-12-11T17:01:37Z

Resolves #58.

Currently, many nodeos allocations hit the segment manager, which is costly both in time (involves updating a rbtree and rebalancing it) and in space overhead (16 to 40 bytes depending on the size allocated).

This PR adds a small size allocator, which internally maintains 64 free lists (for allocation sizes from 8 to 512 bytes, increment = 8 bytes). The first allocation allocates 512 units (so 511 remain of the free list), reducing the hits of the segment manager allocator by a factor of 512x.

Also, the chainbase node allocator, which also does batch allocations for undo_index tree nodes, is also using the small size allocator when allocating more than one value to be pushed into the undo_stack deque.

preallocate

Some tables loaded from the snapshot have a very large number of rows (I saw one with 95 million rows). I thought that instead of allocating by batches of 512 tree nodes, it might be faster to allocate for the 95 million rows in one allocation, which is what I tried with the preallocate.

However my testing didn't show any significant difference, so the call to preallocate is commented out in controller.cpp. I have left the implementation in case we want to experiment further later.

batch size, number of allocators.

When experimenting with the batch size for allocations, it was clear that larger batch sizes provide better performance when loading the snapshot. The same goes with the number of allocators, and 128 allocators (allowing to batch allocations up to 1024 bytes) perform better than 64 allocators (allowing to batch allocations up to 512 bytes).

However, with 128 allocators, and a batch size of 512, some tests fail because the configuration of the chainbase memory segment in the test is too small.

So I added some code that grows the allocation batch size from 32 to 512.:

chainbase/include/chainbase/small_size_allocator.hpp

Lines 84 to 85 in a0797d2

    
           if (_allocation_batch_size < max_allocation_batch_size) 
        
              _allocation_batch_size *= 2;

I should note that I originally started at 4 (instead of the current 32), but my testing showed a slowdown compared to using 32 (which I found somewhat surprising).

Why keep the `chainbase_node_allocator`

The chainbase_node_allocator still has some benefits:

it allocates the exact size required for the node
it doesn't need mutex protection (unlike the small_size_allocator which is used from multiple threads when loading the snapshot).

Why not link into the free list in `get_some`?

My experimentation show that not linking newly allocated blocks into the free list is faster (even though the code in allocate() is slightly longer):

chainbase/include/chainbase/chainbase_node_allocator.hpp

Lines 71 to 80 in a0797d2

    
           void get_some(size_t num_to_alloc) { 
        
              static_assert(sizeof(T) >= sizeof(list_item), "Too small for free list"); 
        
              static_assert(sizeof(T) % alignof(list_item) == 0, "Bad alignment for free list"); 
        
              _block_start = static_cast<char*>(_manager->allocate(sizeof(T) * num_to_alloc)); 
        
              _block_end   = _block_start + sizeof(T) * num_to_alloc; 
        
              if (_allocation_batch_size < max_allocation_batch_size) 
        
                 _allocation_batch_size *= 2; 
        
           }

…llocator`

…ntly disabled)

include/chainbase/environment.hpp

Also use increasing batch size

include/chainbase/small_size_allocator.hpp

greg7mdp added 12 commits November 27, 2024 11:04

Start work on small_size_allocator

4686438

wip

8bfe637

wip

cb336e0

wip

9d73d48

All tests pass

9614c9d

Implement the detail::allocator and fix a couple issues.

7a72fde

Fix storage of small_size_allocator and test.

5bedb5c

Update chainbase_node_allocator to be backed with the `small_size_a…

1c5d37a

…llocator`

Update batch size in small_size_allocator

81e3b7d

Try preallocation for tables with many rows, and insert at end (curre…

7ec42e7

…ntly disabled)

Remove #if directives and slightly restrict preallocate

0117552

Remove commented out block in undo_index.hpp

a80da4d

greg7mdp marked this pull request as draft December 11, 2024 17:02

heifner reviewed Dec 11, 2024

View reviewed changes

include/chainbase/environment.hpp Show resolved Hide resolved

greg7mdp added 6 commits December 12, 2024 16:02

Fix sanitizer issue.

af9af72

Fix test issue on macos.

6106658

Minor changes, add comments.

5d86965

Cannot have vtables in shared memory

d127509

Avoid linking into free list on block alloc

54950d3

Also use increasing batch size

Avoid subtraction which may overflow.

27a561d

greg7mdp marked this pull request as ready for review December 16, 2024 14:46

Start _allocation_batch_size at 32, seems faster

a0797d2

greg7mdp requested a review from spoonincode December 16, 2024 16:26

greg7mdp mentioned this pull request Dec 16, 2024

Use version of chainbase which implements a slab allocator for small size buffers. AntelopeIO/spring#1070

Open

heifner approved these changes Dec 17, 2024

View reviewed changes

include/chainbase/small_size_allocator.hpp Outdated Show resolved Hide resolved

include/chainbase/small_size_allocator.hpp Outdated Show resolved Hide resolved

Make constructors explicit.

91df26d

greg7mdp requested a review from linh2931 January 2, 2025 14:27

greg7mdp mentioned this pull request Jan 9, 2025

Make sure all small allocations (<= 512 bytes) are batched together #58

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make sure all small allocations (<= 512 bytes) are batched together, #54

Make sure all small allocations (<= 512 bytes) are batched together, #54

greg7mdp commented Dec 11, 2024 •

edited

Loading

	if (_allocation_batch_size < max_allocation_batch_size)
	_allocation_batch_size *= 2;

	void get_some(size_t num_to_alloc) {
	static_assert(sizeof(T) >= sizeof(list_item), "Too small for free list");
	static_assert(sizeof(T) % alignof(list_item) == 0, "Bad alignment for free list");

	_block_start = static_cast<char>(_manager->allocate(sizeof(T) num_to_alloc));
	_block_end = _block_start + sizeof(T) * num_to_alloc;

	if (_allocation_batch_size < max_allocation_batch_size)
	_allocation_batch_size *= 2;
	}

Make sure all small allocations (<= 512 bytes) are batched together, #54

Are you sure you want to change the base?

Make sure all small allocations (<= 512 bytes) are batched together, #54

Conversation

greg7mdp commented Dec 11, 2024 • edited Loading

preallocate

batch size, number of allocators.

Why keep the chainbase_node_allocator

Why not link into the free list in get_some?

greg7mdp commented Dec 11, 2024 •

edited

Loading

Why keep the `chainbase_node_allocator`

Why not link into the free list in `get_some`?