Additional benchmarks #62

warpfork · 2020-08-12T18:37:36Z

This PR contains a variety of new and extended benchmarks.

Existing benchmarks were moved to standardized benchmark names and standardized tables for parameters -- This is useful if we want to do charting (which will indeed be discussed later in the PR comment! Just hang on).

Set vs Fill benchmarks are introduced. Fill benchmarks (already existed, but renamed) measure how long it takes to create an entire hamt of some size; the Set benchmarks do an additional 1000 operations on top of an already created hamt, which offers more insight into costs of operations at particular scales.

Many new stats are tracked, using the now-standard ReportMetric feature:

bytes per entry (both as counted as reachable by the hamt, and also as counted as present in the blockstore)
blocks created per entry
numbers of get and put events on the blockstore
and more!
Some histograms are also available, but are commented out, as I haven't yet found a pleasing way to present them.
Some tests exist in both bulk flush and individual flush modes, which produce interestingly different statistics.

The mockBlocks type was extended to add accounting for many of these statistics. (We can imagine using this in the future to make more tests asserting, for example, the number of get and put events expected for some operation. That's left for future work, however.)

Now, for some selected results!

I charted these using https://github.com/cep21/benchdraw . Instructions for reproducing these can be found in the comments in the diff body.

First of all: the number of blocks created in the blockstore per entry placed into a hamt during the Fill test: is... random?

The reason for this appears to be that creation of subshards during value insertion will unconditionally and immediately Put to the blockstore. (This might be a bit surprising, since almost every other logical branch that value insertion and update can take will mutate cache nodes, and not immediately do a Put.)

This seems like it could represent a lot of unnecessary garbage block creation. However, this may vary by workloads: if a workload can't bulk its writes anyway, this may simply not matter in practice. (I don't have a lot of information about workloads right now.)

Get events to perform a lookup: larger bitwidth parameters need fewer get events. Larger hamts need slowly more gets, of course. No big surprises here. (I like no surprises.)

The number of addtional blocks created during inserting entries: is also higher for smaller bitwdiths. Larger hamts also need slowly more blocks. This is also simple good confirmatory news: it scales reasonably nicely.

More benchmarks could be useful in the future. In particular I'm very aware that these are benchmarking synthetic operations; more information about actual workloads could result in much more interesting benchmarks. But hopefully these are useful, and have useful outlines of how to do yet more in the future.

…e charting; introduce Set vs Fill benchmarks; report bytes/entry and blocks/entry.

Do flush explicitly. It is necessary in this library's API. Started measuring byte size in another, more direct mechanism. There's a significant difference between the two forms of measurement (the checkSize method gives much flatter results than the look at the blockstore does!); I don't yet understand why this is or what significance it might (or might not) have. BenchmarkSet now has two variants: bulk flush and individual flush. The behavior in terms of how many new blocks are created varies markedly, as you'd expect. Include more bitsizes. I want to gather enough datapoints to draw interesting charts. Surely we'll expect to see some predictable change in the scaling curves across this dimension? Histograms of the sizes of blocks that appear in storage are now available. I've commented them back out after making some observations; they produce quite a lot of noise in output.

…arting. The largest scale entry in the table is commented out because it's an OOM-causer on some of my hardware; and I'd expect to get some impression of progression from these four orders-of-mag-base-10. I added (roughly) halfway points on the scale table just to try to fill things out a bit more. (You'll see why in the results discussion; some things are noisier than expected.) I'm leaving the charting tools to be handled out of band; I don't think adding them as transitive dependencies of this repo would be good. Results? I'm a little perplexed, actually. - BenchmarkFill-blocks-per-entry-vs-scale.svg ... - is completely all over the map. There appears to be *no* correlation between the bitwidth parameter and growth rate for block counts. - BenchmarkFill-totalBytes-per-entry-vs-scale.svg ... - also completely all over. No correlations. - BenchmarkFind-speed-vs-scale.svg ... - does sort of trend up as one would expect; - the noise is almost bigger than the signal, which is certainly interesting (but not wrong). - There's oddness for bitwidth=8: unlike all others, it's actually slower at the smallest scales. Maybe worth questioning? - BenchmarkSetBulk-addntlBlocks-per-addntlEntry-vs-scale.svg and BenchmarkSetIndividual-addntlBlocks-per-addntlEntry-vs-scale.svg ... - these both have reasonable up-and-to-the-right trends. - with batched flush, results are somewhat noiser. Per-insertion flush is fairly smooth. - it's a clear progression: higher bitwidth -> fewer new blocks created per insertion. Note that due to the way our randomization is seeded, even though all these keys and values are "random", they're deterministic between benchmark runs; and they're also the same on each benchmark between each parameter set. (Larger b.N adds noise, but I've been running that with fixed amounts, e.g. -benchtime=10x or -benchtime=30x, due to the fact using time as a parameter causes the larger values of "n" for scale to get a single run.) I'm very curious about the blocks and bytes per entry being apparently completely unaffected by scale across multiple orders of magnitude. Certainly we want those things to scale "well", for some definition of "well", but... complete noise is... not what I expected. Wondering if there's a measurement bug here. These are some interesting observations, but more numbers needed; I'm not certain these are the right ones, and certainly they aren't the only interesting ones. In particular I'm not sure there's enough probing of the layers of caching and flushing and how they affect observations; there may be a lot more to do in order to produce useful insights thereabout.

These are both reported as metrics, and also actively logged if repeated puts are detected in any number. (It is necessary to do both, because repeated puts could otherwise end up rounded down to zero, and I'd in fact like to see if the number is at all nonzero.) I also wanted to try to make some use of these numbers within the addAndRemoveKeys function... however, at the moment, that goes over like a lead balloon. Immediately after the first Set operations, and before the Flush, there may still be some Put operations! (In practice, I saw both 0 and 1 in the tests that already call the addAndRemoveKeys test helper function; just enough variance to make asserting on it not fly.) The reason for these Puts-without-Flush is that `modifyValue` does a `store.Put`... but, only in the path where the KV's array is full and it creates a new subshard: it then puts the new subshard (and not, arguably surprisingly, itself). Not sure I entirely grok the high level reasoning behind this. And these Puts are tad difficult to predict, short of evaluating the full hamt algorithm itself. These Puts-without-Flush also explains why the graphs for BenchmarkFill-blocks-per-entry-vs-scale.svg were so uncorrelated, as remarked on in the previous commit: the number of blocks generated is largely affected by how many changes are accumulated before Put starts to occur; and if things aren't *actually* being buffered up until Flush is called, then of course we get plenty of Put operations no matter what. This may indicate a problem: if someone was expecting batching to really work for garbage avoidance reasons, it... doesn't look like it really does. (It's possible this hasn't actually been evident in any of the workflows people are using this hamt for, though; if the workload is doing various point changes, the effectiveness of this batching doesn't matter very much.) The countSize method gains some docs, because in the first writing of this, I reported these blockstore event stats *after* countSize... and yeah, oof, that's a measurement error alright. (You'll see many, many, many duplicate puts if you do things in this order.)

This new graph tracks BenchmarkSetIndividual-addntlBlocks-per-addntlEntry-vs-scale.svg pretty darn closely, just with a scaling of the Y axis (it more or less shifts up by one! Which I suppose would make sense).

warpfork · 2020-08-12T18:39:11Z

I belatedly have noticed https://github.com/ipfs/go-hamt-ipld/pull/33 also adds more benchmarks in a PR that has gone unmerged :( There may be some duplicate work between the two that might need manual resolve; have not yet deeply inspected.

rvagg · 2020-08-14T02:34:29Z

Nice. I don't see a comparison that includes BenchmarkSetBulk vs BenchmarkSetIndividual, is there anything interesting in the Bulk case that's worth seeing? Maybe if we can illustrate Flush() in action clearer we can start to improve it to fully handle mutations (currently avoids the new-node case as you've mentioned in your GC comments) and handle dirty flag better (i.e. not just re-serialize everything that has a cache even if it comes from just being loaded).

warpfork · 2020-08-16T17:18:29Z

Here's the visualization for the bulk-set:

It's the same general shape as the individual sets graph, but the Y axis does have a different scale -- individual setting causes about twice as many garbage blocks. (It's also just a bit noiser of a graph.)

I think these would be drastically more distinct from each other if it wasn't for the somewhat weird behavior of Sets that cause a new subshard instantly causing a Put to the blockstore -- that causes a lot of writes that would otherwise be avoidable, I suspect.

warpfork · 2020-08-16T17:19:08Z

IIRC, the failure reported by CI here is just that ReportMetric doesn't exist in the testing package in the version of Go that CI is set up for. So we should probably bump the go version that CI is pegged to!

ZenGround0 · 2020-08-26T21:23:57Z

hamt_bench_test.go

+		5,
+		6,
+		7,
+		8,


@warpfork do you know why we don't try things out at higher bitwidths? For context I'm working on fitting particular HAMT sizes and workloads to optimal bit widths.

It has no effect! There's a fence around the config initialization that just brings anything larger back down to 8.

There's also casts from an int down into byte in some of the internal logic which effectively limits things to 8 bits. I think that these can all be changed to int fairly trivially though (I did this briefly out of curiosity, but then dropped the diff again because it didn't seem like a priority), and then presumably we could also drop the config range fence and then larger bitwidth numbers would work.

Also relevant: larger bitwidths mean a larger bitfield byte sequence in the serial format (currently this is somewhat dynamic, but not necessarily usefully so; #54 / #63 discusses changing some of the serial structures to use a fixed number of bytes). So, you will probably also want to watch the serial size graphs when increasing bitwidths to see if the size of this field becomes impactful.

warpfork added 5 commits August 10, 2020 15:08

Move to standardized benchmark names+tables, in preparation for futur…

e34c0ee

…e charting; introduce Set vs Fill benchmarks; report bytes/entry and blocks/entry.

Track Get event counts during the find benchmark.

3c6a442

This new graph tracks BenchmarkSetIndividual-addntlBlocks-per-addntlEntry-vs-scale.svg pretty darn closely, just with a scaling of the Y axis (it more or less shifts up by one! Which I suppose would make sense).

warpfork mentioned this pull request Aug 13, 2020

Introduce Bitmap type, use fixed size byte array in serialization, tighter bitWidth validation #63

Closed

Bump CI go version to fix failures

a4c6bbe

ZenGround0 force-pushed the addntl-benchmarks branch from 919597f to a4c6bbe Compare August 26, 2020 20:58

ZenGround0 reviewed Aug 26, 2020

View reviewed changes

ZenGround0 approved these changes Aug 26, 2020

View reviewed changes

ZenGround0 merged commit 5b4fdce into master Aug 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Additional benchmarks #62

Additional benchmarks #62

warpfork commented Aug 12, 2020

warpfork commented Aug 12, 2020

rvagg commented Aug 14, 2020

warpfork commented Aug 16, 2020

warpfork commented Aug 16, 2020

ZenGround0 Aug 26, 2020

warpfork Aug 26, 2020

warpfork Aug 26, 2020

+,
+,
+,
+,

Additional benchmarks #62

Additional benchmarks #62

Conversation

warpfork commented Aug 12, 2020

warpfork commented Aug 12, 2020

rvagg commented Aug 14, 2020

warpfork commented Aug 16, 2020

warpfork commented Aug 16, 2020

ZenGround0 Aug 26, 2020

Choose a reason for hiding this comment

warpfork Aug 26, 2020

Choose a reason for hiding this comment

warpfork Aug 26, 2020

Choose a reason for hiding this comment