Large tuples are a performance footgun #237

epage · 2023-04-27T15:22:41Z

Please complete the following tasks

I have searched the discussions
I have searched the open and rejected issues

rust version

1.68

winnow version

0.4.0

Minimal reproducible code

(a, b).map(...).parse_next(input)

Steps to reproduce the bug with the above code

TBD

Actual Behaviour

Slow

Expected Behaviour

Fast

Additional Context

See #230

Options

Reduce the number of tuple elements supported to not encourage bad behavior
The struct macro from An easier way to discard fields in a tuple #82

epage · 2023-06-16T01:19:33Z

One thing that further decreases performance are large output types. Surprisingly, I've seen quite big performance improvements by Box-ing large types that are passed though multiple layers of parser code. Of course, boxing by default is a performance footgun and usually makes the parser substantially slower.

In addition to that, tuple parsers should be used carefully as they can run into the same performance issues when multiple large output types are involved or if they simply have too many items in them.

E.g. i replaced something like
(a, b).map(...).parse_next(input)
with
let (input, a) = a.parse_next(input)?;
...
let (input, b) = b.parse_next(input)?;
...
Ok((input, ...))
in quite a few places to resolve performance issues.

I wonder if #251 improved the situation as the compiler is now more likely to optimize the tuple version into the imperative version.

epage · 2023-06-16T11:40:07Z

@martinohmann when you have more time, could you create a branch where hcl-edit is using tuples so I can do some more analysis of this? I'd like to see how #251 or tuple alternatives may be able to help improve things. Because of the chance of this being fixed in #251, I'm deprioritizing this for now, so no rush.

martinohmann · 2023-06-18T21:37:05Z

Good idea! I'll try find some tuple cases. The parser is built in a way now that makes bringing these back a bit more involved. But I think I see 1-3 cases that are "easy" to negatively impact performance. Not sure if I can get to it this month or next month. Will ping you once I have a branch.

epage · 2024-10-03T14:22:57Z

@praveenperera in #191 (comment)

Specifically cover the cost of large return types which can show up in surprising ways like just using a tuple (#230)

@epage This is a question I had, when you say watch for large return types. Is it better to call parse_next multiple times and use regular if else branching?

Instead of trying to purely do it all with combinators?

Examples and more context: https://www.perplexity.ai/search/rust-nom-and-winnow-parsing-is-SaJuUbDxSfu09wjpXcOctA

Though in my example, I’m not actually returning, the tuple.

epage · 2024-10-03T14:25:24Z

Examples and more context: https://www.perplexity.ai/search/rust-nom-and-winnow-parsing-is-SaJuUbDxSfu09wjpXcOctA

That AI answer has a semblance of sounding correct but is completely wrong.

(...).parse_next() will just call parse_next() on everything inside of the tuple, there is no difference. Also, in cases like calling parse_next on a function, the overhead would only be the creation of a stack frame but we #[inline(always)] that call, so there should be no overhead
Sometimes a large combination of parsers is easier to read, sometimes imperative calling of parsers is easier to read. Its dependent on context. In particular, if you force the use of combinators for cases they aren't really designed for, the complexity will be high, making the readability low
One large set of combinators does not make error handling easier. In fact, combinators limit the expressiveness of error reporting (e.g. verify, map_res, etc don't compose with cut_err, etc #180)
Combinator or imperative should not affect optimizations the library can apply to your parser.
- For GAT based libraries, like nom v8, what can make a difference is having your own parser functions that you compose into bigger parsers because that stops nom v8 from propagating the GAT through which is used for optimizations. This surprising performance cliff is one reason we've chosen not to leverage GATs.

The answers for when to make individual parse_next() calls is reasonable.

epage · 2024-10-03T14:37:55Z

This is a question I had, when you say watch for large return types. Is it better to call parse_next multiple times and use regular if else branching?

Instead of trying to purely do it all with combinators?

Though in my example, I’m not actually returning, the tuple.

If you use (...).parse_next(), a tuple is being returned, even if its not by your function. Ultimately the performance hit is from large return types but that is difficult to create one without (...).parse_next(), so the topics are closely related.

If you have a function like

fn add(left: usize, right: usize) -> usize {
    left + right
}

the compiler will effectively transform this to

fn add(left: usize, right: usize, out result: usize) {
    out = left + right
}

The fastest form of memory is called a register and parameters and return types go through these where possible. However, if a parameter or return type becomes too big, the compiler will instead return through the stack which is using memory (with dcaches), turning it into

fn add(left: &Large, right: &Large, result: &mut Large) {
    *out = left + right
}

Before v0.5, winnow's return type was

Result<
    (I, O),
    ErrMode<InputError<I>>,
>

Meaning that Winnow <=v0.4 added size_of::<I>() overhead to the return type, making it more likely we'd "spill over into the stack".

Now that we use &mut I as a parameter, our return type is

Result<
    O,
    ErrMode<ContextError>,
>

Winnow has a fixed overhead for error reporting (reducable by specifying a custom error type) but the I overhead is gone. I am considering making ErrMode optional, allowing the overhead to be dropped even further.

However, if you do (parser1, parser2, parser3, parser4, parser5, parsser6).parse_next(), you could still spill over into the stack. I've been hoping we could give rustc enough hints to be able to rewrite that into the imperative form but so far it has not worked in enough cases.

In general, write your code for readability and if its too slow, experiment. It can be surprising what things can speed up or slow things down as there are ripple effects in optmizations. I tried to reduce shuffle things owe I had fewer large return types and instead I made performance worse.

praveenperera · 2024-10-03T14:59:30Z

@epage very detailed and very helpful thank you! You should add their comments to the docs.

One more question :

Before v0.5, winnow's return type was

Result<(I, O), ErrMode<InputError>

That is still the return type for parse_peek so i’m assuming if i’m using that function I should still be careful of tuples?

Thanks again

update

For anyone reading this, here is an example of switching from parse_peek to parse_next: bitcoinppl/cove@67c98fb

epage · 2024-10-03T15:03:32Z

As mentioned on the other thread, parse_peek should really only be used for testing at this point.

epage · 2025-01-03T17:24:06Z

#505 moves seq! off of impl Parser for <tuple> to unrolling the parse_next calls, allowing the tuple to be smaller when fields are skipped.

epage added A-combinator Area: combinators C-bug Category: Things not working as expected labels Apr 27, 2023

epage modified the milestones: 0.4.x, 0.5.x Apr 27, 2023

epage added the M-breaking-change Meta: Implementing or merging this will introduce a breaking change. label Apr 27, 2023

This was referenced Apr 28, 2023

feat(seq): 'seq' macro for easier skipping #238

Merged

An easier way to discard fields in a tuple #82

Closed

epage removed this from the 0.5.x milestone Jun 20, 2023

epage mentioned this issue Apr 9, 2024

seq! for tuples errors when capturing locally-defined parsers. #502

Closed

2 tasks

epage mentioned this issue Oct 3, 2024

Write a Parser Performance special topic #191

Open

This comment was marked as off-topic.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large tuples are a performance footgun #237

Large tuples are a performance footgun #237

epage commented Apr 27, 2023

epage commented Jun 16, 2023 •

edited

Loading

epage commented Jun 16, 2023

martinohmann commented Jun 18, 2023

epage commented Oct 3, 2024

epage commented Oct 3, 2024 •

edited

Loading

epage commented Oct 3, 2024

praveenperera commented Oct 3, 2024 •

edited

Loading

update

epage commented Oct 3, 2024

This comment was marked as off-topic.

This comment was marked as off-topic.

epage commented Jan 3, 2025

Large tuples are a performance footgun #237

Large tuples are a performance footgun #237

Comments

epage commented Apr 27, 2023

Please complete the following tasks

rust version

winnow version

Minimal reproducible code

Steps to reproduce the bug with the above code

Actual Behaviour

Expected Behaviour

Additional Context

epage commented Jun 16, 2023 • edited Loading

epage commented Jun 16, 2023

martinohmann commented Jun 18, 2023

epage commented Oct 3, 2024

epage commented Oct 3, 2024 • edited Loading

epage commented Oct 3, 2024

praveenperera commented Oct 3, 2024 • edited Loading

update

epage commented Oct 3, 2024

This comment was marked as off-topic.

This comment was marked as off-topic.

epage commented Jan 3, 2025

epage commented Jun 16, 2023 •

edited

Loading

epage commented Oct 3, 2024 •

edited

Loading

praveenperera commented Oct 3, 2024 •

edited

Loading