Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large tuples are a performance footgun #237

Open
2 tasks done
epage opened this issue Apr 27, 2023 · 11 comments
Open
2 tasks done

Large tuples are a performance footgun #237

epage opened this issue Apr 27, 2023 · 11 comments
Labels
A-combinator Area: combinators C-bug Category: Things not working as expected M-breaking-change Meta: Implementing or merging this will introduce a breaking change.

Comments

@epage
Copy link
Collaborator

epage commented Apr 27, 2023

Please complete the following tasks

rust version

1.68

winnow version

0.4.0

Minimal reproducible code

(a, b).map(...).parse_next(input)

Steps to reproduce the bug with the above code

TBD

Actual Behaviour

Slow

Expected Behaviour

Fast

Additional Context

See #230

Options

@epage epage added A-combinator Area: combinators C-bug Category: Things not working as expected labels Apr 27, 2023
@epage epage modified the milestones: 0.4.x, 0.5.x Apr 27, 2023
@epage epage added the M-breaking-change Meta: Implementing or merging this will introduce a breaking change. label Apr 27, 2023
@epage
Copy link
Collaborator Author

epage commented Jun 16, 2023

One thing that further decreases performance are large output types. Surprisingly, I've seen quite big performance improvements by Box-ing large types that are passed though multiple layers of parser code. Of course, boxing by default is a performance footgun and usually makes the parser substantially slower.

In addition to that, tuple parsers should be used carefully as they can run into the same performance issues when multiple large output types are involved or if they simply have too many items in them.

E.g. i replaced something like

(a, b).map(...).parse_next(input)

with

let (input, a) = a.parse_next(input)?;
...
let (input, b) = b.parse_next(input)?;
...
Ok((input, ...))

in quite a few places to resolve performance issues.

I wonder if #251 improved the situation as the compiler is now more likely to optimize the tuple version into the imperative version.

@epage
Copy link
Collaborator Author

epage commented Jun 16, 2023

@martinohmann when you have more time, could you create a branch where hcl-edit is using tuples so I can do some more analysis of this? I'd like to see how #251 or tuple alternatives may be able to help improve things. Because of the chance of this being fixed in #251, I'm deprioritizing this for now, so no rush.

@martinohmann
Copy link
Contributor

Good idea! I'll try find some tuple cases. The parser is built in a way now that makes bringing these back a bit more involved. But I think I see 1-3 cases that are "easy" to negatively impact performance. Not sure if I can get to it this month or next month. Will ping you once I have a branch.

@epage
Copy link
Collaborator Author

epage commented Oct 3, 2024

@praveenperera in #191 (comment)

Specifically cover the cost of large return types which can show up in surprising ways like just using a tuple (#230)

@epage This is a question I had, when you say watch for large return types. Is it better to call parse_next multiple times and use regular if else branching?

Instead of trying to purely do it all with combinators?

Examples and more context: https://www.perplexity.ai/search/rust-nom-and-winnow-parsing-is-SaJuUbDxSfu09wjpXcOctA

Though in my example, I’m not actually returning, the tuple.

@epage
Copy link
Collaborator Author

epage commented Oct 3, 2024

Examples and more context: https://www.perplexity.ai/search/rust-nom-and-winnow-parsing-is-SaJuUbDxSfu09wjpXcOctA

That AI answer has a semblance of sounding correct but is completely wrong.

  • (...).parse_next() will just call parse_next() on everything inside of the tuple, there is no difference. Also, in cases like calling parse_next on a function, the overhead would only be the creation of a stack frame but we #[inline(always)] that call, so there should be no overhead
  • Sometimes a large combination of parsers is easier to read, sometimes imperative calling of parsers is easier to read. Its dependent on context. In particular, if you force the use of combinators for cases they aren't really designed for, the complexity will be high, making the readability low
  • One large set of combinators does not make error handling easier. In fact, combinators limit the expressiveness of error reporting (e.g. verify, map_res, etc don't compose with cut_err, etc #180)
  • Combinator or imperative should not affect optimizations the library can apply to your parser.
    • For GAT based libraries, like nom v8, what can make a difference is having your own parser functions that you compose into bigger parsers because that stops nom v8 from propagating the GAT through which is used for optimizations. This surprising performance cliff is one reason we've chosen not to leverage GATs.

The answers for when to make individual parse_next() calls is reasonable.

@epage
Copy link
Collaborator Author

epage commented Oct 3, 2024

This is a question I had, when you say watch for large return types. Is it better to call parse_next multiple times and use regular if else branching?

Instead of trying to purely do it all with combinators?

Though in my example, I’m not actually returning, the tuple.

If you use (...).parse_next(), a tuple is being returned, even if its not by your function. Ultimately the performance hit is from large return types but that is difficult to create one without (...).parse_next(), so the topics are closely related.

If you have a function like

fn add(left: usize, right: usize) -> usize {
    left + right
}

the compiler will effectively transform this to

fn add(left: usize, right: usize, out result: usize) {
    out = left + right
}

The fastest form of memory is called a register and parameters and return types go through these where possible. However, if a parameter or return type becomes too big, the compiler will instead return through the stack which is using memory (with dcaches), turning it into

fn add(left: &Large, right: &Large, result: &mut Large) {
    *out = left + right
}

Before v0.5, winnow's return type was

Result<
    (I, O),
    ErrMode<InputError<I>>,
>

Meaning that Winnow <=v0.4 added size_of::<I>() overhead to the return type, making it more likely we'd "spill over into the stack".

Now that we use &mut I as a parameter, our return type is

Result<
    O,
    ErrMode<ContextError>,
>

Winnow has a fixed overhead for error reporting (reducable by specifying a custom error type) but the I overhead is gone. I am considering making ErrMode optional, allowing the overhead to be dropped even further.

However, if you do (parser1, parser2, parser3, parser4, parser5, parsser6).parse_next(), you could still spill over into the stack. I've been hoping we could give rustc enough hints to be able to rewrite that into the imperative form but so far it has not worked in enough cases.

In general, write your code for readability and if its too slow, experiment. It can be surprising what things can speed up or slow things down as there are ripple effects in optmizations. I tried to reduce shuffle things owe I had fewer large return types and instead I made performance worse.

@praveenperera
Copy link

praveenperera commented Oct 3, 2024

@epage very detailed and very helpful thank you! You should add their comments to the docs.

One more question :

Before v0.5, winnow's return type was

Result<(I, O), ErrMode<InputError>

That is still the return type for parse_peek so i’m assuming if i’m using that function I should still be careful of tuples?

Thanks again

update

For anyone reading this, here is an example of switching from parse_peek to parse_next: bitcoinppl/cove@67c98fb

@epage
Copy link
Collaborator Author

epage commented Oct 3, 2024

As mentioned on the other thread, parse_peek should really only be used for testing at this point.

@praveenperera

This comment was marked as off-topic.

@epage

This comment was marked as off-topic.

@epage
Copy link
Collaborator Author

epage commented Jan 3, 2025

#505 moves seq! off of impl Parser for <tuple> to unrolling the parse_next calls, allowing the tuple to be smaller when fields are skipped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-combinator Area: combinators C-bug Category: Things not working as expected M-breaking-change Meta: Implementing or merging this will introduce a breaking change.
Projects
None yet
Development

No branches or pull requests

3 participants