Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate parser ranged variants #98

Closed
2 tasks done
epage opened this issue Jan 30, 2023 · 19 comments
Closed
2 tasks done

Consolidate parser ranged variants #98

epage opened this issue Jan 30, 2023 · 19 comments
Labels
A-combinator Area: combinators C-enhancement Category: Raise on the bar on expectations
Milestone

Comments

@epage
Copy link
Collaborator

epage commented Jan 30, 2023

Please complete the following tasks

winnow version

0.2.0

Describe your use case

Nom has many1, many0, and many_m_n functions. Similar with other parts of the API.

Describe the solution you'd like

What if instead we had a IntoRange trait that took in the different range types and single numbers.

Example:

many(tag("abc"), 0..) -> many0(tag("abc")) 
many(tag("abc"), 1..) -> many1(tag("abc")) 
many(tag("abc"), 1) -> many_m_n(tag("abc"), 1, 1) 

Alternatives, if applicable

No response

Additional Context

No response

@epage epage added A-combinator Area: combinators C-enhancement Category: Raise on the bar on expectations labels Jan 30, 2023
@epage epage added this to the 0.1 milestone Jan 30, 2023
@epage
Copy link
Collaborator Author

epage commented Jan 30, 2023

nom merged this as rust-bakery/nom#1608 with rust-bakery/nom#1613 remaining

Differences we want from that

  • We want this for every applicable parser
  • We want to deprecate the other versions

The main questions remaining

  • if we should keep 0 and 1 variants for convenience?
  • Are there cases where we only want a 0/1 or a ranged?

@epage epage changed the title Consolidate parser variants Consolidate parser ranged variants Feb 1, 2023
@epage epage modified the milestones: 0.3, 0.4 Feb 1, 2023
@epage
Copy link
Collaborator Author

epage commented Feb 1, 2023

Postponed as I want to remove the deprecated functions first

@epage
Copy link
Collaborator Author

epage commented Mar 22, 2023

Potential parsers:

  • many0 / many1 / count
  • fold_many0
  • take, take_while0, take_while1, take_while_m_m
  • take_till0., take_till1
  • take_until0, take_until1
  • many_till0, many_till1
  • separated0

For winnow::character parsers like digit0 / digit1, I'd like to find a different way of solving those.

@epage
Copy link
Collaborator Author

epage commented Mar 22, 2023

chumsky has Parser::repeated which returns an object with at_least, etc. Not seeing a way to handle this for capturing slices unless its through their version of recognize which they can consolidate these as using a parser for slices since GATs can remove some of the overhead.

The problem with that approach is I feel like Parser methods are de-emphasized when reading the code. An alternative might be a Many(<range>) * <parser>. We could offer something similar with Take(<range>) * <contains token> This might have problems determining the output container though...

@epage
Copy link
Collaborator Author

epage commented Apr 28, 2023

I'm running into the dreaded

error[E0207]: the type parameter `I` is not constrained by the impl trait, self type, or predicates
  --> src/combinator/multi.rs:39:9
   |
39 | impl<P, I, O, E> crate::lib::std::ops::Mul<P> for Repeat
   |         ^ unconstrained type parameter

when using Mul

I was already thinking of having a fallback function. Looks like it'll be required

@epage
Copy link
Collaborator Author

epage commented Apr 28, 2023

Hmm, I feel like using functions for this, making the API fluent, is one part verbosity and one part being clever. Maybe I should just fallback to doing this the original way

epage added a commit to epage/winnow that referenced this issue Apr 28, 2023
When working on winnow-rs#98, I realized `repeat` makes the intent a lot
clearer.  Thanks `chumsky`!

This is a part of winnow-rs#95
epage added a commit to epage/winnow that referenced this issue Apr 28, 2023
When working on winnow-rs#98, I realized `repeat` makes the intent a lot
clearer.  Thanks `chumsky`!

This is a part of winnow-rs#95
epage added a commit to epage/winnow that referenced this issue Apr 28, 2023
When working on winnow-rs#98, I realized `repeat` makes the intent a lot
clearer.  Thanks `chumsky`!

This is a part of winnow-rs#95
epage added a commit to epage/winnow that referenced this issue Apr 28, 2023
When working on winnow-rs#98, I realized `repeat` makes the intent a lot
clearer.  Thanks `chumsky`!

This is a part of winnow-rs#95
@epage epage removed this from the 0.4.x milestone Apr 28, 2023
@epage
Copy link
Collaborator Author

epage commented May 1, 2023

We now have

  • take_while
  • repeat
  • fold_repeat

Still remaining

That still leaves ascii parsers like digit0 / digit1.

epage added a commit to epage/winnow that referenced this issue May 2, 2023
This is a follow up to winnow-rs#241 for merging in the 0/1 variants

This is a step towards winnow-rs#98
epage added a commit to epage/winnow that referenced this issue May 2, 2023
This is a follow up to winnow-rs#241 for merging in the 0/1 variants

This is a step towards winnow-rs#98
epage added a commit to epage/winnow that referenced this issue May 2, 2023
This is a follow up to winnow-rs#241 for merging in the 0/1 variants

This is a step towards winnow-rs#98
epage added a commit to epage/winnow that referenced this issue May 2, 2023
This is a follow up to winnow-rs#241 for merging in the 0/1 variants

This is a step towards winnow-rs#98
@martinohmann
Copy link
Contributor

The new consolidated ranged parsers seem to cause performance regressions. I just tried to upgrade hcl-edit to use winnow 0.4.6 and my benchmarks do not look too great (rustc 1.69.0):

cargo criterion hcl-edit --bench parse --features hcl-edit/perf
   Compiling hcl-edit v0.4.8 (/home/mohmann/repos/github.com/martinohmann/hcl-rs/crates/hcl-edit)
   Compiling benchmarks v0.0.17 (/home/mohmann/repos/github.com/martinohmann/hcl-rs/crates/benchmarks)
    Finished bench [optimized] target(s) in 6.73s
parse/hcl-edit/deeply_nested.tf
                        time:   [26.688 µs 26.709 µs 26.736 µs]
                        thrpt:  [26.431 MiB/s 26.458 MiB/s 26.479 MiB/s]
                 change:
                        time:   [+15.679% +18.777% +22.903%] (p = 0.00 < 0.05)
                        thrpt:  [-18.635% -15.809% -13.554%]
                        Performance has regressed.
parse/hcl-edit/large.tf time:   [2.4641 ms 2.4679 ms 2.4731 ms]
                        thrpt:  [32.963 MiB/s 33.034 MiB/s 33.084 MiB/s]
                 change:
                        time:   [+9.0822% +9.4130% +9.7655%] (p = 0.00 < 0.05)
                        thrpt:  [-8.8967% -8.6032% -8.3260%]
                        Performance has regressed.
parse/hcl-edit/medium.tf
                        time:   [528.16 µs 529.96 µs 532.59 µs]
                        thrpt:  [26.960 MiB/s 27.093 MiB/s 27.186 MiB/s]
                 change:
                        time:   [+10.027% +10.435% +11.105%] (p = 0.00 < 0.05)
                        thrpt:  [-9.9951% -9.4492% -9.1132%]
                        Performance has regressed.
parse/hcl-edit/small.tf time:   [34.133 µs 34.157 µs 34.199 µs]
                        thrpt:  [27.691 MiB/s 27.725 MiB/s 27.744 MiB/s]
                 change:
                        time:   [+10.457% +10.928% +11.365%] (p = 0.00 < 0.05)
                        thrpt:  [-10.205% -9.8514% -9.4673%]
                        Performance has regressed.

I can open a separate issue for this, maybe I can try helping to find the root cause.

@epage
Copy link
Collaborator Author

epage commented May 4, 2023

I tried to keep the changes very small so hopefully you can isolate the root cause

(and of course, it seems like we need more benchmarks since our existing ones didn't catch this)

@epage
Copy link
Collaborator Author

epage commented Jun 9, 2023

I've narrowed down the slowdown for parse/hcl-edit/large.tf to 98f5e03

@epage
Copy link
Collaborator Author

epage commented Jun 9, 2023

Same for parse/hcl-edit/deeply_nested.tf

@epage
Copy link
Collaborator Author

epage commented Jun 9, 2023

The slowdown is specifically when using the ascii parsers that wrap take_while0 / take_while1. If I just change take_while0 / take_while1 to the original code, no performance difference is noticed

@epage
Copy link
Collaborator Author

epage commented Jun 13, 2023

If I'm reading cg_diff correctly, the slowdowns are in

  • _ZN4core3ptr42drop_in_place$LT$hcl..expr..Expression$GT$17hecca4285dbbc95fcE.llvm.1197827693854984100
  • _ZN4core3ptr95drop_in_place$LT$$u5b$vecmap..Slot$LT$hcl..expr..ObjectKey$C$hcl..expr..Expression$GT$$u5d$$G T$17h5cab70b7c57bdf45E.llvm.1197827693854984100
  • _ZN4core3ptr42drop_in_place$LT$hcl..expr..Expression$GT$17hecca4285dbbc95fcE.llvm.1197827693854984100
  • _ZN4core3ptr46drop_in_place$LT$hcl..structure..Structure$GT$17hb0bfdd3dbcd5dce7E.llvm.1197827693854984100
  • _ZN4core3ptr95drop_in_place$LT$$u5b$vecmap..Slot$LT$hcl..expr..ObjectKey$C$hcl..expr..Expression$GT$$u5d$$G T$17h5cab70b7c57bdf45E.llvm.1197827693854984100
  • ...

@epage
Copy link
Collaborator Author

epage commented Jun 13, 2023

That is confusing that the slowdowns would be on drops.

@epage
Copy link
Collaborator Author

epage commented Jun 14, 2023

Problem found, see #251

Now performance is better than it was before

@martinohmann
Copy link
Contributor

Woa, thanks for digging! Although I promised, I didn't find the time to investigate this myself due to personal obligations which keep me away from my computer for most of the day right now.

Using 0.4.7 I can see a 15-25% performance improvement over 0.4.6. 👍

@vwkd
Copy link
Contributor

vwkd commented Oct 25, 2023

Still remaining

  • take_till0., take_till1
  • take_until0, take_until1
  • repeat_till0, repeat_till1
  • separated0

That still leaves ascii parsers like digit0 / digit1.

What's the status for these? Any known challenges?

Ranges for separated and co. would make winnow's API very nice!

@epage
Copy link
Collaborator Author

epage commented Oct 25, 2023

Unimplemented.

The initial batch was opportunistic in that we already had the m_n variants (or were a high priority for writing). The m_n variant is the highest priority and we can always reuse 0 / 1 variants if they already exist or add them if they are popular enough and benchmarking shows that it offers improvements.

epage added a commit to epage/winnow that referenced this issue Dec 4, 2023
@epage epage added this to the 0.6.0 milestone Dec 18, 2023
@epage
Copy link
Collaborator Author

epage commented Jan 26, 2024

At this point, ascii and separated_foldl1 are some of the remaining parsers. Both of those need more focused conversations on what the APIs should be, so closing this as resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-combinator Area: combinators C-enhancement Category: Raise on the bar on expectations
Projects
None yet
Development

No branches or pull requests

3 participants