Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regexp matching searches for floating substr too often #17427

Open
hvds opened this issue Jan 17, 2020 · 0 comments
Open

regexp matching searches for floating substr too often #17427

hvds opened this issue Jan 17, 2020 · 0 comments

Comments

@hvds
Copy link
Contributor

hvds commented Jan 17, 2020

I don't think I've reported this before, but want it here so I don't forget to look into it:

If we find a floating substr, pick a starting point, and then fail to match, it appears we often search for the floating substr again even when we already know exactly where we'll find it. Here's an example adapted from some C parsing I've been doing:

% perl -Mre=debug -wle '
  $s= "    foo\n" x 10
    . "    bar long_string\n";
  $s =~ m{^ [ \t]* bar \s+ long_string }xm;
' 2>&1 | grep floating
floating "long_string" at 4..9223372036854775807 (checking floating) stclass ANYOF[\t f] anchored(MBOL) minlen 15 
  Found floating substr "long_string" at offset 88 (rx_origin now 0)...
  Found floating substr "long_string" at offset 88 (rx_origin now 8)...
  Found floating substr "long_string" at offset 88 (rx_origin now 16)...
  Found floating substr "long_string" at offset 88 (rx_origin now 24)...
  Found floating substr "long_string" at offset 88 (rx_origin now 32)...
  Found floating substr "long_string" at offset 88 (rx_origin now 40)...
  Found floating substr "long_string" at offset 88 (rx_origin now 48)...
  Found floating substr "long_string" at offset 88 (rx_origin now 56)...
  Found floating substr "long_string" at offset 88 (rx_origin now 64)...
  Found floating substr "long_string" at offset 88 (rx_origin now 72)...
  Found floating substr "long_string" at offset 88 (rx_origin now 80)...

I think after the first time we should skip this check until (roughly) rx_origin + min_offset > offset, certainly when max_offset is infinity, but probably always.

Hugo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants