Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] Support multiline regex in text filtering #2857

Open
MoshiMoshi0 opened this issue Dec 23, 2024 · 6 comments · May be fixed by #2889
Open

[feature] Support multiline regex in text filtering #2857

MoshiMoshi0 opened this issue Dec 23, 2024 · 6 comments · May be fixed by #2889
Labels
enhancement New feature or request

Comments

@MoshiMoshi0
Copy link
Contributor

It seems that Trigger/wait for text, Ignore lines containing, Block change-detection while text matches in Text filtering section do not support multiline regex.

Narrowed it down to:

def strip_ignore_text(content, wordlist, mode="content"):
i = 0
output = []
ignore_text = []
ignore_regex = []
ignored_line_numbers = []
for k in wordlist:
# Is it a regex?
res = re.search(PERL_STYLE_REGEX, k, re.IGNORECASE)
if res:
ignore_regex.append(re.compile(perl_style_slash_enclosed_regex_to_options(k)))
else:
ignore_text.append(k.strip())
for line in content.splitlines(keepends=True):
i += 1
# Always ignore blank lines in this mode. (when this function gets called)
got_match = False
for l in ignore_text:
if l.lower() in line.lower():
got_match = True
if not got_match:
for r in ignore_regex:
if r.search(line):
got_match = True
if not got_match:
# Not ignored, and should preserve "keepends"
output.append(line)
else:
ignored_line_numbers.append(i)
# Used for finding out what to highlight
if mode == "line numbers":
return ignored_line_numbers
return ''.join(output)

The function iterates over the content line by line and matches each regex to each line:

for line in content.splitlines(keepends=True):

The function could be reworked to use re.finditer/re.findall on the whole content instead.

@MoshiMoshi0 MoshiMoshi0 added the enhancement New feature or request label Dec 23, 2024
@dgtlmoon
Copy link
Owner

it COULD be reworked, but then it would maybe break all existing filters, whats your thoughts on how to handle that?

@MoshiMoshi0
Copy link
Contributor Author

Unless I'm missing something it would only break regex filters that have s or m flags set (currently those flags have no effect) or regex that captures \n in the middle of the pattern (currently such regex matches nothing). Everything else should behave the same.

Other option is to match on the whole content only when the s or m flag is set, otherwise use the current implementation.

@dgtlmoon
Copy link
Owner

dgtlmoon commented Dec 24, 2024

Other option is to match on the whole content only when the s or m flag is set, otherwise use the current implementation.

yes! what i'm thinking.. any downsides?

@MoshiMoshi0
Copy link
Contributor Author

Downsides are that you have to supports two versions of text filtering, and that filters that already set s or m flags could break.

@dgtlmoon
Copy link
Owner

There's a few tasks to achieve this

https://github.com/dgtlmoon/changedetection.io/blob/master/changedetectionio/html_tools.py#L365 supports this "line number" mode which is used to render the 'preview' diffs of the filter section

also lots of small references for checking if the content changed or not uses the line number mode and needs to be refactored

@MoshiMoshi0
Copy link
Contributor Author

Yes, but it should not be hard to support line mode with multiline regex, you just count the number of new lines up to each match start/end. And since re.finditer returns matches in order you can optimize the new lines count by starting from the previous match instead of from the content start.

@MoshiMoshi0 MoshiMoshi0 linked a pull request Jan 7, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants