[feature] Support multiline regex in text filtering #2857

MoshiMoshi0 · 2024-12-23T19:02:49Z

It seems that Trigger/wait for text, Ignore lines containing, Block change-detection while text matches in Text filtering section do not support multiline regex.

Narrowed it down to:

changedetection.io/changedetectionio/html_tools.py

Lines 365 to 403 in 5dea5e1

    
           def strip_ignore_text(content, wordlist, mode="content"): 
        
               i = 0 
        
               output = [] 
        
               ignore_text = [] 
        
               ignore_regex = [] 
        
               ignored_line_numbers = [] 
        
               for k in wordlist: 
        
                   # Is it a regex? 
        
                   res = re.search(PERL_STYLE_REGEX, k, re.IGNORECASE) 
        
                   if res: 
        
                       ignore_regex.append(re.compile(perl_style_slash_enclosed_regex_to_options(k))) 
        
                   else: 
        
                       ignore_text.append(k.strip()) 
        
               for line in content.splitlines(keepends=True): 
        
                   i += 1 
        
                   # Always ignore blank lines in this mode. (when this function gets called) 
        
                   got_match = False 
        
                   for l in ignore_text: 
        
                       if l.lower() in line.lower(): 
        
                           got_match = True 
        
                   if not got_match: 
        
                       for r in ignore_regex: 
        
                           if r.search(line): 
        
                               got_match = True 
        
                   if not got_match: 
        
                       # Not ignored, and should preserve "keepends" 
        
                       output.append(line) 
        
                   else: 
        
                       ignored_line_numbers.append(i) 
        
               # Used for finding out what to highlight 
        
               if mode == "line numbers": 
        
                   return ignored_line_numbers 
        
               return ''.join(output)

The function iterates over the content line by line and matches each regex to each line:

for line in content.splitlines(keepends=True):

The function could be reworked to use re.finditer/re.findall on the whole content instead.

The text was updated successfully, but these errors were encountered:

dgtlmoon · 2024-12-23T21:01:31Z

it COULD be reworked, but then it would maybe break all existing filters, whats your thoughts on how to handle that?

MoshiMoshi0 · 2024-12-23T21:24:54Z

Unless I'm missing something it would only break regex filters that have s or m flags set (currently those flags have no effect) or regex that captures \n in the middle of the pattern (currently such regex matches nothing). Everything else should behave the same.

Other option is to match on the whole content only when the s or m flag is set, otherwise use the current implementation.

dgtlmoon · 2024-12-24T10:39:25Z

Other option is to match on the whole content only when the s or m flag is set, otherwise use the current implementation.

yes! what i'm thinking.. any downsides?

MoshiMoshi0 · 2024-12-24T20:02:12Z

Downsides are that you have to supports two versions of text filtering, and that filters that already set s or m flags could break.

dgtlmoon · 2024-12-27T09:52:26Z

There's a few tasks to achieve this

https://github.com/dgtlmoon/changedetection.io/blob/master/changedetectionio/html_tools.py#L365 supports this "line number" mode which is used to render the 'preview' diffs of the filter section

also lots of small references for checking if the content changed or not uses the line number mode and needs to be refactored

MoshiMoshi0 · 2024-12-27T15:19:34Z

Yes, but it should not be hard to support line mode with multiline regex, you just count the number of new lines up to each match start/end. And since re.finditer returns matches in order you can optimize the new lines count by starting from the previous match instead of from the content start.

MoshiMoshi0 added the enhancement New feature or request label Dec 23, 2024

MoshiMoshi0 linked a pull request Jan 7, 2025 that will close this issue

Support multiline regex in text filtering (#2857) #2889

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature] Support multiline regex in text filtering #2857

[feature] Support multiline regex in text filtering #2857

MoshiMoshi0 commented Dec 23, 2024

dgtlmoon commented Dec 23, 2024

MoshiMoshi0 commented Dec 23, 2024

dgtlmoon commented Dec 24, 2024 •

edited

Loading

MoshiMoshi0 commented Dec 24, 2024

dgtlmoon commented Dec 27, 2024

MoshiMoshi0 commented Dec 27, 2024

[feature] Support multiline regex in text filtering #2857

[feature] Support multiline regex in text filtering #2857

Comments

MoshiMoshi0 commented Dec 23, 2024

dgtlmoon commented Dec 23, 2024

MoshiMoshi0 commented Dec 23, 2024

dgtlmoon commented Dec 24, 2024 • edited Loading

MoshiMoshi0 commented Dec 24, 2024

dgtlmoon commented Dec 27, 2024

MoshiMoshi0 commented Dec 27, 2024

dgtlmoon commented Dec 24, 2024 •

edited

Loading