Skip to content

Commit

Permalink
Add re.findall to pick out re matches
Browse files Browse the repository at this point in the history
Signed-off-by: James Hewitt <[email protected]>
  • Loading branch information
Jamstah committed Mar 9, 2024
1 parent e7a4d00 commit 213d3a1
Show file tree
Hide file tree
Showing 3 changed files with 47 additions and 0 deletions.
1 change: 1 addition & 0 deletions docs/source/filters.rst
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@ At the moment, the following filters are built-in:
- **ical2text**: Convert `iCalendar`_ to plaintext
- **ocr**: Convert text in images to plaintext using Tesseract OCR
- **re.sub**: Replace text with regular expressions using Python's re.sub
- **re.findall**: Find all non-overlapping matches using Python's re.findall
- **reverse**: Reverse input items
- **sha1sum**: Calculate the SHA-1 checksum of the content
- **shellpipe**: Filter using a shell command
Expand Down
20 changes: 20 additions & 0 deletions lib/urlwatch/filters.py
Original file line number Diff line number Diff line change
Expand Up @@ -848,6 +848,26 @@ def filter(self, data, subfilter):
return re.sub(subfilter['pattern'], subfilter.get('repl', ''), data)


class RegexFindall(FilterBase):
"""Pick out regular expressions using Python's re.findall"""

__kind__ = 're.findall'

__supported_subfilters__ = {
'pattern': 'Regular expression to search for (required)',
'repl': 'Replacement string (default: empty string)',
}

__default_subfilter__ = 'pattern'

def filter(self, data, subfilter):
if 'pattern' not in subfilter:
raise ValueError('{} needs a pattern'.format(self.__kind__))

# Default: Replace with empty string if no "repl" value is set
return "\n".join([match.expand(subfilter.get('repl', '\\g<0>')) for match in re.finditer(subfilter['pattern'], data)])


class SortFilter(FilterBase):
"""Sort input items"""

Expand Down
26 changes: 26 additions & 0 deletions lib/urlwatch/tests/data/filter_tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -326,6 +326,32 @@ re_sub_multiline:
One Line
Another Line
re_findall:
filter:
- re.findall: '-[a-z][a-z][a-z]-'
data: |-
Some-abc-things-def-on-ghi-this-line-and
some-jkl-more-mno-here
expected_result: |-
-abc-
-def-
-ghi-
-jkl-
-mno-
re_findall_repl:
filter:
- re.findall:
pattern: '-([a-z])([a-z])([a-z])-'
repl: '\3\2\1'
data: |-
Some-abc-things-def-on-ghi-this-line-and
some-jkl-more-mno-here
expected_result: |-
cba
fed
ihg
lkj
onm
strip:
filter: strip
data: " The rose is red; \n\nthe violet's blue.\nSugar is sweet, \nand so are you. "
Expand Down

0 comments on commit 213d3a1

Please sign in to comment.