Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiline regex in text filtering (#2857) #2889

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

MoshiMoshi0
Copy link
Contributor

Fixes #2857.
Hopefully this does not break some obscure edge case.

else:
ignore_text.append(k.strip())

for line in content.splitlines(keepends=True):
i += 1
for r in ignore_regex_multiline:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what the approach is here so i dont have to manually compile it in my brain?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It finds all matches over the whole content using finditer, without splitting it into lines.
For each match it calculates the start/end line number by using splitlines on part of the content before the match end, and the number of lines in the matched content.

Instead of accumulating output lines to an array, the line numbers are added to ignored_lines array which is then converted to a set to remove duplicates. This way multiple filtering methods can mark lines as ignored, and the output is constructed as the last step.

@@ -42,6 +41,46 @@ def test_strip_regex_text_func():
# Check line number reporting
stripped_content = html_tools.strip_ignore_text(test_content, ignore_lines, mode="line numbers")
assert stripped_content == [2, 5, 6, 7, 8, 10]

stripped_content = html_tools.strip_ignore_text(test_content, ['/but 1.+5 lines/s'])
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bumping the tests!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feature] Support multiline regex in text filtering
2 participants