Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enriched NoContentFilter #33

Conversation

otakumesi
Copy link
Contributor

@otakumesi otakumesi commented Dec 18, 2023

I enriched NoContentFilter.
The filtering DOM, classes, and ids criteria were determined by analyzing filter results of filter=LargeFreqParagrap and filter=null.

  • split the DOM names variable into three variables (DOM list and full match classes and ids, partial match classes and ids)
  • rewrite the logic matching classes and ids as a full match and partial match

Copy link
Collaborator

@eiennohito eiennohito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there plans for these lists to grow further? If its the case, probably it would be better to move to algorithms/data structures for partial match selection which do not depend on the size of the word list linearly (e.g. using tries).

For the current length the current version is still feasible though.

@otakumesi
Copy link
Contributor Author

otakumesi commented Dec 19, 2023

Are there plans for these lists to grow further? If its the case, probably it would be better to move to algorithms/data structures for partial match selection which do not depend on the size of the word list linearly (e.g. using tries).

For the current length the current version is still feasible though.

Maybe, class/id-based partial matching is almost satisfied.
If it needs to be added, I will handle it.

}

def partialMatchClasses(css: PathSegment): Boolean = {
filteringPartialMatchClassOrIdNames.exists(name => css.lowerClasses.exists(_.contains(name)))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although these codes may be computationally expensive, I can't catch a good idea.
Please, give me a good solution if you have one.

@otakumesi otakumesi requested a review from eiennohito December 19, 2023 09:27
@eiennohito eiennohito merged commit fe38174 into WorksApplications:main Jan 18, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants