Skip to content

Commit

Permalink
fixes stopwors implementation...
Browse files Browse the repository at this point in the history
  • Loading branch information
guipenedo committed Jan 20, 2025
1 parent cd61018 commit f8e78f5
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion src/datatrove/pipeline/filters/gopher_quality_filter.py
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ def filter(self, doc: Document) -> bool | tuple[bool, str]:
return False, "gopher_below_alpha_threshold"

# stop word filter
if self.min_stop_words and sum(w in self.stop_words for w in words) < self.min_stop_words:
if self.min_stop_words and len(self.stop_words.intersection(set(words))) < self.min_stop_words:
return False, "gopher_enough_stop_words"

return True

0 comments on commit f8e78f5

Please sign in to comment.