Replies: 7 comments 4 replies
-
Beta Was this translation helpful? Give feedback.
-
Good idea! I'm truly surprised so many of them are already down. A sign that this business is not viable and that these will taper down? 🤞 I first though about checking the DNS expiration date, but we can probably expect a lot of them to fall into the hands of domain parking scum after they expire. Checking a few of them, they are still listed in google, so filtering them is even more useful: it's even lower value results now that the links are dead, but they might stay in the index for several months. Some articles say broken links can stay up to three months in the google search index, but other engines might have an even longer retention period. Ideally, we would monitor the number of hits for these searches and remove the filter when a certain threshold (10 results, to account for parking?) is reached. The google search API has a 100 searches/day free tier, so we could get one point a week for now, less when the corpus grows. |
Beta Was this translation helpful? Give feedback.
-
I've seen some people using https://github.com/funilrys/PyFunceble to filter out NXDOMAIN entries from blacklists.
I've noticed that many of the clones or their mirrors have similar structure, at least visually. It's not improbable that they have some code in common, at least between clones from the same author. One idea would be to use the search engine to query for mirrors, extract the links and pass these to a program that fetches the page and tests it against some pattern (maybe with xpaths). The program could disguise itself as one of the common crawlers to prevent it from being detected and blocked early on. Another idea could be to perform some mathemagics/statistics on the search results, in a similar way to email antispam filters. |
Beta Was this translation helpful? Give feedback.
-
To help me to remove down websites and to review big lists of domains, I made a Tampermonkey UserScript helper. Here is the documentation: https://github.com/quenhus/uBlock-Origin-dev-filter/wiki/Helper-to-Review-a-List-of-Domains Example of the UI: |
Beta Was this translation helpful? Give feedback.
-
@quenhus Wouldn't it be better to generate clean_report.md every month using workflows ? |
Beta Was this translation helpful? Give feedback.
-
I noticed a few minutes ago that three domains which were blocked before ( Is it really wise to remove these domains from blocklists? The domain Here's an example why I think these domains should be kept blocked. I'm using the libredirect extension on Firefox Android and maybe because of a bug, the domain |
Beta Was this translation helpful? Give feedback.
-
I don't want to keep useless block rules in the filter.
I created a tool to help detect domains that are down (
/src/clean_data/main.py
).If anyone want to help me with that :D
I think we can remove domains without A/AAAA DNS response. However it would be great to use a Google
"site:DOMAIN linux"
to determine whether each domain is still used as a mirror. Don't forget to disable uBlock-Origin-dev-filter while doing so, otherwise you will only get empty responses.Beta Was this translation helpful? Give feedback.
All reactions