-
-
Notifications
You must be signed in to change notification settings - Fork 543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nitter over IPFS or torrent #1188
Comments
That is very clever, hopefully the "Instance has been rate limited." issue goes away once this is implemented. It's been over a month of unable to save posts through nitter. Even the instances that 403s bots kept getting the aforementioned error message, after a 2-3 pages visits by a human being. Looking at it, it looks like a cache system shared across nitter instances. |
it's exactly a shared cache system, like a wayback machine just for tweets running on IPFS yes, displaying outdated tweets is much better than displaying no tweets at all If instead of just reading the RSS feeds, a torrent file of the latest tweets were created, this could certainly alleviate the load on the RSS system, which demands a lot from the instances, speeding up viewing through seeds/peers Few people tend to seed IPFS, but torrenting is easy and makes it easier to create new instances. The instances could actually be divided into 4 groups:
Using IPFS and torrent, it would no longer be necessary for all instances to have valid Twitter accounts to display messages. The more instances and users seed messages, the faster and stronger the nitter will become 💪 |
That means when the user/WBM loads a nitter page, this happens:
Nothing negative has changed on the client side, just that the way it works no longer re-request and trigger its god-awful rate limit, and is faster. Hopefully in the future, we also have such an archivable frontend system for bsky, notion and misskey, the WBM's playback (viewing an archived page) isn't capable of saving complex-JS pages and resulted in having blank pages. |
Using IPFS/torrent is a good idea, but:
The current dilemma of nitter is that instances cannot handle a large number of requests. |
some sort of eventual consensus of x nitter instances maybe? |
through certified instances ex:
If the index server detects that one of the instances has shared corrupted tweets, its IPFS hashes are hidden
yes, but in this case the instance will be marked as untrusted by the IPFS index server
just check the tweets individually (like on fxtwitter, vxtwitter, etc.) if any suspicion of tampering arises |
yes! the index servers would act as a DNS server network for tweets by converting queries like |
I like this. This would effectively pool each Nitter instance's accounts and request budgets, without disadvantaging any instance operator or requiring them to sacrifice request budget to help others. |
Yes it's a good idea. The tampering issue isn't really a problem, plus it's not the biggest preoccupation right now. |
@libreddit, @iv-org, @Booteille Alternative frontend services in general should be doing this. I'd expect platforms will go even harder on the rate limits, either by further reducing the number of requests per time period, increase the length of the cooldown (most rate limits, like github, are 1-4 minutes, twitter is a whopping full day), placing more stuff behind a loginwall and/or any form of hindering users from, well, use the site in any way (not just scrapers who's affected, but also ordinary users viewing content). If AFEs (alternative front ends) don't, they'll become useless. As long as enshittification continues on free sites, AFE will increasingly become important. The act of making sites easier to use from designed problems/annoyance have started with browser extensions, since the internet is an open web, and that browsers are a software agent on behalf of the user rather than the website's. Site owners wanted their websites to be like television that they are the ones to decide how you experience their websites, especially when it comes to intrusive advertisements and disabling right-clicks/select text/inspect elements. And now in the modern era where the internet is mostly used by mobile devices, a lot of our abilities to use the web are hindered. Seriously, we shouldn't need or to be pestered or nudged into having another browser (but with even fewer features) just to see text and images. I'm very thankful of whoever have invented alternative frontend sites, as well as the @ipfs decentralized nature. This restores our ability to browse the net annoyance-free that espically the mobile web have taken it away from. |
Great question. Welcome to the world of decentralized consensus. This is a hard problem to solve. But if solved, the potential gains are huge: Nitter working again with pretty much no rate limiting, while making low traffic to Twitter (and not getting banned). The tampering problem has always existed for Nitter and similar front-ends. You never know if the Nitter/Invidious/Libreddit/etc server is showing you unmodified data from the source, or fake/tampered data. We are just lucky to have so many honest and altruistic front-end operators who provide us with a great service for free.
It is really a good question. Creating a decentralized anti-tampering system is a lot of R&D work, and we should be asking what is the reasonable amount of time/energy to be invested in it (compared to other efforts to keep Nitter runnning in some capacity). I don't see a good solution yet, but one thing is immediately clear. If Nitter instances start sharing data (to solve the problem of making too many requests to Twitter), a malicious instance will be able to do more damage, compared to a single malicious instance in the current architecture. In the current architecture, if one instance starts serving fake data, people can just stop using it, switch to an honest instance, and notify each other to avoid that bad instance. The damage is limited to that one instance, and "poisoning" the data on other well-known and trusted instances requires to hack them. In a distributed/decentralized architecture, one instance could poison the whole network if there is no good anti-tampering system in place. Therefore, it is more important than in the current architecture of independent individual instances. |
The authenticity problem has been discussed in #919 (comment) (Starting from this comment) and there is also an individual issue for it (#931). Quoting from @12joan (#919 (comment)):
It seems possible to cryptographically verify that some data is originated from Twitter, untampered. But someone has to research the inner detail of TLS for it. |
Yes, it is important to have network IPFS file indexes that function as anti-tamper systems. To illustrate, I used as an example the DNS system that translates website names into IPs (addresses of servers on the internet) Without checking the origin, a DNS server is extremely dangerous, which is why so many people use the same services (google, cloudflare, opendns, etc.) Trust in the servers is essential in both cases, in DNS for example there are 13 root servers that feed thousands of other servers currently the listing of public instances of nitter already works in a more or less structured and centralized way instance indexes already exist and their quality is assessed according to availability, resources and response time simply add a system for evaluating the integrity of tweets and this list becomes a decentralized anti-tamper system This system can be automatic (through sampling verification scripts) or collaborative (public API) as it is not possible to do it manually Only instances linked to this integrity assessment network would be shown in the index, instances with tampered tweets could simply be left out. Thus the quality of a considerable group of instances can be guaranteed |
@devgaucho Good comparison with indexing -> DNS. Both maps data to a valid, untampered information. We don't want a nitter page containing tweets "by users" saying things they actually didn't say on twitter. Worse as a archivist, saving tweets that are false, which could perpetuate the false information. |
The problem is that the list is centralized and could go down or get poisoned. But yeah if we had a decentralized system for data integrity verification that would be huge. Ideally tweets could be verified at end user's client.
It says it was deprecated in 2023 due to a vulnerability that allowed to create fake proofs. But other projects in that organization are still active and the website looks promising https://tlsnotary.org/ |
@xaur we already have a decentralized system for verifying data integrity. the challenge is not to verify the integrity of individual messages but to index the message lists on users' pages only once and share the tweet ids with all other instances throught IPFS |
Verification of tweets should probably happen on a different layer. All we need to do is for each individual instance to mark/sign its contributions to the collective database with an instance-specific identifier. Then instances which read the database can analyze contributions for correctness and blacklist as necessary. There's not really any case where we want contributions from an instance that sometimes manipulates tweets. The current system of "if an instance supplies fake data, use another instance" is good enough. We can replicate this model by allowing people to ignore tweets contributed to the DB by the one or the other instance. For the case where someone spams the DB with fake tweets each pretending to come from different (nonexistent) instances, it's easy enough for the retrieving instance to have a whitelist of "real" instances whose signature or hash is verified the same way LetsEncrypt verifies you control a given web service: in order to claim that contribution X is originated by Once the list of public keys has been retrieved, they can be cached at the consumer. Rate-limiting the creation of spam instances is easy enough, since the consumer can see that |
OrbitDB may be suitable for this. |
Why not use ActivityPub? I think that protocol is perfect for Nitter's usecase. |
with the current limitations of private instances it could be interesting to think about sharing data between instances
it doesn't make sense for each instance to make dozens of requests for the same profile at all times
files with the latest tweets could be created and shared via IPFS with a centralized index that would function as an official tracker
the name used in the index of these files could be
<username>/<unix epoch>/tweets.json
, something similar to web.archive.orgthis official tracker could have official mirrors protected by hash system, like the secure apt mirrors on debian/ubuntu
so if there is a file in IPFS created in the last 60 seconds, the instance could get the data from IPFS instead of Twitter
doing it this way:
the main problem with IPFS is the lack of seeds, this does not seem to be a problem when we are dealing with an already consolidated network of decentralized instances
to increase the number of seeds, torrents could be used so that any visitor with a bit torrent client can download the latest tweets and contribute to maintaining the network with seeds (archive.org has been doing this for years)
this would certainly help a lot to reduce the number of requests made by instances to Twitter
The text was updated successfully, but these errors were encountered: