A busy torrust tracker, works really well #908
Replies: 1 comment 1 reply
-
Hi @lestercrest99, I'm glad to have user feedback, especially good feedback :-). We are also running a demo of the Tracker and Index on a small virtual machine to ensure the admin experience is as good as possible. Regarding your problem with the timeout, I think you probably are getting a timeout because of this line: https://github.com/torrust/torrust-tracker/blob/develop/src/servers/apis/routes.rs#L31 We have set a hardcoded value (5 seconds) for any request to the API. If that's OK for your case, we could extract that value into a configuration option so you can increase it. Maybe you prefer to receive a response after 15 seconds rather than the timeout. Anyway, we should find out why the API does not have time to process the request. I guess it's because of the server's high load. I would guess the problem is that the trackers (UDP and HTTP) compete to acquire the lock to update the statistics object in memory. There can be only one writer or many readers for the statistics. Whenever the server receives a connect/announce/scrape request, the thread handling the request must lock the object for writing. I think How to confirm the problemWe could enable the If that's not feasible for you, I can do it in our demo instance. It's a small server, and we could have the same problem. In fact, it's restarted more or less once a week because it runs out of memory, it starts going really slow and the docker container healthcheck rule makes the container restart. How to solve the problemProposal 1: add a cache for statsThe API could load the stats not from the core source in the tracker but from a copy in the API context. We could copy stats to the cache every X seconds (interval with config option) and make sure the thread that makes the copy can acquire the lock. Pros:
Cons:
Regarding this solution, I've been thinking that maybe we should isolate the API data from core data. I mean, the API service (API context) should collect the data from the core tracker (core context) and make a copy of whatever the API needs. For example, API endpoints returning torrents' info get the data directly from the Torrent Repository. In that case, the API must also compete with the core tracker. An intensive API use can affect the tracker's performance and the opposite. Maybe this solution does not make sense because, under high-load pressure because you introduce a new task: copying the data. Maybe the only solution is just to scale up the server. This problem could be even more complex in the future if we add more stats like the ones we are discussing here. Proposal 2: introduce eventsThe core tracker does not maintain stats. It only emits events. Events can be consumed externally, for example, by the API. The API could build a projection with the aggregate stats.
It would be helpful if it's a problem for you to lose the aggregate data at a given moment. I mean, maybe you absolutely need the stats every 15 minutes, and you have a timeout for 30 minutes, so there is no way to fill the data in the middle of the timeouts. I suppose that's not your problem. I don't think tracker stats are so important. ConclusionFor the time being, I would add logs to confirm the reason and increase the timeout or scale up the server, depending on what's better for you. I would also research if there are other ways to avoid readers' starvation if that's the problem. For example: https://docs.rs/parking_lot/latest/parking_lot/type.RwLock.html cc @da2ce7 |
Beta Was this translation helpful? Give feedback.
-
My tracker is serving 22.72 billion requests every month, and 22.88 TB data transfer
uptime is 215 days
mem usage and cpu usage is very good, doesn't exceed %400 cpu under heavy load and 4 gb mem
can handle 9000 req/s behind nginx
api/v1/stats sometimes timeouts, though. I wonder why. maybe i'd create an issue some time
Beta Was this translation helpful? Give feedback.
All reactions