-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate memory usage #147
Comments
Spike findingsContext, complexity of the Issue & recap on previous workInitially the memory leak was reported in January of this year. It was particularly hard to find the cause of it, since it could be in one of the many components, underlying Subgraph Radio:
From the start the most likely culprit were the C bindings and how they interact with Go and Rust code, since C is most prone to memory leak issues, whereas Go (with its garbage collector) and Rust (with the ownership and borrowing system) are memory safe. It's also worth noting that the leak is/was only happening on Linux (our Dockerfile uses the After thorough investigation, the leak was traced to a discv5 issue on the waku-rust-bindings. In simple terms, the memory usage pattern of the Radio was something like this:
Where we could see memory spiking often, in batches of 20-30MB at a time, ultimately that caused the Radio's memory usage to rise to 10GB over a few days, without ever going down. The Waku team deployed a fix for the issue on their side, and straight away we saw an improvement. The memory usage pattern turned to something like this:
Which was a notable improvement! The big spikes were gone, and now the memory went to 4.5GB after a few days. We saw a huge initial spike after the Radio starts and then it turned into a fairly stable plateau. But again, the memory usage was never going down, and despite it rising way slower than before, we still had 2 big problems:
InvestigationAs mentioned above, it's very hard to trace memory issues and profile memory usage in a system that jumps through so much hoops, as Subgraph Radio does. There are three languages involved here in the different layers of the application - Rust, Go and C, and all of them interact with the host system in various ways. Ultimately, only heaptrack proved useful (I tried Profiling the Radio's memory usageI profile the Radio's memory using a macOS machine that runs the Radio in a Docker container, for this I just added RUN apt-get update \
&& apt-get install -y --no-install-recommends \
wget \
...
heaptrack \
... To run heaptrack in the container, I point it to the docker exec -it subgraph-radio heaptrack /usr/local/bin/subgraph-radio I let the Radio run for 10-15 minutes and stop the container, that leaves me with a heaptrack report file ( The culpritRight after opening the heaptrack report, a few issues stand out. Raw sample from the report:
From this we can already see that the issue is coming from the In the short time that the Radio ran (10-15 minutes), it called the The Radio uses OpenSSL when it's running on Linux (Debian) environments because both the SDK and the Radio use SolutionHistory of
|
Wow, this is an awesome report! Thanks for sharing your methodology, I didn't know about heaptrack. |
Could heaptrack help for graphprotocol/indexer#41? 😁 |
hopefully! 🤞🏻 |
This Issue comes the back of the memory leak issue that we tackled in Subgraph Radio 1.0.2 release. While the initial Issue has been fixed and the Radio's RAM is not incrementally increasing all the time, it still appears that it's consuming too much RAM (±4.7GB), and it still looks like it's slowly growing over time (although much much less evident than before).
A basic instance of the Graphcast SDK only takes about 250MB of RAM when running continuously, so the Radio's usage is 50x that. We need to understand why it's using so much RAM and if we can improve this somehow.
The text was updated successfully, but these errors were encountered: