-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving DirectFileStore for performance and scalability #281
Comments
I like this idea to make it easy to port from one machine to another for analysis etc. If we're open to discussing adjustments to the binary format, would it be useful to consider a field in the header of the file to specify some sort of version of the binary format to allow for easier upgrade? I think it might make it easier to evolve the schema of the binary format over time and reduce the risk of upgrades. |
Yes!!! So there are two reasons I haven't done this... One is "fear": maybe that'll actually be slower / worse in some way for some case or someone. It would be worse if you have a lot of thread contention, basically, which was the logic behind "file per metric": if 2 threads are incrementing at the same time, if they're not the same metric, they don't block on each other because it's separate files. In retrospect, I consider this was a mistake. But this fear is why I tried making it configurable in the PR you sent, which didn't quite work. Having interchangeable stores kinda solves that, though, so i'm not too fussed, if we can convince ourselves that, for most cases, the default (file per process) is better, and if someone has that high thread contention, they can switch to the "file per metric" store, which would be in a separate accessory gem, for example. The second reason is honestly I haven't had much time to work on this, and this is a not insubstantial change, so I've been putting it off (and feel appropriately bad about that!). So i'm really happy to hear you're interested in helping with it!
At the time I refactored the stores, I did see that you were doing "file per metric type", but I didn't quite understand why (or if I did, I no longer remember). Can you tell me a bit more about why this is useful / advantageous?
Again, if this is also a separate store, particularly in a separate gem (i'll expand on why) I would love to see this! The main challenge I had with making the exporting faster was that I was letting purity of the interfaces get in the way. It's been a while, so I may be getting details wrong here, but it's something like the exporter does a "foreach metric, get the data", and so for each of the metrics you're going to the files and reading them and processing them, and then for the next metric you do that again. Which reads very elegant in the code, but the performance is terrible. You'd want to read all the files, munch them all at once, and dump back a giant result with all the numbers for all the metrics, and then the exporter can deal with that. That'd be faster, and from what you're saying, I gather you could do that even faster in C/Rust. The "problem" is the way the code is currently structured doesn't play nice with that; we need to refactor it, and probably accept a "elegance/performance" tradeoff (which i'm happy to do) I'd say let's have a sketch of what your idea is before sinking a lot of time and effort in that, and once we know roughly the direction, that'd be awesome! The reason i'm saying this Rust solution should be in a separate gem, however, is we want the main gem to be pure-Ruby, because we want to support platforms like JRuby, which can't run native code gems, and which don't need these solutions (because they don't fork). So ideally the Rust solution would be a separate gem one includes and points at in configuration.
I have no opinion on that. I don't intuitively understand why that's better (and i'd like to know just for my own learning), but i'm happy either way :)
We absolutely are. The binary format is what I found in a proof of concept someone (Julius?) had done and I blatantly stole it. I have no attachment to it, and as per the answer below, it doesn't matter what it is :)
This has been explicitly not considered. These files are intended to be ephemeral. You're supposed to delete all of them every time you restart your web server. There shouldn't be any situations where one version of the code needs to read files generated by another version. So we didn't worry about these types of headers / migrations / versions / etc, but also, we almost want to actively discourage that kind of thing. There shouldn't be a reason for doing that :)
Ok, fine you found a reason ;-) Tbh, i'm skeptical, but i'm not going to particularly stand in the way, assuming "no serious trade-offs". I would worry about introducing code complexity (having more involved code that can read different versions of files), or any sort of performance hit, I don't think pretty much any tradeoff is worth this. But again, i'm open to changing my mind on this one! Thank you for this, i'm really excited to see this move forward. I've had it in my "wish i had time to do" list for years now! |
IIUC I think you're suggesting creating a new gem to offer a custom data store that can store and process the data however it wants as long as it implements the Store interface. Does that sound about right? |
@stanhu One thing I've been thinking about is how we could make a multi-process shared file store that is consistent across different client library implementations. The main reason for this is we've been working on a new multi-process worker controller to replace things like unicorn/puma/gunicorn. Kinda like Phusion Passenger, but our controller is written in Go. I'm in the process of getting approval to open source this new worker controller. One feature I've been thinking about is that we implement the metrics gathering read path in Go so we can take Ruby/Python/Node out of the loop. |
Exactly that!
Would you be able to call this Go "web server" from the Ruby/Python/Node code? If so, you can do the incrementing / observing there too, and you don't need files anymore... |
Yes, that's possible, but I think it would be less efficient all-around because each increment would require network packets. We'd be back to the inefficiencies of statsd. 😁
I'm sure it will catch on once I show the numbers on how much performance improvement we got. Although we're actually replacing Einhorn, not Unicorn. Also this is not for GitLab/Ruby. It's actually a Python service. My only request is that we make a "Shared filesystem metrics format" that is portable across implementations. This way we could share functionality across client_ruby, client_python, etc. Same thing goes with the idea of "works with pure ruby" vs "Ruby with a Rust plugin". |
@SuperQ As I mentioned earlier, we actually tried this approach in our staging environments (see parsing code). As @mttkay said in https://gitlab.com/groups/gitlab-org/-/epics/9433#note_1271623080:
That's why we're looking at Rust now, and we're quite encouraged that our benchmarks show that the Rust implementation for the read path is already faster than the C implementation for a representative test case with thousands of metrics:
We're doing some work to validate that Rust memory usage is also in line, but we don't really expect that many surprises since there's no garbage collection.
I like that idea. That suggests to me that all the metric data--including docstrings, metric types--should be serialized to a file so that any client can read and aggregate the data just from the files themselves.
@dmagliola That makes sense. I wonder if we want to follow the model of
That goes to my previous point. I believe with Going forward, I'd propose:
If I were to propose a new binary format, I might suggest something like: < ----- 4 bytes ----- > | < ----- 4 bytes ----- > | Total used bytes | | Total length of key 1 | Key 1 metric type | | Key 1 docstring length | Key 1 docstring | | Key 1 docstring (continued with padding) | | Key 1 name length | Key 1 name | | Key 1 name (continued with padding) | | Key 1 value (double) | | Total length of key 2 | Key 2 metric type | ... Some notes:
This is serialized as (ignore the redundant metric name): ["cache_misses_total", "cache_misses_total", ["method","feature_category"], ["GET", "build_artifacts"]] In
The second approach is 3x slower for decoding than the JSON approach in Ruby. I think we stuck with JSON in the C implementation because we had a fast JSON parser, and parsing CGI strings would be more complicated. I'd favor sticking with JSON for performance reasons. What do you think? |
@slai11 pointed out that we may also have to store the aggregation mode for gauges (sum/max/min). |
That would be my preference too, at least from a convenience point of view. Currently, we have to parse the file name (which contains the aggregation mode) to understand which aggregation is required, and it also leads to asymmetric naming because the aggregation mode is meaningless for anything but gauges (a counter you usually want to OTOH, it also means we should account for this to be optional in the binary format since we would not want to waste storage for this in a counter file, when we know counters are always summed up? Or do we want to support all aggregation modes on all metric types? Maybe there are use cases for this, but I believe this is currently unsupported in |
Right now we support all aggregation modes in all metric types. The one exception is we only support "most_recent" for gauges. I don't remember why but it's gotta be either in a code comment or a commit message |
Makes sense. Another reason not to make aggregation part of the binary format is that we probably wouldn't want to store it redundantly as each entry denotes a sample, not a metric, but we only need to store aggregation mode once per metric. |
To clarify, I'm not saying not to make aggregation part of the binary format. I'm happy to do that if we find a good way. I'm just suggesting that we don't "force" sum for non-gauges |
w.r.t Stan's proposal, I was thinking we could store both the metric type (0 for counter, 1 for gauge, and 2 for histograms would only take up 2 bits) and aggregation mode within the 32 bits allocated for metric type information? |
I may be way off here, but I'd measure before going too hard on how many bits/bytes this takes... Compared to the space taken up by the labels, this is probably not going to make a difference to performance... Which by the way reminds me of something... It was mentioned above in this thread we should use JSON instead of CGI encoding for the labels... I'm pretty sure I benchmarked that and CGI encoding was faster in Ruby. We should retest that... It's been a couple years, maybe JSON is much faster now... But I wouldn't switch to JSON unless benchmarks show it's not slower |
I think this might make a good topic for the Prometheus Dev summit that's on 2023-03-23. This discussion may also interest client_python. |
Out of curiosity, I see that
|
Hey, first off: sorry for only chiming in now. I was finishing off and recording a conference talk when the bulk of this discussion happened, and I've only just got to it on my to-do list. I'd like to address everything that's come up, but that's a lot, so I'm going to break my response into sections so it doesn't get too unwieldy. Reducing number of open filesRight, let's start with the least contentious part of this. I completely agree with us building a store that does file-per-process rather than file-per-metric-per-process. The original idea of minimising lock contention was a reasonable one to pursue, but has demonstrably not worked well in practice for many of our users. Whatever else we do, I think we should definitely build a store that does this, and perhaps even make it the default to avoid people running into a sharp edge. Changing the binary formatIn the general case, I'm entirely in favour of doing this if it helps us meet a goal. The files produced and consumed by If we need to do something which implies making a change to this format, or if a change to this format means a significant performance win, then let's do it. That said... Switching to little endianI'm fine with us switching to little endian, but I'm not sure it buys us anything beyond a little internal consistency. I would be in favour of this if the format weren't so internal and the files weren't so transient, but they are and I'd like to keep them that way. Introducing a header for versioningI'm pretty strongly opposed to this, and it was a very deliberate choice not to have something like this in the current implementation. These files aren't meant for external consumption, and are meant to be thrown away once the server process is finished. I don't want to do anything that encourages people to treat them any other way, and I don't want the complexity of handling multiple versions of these files in the store code. Changing the store interfaceI'll echo @dmagliola here and say that I'm totally fine with us revisiting the performance vs interface cleanliness trade-off. Having a "just get me all the metrics in one go so that I can serialise them" method feels pretty inevitable as a performance win. Where does metric type information live?I feel fairly neutral about this one. There hasn't been a strong reason to externalise this information into the files themselves (either in the file name or as part of the storage format within the file), but I can believe it would be a useful optimisation to have this information alongside the metrics for that big "aggregate everything together" method. I'm totally down to introduce this if it gets us a performance win. I don't mind too much whether we do that via making the storage file-per-process-per-metric-type or by encoding it in the metric data. Rust extensionThis has been on my list of side-projects since forever, and I didn't exactly get far before getting distracted by something else. I'm extremely keen to have a really performant version of the data store we end up creating in pure Ruby. I do think we should have a pure Ruby version that makes all of the choices we've talked about (e.g. file-per-process), and then have an opt-in high-performance version that uses Rust to do the heavy lifting. In
|
+1 for the most part. The only reason I would want this is to support external reading by an external aggregator. For example, having a Go server that does this to avoid having Ruby do any work in doing the aggregations. But if there's a rust HTTP |
At GitLab, we've been running prometheus-client-mmap to record and report Prometheus metrics. It's an early fork of this project and uses a C extension with
mmap()
to optimize both the read and write paths.We'd like to switch to
DirectFileStore
, but we see a number of issues:DirectFileStore
recording one metric per file is going to work at scale. With thousands of metrics, that's a lot of file descriptors that may have to be opened to read and write metrics. Inprometheus-client-mmap
we use one file metric per metric type (counter, histogram, gauge). I see there was an attempt to use one file per process in One file per process instead of per metric/process for DirectFileStore [DOESNT WORK] #161.DirectFileStore
creates too many files for long running applications #143, slow response on metrics http endpoint #194): Aggregating thousands of metrics in Ruby is pretty CPU and memory intensive. We found that this was even difficult to do efficiently with a Go exporter that reads theprometheus-client-mmap
metrics since garbage collection becomes an issue due to memory allocations. We have a prototype Rust port that handles reading of metrics that is faster than the C implementation.prometheus-client-mmap
can be aggregated by a separate process since the metric type is in the filename. WithDirectFileStore
, I believe the types are registered in the registry, so there's no way an outside process can determine the metric type just by scanning the.bin
files.I would like to propose a path forward:
DirectFileStore
to use file per process, or one file per metric type. The latter is simpler from our experience. If we do the former, we'd probably want to encode the metric type in the file.DirectFileStore
. We can easily adapt the work in our Rust port forDirectFileStore
.A side point: I believe the metrics are always written in the native endian format. I propose that we enforce little endian (which is what x86 and ARM64 use) to avoid cross-platform confusion.
@dmagliola What do you think?
The text was updated successfully, but these errors were encountered: