-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split up content databases (kvstores) per network #1086
Comments
I thought about it a bit and I like option
Disadvantages are:
|
I don't think any of those reasons apply only to having a separate database, do they? At first sight, it looks that could also be abstracted away with a separate object for each network (call it ContentStore, or so). Good point on how this in general does get more complicated for pruning, over different networks, on an add of a specific network. Anyway, these are exactly the things that need to be sketched out better I think before we make too many changes. |
FYI, similar (albeit not the same) technical question: https://github.com/status-im/nimbus-eth2/blob/039bece9175104b5c87a8c2ff6b1eafae731b05e/beacon_chain/validators/slashing_protection_v2.nim#L119 |
It maybe more complex to have one more global radius, as different networks have different sizes so to adjust global radius we would need to take it into account somehow not to monopolize node storage by one type of data. Where in radius per network we keep the same size proportional logic everywhere.
Having db per network incurs probably least changes for now (as we have one working network) it is just the question of initiating db in history network constructor instead of fluffy main. Where with multiple kvstores we would also need to update queries and calculating sizes of db. Configs would need to be updated in both approaches as in both of them user should configure different sizes for different networks. (at least I think so) I wonder, maybe we shoould delay making the decision until having another network, and some endpoint which get data from both of them, then implement some proof of concept for both approaches and then see which one we like more ? |
Different networks will have indeed different sizes, but I think that is fine. A node's storage ratios pet network would ideally represent the network total storage ratios. This shouldn't be an issue for the global radius as long as content on each network is evenly distributed over the id space (which it should).
Sure, but with different databases you will also add some complexity, unless the idea was to just split the total storage per amount of networks evenly. Which would probably be not correct, see comment above.
Sure, we can wait with this. I actually want to add a small second database for the accumulator data, as we cant access this data over the network yet, and I don't want it to be pruned among the other data. (This will be under an optional flag when run) |
Related discussion in Portal discord raised some interesting points:
|
So while dealing with the removal of Headers without a proof and the removal of the Union all together (ethereum/portal-network-specs#341 and ethereum/portal-network-specs#362), it became clear that in order to prune & migrate the old data, while possible, the current situation is not ideal. See solution at: #3019 and #3053 There are several possibilities to improve this (some of which are already mentioned here), but some with drawbacks due to added complexities e.g. in pruning. It would be good to think of some solutions that add value but don't make other parts too complex. |
For this task I don't see any good reason to have more than one database. Splitting the data can be done by putting the content for each sub-network into separate tables. As a general rule content that never needs to be queried together should be put into separate tables so that we don't need to rely on indexes as much to get decent query performance on larger databases. We shouldn't need to decode the content to determine what type of content is stored. To solve this we should start adding some metadata fields to the tables such as a type field. We can also make this type part of the index to improve the performance of content lookups. Creating a separate table per content type might be overkill and unnecessarily complicate the codebase. Pruning can be done per subnetwork and the storage capacity assigned per subnetwork. Perhaps the user just see this as a single storage capacity that is divided among the subnetworks automatically. |
Currently the quickest, simplest approach is taken and all is stored in one table / kvstore. However, this will not be scalable once we are dealing with lots of data.
This issue is about how to split this storage, basically this comment: https://github.com/status-im/nimbus-eth1/blob/master/fluffy/content_db.nim#L24
I think approach 1. mentioned there is probably the most straightforward path to take, but some investigation to better understand the implications of (the) other approaches is allowed ;-).
The text was updated successfully, but these errors were encountered: