-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add request limit cache #1275
Add request limit cache #1275
Conversation
…loaded-signing-test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The signing stress test is great ⭐
And the cache is definitely an improvement, since this stuff does not need to be stored on disk.
But i would have done it a bit differently - rather than a generalised cache i would have separate data structures for the different things we need to store.
So for the request limit i would have the property of AppState
be:
request_limit: Arc<RwLock<HashMap<VerifyingKey, RequestLimitStorage>>>
and have similar properties for the other things we need to store (we could put them in a sub-struct if AppState
is starting to get a mess).
The advantages:
- Less time spent waiting for lock to become free, as with everything behind a single RwLock the chances of it being locked are much higher. I think putting them all behind the same
Arc
is fine. - Faster because you don't need to serialize and deserialize values.
- The type system guarantees that entries are well formed - impossible to have a failed lookup or corrupt value.
impl AppState { | ||
/// Setup AppState with given secret keys | ||
pub fn new( | ||
configuration: Configuration, | ||
kv_store: KvManager, | ||
pair: sr25519::Pair, | ||
x25519_secret: StaticSecret, | ||
cache: Cache, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: It looks like we always call this constructor with the cache being empty - so it would be slightly less code to create a new cache here rather than have it be passed in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ya but if we want to test some pre state it would be easier......however I did create the unsafe calls for that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also initialize it to Default
internally instead of passing it in. There are methods to write into the cache which can be used for testing
self.clear_poisioned_chache(); | ||
let cache = | ||
self.cache.read().map_err(|_| anyhow!("Error getting read read_from_cache lock"))?; | ||
Ok(cache[key].clone()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should definitely use .get
here and return an option, as this will panic if the key does not exist.
Also, it gives us a way to check if the key exists and get it if it does in one go. Otherwise we end up locking the RwLock once for exists
, unlocking again then immediately locking again to get the value.
@@ -344,6 +352,46 @@ impl AppState { | |||
)) | |||
} | |||
|
|||
pub fn write_to_cache(&self, key: String, value: Vec<u8>) -> anyhow::Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Im not super keen on using anyhow here as strictly speaking this is library code. Probably it doesn't really matter but in best case we make have a CacheError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im ok with that, I saw your method I think I was thinking of a method with multiple errors then never ended up writing one
hmmmmm I see the benefit, the issue being it will be writing a lot more code instead of a generalized get call or write call it becomes multiple functions for each storage unit, im on the fence, Actually don't respond here let's talk about it on #1282 |
Co-authored-by: peg <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm with Peg here, we should be leveraging the type system here more + also defining the cache a bit better (but I guess that's being discussed in #1282).
impl AppState { | ||
/// Setup AppState with given secret keys | ||
pub fn new( | ||
configuration: Configuration, | ||
kv_store: KvManager, | ||
pair: sr25519::Pair, | ||
x25519_secret: StaticSecret, | ||
cache: Cache, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also initialize it to Default
internally instead of passing it in. There are methods to write into the cache which can be used for testing
@@ -344,6 +352,46 @@ impl AppState { | |||
)) | |||
} | |||
|
|||
pub fn write_to_cache(&self, key: String, value: Vec<u8>) -> anyhow::Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these public methods should be documented
self.cache.clear_poison() | ||
} | ||
} | ||
// TODO delete from cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover TODO
@@ -344,6 +352,46 @@ impl AppState { | |||
)) | |||
} | |||
|
|||
pub fn write_to_cache(&self, key: String, value: Vec<u8>) -> anyhow::Result<()> { | |||
self.clear_poisioned_cache(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure we should just be bypassing this check every time we try and write.
For example, let's say there's a thread that's updating the request limit and it keeps panicking when trying to update the limit. By bypassing this we could end up with inconsistent data here, potentially leading to the limit just being ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for one block, which would be better than a poisioned cache forever making it unable to sign
/// A global cache type for the TSS | ||
pub type Cache = Arc<RwLock<HashMap<String, Vec<u8>>>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kinda strange to have this between the struct definition and implementation block
/// A global cache type for the TSS | ||
pub type Cache = Arc<RwLock<HashMap<String, Vec<u8>>>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By using a Vec<u8>
value we lose type guarantees here for no real benefit. We know exactly what type(s) we want to store, so imo we should just use them directly
@@ -602,7 +601,7 @@ pub async fn check_for_key(account: &str, kv: &KvManager) -> Result<bool, UserEr | |||
/// Checks the request limit | |||
pub async fn request_limit_check( | |||
rpc: &LegacyRpcMethods<EntropyConfig>, | |||
kv_store: &KvManager, | |||
app_state: &AppState, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd try and limit the scope of what we pass into here. We don't need the whole of AppState
, just the cache
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently all hellper functions are in app state #1282 should handle that by creating a sub impl of cache
@@ -429,6 +440,84 @@ async fn signature_request_with_derived_account_works() { | |||
clean_tests(); | |||
} | |||
|
|||
#[tokio::test] | |||
#[serial] | |||
async fn signature_request_overload() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the actual assertions needed for this test? I don't see anything (either success or failure)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dvdplm suggested that we can probably get away with a non-poisoning lock here instead of explicitly ignoring the poisoned lock. |
From the docs, "No poisoning, the lock is released normally on panic." which doesn't help the initial concern of the data not being updated.......which again is mostly being off by one which isn't a big deal, however locking the sign flow is, anyways can look in more on monday, make an issue |
One of the benefits here is that we lean into the fact that we're not offering any guarantees around a non-poisoned lock state - as opposed to hiding that from the caller by clearing the poisoned state internally |
right but that means we either do this for all the rwlocks here or just this one? Im writting this one, I know that a poision lock should not lock the request limit Im not sure about the other rwlocks in app_state. would you rather be explicit about this or have two different rwlock libraries |
Not sure - depends on the other data. Let's leave it as is for now, and make a concrete decision in #1282? |
lol pretty much what I was thinking, also to be honest talking about this mutex problem is fun id like to bring it up in our next call.......just cuz idk it is fun |
Originally started off as writing a threaded test for multiple txs, found out in the
increment_or_wipe_request_limit
function there was areserve_key
kvdb call that did not handle the overloading tx well and caused failures. This was solved by adding a cache and moving the request limit thereThe cache probably should have been used before and will open up an issue about moving things that were stored in the kvdb that shouldn't to it. #1280
As well will open an issue to create a deletion strategy for request limit as it is O(n) (this problem existed with the kvdb too) #1281
Also maybe we should merge other app_state hashmaps into cache #1282