Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusing behavior of caching mechanism in 4.1.2 for parametrized selectors #544

Open
ViktorQvarfordt opened this issue Nov 8, 2021 · 6 comments
Labels

Comments

@ViktorQvarfordt
Copy link

ViktorQvarfordt commented Nov 8, 2021

The call signature of equalityCheck is hard to use and there are excessive calls to equalityCheck.

In the following example, think of arg1 as state and arg2 as some parameter, I've just kept it general for brevity.

import { createSelectorCreator, defaultMemoize } from 'reselect'

const createFooSelector = createSelectorCreator(defaultMemoize, {
  // How do we know if `a` and `b` represent arg1 or arg2? I think we should get
  // all args at once in justs one call, so that both `a` and `b` are on the form [arg1, arg2].
  equalityCheck: (a, b) => {
    const equal = a === b
    console.log('equalityCheck', { a, b, equal })
    return equal
  },
  maxSize: 1000
})

const selectFoo = createFooSelector(
  [
    (arg1, arg2) => arg1,
    (arg1, arg2) => arg2,
  ],
  (arg1, arg2) => {
    console.log('selectFoo', { arg1, arg2 })
    return 0
  }
)

selectFoo('arg1', 'arg2:1')
selectFoo('arg1', 'arg2:2')
selectFoo('arg1', 'arg2:3')
console.log('\nThis will not trigger recompute (but excessive equalityChecks):')
selectFoo('arg1', 'arg2:1')

Outputs

selectFoo { arg1: 'arg1', arg2: 'arg2:1' }
equalityCheck { a: 'arg1', b: 'arg1', equal: true }
equalityCheck { a: 'arg2:2', b: 'arg2:1', equal: false }
selectFoo { arg1: 'arg1', arg2: 'arg2:2' }
equalityCheck { a: 'arg1', b: 'arg1', equal: true }
equalityCheck { a: 'arg2:2', b: 'arg2:1', equal: false }
equalityCheck { a: 'arg1', b: 'arg1', equal: true }
equalityCheck { a: 'arg2:3', b: 'arg2:2', equal: false }
equalityCheck { a: 'arg1', b: 'arg1', equal: true }
equalityCheck { a: 'arg2:3', b: 'arg2:1', equal: false }
selectFoo { arg1: 'arg1', arg2: 'arg2:3' }
equalityCheck { a: 'arg1', b: 'arg1', equal: true }
equalityCheck { a: 'arg2:3', b: 'arg2:2', equal: false }
equalityCheck { a: 'arg1', b: 'arg1', equal: true }
equalityCheck { a: 'arg2:3', b: 'arg2:1', equal: false }

This will not trigger recompute (but excessive equalityChecks):
equalityCheck { a: 'arg1', b: 'arg1', equal: true }
equalityCheck { a: 'arg2:1', b: 'arg2:3', equal: false }
equalityCheck { a: 'arg1', b: 'arg1', equal: true }
equalityCheck { a: 'arg2:1', b: 'arg2:2', equal: false }
equalityCheck { a: 'arg1', b: 'arg1', equal: true }
equalityCheck { a: 'arg2:1', b: 'arg2:1', equal: true }

It looks like the caching mechanism is doing a search. It doesn't feel like a great caching mechanism - we must be missing something? We will revert back to re-reselect for parameterized selectors case for now. Are we using reselect incorrectly or is this a bug?

@markerikson
Copy link
Contributor

Can you clarify what you mean by "call signature is hard to use"? Equality functions are meant to be trivial: "are these two things considered the same, yes or no". It's irrelevant which argument is first and which is second - they should be considered equal or not regardless.

Internally, the equalityCheck function will get called multiple times:

  • Outer level: once for each input selector that was passed in
  • Inside of defaultMemoize with a cache size > 1: once for each cache entry that is stored (a standard .findIndex() loop)

You can see the LRU cache implementation here:

https://github.com/reduxjs/reselect/blob/v4.1.2/src/defaultMemoize.ts#L21-L25

If you've got suggestions for better options we're certainly interested, but this straight out of the implementation of https://github.com/erikras/lru-memoize, and that library consistently ranked reasonably well in benchmarks of caching libraries when I was researching how to add a larger cache to Reselect.

So, no, I don't think this is a bug as far as I can tell. Are you seeing some kind of actual performance problems, or are you just wondering how it behaves in general? At first glance this feels like worrying over perf when there isn't actually a problem.

@ViktorQvarfordt
Copy link
Author

Thanks for clarifying. The use case I’m considering is sharing memoization of parametrized selectors across components. re-reselect solves this with cache keys which gives constant lookups.

How would one achieve the same behavior as re-reselect using reselect 4.1? This is what I mean with the call signature; just like with re-reselect I would like to cherry pick out the cache key in reselect and have the cache check equality on that (and of courser always check reference equlity on the first argument that is the state).

@ViktorQvarfordt
Copy link
Author

I understand the cache has to compare across all cache entries since the user defines a comparator and not a cache key, resulting in O(N) instead of O(1). Perhaps not a big deal. But I encountered this when I was experimenting with deep equal equality checks like in some of the examples in the readme. And in this case the additional equality checks become a significant performance impact. One has to use it very sparingly.

Perhaps the taleaway from this should be that I don’t understand how to achieve the same functionality as re-reselect. Perhaps this is different by design and I’m missing something. But the answer here is now ”Yes” but I don’t see how so perhaps I’m missing something.

@markerikson
Copy link
Contributor

markerikson commented Nov 9, 2021

I don't think you're "missing" anything. Setting maxSize > 1 is the rough equivalent in behavior. But, as you've noted, there are differences in how the two libraries implement that behavior.

Can you give an example of why you need to do deep comparisons? What's the use case? While that's always been possible given that you can pass your own equality function, Reselect was specifically built around the intent that comparisons are done either by reference or shallow checks, since those are fast and cheap.

@ViktorQvarfordt
Copy link
Author

ViktorQvarfordt commented Nov 9, 2021

We actually don't need to do deep comparison. That was just me experimenting. The need we have is precisely that which re-reselect caters too. I.e. sharing cache based on cache key for parametrized selectors across multiple components.

Now that I have a better understanding of the difference between re-reselect and reselect 4.1, and that reselect 4.1 doesn't cover the exact same functionality, I feel my confusion has been resolved. The root cause was an (incorrect) belief that reselect 4.1 would replace re-reselect.

I'd suggest to clarify this in the documentation.

Also, before finally dropping the questions around excessive calls to checkEquality, should the equality directly after the line selectFoo { arg1: 'arg1', arg2: 'arg2:3' } really be there? What is their purpose? It seems the same checks happen both before and after the selector was called. That is, these four lines:

equalityCheck { a: 'arg1', b: 'arg1', equal: true }
equalityCheck { a: 'arg2:3', b: 'arg2:2', equal: false }
equalityCheck { a: 'arg1', b: 'arg1', equal: true }
equalityCheck { a: 'arg2:3', b: 'arg2:1', equal: false }

@markerikson
Copy link
Contributor

I'm actually a bit confused myself. I haven't fully gone through all of re-reselect's docs, but the intent here is that the maxSize option can replace it for the common use case of "I want to share one selector instance between many components that will pass in different parameters", because now that one selector instance will keep cached values for multiple different input sets. Yes, the implementation is different in that re-reselect uses a "cache key extraction" approach, but the use case should be basically the same here.

Are there other aspects of re-reselect that the maxSize behavior does not cover, or that the O(n) cache lookup implementation is unsuitable for?

The comparisons that you're seeing are because Reselect's original logic does a double-memoization. It memoizes both an outer function that accepts the overall arguments, and an inner function that accepts the results of the input selectors:

https://github.com/reduxjs/reselect/blob/v4.1.2/src/index.ts#L105-L128

Both functions are memoized using the provided memoizer, although only one gets the additional customization arguments atm (see other open issues on that topic). Tbh I'm not entirely sure why this is necessary - that code has been there for a long time, well before I started actively working on this codebase.

So, I believe two of those comparisons that you're seeing are from the memoize() call on line 115, as we check the actual arguments being passed in. The other two comparisons are from the memoize() call on line 105, where we have confirmed that "yes, the extracted inputs really changed", and now it's time to calculate a new result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants