-
-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vector caching. #3292
base: main
Are you sure you want to change the base?
Vector caching. #3292
Conversation
@@ -628,9 +657,33 @@ pgVector_NEW(Py_ssize_t dim) | |||
static void | |||
vector_dealloc(pgVector *self) | |||
{ | |||
if (pgVector_CheckExact(self) && | |||
PyObject_CallFinalizerFromDealloc((PyObject *)self) < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see no mention of this function in the C API docs. What does it do exactly?
Also, I think there is a possibility of introducing a memory leak here. What happens when the cache is full? I would expect vectors to be dealloc'd normally in that case, but it looks like those vectors will be leaked in this implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see no mention of this function in the C API docs. What does it do exactly?
Also, I think there is a possibility of introducing a memory leak here. What happens when the cache is full?
PyObject_CallFinalizerFromDealloc
returns non-zero value only when object is resurrected [1] (the ref-count is increased while finalizing, which only happens when cache is not full - _VEC_CACHE_ADD
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit skeptical on using undocumented Python C API, but for now I think it's okay
@@ -111,6 +115,31 @@ typedef struct { | |||
PyObject_HEAD pgVector *vec; | |||
} vector_elementwiseproxy; | |||
|
|||
#define VECTOR_CACHE_SIZE 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why this is 100, I mean, I would expect it to be bigger.
To be fair, these numbers are hard to get right though. I would expect numbers like this to be backed up by empirical test data. A simple test app that uses vectors extensively would be a good start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's true. I don't have experience in that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if you could write a simple benchmark that demonstrates the exact performance enhancements this approach can give.
Py_TYPE(self)->tp_free((PyObject *)self); | ||
} | ||
|
||
static void | ||
vector_finalize(PyObject *self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this logic kept in a separate finalize function, is there a particular reason it should not be a part of dealloc
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's used for tp_finalize
, not tp_dealloc
in PyTypeObject
struct. "Resurrecting" an objects seems to be allowed only in tp_finalize
and not tp_dealloc
. I have to explicitly call PyObject_CallFinalizerFromDealloc
due to quirkiness of default CPython deallocator (for static types - heap types and python types don't have this issue), which just doesn't call it (PyObject_CallFinalizerFromDealloc
handles the logic behind tp_finalize
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh okay, yeah that makes sense
As in the title. Benchmarks would be nice if anyone has time for doing them.