-
Notifications
You must be signed in to change notification settings - Fork 996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't revalidate Python interpreter cache entry with --upgrade
#10361
base: main
Are you sure you want to change the base?
Conversation
We were always revalidating the Python cache entry when using `--upgrade`, since the cache timestamp is too recent. Instead, we ignore the usual caching semantics for packages and only compare the timestamp of the interpreter with the recorded timestamp. This avoids the cost for querying the Python interpreter, and the expense of being slightly inconsistent with `Cache` behaving different for python alone (which is imho acceptable since interpreter metadata is already different from package metadata).
} | ||
|
||
// We ignore the freshness of the cache entry and use only our own logic, otherwise | ||
// `--upgrade` always implies querying Python. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we need to respect --refresh
though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is --refresh
intended to also revalidate the python interpreter querying? Querying Python is kinda costly, so I'd like to find a way to not do it unless we have, e.g. not query python with just --upgrade
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I think it's the only way to revalidate it, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have the internal timestamp check, so when the file changes we revalidate. It's different from packages because python is local, so we're always performing the "revalidation request" by checking the timestamp on the file. Are there scenarios where the python interpreter metadata changes while the timestamp would indicate it's fresh?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I... guess not. I mean, for symlinks yes, but we already don't cache for symlinks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not avoid revalidation with --refresh
, since we're also revalidating the python interpreter by checking the timestamp. It does however not update if the python interpreter changes without the timestamp changing (i don't know how that would be possible except for the user manually changing timestamps, breaking our caching assumptions). This is mirroring the behavior for packages, where we're also trusting the remote with If-Modified-Since
and only reload the full information with --refresh
when either the revalidation request fails or the user clears the cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused. On main, --refresh
will force us to re-query here always. On this branch, --refresh
will no longer re-query here unless the timestamp has changed. So it does avoid refetching the interpreter data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't the interpreter depend on something else that changes?
This is mirroring the behavior for packages, where we're also trusting the remote with If-Modified-Since and only reload the full information with --refresh when either the revalidation request fails or the user clears the cache.
I think this is a reasonable argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't the interpreter depend on something else that changes?
If something like that exists, it would break this PR and i'd change strategies to more aggressively invalidating the python interpreter cache instead. I'm not aware of any such cases, if you have any let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a great example #10832
We were always revalidating the Python interpreter cache entry when using
--upgrade
, since the cache timestamp is too recent. Instead, we ignore the usual caching semantics for packages and only compare the timestamp of the interpreter with the recorded timestamp. This avoids the cost for querying the Python interpreter, at the expense of being slightly inconsistent withCache
behaving different for python alone (which is imho acceptable since interpreter metadata is already different from package metadata).Most of the diff is an indentation change.