-
Notifications
You must be signed in to change notification settings - Fork 288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File locking module data store #3139
Conversation
This adds a test for concurrent cache reads/writes from _different_ caches. This is meant to capture the race condition that occurs when concurrent, independent invocations of the CLI interacts with the cache.
The latest Buf updates on your PR. Results from workflow Buf CI / buf (pull_request).
|
private/bufpkg/bufmodule/bufmodulecache/module_data_provider.go
Outdated
Show resolved
Hide resolved
We might consider landing #3123 before this so we can update to the latest filelock version. It requires Go 1.21+ in all of the more recent releases. |
This adds some module.lock files to tests that read `testdata` directly as a cache.
.../v3/modules/lock/b5/bufbuild.test/bufbot/people/fc7d540124fd42db92511c19a60a1d98/module.lock
Outdated
Show resolved
Hide resolved
Co-authored-by: Philip K. Warren <[email protected]>
data, err := storage.ReadPath(ctx, moduleCacheBucket, externalModuleDataFileName) | ||
p.logDebugModuleKey( | ||
moduleKey, | ||
fmt.Sprintf("module data store put read check %s", externalModuleDataFileName), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this copied from previous code? I don't understand what this is, perhaps it was my fault though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was not -- this is part of the comment above:
// Before writing to the module directory, first get a shared lock and check module.yaml
We are reading the module.yaml
preemptively before attempting to write new data to the cache.
} | ||
return p.bucket.DeleteAll(ctx, dirPath) | ||
} | ||
|
||
func (p *moduleDataStore) putModuleData( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's really difficult to understand what the code in here is trying to accomplish now. We need specific documentation as we go, and comment within explaining what this code block is trying to accomplish. Something like "we're going to first check for X, if that's not the case, we're going to do this overwrite like this, etc etc".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So getModuleDataForModuleKey
currently only does read operations, and each step of that read is documented... we can add more documentation, but I need some clarification on which parts.
As the last comment in getModuleDataForModuleKey
addresses, validity of content is always determined by module.yaml
, and that is always expected to be written last.
Any error and or invalid module.yaml
is returned as an error, and then handled by fetching new data and putting it into the cache. There is nothing being overwritten/changed by getModuleDataForModuleKey
, it is only reading the cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you point me to the documentation? I can't follow what is going on here.
} | ||
return p.bucket.DeleteAll(ctx, dirPath) | ||
} | ||
|
||
func (p *moduleDataStore) putModuleData( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see anywhere where invalid data is cleaned up anymore. Perhaps I missed it. What happens if I have invalid data, say file "a.proto", and the valid data is only the file "b.proto", both written to same cache directory...will "a.proto" not be there anymore? Basically, I don't understand how the deleteInvalidModuleData case is now handled. It appears to only be handled for tar files (getModuleDataForModuleKey
calls bucket.Delete
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the validity of the contents of v3/modules/<digest_type>/<module_key>
is determined by the presence of a valid module.yaml
file. As documented at the end of getModuleDataForModuleKey
, we rely on module.yaml
to be written last.
So the case you're describing, where there is an invalid a.proto
but a valid b.proto
, this occurs because of some interruption to the process while writing a.proto
. In which case, no module.yaml
would be written, this would not be considered valid, and then new data is fetched, etc.
private/buf/bufcli/cache.go
Outdated
// This directory is used to store lock files for synchronizing reading and writing module data from the cache. | ||
// | ||
// Normalized. | ||
v3CacheModuleLockRelDirPath = normalpath.Join("v3", "module_locks") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wellknowntypes
doesn't use underscores, this shouldn't either, for consistency - modulelocks
if err := p.bucket.Delete(ctx, tarPath); err != nil { | ||
return nil, err | ||
} | ||
// Return a path error indicating the module data was not found |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be a PR comment - this should explained in the code for future readers who may question it. It should be clear to a reader at first glance why we are returning a fs.PathError
instead of the error itself from reading the code comments - perhaps this wasn't the case in the past, but it needs to be going forward.
} | ||
}() | ||
// Only attempt to get a file lock when storing individual files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what this means - what is an "individual file"? It should be clear from the code comments as to what this code does. This may take a paragraph or two of comments.
} | ||
return p.bucket.DeleteAll(ctx, dirPath) | ||
} | ||
|
||
func (p *moduleDataStore) putModuleData( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you point me to the documentation? I can't follow what is going on here.
…odule data store layer.
This adds file locking to the cache implementation of
ModuleDataStore
, using amodule.lock
file. This fixes a race condition we had with concurrentreads/writes for module data by concurrent invocations of the
buf
CLI.In this implementation, we are taking a shared lock on
<module_key>/module.lock
when reading from the cache, and taking an exclusive lock for writes.
When clearing out the module directory, the
module.lock
file is deleted lastto ensure that we hold the lock while everything else is being cleared out.
When writing data to the cache, we first take a shared lock and check
module.yaml
. Ifmodule.yaml
contains valid data, then we do not need to do an extraneous write. If not,then we can upgrade to an exclusive lock, check
module.yaml
again, and if it is valid, thenwe do not write, otherwise we proceed to writing the data.
A separate
module.lock
file is used instead of locking on an existing file, e.g.module.yaml
since that is the intended use of the
flock
package, since the file handler is not exposedfrom the lock and Windows prevents the same process from accessing a region on a file it
has taken an exclusive lock on: (https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-lockfileex)
This PR also adds support for setting a default lock timeout and retry delay on
filelock.Locker
usingLockerOption
s when constructing theLocker
. If any timeouts and/or retry delays are passed toLock
throughLockOption
s, then those will take precedence.