-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Engines now return FileMeta with correct millisecond timestamps #565
Engines now return FileMeta with correct millisecond timestamps #565
Conversation
Since I'm using the filesystem to get a ground truth for the tests, should I:
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #565 +/- ##
==========================================
- Coverage 82.32% 82.26% -0.06%
==========================================
Files 71 71
Lines 15734 15777 +43
Branches 15734 15777 +43
==========================================
+ Hits 12953 12979 +26
- Misses 2164 2167 +3
- Partials 617 631 +14 ☔ View full report in Codecov by Sentry. |
It seems like we should either make the tests OS-independent, or have OS-specific code for each test so we verify it works? But then again this is only for local filesystem which is of little interest to prod code. The really useful test would validate that object store returns millisecond resolution timestamps, but that would require tests that access cloud storage. |
Does that mean I should remove the tests for now? |
kernel/src/engine/sync/fs_client.rs
Outdated
.ok() | ||
.and_then(|modified| { | ||
modified.duration_since(SystemTime::UNIX_EPOCH).ok() | ||
}) | ||
.and_then(|modified| modified.as_millis().try_into().ok()) | ||
.unwrap_or(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we converting Result to Option three times here? Why is that needed?
.ok() | |
.and_then(|modified| { | |
modified.duration_since(SystemTime::UNIX_EPOCH).ok() | |
}) | |
.and_then(|modified| modified.as_millis().try_into().ok()) | |
.unwrap_or(0); | |
.and_then(|modified| modified.duration_since(SystemTime::UNIX_EPOCH)) | |
.and_then(|modified| modified.as_millis().try_into()) | |
.unwrap_or(0); |
Or do types not match in some annoying way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, all this monadic chaining feels like a poor substitute for a missing helper method...
impl TryFrom<DirEntry> for FileMeta {
type Error = Error;
fn try_from(ent: DirEntry) -> DeltaResult<FileMeta> {
let metadata = ent.metadata()?;
let last_modified = metadata.modified()?.duration_since(SystemTime::UNIX_EPOCH)?;
Ok(FileMeta {
location: Url::from_file_path(ent.path())?,
last_modified: last_modified.as_millis().try_into()?,
size: metadata.len() as usize,
})
}
}
and then
let it = all_ents
.into_iter()
.sorted_by_key(|ent| ent.path())
.map(TryFrom::try_from);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woah that's clean 👌 I'm doing a lot of map_err, and I'm wondering if we should do From
conversions for common things like Url::from_*
and TryInto
🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea seems reasonable? make an issue for us to think about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have the From
conversion for url::ParseError
, so ?
should Just Work? The problem with TryInto
is the error type is parametrized, so there's no way to know in advance what might be needed. We'd have to go case by case (when in doubt, try ?
and hopefully there's already an impl From
for it to use).
kernel/src/engine/sync/fs_client.rs
Outdated
let expected_timestamp = metadata | ||
.modified()? | ||
.duration_since(UNIX_EPOCH)? | ||
.as_millis() | ||
.try_into()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the exact same code the iterator uses... so it's not actually testing for correctness but rather testing that the code is equivalent. Not sure what the better test would be? Maybe we write a file and manually set its mtime, then verify the FileMeta has the same timestamp?
(same criticism applies to the default client test)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe create the file then just assert that the timestamp we get back is within 10s or 60s or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented a test that checks that the filemeta is from the last minute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems good for now :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
// The [`FileMeta`]s must be greater than 1 minute ago | ||
let allowed_time = begin_time - Duration::from_secs(60); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qq: Why do we need such a big safety margin, out of curiosity? All files are PUT after this timestamp, so they should not have smaller timestamps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I did come across a timing issue where it failed even though we put it before. This was when I'd set the bound to 0. I could make the bounds tighter, but I didn't want to make the test remotely flaky
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we also just assert the file_meta time is < current time (and could include a little safety margin as well)? Right now we only have a bound on one end (I don't think this would catch the case of us accidentally returning nanos instead of millis for example)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending one fix!
// The [`FileMeta`]s must be greater than 1 minute ago | ||
let allowed_time = begin_time - Duration::from_secs(60); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we also just assert the file_meta time is < current time (and could include a little safety margin as well)? Right now we only have a bound on one end (I don't think this would catch the case of us accidentally returning nanos instead of millis for example)
What changes are proposed in this pull request?
This PR fixes the
list_from
method used by both theDefaultEngine
and theSyncEngine
to return the correctFileMeta
struct. Previously, both the engines would return the timestamp in number of seconds from Unix Epoch. However,FileMeta
specifies that we should expect the modification time to be in milliseconds since Unix Epoch:This PR affects the following public APIs
This changes the behaviour of the following:
SyncFilesystemClient::list_from
ObjectStoreFileSystemClient::list_from
How was this change tested?
I compare the
FileMeta
returned by each of the engines to expectedFileMeta
within 10 seconds of creation time.