-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partial Checksum Extension #172
base: 1.1
Are you sure you want to change the base?
Changes from all commits
36e8ad8
e968566
73e4f7e
c5ef4f3
0892d13
32a812f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -568,6 +568,127 @@ Tus-Resumable: 1.0.0 | |
Upload-Offset: 11 | ||
``` | ||
|
||
### Partial Checksum | ||
|
||
This extension allows Clients that use the Checksum extension to verify and confirm the | ||
integrity of chunks that were not received in full by the Server - thereby avoiding that | ||
intact data needs to be sent again. | ||
|
||
Servers that support this extension MUST add `partial-checksum` to the `Tus-Extension` header. | ||
|
||
If a Server receives an incomplete `PATCH` or `POST` request with `Upload-Checksum` header, it | ||
SHOULD store the incompletely received chunk's data (hereafter referred to as *partial data*) in | ||
a separate location instead of discarding it. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another idea for implementation here (this is nitpicking but anyway) is that the server stores the data in the actual file but keeps some kind of marker which bytes are "partial" or "not validated" or similar. This would reduce reduce the number of writes needed for copying the data from one location to another. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The text only says Thinking about it for a bit, the way to implement this with the least filesystem / copy overhead would likely be to simply keep writing all received data to a file, use file truncation as needed to discard data or partial data with bad checksums, and keep track of offsets and partial segments through a separate (JSON) file. Writing that metadata to a chunk at the end of the file could be even more efficient, but could leave you with invalid data if there's a power outage or crash before the metadata chunk could be written to disk. An outdated JSON file, OTOH, would just prompt the client to upload the possibly imcompletely written part of the data again. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's why I said "nitpicking" :) It was meant to just bring other solutions for the record in case people read this and interpret is as "store in a different file on disk" which was my first interpretation. |
||
|
||
If the Server receives a `HEAD` request on the upload thereafter, it SHOULD compute a checksum | ||
on the partial data and then return it, alongside the partial data's start and end offsets relative to | ||
the beginning of the file, in the `Upload-Partial-Checksum` and `Upload-Partial-Checksum-Range` | ||
response headers respectively. | ||
|
||
A Client can then use the range returned by the Server, compute its own checksum for the | ||
same range and compare it against the checksum returned by the Server. | ||
|
||
If the checksums are identical, the Client SHOULD confirm the partial data's integrity to the | ||
Server by sending the checksum and range in the `Upload-Partial-Checksum` and | ||
`Upload-Partial-Checksum-Range` headers of the next `PATCH` request. The `Upload-Offset` | ||
header of the `PATCH` request MUST take the confirmed range into account and represent | ||
the actual offset from where the upload should be resumed. | ||
|
||
If the checksums don't match, the Client MUST continue the upload from the `Upload-Offset` | ||
returned by the `HEAD` request, at which point the Server MAY discard any partial data it may | ||
still be holding. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this section here is very clear. Nicely done! Would this be correct from a servers perspective just to clarify that I understand this correctly? The pretense is that the client started a PATCH request with a Upload-Checksum header and then disconnected:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
If you mean that these headers would be contained in the response to the HEAD request, then: yes, that's exactly how it is meant to work. (The HEAD request itself would not contain those headers.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes sorry! Obviously meant the HEAD response :) |
||
|
||
If a Server receives `Upload-Partial-Checksum` and `Upload-Partial-Checksum-Range` | ||
headers as part of a `PATCH` request and they match the Server's computed results for its stored | ||
partial data for the upload, the Server MUST append the partial data to the upload and move its | ||
internal upload offset forward accordingly, before processing the request any further. | ||
|
||
If the Server has disposed of the partial data or if the Client sends range and checksum values | ||
that do not match those of the Server for the partial data, the Server MUST dispose of the | ||
partial data (if any) and MUST respond with a `409 Conflict` status. | ||
|
||
If the Server receives a `PATCH` request with an `Upload-Offset` other than the end of | ||
the partial data's range, it SHOULD dispose of the partial data. | ||
|
||
Clients receiving a `409 Conflict` status in response SHOULD send a new `HEAD` request to the | ||
Server to determine the upload's status before sending the next `PATCH` request. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just to clarify: This second HEAD request would also include the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, the server would already have to have disposed of the partial data at that point:
With partial data gone, the Server also shouldn't respond with any There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 Makes sense. Thanks for clarifying! |
||
|
||
#### Headers | ||
|
||
##### Upload-Partial-Checksum-Range | ||
|
||
The `Upload-Partial-Checksum-Range` header MUST describe the byte range of the partial | ||
data within the upload, relative to the start of the upload, using this format: `[first byte offset]-[last byte offset]` | ||
|
||
For example `0-499` for the first 500 bytes, `500-999` for the second 500 bytes. | ||
|
||
##### Upload-Partial-Checksum | ||
|
||
The `Upload-Partial-Checksum` header uses the same format as the `Upload-Checksum` | ||
header, but MUST contain the checksum of the partial data. | ||
|
||
#### Example | ||
|
||
The Client tries to append 11 bytes using a `PATCH` request: | ||
|
||
**Request** | ||
|
||
``` | ||
PATCH /files/17f44dbe1c4bace0e18ab850cf2b3a83 HTTP/1.1 | ||
Content-Length: 11 | ||
Upload-Offset: 0 | ||
Tus-Resumable: 1.0.0 | ||
Upload-Checksum: sha1 Kq5sNclPz7QV2+lfQIuc6R7oRu0= | ||
|
||
hello w[connection breaks down] | ||
``` | ||
|
||
Since the connection broke down before the request completed, the Client sends a `HEAD` request to determine from which offset it should resume: | ||
|
||
**Request** | ||
|
||
``` | ||
HEAD /files/17f44dbe1c4bace0e18ab850cf2b3a83 HTTP/1.1 | ||
Tus-Resumable: 1.0.0 | ||
``` | ||
|
||
The Server has kept the partially received data and adds range and checksum information about it to the response: | ||
|
||
**Response** | ||
|
||
``` | ||
HTTP/1.1 200 OK | ||
Upload-Offset: 0 | ||
Upload-Partial-Checksum-Range: 0-6 | ||
Upload-Partial-Checksum: sha1 V2uc6R7+lKq5sfQINclPz7QoRu0= | ||
Tus-Resumable: 1.0.0 | ||
``` | ||
|
||
The Client verifies that the checksum provided by the Server is valid for the returned range, confirms they are identical and resumes the upload from the end of the confirmed range: | ||
|
||
**Request** | ||
|
||
``` | ||
PATCH /files/17f44dbe1c4bace0e18ab850cf2b3a83 HTTP/1.1 | ||
Upload-Offset: 7 | ||
Upload-Partial-Checksum-Range: 0-7 | ||
Upload-Partial-Checksum: sha1 V2uc6R7+lKq5sfQINclPz7QoRu0= | ||
Content-Length: 4 | ||
Tus-Resumable: 1.0.0 | ||
|
||
orld | ||
``` | ||
|
||
The request completes successfully and the Server responds with the new `Upload-Offset`: | ||
|
||
**Response** | ||
|
||
``` | ||
HTTP/1.1 204 No Content | ||
Tus-Resumable: 1.0.0 | ||
Upload-Offset: 11 | ||
``` | ||
|
||
### Termination | ||
|
||
This extension defines a way for the Client to terminate completed and unfinished | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be changed to also allow for requests that are received in full but where the checksum does not match? E.g. if the file has a Upload-Length of 100 bytes with the first 50 already written and the client uploads the next 50 but with an invalid checksum, should we include the headers to let the client re-calculate the checksum and not have to resend the 50? I would argue that this makes the server side implementation easier as we do not have to differentiate between if the client disconnected or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Partial Checksum extension builds on top of the Checksum extension, which stays intact for completed requests.
In your example the
PATCH
request for bytes 50-100 would - to satisfy the Checksum extension requirements - already include anUpload-Checksum
header with the checksum of those 50 bytes.Per the Checksum extension, the Server would check if the checksum for those 50 bytes would match what the Client sent in the
Upload-Checksum
header. If it does not, the Server would respond with a460 Checksum Mismatch
error and discard the 50 bytes. Which I'd argue is the right thing to do when the Checksum for the fully received data doesn't match: the Client should still come up with the same checksum as the one it sent originally - and if it is tolerant to checksum errors, it could probably skip using the Checksum and Partial Checksum extensions altogether.From the server-side, I'd think differentiating between partial/interrupted and fully received requests should be relatively simple, as the server can assume a partial/interrupted request when the received request body's byte count is less than the number in the request's
Content-Length
header - and a fully received request if byte count and header value match.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do now how the checksum extension work. ;) My comment was more to highlight a different angle on how it could be implemented. As long as it's clear how this extension behaves when both this and checksum is supported I don't think there is an issue here. The use case proposed (for invalid checksum) is very niche and just an idea.