Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for creation-defer-length extension #96

Merged
merged 5 commits into from
Dec 11, 2024
Merged

Conversation

Deimvis
Copy link
Contributor

@Deimvis Deimvis commented Aug 3, 2024

Currently, there is no way to declare length to tus server unless it has been declared on upload start (POST request).
So when upload was started somewhere else with "Upload-Defer-Length": "1" header it becomes impossible to continue upload using python client since it will never send "Upload-Length" header to the server.

I attempt to solve it by adding declare_length method to Uploader class that will first HEAD upload_url and if "Upload-Defer-Length": "1" is returned then send empty PATCH request with "Upload-Length" header. Orecautions are needed since the second and subsequent length declarations raise an error.
I also tried to make it efficient as possible and added length_declared keyword argument to BaseUploader that is True by default and can be set to False in order to hint that length should be declared as soon as possible and it will be declared on next PATCH request. I agree, that the code is a bit of a mess, but I haven't found a better way to implement this crucial feature so far.

I also investigated other approaches. For example, I found out that "Upload-Complete" IETF header can be used in order to hint server to finish upload even though length wasn't declared, but I decided that it's a bad idea to mess tus client with IETF headers.

I think that in the future it would be great to support uploading byte stream without seeking to the end to determine its size and declare its size only when last read operation happened and eof occured.

Looking forward to your to your opinion on this issue and its solution @Acconut

@Deimvis Deimvis changed the title Add support to declare length Add support to declare upload length Aug 3, 2024
@Acconut
Copy link
Member

Acconut commented Aug 23, 2024

Apologies for not responding sooner. I hope to look into this in two weeks. I had a brief first look at this and I would prefer a more complete solution. For example, tus-js-client does not have a dedicated declare_length method, but instead allows the user to pass a readable stream. tus-js-client then takes care of setting Upload-Defer-Length: 1 when creating the upload and then later setting Upload-Length once the stream closes and the entire length is known. The end user does not have to get involved with these lower level mechanics. Plus, the risk is removed that the user accidentally declares the wrong length (i.e. more or less than is actually uploaded). What do you think?

@Acconut
Copy link
Member

Acconut commented Aug 23, 2024

I also investigated other approaches. For example, I found out that "Upload-Complete" IETF header can be used in order to hint server to finish upload even though length wasn't declared, but I decided that it's a bad idea to mess tus client with IETF headers.

This IETF work will actually be the basis for a future tus 2.0 (if it's actually needed). But in tus 1.0 we don't have the Upload-Complete header available :)

@Deimvis
Copy link
Contributor Author

Deimvis commented Aug 25, 2024

Apologies for not responding sooner. I hope to look into this in two weeks. I had a brief first look at this and I would prefer a more complete solution. For example, tus-js-client does not have a dedicated declare_length method, but instead allows the user to pass a readable stream. tus-js-client then takes care of setting Upload-Defer-Length: 1 when creating the upload and then later setting Upload-Length once the stream closes and the entire length is known. The end user does not have to get involved with these lower level mechanics. Plus, the risk is removed that the user accidentally declares the wrong length (i.e. more or less than is actually uploaded). What do you think?

Sounds nice! I took a look into js client implementation. It's really what I think is the best approach and what I mentioned in the end of my proposal about not seeking to the end of the reader and calculating upload size only when we are done reading from the given stream. I just hesitated to implement this since it requires a little more changes to the source code, but I will happy to do this. Will be back soon with more js client like implementation. Thanks!

@Deimvis
Copy link
Contributor Author

Deimvis commented Aug 28, 2024

I imlemented upload_length_deferred algorithm similar to the one used in js client. Though, I decided to keep declare_length method in BaseUploader. The rationale is that user may want to declare the length as soon as possible in order to hint the server beforehand about the total upload size and to enable the server to work more efficiently. For example, in my S3 storage implementation for tusd server I have configured the best size for parts in multipart upload and unless I know the total upload size I use heuristics to calculate the part size and it always works less efficiently than in case the server knows about the total upload size in advance.
Using declare_length method will force upload size calculation and will set upload_length_deferred to False.

@Deimvis
Copy link
Contributor Author

Deimvis commented Sep 26, 2024

Looking forward to your opinion :) @Acconut

@Acconut
Copy link
Member

Acconut commented Nov 26, 2024

Though, I decided to keep declare_length method in BaseUploader. The rationale is that user may want to declare the length as soon as possible in order to hint the server beforehand about the total upload size and to enable the server to work more efficiently. For example, in my S3 storage implementation for tusd server I have configured the best size for parts in multipart upload and unless I know the total upload size I use heuristics to calculate the part size and it always works less efficiently than in case the server knows about the total upload size in advance.

In my experience, it is rare that the upload size becomes known while the upload is running. Usually the size is either known upfront (e.g. with a discrete file living on disk) or becomes known when the input stream delivering the file data closes, which is typically also the end of the upload. Is this not the case in your situation?

Just to double check that I understand the code here correctly:

  • If upload_length_deferred is enabled, tus-py-client will include the Upload-Length header in the last PATCH request. It is not necessary to call declare_length
  • A user may call declare_length while the upload is running to tell the server the upload length before the last PATCH request. How does the user actually specify the length since the function doesn't take an argument? Is the user expected to set the property in the class?

To be honest, I am still doubtful whether declare_length is actually useful and worth the additional logic and complexity it adds.

@Deimvis
Copy link
Contributor Author

Deimvis commented Dec 4, 2024

I used this declare_length functionality in case when I gave the client upload_url before knowing anything about the data that will be uploaded, so the initial POST request is made with Upload-Defer-Length header, and as soon as client starts the actual uploading they declare length in order to use efficient part splitting on server side. Combining with that upload_length_deferred parameter assumes that stream is large and won't calculate the total upload size upfront, declare_length is the only option to force calculating the size of the stream and sending it to the server. Honestly, I agree that it looks messy, but I haven't found better solution yet. Maybe you have some ideas? Otherwise, we can remove declare_length function and finally close the PR

To be honest, I am still doubtful whether declare_length is actually useful and worth the additional logic and complexity it adds.

@Acconut
Copy link
Member

Acconut commented Dec 9, 2024

That's an interesting use case, I have never encountered and don't really have an better option. I would suggest the following: Let's remove declare_length from the PR for now and discuss how we can serve this use case in a separate ticket if you want to. Then we can design a good API for this and implement it across all our clients. What do you think?

@Deimvis
Copy link
Contributor Author

Deimvis commented Dec 11, 2024

Sure, thanks

@Acconut Acconut changed the title Add support to declare upload length Add support for creation-defer-length extension Dec 11, 2024
Copy link
Member

@Acconut Acconut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much!

@Acconut Acconut merged commit 0579580 into tus:main Dec 11, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants