-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lost Packet handling #41
Comments
Hmm. So we received the ACK already and sent the DAT, but perhaps the ACK from the client was lost. The reason we don't send a DAT in response to a duplicate ACK is the sorceror's apprentice problem. So how can I safely send a DAT there without introducing that problem? |
I've proposed a pull request for this. "Implement retransmission of packet on peer request". The reported issue is actually due to an incomplete implementation of the protocol as per RFC 1350. The pull request handle this. I've tested it on other clients such as u-boot tftp client and it fixes the issue. |
FWIW, I believe I have seen this same problem when doing U-Boot tftp transfers. Have not tested the PR but could probably give it a whirl to see if it increases robustness when on a congested network. |
Indeed, u-boot tftp client, implements a timeout policy waiting for peer packets, and when a timeout occurs, ask for the expected packet to be resent as described in the protocol. This unimplemented tftpy_server behavior (packet retransmissoin) will not occur when testing the tftpy_server with tftpy own client (tftpy_client) as tftpy_client is simpler and just waits peer packet forever. More precisely, as soon as the client timeout is lower then the tftpy_server timeout, the client will keep sending requests for resending the previous packet. As the server sees traffic on the socket (for packet it wrongly drops), then it itself never gets into a timeout and never resend by itself the packet. At the end the client never receives any more packet from the server and the transmission is completly stucked. |
See: #67 commit: Improve lost DAT handling This fixes the problem without re-introducing the sorceror's apprentice problem. The solution is not to update the timeout counter when a duplicate ACK packet is received, so the server timeout waits for "progress" not "any traffic". |
Hi, It seems to be that I have the U-Boot issue describe by @guillon is tftpy 0.8.1 fix this problem? |
@msoulier please help us getting this issue fixed, it is causing quite a lot of trouble in our CI pipeline. If the fix by @BuhtigithuB is fine, please merge it. |
With this cherry-pick : https://github.com/BuhtigithuB/tftpy/tree/navlrac-windowsize-rfc-7440-support-only Tests pass... So cherry pick succeeded... Although when I add the navlrac tests for it rfc 7440 feature there is 3 tests that start to fail which are not even navlrac's tests... I am a bit puzzled... I think I didn't fetch upstream before attempt the cherry pick :( so I am not up to date... Will try to update my repo and rebase... |
Fetched from Github UI and it didn't end in merge conflict 😅 Can you gave a look? I will push a clone of this branch with the failing tests navlrac tests so you can also have look at the failing tests |
Here the branch with navlrac tests https://github.com/BuhtigithuB/tftpy/tree/navlrac-windowsize-rfc-7440-support-only-with-failling-tests |
Here you have all commits but 2 that seems dated : https://github.com/BuhtigithuB/tftpy/tree/navlrac-windowsize-all-cherry-picks @msoulier : Waiting for your input to make a proper PR that could be merged |
UP! |
Let me know if I can help you any further... |
UP! |
@msoulier do you intent to include navlrac code into tftpy? Is it going to solve the instability issue identified with ubtoo?? Can we expect progress here? I am happy to help more if I can, but please give us some feedback... |
Frankly I need to set aside some time to reproduce this problem in a test case, and then evaluate the fix. I've been very busy. I will try to find some time. |
I can help with testing and printout if I can... I will try the resolved
merge conflict version with our setup if it helps fixing the instability
issue that we have currently with tftpy and uboot...
Thanks for the feedback.
…On Fri, Nov 12, 2021 at 10:07 AM Michael P. Soulier < ***@***.***> wrote:
Frankly I need to set aside some time to reproduce this problem in a test
case, and then evaluate the fix. I've been very busy.
I will try to find some time.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#41 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAMVNRYSENCA5R5EDC4RCR3ULUUSXANCNFSM4ALXJNGQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
I need to take a little time to see if I can automate a test for this. I
think we do something similarly already, so I think so.
Mike
…On 2021-11-12 4:10 p.m., BuhtigithuB wrote:
I can help with testing and printout if I can... I will try the resolved
merge conflict version with our setup if it helps fixing the instability
issue that we have currently with tftpy and uboot...
Thanks for the feedback.
|
When tftpy runs as a server and a data packet is lost, all further requests for the lost packet are just logged as duplicate ACKs by tftpy. If I add
right after
the transfer continues properly after packet loss. I don't know if it is an error of tftpy or of my tftp client (u-boot bootloader), but the problem does not occur with other tftp servers.
Regards,
Thomas
The text was updated successfully, but these errors were encountered: