-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add timeout for long pooling connections and reconnect if connection breaks #32
base: master
Are you sure you want to change the base?
Conversation
The likely cause is the browser timing out the connection if it's sitting idle too long. When this happens you can reconnect at any time just by calling Automatically reconnecting would be nice so that the user doesn't need to worry about it. However, I'm definitely not satisfied with your changes for a few reasons:
What I'd like to see instead is an option for getJSON() and postJSON(), probably as a config object argument, that tells these functions to attempt to try the request again and again until it finally succeeds. The retry should happen pretty much immediately the first time, then there should be a back-off factor that slows down the retries until it's a minute or more in between (maxing out). skewer.getJSON(skewer.host + '/skewer/get', callback, {retry: true}); The retry option could also accept a number instead that limits the number of retries. More properties could control the back-off factor and such. Maybe the callback should even be rolled into the config object as a For detecting when a retry should occur, definitely don't just abort and retry. Rely on events for this, like onreadystatechange, onerror, or maybe even ontimeout (not sure if this ever fires with |
Oh yeah, I also want to say thanks for looking into this! |
I see the point of having error handling logic outside of main skwer() function. I still have some concerns to handling only connection errors. It was working fine for me when I was using apache proxy on top of emacs. It was throwing 502 (proxy Timeout) responses after 3 minutes of idle connection. With the changes i just made it was effectively as algorithm before, maybe except time being longer. This will not work if you restart your http-server or if you are connections through NAT gateways which silently drop inactive connections. I can imagine that in some situations you may get stalled TCP connections in busy Wi-Fi when using skewer. I am mostly concern about mobile device usage, as it is difficult to have console ther and manually call skewer(). I was thinking that on top of retrying on errors, we could have server initiated ping every 2-3 minutes. With that we would get sort of keep-alive packet for routers and proxies on the way. Also it would be useful info for server, so client list would be more up to date. That already may improve connection reliability. After having systematic ping, we could also add watchdog on client, which would reconnect if there was no response within some reasonable time. Alternatively we could have pin initiated by client to keep server side simple. Effect on connection reliability would be the same. The only drawback would be some overhead on wire, but with reasonable timeouts it should not be big issue. Something between 203 minutes seams like reasonable compromise. What are you opinions on implementing that approach? I would be happy to prepare some proof of concept and run some tests in different setups. If you don't like that, the only better solution I can come up with right now is implementing communication channel via WebSocket. In WebSocket keep-alive payload could be kept to minimum, so network overhead would be negligible. The only problem with WS is, that it is not supported on older browser and androids. It would mean that we would have to support roll-back of connection to long-polling as described above, which in turn means more code to support. Oh, there is one more thing. You have suggested to retry last request. I am not sure if it is always the thing to do. On post request i was getting usually error when data was already delivered and we were waiting from command from server. If we keep current logic, that we re-use POST for long-polling as well, we would have to have smarter retry logic. If error was because of timeout or 502 or similar, retry only GET request. If anything else retry original request. Alternatively we could keep them separate and use GET for long polling and POST for delivering messages to server. In that case after each command from server client would send new GET request to resume long polling connection and separate POST request with just payload. In that case POSTs would be always re-requested on errors. I know it gives significant network overhead, but let us keep client logic simpler. I also thing that we should retry on 50x errors, but not on 40x. If access is forbidden we most probably will not get allowed any time soon, contrary to when server is temporarily down. In the second case there is some point to retry after some time. Maybe on 30x we should follow redirect request to other location for full compatibility with HTTP? What do you think? |
Tomasz Szarstuk [email protected] writes:
What happens to the XHR when the TCP connection is silently dropped?
Good point.
Some months ago I also considered adding WebSocket support, making it https://github.com/ahyatt/emacs-websocket I've hesitated to do it because it complicates the end-user
That's a really good point. I hadn't thought of that.
Good idea, I think this is the right direction to go. Requiring two TCP
This is a good idea. When talking to Emacs it should only get a 40x Implementing any of these changes will be risky for Skewer's stability Moving forward I'd like to go with your idea of long-polling only with |
…t amount of keep-alive connections to 2
Sorry, I've been busy so I didn't have a chance to look into this until tonight. Should I take a look at long-polling-reconnect again now? |
Maybe give me few days to finish it up. I got it working well in my environment. I would like to spend some time testing it in different configurations and add original post and get methods for API compatibility. I think it would be best for you to look into proposal it is confirmed to perform well. |
Conflicts: skewer-mode.el skewer.js
…stJSON and getJSON communication methods for compatibility. Queue up messages on JavaScript side and send them in bulk
Sorry for taking long time with thiose changes. I was busy with other projects. I have refactored few bits of my original proposal. After your suggestion I have decided to leave original skewer.getJSON and skewer.postJSON for compatibility and implemented separate, reliable, queued message channel for communicating between JS and Emacs parts of skewer. The solution is higly inspired on BOSH protocol. It keeps 2 persistent connections, from which one is almost always being kept in polling state, so server can send messages as soon as possible. At the same time the other connection is for client to send stuff immediately as it has anything to send. To keep it efficient the polling is on 5 minutes timeouts to force change of tokens. From my experience this is reliable values. Otherwise proxies and browsers may kill pending connection. Also when connection is broken, browser will re-establish it as it becomes available. I had to change a structure inside skewer-clients to have connection id, which is unique per client session (page visit). Only with this we can track and keep close unnecessary pending connections on server. The side effect is that now 'skewer-list-clients' show more reliable info. I guess, that because of massive change since my first proposal, you may find it more convenient to see all the the changes compared to master, rather than previous ones. In my tests the solution seems to be performing very well on desktop and mobile devices, even through not reliable network like 3G. I have already implemented bulk message sending from browser to client. If you like the idea I could add it to messages sent from Emacs to browser. It is improving performance a lot if you have a lot of skewer.log statements in your code. Being able to combine many of those into single request seems to improve performance a lot. I have not notice Emacs to send many messages in short period of time. What do you think? Do you like this approach? Any suggestion for further improvements? |
Thanks for working on this! It will take some time for me to review/test all of it. |
Why did you set a |
I really like this new protocol, especially having a client ID like that. It's looking pretty obvious I didn't know what the heck I was doing before! This is much cleaner now. I took your code and ran with it in my own branch (of the same name). I refactored the client interface into generic functions (EIEIO) and extracted all the BOSH-specific code into skewer-bosh.el. My long-term goal is to add seamless support for WebSocket clients -- and potentially even node.js clients. The core skewer functions will remain unaware of what kind of client they're handling. One significant change was moving the request queue from a global queue to a per-BOSH-client queue. A websocket client doesn't need an explicit queue, just as other implementations probably don't, plus this eliminates some race conditions that were around before. Client keys (which I've renamed to ID's on Emacs' side of things) allow this per-client queue, since it's much better tracked. The client ID maps to a specific EIEIO object, so the channel servlet can always get the associated client object ( One issue that's sort of unresolved is handling disconnects. I made a change that drops the client if it sees its process has disconnected ( Later on I'm going to run it through the gauntlet of browsers I have available to make sure they all work properly, then merge it into master when you've signed off on this. When I eventually add websocket support it will prefer websocket connections, falling back onto BOSH if needed (or if explicitly configured to prefer BOSH). With the BOSH code separated it should be easier to follow the protocol implementation and ensure it's working properly, containing all that stateful logic. Could you take a look at my changes to your code, specifically in skewer-bosh.el? |
One problem I've noticed is that IE8 doesn't support setRequestHeader(), so the client key doesn't work. However, it may just be time to drop IE8 support anyway with XP going away in April. |
I like the way you have separated each communication channel logic very much. Answering your previous question: Keep-Alive header is required to instruct browsers not to close TCP connection after request. The goal is to keep 2 sockets opened for long time, to avoid overhead of establishing TCP connection with every request. The timeout is something which is default for most browser, so we may expect that most of proxies and NAT gateways will honor it. It can be further tuned, possibly subject to customization. Also we want to have 2 active connections so we want browser being able to send answer (make 2nd request) while Emacs is still holding first one (the one waiting for command from server). I need to investigate more the idea of Emacs terminating clients which disconnected. We may need to be more tolerant here. What I liked in original solution was that the requests from Emacs remained in a queue for long time so even when connection was broken, when client has resumed it could pick-up all queued tasks and execute them. This was very useful when working with mobiles or devices over dodgy wi-fi networks where disconnects were quite often. I am even not sure if it is going to be a case any more. Let me play around with that to see how it performs. You may have noticed that I have added support for connection .close() from client side. I found it quite useful when I was working with few browsers, but I wanted to execute some commands only in few of them. I like the idea of having WebSockets support. This could be event better channel for low latency communication. I guess having BOSH fallback make sense, because even if you have support in browser and in Emacs, still some proxies may block you from using it. If you like, I am very happy to help you out with WebSocket implementation, especially with part in a browser. I guess it is a matter for separate discussion/ticket. I am afraid there is no easy workaround for IE8, because even if we pass client key somewhere in a body, still there will be problem with cross domain. Are you OK with dropping IE8 support or should we consider moving client identifier to body? |
How did you arrive at a timeout of 300 seconds? Is that a standard value? I noticed the clients hang around properly for much longer than 5 minutes, so is this automatically retried as needed? I'm curious about what exact problem this header solves -- i.e. what would go wrong if it wasn't present.
How about we extend that timeout to apply here? I'm now keeping track of each client's last-seen time. If the client is disconnected and we haven't seen them in the configurable timeout, then drop the client. I don't want to keep it around forever because it's building up a queue of commands that will never get processed.
Yup, I did. You'll see I continued this in the websockets backend. Do you think it's worth trying to call this when the page is closing?
I actually did most of this last night. You can find it in the websockets branch. If the websocket ever disconnects or fails to connect for any reason, the BOSH method is immediately attempted instead. I was testing both it and BOSH out last night. One issue I noticed is that the Android 4.1 stock browser pretends to support websockets, but does not. The websocket sits in a "connecting" state forever (never triggering the BOSH fallback), making it hard to detect misbehavior. Safari 5 (the latest version Apple actually offers) also has broken websockets and, fortunately, it fails quickly, but it's noisy on Emacs' side from emacs-websocket's warnings.
Yeah, I think it's probably time to do so. While it's officially supported by MS for another 6 years, the most trafficked parts of the Internet don't work with it anymore. |
I have added polling reconnect on periodical timeout and errors.
When ruining long session I come into situations where connection would brake. In those situations browser would still be able to send messages via skewer.log to skewer-repl buffer, bund commands wouldn't make it the other way. I am not entirely if it fixes all issues with dropping connection, but it seamed to help in my cases. What do you thing about this changes ?