-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Edge Browser Scheduled Job timeout #1752
base: master
Are you sure you want to change the base?
Conversation
7d16ca0
to
8195777
Compare
I would additionally reduce the timeout introduced in #1677 to 10 seconds (+5 seconds from your new timeout) just to make sure that your change here actually covers all failing tests and use cases. If it does then I would expect to see no timeouts hitting 10s, only timeouts after 5s. WDYT? |
Test Results 493 files - 1 493 suites - 1 9m 34s ⏱️ +10s For more details on these errors, see this check. Results for commit 98da9eb. ± Comparison against base commit 081eb9b. This pull request removes 37 tests.
♻️ This comment has been updated with latest results. |
8195777
to
b93ae44
Compare
Alright, I'll add it here. |
04d2500
to
ae4fcac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to have another thorough look at what may cause Edge initialization to block the UI thread. When trying to test the timeout logic, I came across several places where I can introduce blocking operations that will not be caught by this proposal.
When systematically going through the public methods of Edge, I see that the calls to getWebView*()
and isWebView*Availabile()
are the ones that block in case initialization or any of the subsequently scheduled futures do not finish timely.
Then, taking a look at which calls may potentially block inside those methods, I see these two:
- waitForFutureToFinish()
- future.join()
The latter will be captured for the initialization future by the timeout added in this PR. The former will only partly be resolved, as inside waitForFutureToFinish()
there is a call to processNextOSMessage()
, which spins a loop that may be blocking as well. Thus, I think we need to break that potentially blocking operation as well. Maybe we can simply spin Display.readAndDisplay()
in the loop checking for future completion (inside waitForFutureToFinish()
) instead of having the additional loop inside processNextOSMessage()
?
Then, what about the other callers of processNextOSMessage()
? In particular, createEnvironment()
is called synchronously when creating the first Edge instance. May that one block as well? I also see further calls of that method of which I am not sure if they are correct. We need to check them as well.
webViewFuture.orTimeout(1, TimeUnit.MILLISECONDS).exceptionally(exception -> { | ||
// Throw exception on the Display thread directly to prevent CompletableFuture | ||
// to wrap the exception and throw it silently | ||
browser.getDisplay().execute(() -> SWT.error(SWT.ERROR_UNSPECIFIED, exception, "Edge Browser initialization timed out")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case we run into a timeout, the initialization needs to be properly rolled back in terms of the state of Edge
being cleaned up and potentially OS resources to be freed, like in the abortion logic of createControllerInitializationCallback()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have extracted the environment release method and added that here as now we have a webview wrapper which makes sure that we dont have multiple futures dealing with multiple instances of webView but one future initializing everything as a whole. And I have verified, the initialization of webview* happens very quickly after obtaining the webview instance which gives the initialization an atomic behaviour. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally sounds good, but I am sure whether it works like it is implemented now. You will run into the timeout handler and release the environment while the webview initialization is still running asynchronously at the OS. Is that the right thing to do? I would rather expect that to eventually be done when the initialization callback is executed (at some point in time after the callback) or, maybe, after a rather high amount of further waiting time.
webViewFuture.orTimeout(1, TimeUnit.MILLISECONDS).exceptionally(exception -> { | ||
// Throw exception on the Display thread directly to prevent CompletableFuture | ||
// to wrap the exception and throw it silently | ||
browser.getDisplay().execute(() -> SWT.error(SWT.ERROR_UNSPECIFIED, exception, "Edge Browser initialization timed out")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of only throwing an error, should we maybe hide the (uninitialized) browser widget and add a label informing the user about the failed initialization instead? Calling something like this inside the execution on display might do the trick:
private void replaceWithErrorLabel() {
browser.setVisible(false);
Label errorLabel = new Label(browser.getParent(), SWT.WRAP);
errorLabel.setForeground(browser.getDisplay().getSystemColor(SWT.COLOR_RED));
errorLabel.setText("Edge browser initialization failed");
errorLabel.setLocation(0, 0);
errorLabel.setSize(browser.getSize());
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like his idea. But how about we combine this with also throwing an error even though it doesn't pop up in a dialog. The error is atleast visible in the Error Log of the IDE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also if we don't use this execute method here, it will sill throw the error but the error will be wrapped in CompletionException and not SWTException since, the errors in the CompletableFuture are caught implicitly and wrapped with CompletionException. In case of SWT.error as well, I see the exception to be silent and there's no dialog coming up. What do you prefer? Is that okay to not wrap the exception with SWT.error and let it throw the CompletableFuture CompletionException and just replace the browser with a label?
The image above is for when I dont throw a wrapped SWT exception and let the future handle it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would still be in favor of logging some kind of error. Even though the user will primarily see the information in the UI, someone debugging issues may only have a look at the logs, which is why having such an error documented there is reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so replaceBrowser with label and log an error? the error is logged automatically by the future of exception. Do you mean that or should we throw an exception explicitly using, browser.getDisplay().execute(() -> SWT.error(SWT.ERROR_UNSPECIFIED, exception, "Edge Browser initialization timed out"));
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no strong opinion this, but I see two things to consider:
- If possible, we should avoid throwing a future-specific exception to consumers of SWT/Edge and rather provide proper information inside an SWTException to them.
- We should have a look at the where consumers may face that exception and how that conforms to the existing APIs they are using (usually those of
Browser
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if one is interested in the exception, get()
is better than join()
8b86b9b
to
893b4dd
Compare
I evaluated your points and i have implemented everything into a webviewWrapper which is responsible for containing all the webviews and we need just one future (webViewWrapperFuture) now which provides us this wrapper from where all the webview* can be obtained, so now the dependency flow is like this: webViewWrapperFuture -> lastWebViewTask. for every webView*, it will be responsible - in case of a webViewWrapperFuture timeout, everything times out. |
I am wondering if addressing processNextOSMessage is necessary after this, since we are looping until the future is done eventually and it can happen either if the initialization completes or the timeout, so there always exists a path with which future will be done. In terms of temporal logic, this is verified: A F webViewWrapperFuture.isDone() The question about if processNextOSMessage is a blocking anything: what kind of messages are processed in there and what control do we have? Looks like, we are just calling display.readAndDispatch(), which is just processing the messages on the display thread from the OS, if something unrelated to Edge was blocking it here, it could have been blocked anywhere in the platform. And We have all the handlers well defined to handling the Edge messages in the callbacks properly, my concern is about the block:
Other than that, I think we should retest the new implementation and try to see if we still have tests / workspcaes freezing. If no freezes, at least we will have logs for the timeout. |
That sounds good. There are several good proposals in this PR now. Can we please separate into different smaller PRs to ease reviewing and testing? In particular, putting the web views inside a wrapper captured by one future is something that we might do as a simple "cleanup" and preparatory PR for this. |
This code will put the thread to sleep until some OS message is received. If for some reason no message arrives anymore, this will block the whole application. Not sure how likely it is, but I would be in favor of avoiding the risk. As mentioned above, it might be possible to simplify this code anway. |
893b4dd
to
70f68cd
Compare
70f68cd
to
98da9eb
Compare
Can it be that this already happened? I'm looking at the failing tests and I see this thread-dump:
Maybe because the tests produce less OS messages and therefore the issue with the empty queue is more likely to happen? What I'm thinking is: an environment without a mouse (e.g. the test environment) probably generates way less OS events. |
@fedejeanne |
@@ -316,10 +316,23 @@ ICoreWebView2 initializeWebView(ICoreWebView2Controller controller) { | |||
return webView; | |||
} | |||
|
|||
private CompletableFuture<WebViewWrapper> initializeWebViewFutureWithTimeOut() { | |||
CompletableFuture<WebViewWrapper> webViewWrapperFuture = new CompletableFuture<>(); | |||
webViewWrapperFuture.orTimeout(3, TimeUnit.SECONDS).exceptionally(exception -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
webViewWrapperFuture.orTimeout(3, TimeUnit.SECONDS).exceptionally(exception -> { | |
webViewWrapperFuture.orTimeout(3, TimeUnit.SECONDS).exceptionallyAsync(exception -> {...}, browser.getDisplay()) |
Just a general remark here:The usage of Instead one should probably better use a Task-List design here, so that different task are processed one after the other, this could be for example a (single thread) If Beside that, if a browser control fails to initialize, I think it is fine to just show nothing / something broken / whatever to the user so no need to replace it with something else. |
I think the key is actually: What public API methods really require blocking operations. These methods should not use Lines 111 to 121 in 5d535ce
Lines 136 to 140 in 5d535ce
the wake is important to break out the sleep in a timely way! |
Thanks for the feedback! The proposals make sense and revising the design again might be a good thing. But we should move that discussion into a separate issue. This PR is part of efforts to mitigate the risk of freezes / UI thread blocks by using Edge for the upcoming release. For that reason, we need to improve the current design instead of revising the design, which we may defer to the next development cycle. |
@HeikoKlare as a workaround replace all parts that do a
Then you don't need special timeout handling and won't see UI freeze. |
it looks odd, but I think private <T> void waitForFutureToFinish(CompletableFuture<T> future) {
while(!future.isDone()) {
Display display = Display.getCurrent();
future.handle((nil1, nil2) -> {
if (!display.isDisposed()) {
try {
display.wake();
} catch (SWTException e) {
// ignore then, this can happen due to the async nature between our check for
// disposed and the actual call to wake the display can be disposed
}
}
return null;
});
MSG msg = new MSG();
while (!OS.PeekMessage (msg, 0, 0, 0, OS.PM_NOREMOVE)) {
if (!future.isDone()) {
display.sleep();
}
}
display.readAndDispatch();
}
} would probably prevent blocking (future itself does not use (a)sync exec). |
No description provided.