WIP: Edge Browser Scheduled Job timeout #1752

amartya4256 · 2025-01-22T12:00:04Z

No description provided.

fedejeanne · 2025-01-22T12:08:03Z

I would additionally reduce the timeout introduced in #1677 to 10 seconds (+5 seconds from your new timeout) just to make sure that your change here actually covers all failing tests and use cases. If it does then I would expect to see no timeouts hitting 10s, only timeouts after 5s.

WDYT?

github-actions · 2025-01-22T12:12:27Z

Test Results

493 files - 1 493 suites - 1 9m 34s ⏱️ +10s
4 296 tests - 37 4 282 ✅ - 38 10 💤 - 3 0 ❌ ±0 4 🔥 +4
16 539 runs - 35 16 429 ✅ - 37 104 💤 - 4 0 ❌ ±0 6 🔥 +6

For more details on these errors, see this check.

Results for commit 98da9eb. ± Comparison against base commit 081eb9b.

This pull request removes 37 tests.

org.eclipse.swt.graphics.ImageWin32Tests ‑ testImageDataForDifferentFractionalZoomsShouldBeDifferent
org.eclipse.swt.graphics.ImageWin32Tests ‑ testImageShouldHaveDimesionAsPerZoomLevel
org.eclipse.swt.tests.win32.Test_org_eclipse_swt_dnd_DND ‑ testByteArrayTransfer
org.eclipse.swt.tests.win32.Test_org_eclipse_swt_dnd_DND ‑ testFileTransfer
org.eclipse.swt.tests.win32.Test_org_eclipse_swt_dnd_DND ‑ testHtmlTransfer
org.eclipse.swt.tests.win32.Test_org_eclipse_swt_dnd_DND ‑ testImageTransfer_fromCopiedImage
org.eclipse.swt.tests.win32.Test_org_eclipse_swt_dnd_DND ‑ testImageTransfer_fromImage
org.eclipse.swt.tests.win32.Test_org_eclipse_swt_dnd_DND ‑ testImageTransfer_fromImageData
org.eclipse.swt.tests.win32.Test_org_eclipse_swt_dnd_DND ‑ testImageTransfer_fromImageDataFromImage
org.eclipse.swt.tests.win32.Test_org_eclipse_swt_dnd_DND ‑ testRtfTransfer
…

♻️ This comment has been updated with latest results.

amartya4256 · 2025-01-22T12:40:35Z

I would additionally reduce the timeout introduced in #1677 to 10 seconds (+5 seconds from your new timeout) just to make sure that your change here actually covers all failing tests and use cases. If it does then I would expect to see no timeouts hitting 10s, only timeouts after 5s.

WDYT?

Alright, I'll add it here.

HeikoKlare

I think we need to have another thorough look at what may cause Edge initialization to block the UI thread. When trying to test the timeout logic, I came across several places where I can introduce blocking operations that will not be caught by this proposal.

When systematically going through the public methods of Edge, I see that the calls to getWebView*() and isWebView*Availabile() are the ones that block in case initialization or any of the subsequently scheduled futures do not finish timely.
Then, taking a look at which calls may potentially block inside those methods, I see these two:

waitForFutureToFinish()
future.join()

The latter will be captured for the initialization future by the timeout added in this PR. The former will only partly be resolved, as inside waitForFutureToFinish() there is a call to processNextOSMessage(), which spins a loop that may be blocking as well. Thus, I think we need to break that potentially blocking operation as well. Maybe we can simply spin Display.readAndDisplay() in the loop checking for future completion (inside waitForFutureToFinish()) instead of having the additional loop inside processNextOSMessage()?

Then, what about the other callers of processNextOSMessage()? In particular, createEnvironment() is called synchronously when creating the first Edge instance. May that one block as well? I also see further calls of that method of which I am not sure if they are correct. We need to check them as well.

HeikoKlare · 2025-01-26T15:33:46Z

bundles/org.eclipse.swt/Eclipse SWT Browser/win32/org/eclipse/swt/browser/Edge.java

+		webViewFuture.orTimeout(1, TimeUnit.MILLISECONDS).exceptionally(exception -> {
+			// Throw exception on the Display thread directly to prevent CompletableFuture
+			// to wrap the exception and throw it silently
+			browser.getDisplay().execute(() -> SWT.error(SWT.ERROR_UNSPECIFIED, exception, "Edge Browser initialization timed out"));


In case we run into a timeout, the initialization needs to be properly rolled back in terms of the state of Edge being cleaned up and potentially OS resources to be freed, like in the abortion logic of createControllerInitializationCallback().

I have extracted the environment release method and added that here as now we have a webview wrapper which makes sure that we dont have multiple futures dealing with multiple instances of webView but one future initializing everything as a whole. And I have verified, the initialization of webview* happens very quickly after obtaining the webview instance which gives the initialization an atomic behaviour. What do you think?

Generally sounds good, but I am sure whether it works like it is implemented now. You will run into the timeout handler and release the environment while the webview initialization is still running asynchronously at the OS. Is that the right thing to do? I would rather expect that to eventually be done when the initialization callback is executed (at some point in time after the callback) or, maybe, after a rather high amount of further waiting time.

HeikoKlare · 2025-01-26T15:34:59Z

bundles/org.eclipse.swt/Eclipse SWT Browser/win32/org/eclipse/swt/browser/Edge.java

+		webViewFuture.orTimeout(1, TimeUnit.MILLISECONDS).exceptionally(exception -> {
+			// Throw exception on the Display thread directly to prevent CompletableFuture
+			// to wrap the exception and throw it silently
+			browser.getDisplay().execute(() -> SWT.error(SWT.ERROR_UNSPECIFIED, exception, "Edge Browser initialization timed out"));


Instead of only throwing an error, should we maybe hide the (uninitialized) browser widget and add a label informing the user about the failed initialization instead? Calling something like this inside the execution on display might do the trick:

private void replaceWithErrorLabel() { browser.setVisible(false); Label errorLabel = new Label(browser.getParent(), SWT.WRAP); errorLabel.setForeground(browser.getDisplay().getSystemColor(SWT.COLOR_RED)); errorLabel.setText("Edge browser initialization failed"); errorLabel.setLocation(0, 0); errorLabel.setSize(browser.getSize()); }

I really like his idea. But how about we combine this with also throwing an error even though it doesn't pop up in a dialog. The error is atleast visible in the Error Log of the IDE.

Also if we don't use this execute method here, it will sill throw the error but the error will be wrapped in CompletionException and not SWTException since, the errors in the CompletableFuture are caught implicitly and wrapped with CompletionException. In case of SWT.error as well, I see the exception to be silent and there's no dialog coming up. What do you prefer? Is that okay to not wrap the exception with SWT.error and let it throw the CompletableFuture CompletionException and just replace the browser with a label?

The image above is for when I dont throw a wrapped SWT exception and let the future handle it

I would still be in favor of logging some kind of error. Even though the user will primarily see the information in the UI, someone debugging issues may only have a look at the logs, which is why having such an error documented there is reasonable.

so replaceBrowser with label and log an error? the error is logged automatically by the future of exception. Do you mean that or should we throw an exception explicitly using, browser.getDisplay().execute(() -> SWT.error(SWT.ERROR_UNSPECIFIED, exception, "Edge Browser initialization timed out"));?

I have no strong opinion this, but I see two things to consider:

If possible, we should avoid throwing a future-specific exception to consumers of SWT/Edge and rather provide proper information inside an SWTException to them.

We should have a look at the where consumers may face that exception and how that conforms to the existing APIs they are using (usually those of Browser).

if one is interested in the exception, get() is better than join()

amartya4256 · 2025-01-27T13:34:03Z

When systematically going through the public methods of Edge, I see that the calls to getWebView*() and isWebView*Availabile() are the ones that block in case initialization or any of the subsequently scheduled futures do not finish timely. Then, taking a look at which calls may potentially block inside those methods, I see these two:

waitForFutureToFinish()

future.join()

I evaluated your points and i have implemented everything into a webviewWrapper which is responsible for containing all the webviews and we need just one future (webViewWrapperFuture) now which provides us this wrapper from where all the webview* can be obtained, so now the dependency flow is like this: webViewWrapperFuture -> lastWebViewTask. for every webView*, it will be responsible - in case of a webViewWrapperFuture timeout, everything times out.

amartya4256 · 2025-01-27T14:00:12Z

The latter will be captured for the initialization future by the timeout added in this PR. The former will only partly be resolved, as inside waitForFutureToFinish() there is a call to processNextOSMessage(), which spins a loop that may be blocking as well. Thus, I think we need to break that potentially blocking operation as well. Maybe we can simply spin Display.readAndDisplay() in the loop checking for future completion (inside waitForFutureToFinish()) instead of having the additional loop inside processNextOSMessage()?

Then, what about the other callers of processNextOSMessage()? In particular, createEnvironment() is called synchronously when creating the first Edge instance. May that one block as well? I also see further calls of that method of which I am not sure if they are correct. We need to check them as well.

I am wondering if addressing processNextOSMessage is necessary after this, since we are looping until the future is done eventually and it can happen either if the initialization completes or the timeout, so there always exists a path with which future will be done. In terms of temporal logic, this is verified: A F webViewWrapperFuture.isDone()

The question about if processNextOSMessage is a blocking anything: what kind of messages are processed in there and what control do we have? Looks like, we are just calling display.readAndDispatch(), which is just processing the messages on the display thread from the OS, if something unrelated to Edge was blocking it here, it could have been blocked anywhere in the platform. And We have all the handlers well defined to handling the Edge messages in the callbacks properly, my concern is about the block:

while (!OS.PeekMessage (msg, 0, 0, 0, OS.PM_NOREMOVE)) {
		display.sleep();
	}

Other than that, I think we should retest the new implementation and try to see if we still have tests / workspcaes freezing. If no freezes, at least we will have logs for the timeout.

HeikoKlare · 2025-01-27T14:00:42Z

I evaluated your points and i have implemented everything into a webviewWrapper which is responsible for containing all the webviews and we need just one future (webViewWrapperFuture) now which provides us this wrapper from where all the webview* can be obtained, so now the dependency flow is like this: webViewWrapperFuture -> lastWebViewTask. for every webView*, it will be responsible - in case of a webViewWrapperFuture timeout, everything times out.

That sounds good. There are several good proposals in this PR now. Can we please separate into different smaller PRs to ease reviewing and testing? In particular, putting the web views inside a wrapper captured by one future is something that we might do as a simple "cleanup" and preparatory PR for this.

HeikoKlare · 2025-01-27T14:07:20Z

The question about if processNextOSMessage is a blocking anything: what kind of messages are processed in there and what control do we have? Looks like, we are just calling display.readAndDispatch(), which is just processing the messages on the display thread from the OS, if something unrelated to Edge was blocking it here, it could have been blocked anywhere in the platform. And We have all the handlers well defined to handling the Edge messages in the callbacks properly, my concern is about the block:
while (!OS.PeekMessage (msg, 0, 0, 0, OS.PM_NOREMOVE)) {
		display.sleep();
	}

This code will put the thread to sleep until some OS message is received. If for some reason no message arrives anymore, this will block the whole application. Not sure how likely it is, but I would be in favor of avoiding the risk. As mentioned above, it might be possible to simplify this code anway.

fedejeanne · 2025-01-29T12:34:52Z

The question about if processNextOSMessage is a blocking anything: what kind of messages are processed in there and what control do we have? Looks like, we are just calling display.readAndDispatch(), which is just processing the messages on the display thread from the OS, if something unrelated to Edge was blocking it here, it could have been blocked anywhere in the platform. And We have all the handlers well defined to handling the Edge messages in the callbacks properly, my concern is about the block:
while (!OS.PeekMessage (msg, 0, 0, 0, OS.PM_NOREMOVE)) {
		display.sleep();
	}
This code will put the thread to sleep until some OS message is received. If for some reason no message arrives anymore, this will block the whole application. Not sure how likely it is, but I would be in favor of avoiding the risk. As mentioned above, it might be possible to simplify this code anway.

Can it be that this already happened? I'm looking at the failing tests and I see this thread-dump:

"main" prio=5 Id=1 RUNNABLE
	at app//org.eclipse.swt.internal.win32.OS.PeekMessage(Native Method)
	at app//org.eclipse.swt.browser.Edge.processNextOSMessage(Edge.java:481)
	at app//org.eclipse.swt.browser.Edge$WebViewProvider.waitForFutureToFinish(Edge.java:460)
	at app//org.eclipse.swt.browser.Edge$WebViewProvider.getWebView(Edge.java:383)
	at app//org.eclipse.swt.browser.Edge.getUrl(Edge.java:910)
	at app//org.eclipse.swt.browser.Browser.getUrl(Browser.java:771)
	at app//org.eclipse.swt.tests.junit.Test_org_eclipse_swt_browser_Browser.createBrowser(Test_org_eclipse_swt_browser_Browser.java:309)
	at app//org.eclipse.swt.tests.junit.Test_org_eclipse_swt_browser_Browser.setUp(Test_org_eclipse_swt_browser_Browser.java:185)
	...

Maybe because the tests produce less OS messages and therefore the issue with the empty queue is more likely to happen? What I'm thinking is: an environment without a mouse (e.g. the test environment) probably generates way less OS events.

laeubi · 2025-01-29T12:38:22Z

@fedejeanne Display.sleep() is very dangerous and you are correct that it can sleep "forever" (many seconds on linux if you don't move the mouse).

laeubi · 2025-01-29T12:40:40Z

bundles/org.eclipse.swt/Eclipse SWT Browser/win32/org/eclipse/swt/browser/Edge.java

@@ -316,10 +316,23 @@ ICoreWebView2 initializeWebView(ICoreWebView2Controller controller) {
 		return webView;
 	}

+	private CompletableFuture<WebViewWrapper> initializeWebViewFutureWithTimeOut() {
+		CompletableFuture<WebViewWrapper> webViewWrapperFuture = new CompletableFuture<>();
+		webViewWrapperFuture.orTimeout(3, TimeUnit.SECONDS).exceptionally(exception -> {


Suggested change

webViewWrapperFuture.orTimeout(3, TimeUnit.SECONDS).exceptionally(exception -> {

webViewWrapperFuture.orTimeout(3, TimeUnit.SECONDS).exceptionallyAsync(exception -> {...}, browser.getDisplay())

laeubi · 2025-01-29T12:50:39Z

Just a general remark here:

The usage of CompleteableFuture here (even though very convenient and powerful) does not seem to be the proper way when looking at the overall design here.

Instead one should probably better use a Task-List design here, so that different task are processed one after the other, this could be for example a (single thread) ExecutorService where one submit(...) tasks, or a Queue that is processed by a special thread (e.g. if SWT thread is needed) unless it is empty.

If ExecutorService is used then one can also combine it with CompleteableFuture, but then it should be used in an event driven way without ever using a join().

Beside that, if a browser control fails to initialize, I think it is fine to just show nothing / something broken / whatever to the user so no need to replace it with something else.

laeubi · 2025-01-29T12:57:22Z

When systematically going through the public methods of Edge

I think the key is actually: What public API methods really require blocking operations.

These methods should not use join() at all, they must use a busy loop that drives the event queue while waiting on this similar to what we do here (stage == future!):

eclipse.platform.swt/bundles/org.eclipse.swt/Eclipse SWT Custom Widgets/common/org/eclipse/swt/custom/BusyIndicator.java

Lines 111 to 121 in 5d535ce

    
           stage.handle((nil1, nil2) -> { 
        
           	if (!display.isDisposed()) { 
        
           		try { 
        
           			display.wake(); 
        
           		} catch (SWTException e) { 
        
           			// ignore then, this can happen due to the async nature between our check for 
        
           			// disposed and the actual call to wake the display can be disposed 
        
           		} 
        
           	} 
        
           	return null; 
        
           });

eclipse.platform.swt/bundles/org.eclipse.swt/Eclipse SWT Custom Widgets/common/org/eclipse/swt/custom/BusyIndicator.java

Lines 136 to 140 in 5d535ce

    
           while (!future.isDone() && !display.isDisposed()) { 
        
           	if (!display.readAndDispatch()) { 
        
           		display.sleep(); 
        
           	} 
        
           }

the wake is important to break out the sleep in a timely way!

HeikoKlare · 2025-01-29T13:09:38Z

Thanks for the feedback! The proposals make sense and revising the design again might be a good thing. But we should move that discussion into a separate issue. This PR is part of efforts to mitigate the risk of freezes / UI thread blocks by using Edge for the upcoming release. For that reason, we need to improve the current design instead of revising the design, which we may defer to the next development cycle.

laeubi · 2025-01-29T13:13:57Z

@HeikoKlare as a workaround replace all parts that do a join() on the event thread with this code:

public static void waitForFuture(CompletableFuture<?> future, Display display) {
	if (!future.isDone()) {
		future.handle((nil1, nil2) -> {
			if (!display.isDisposed()) {
				try {
					display.wake();
				} catch (SWTException e) {
					// ignore then, this can happen due to the async nature between our check for
					// disposed and the actual call to wake the display can be disposed
				}
			}
			return null;
		});
		while (!future.isDone() && !display.isDisposed()) {
		 	if (!display.readAndDispatch()) {
		 		display.sleep();
		 	}
		 }
	}
}

Then you don't need special timeout handling and won't see UI freeze.

laeubi · 2025-01-29T13:37:15Z

it looks odd, but I think

private <T> void waitForFutureToFinish(CompletableFuture<T> future) {
	while(!future.isDone()) {
		Display display = Display.getCurrent();
		future.handle((nil1, nil2) -> {
			if (!display.isDisposed()) {
				try {
					display.wake();
				} catch (SWTException e) {
					// ignore then, this can happen due to the async nature between our check for
					// disposed and the actual call to wake the display can be disposed
				}
			}
			return null;
		});
		MSG msg = new MSG();
		while (!OS.PeekMessage (msg, 0, 0, 0, OS.PM_NOREMOVE)) {
			if (!future.isDone()) {
				display.sleep();
			}
		}
		display.readAndDispatch();
	}
}

would probably prevent blocking (future itself does not use (a)sync exec).

amartya4256 force-pushed the edge_timeout branch 2 times, most recently from 7d16ca0 to 8195777 Compare January 22, 2025 12:01

amartya4256 force-pushed the edge_timeout branch from 8195777 to b93ae44 Compare January 22, 2025 12:39

amartya4256 force-pushed the edge_timeout branch 2 times, most recently from 04d2500 to ae4fcac Compare January 24, 2025 15:34

HeikoKlare reviewed Jan 26, 2025

View reviewed changes

amartya4256 linked an issue Jan 26, 2025 that may be closed by this pull request

Application freezes due to using Edge vi-eclipse/Eclipse-Platform#187

Open

amartya4256 force-pushed the edge_timeout branch 2 times, most recently from 8b86b9b to 893b4dd Compare January 27, 2025 13:25

amartya4256 force-pushed the edge_timeout branch from 893b4dd to 70f68cd Compare January 27, 2025 16:53

This was referenced Jan 27, 2025

Edge: Freeze in Browser#evaluate() if evaluated script calls into SWT #1771

Open

[Win32] Summarize ICoreWebView2* instances in WebViewWrapper #1770

Merged

WIP: Edge Browser Scheduled Job timeout

98da9eb

HeikoKlare force-pushed the edge_timeout branch from 70f68cd to 98da9eb Compare January 28, 2025 09:38

laeubi reviewed Jan 29, 2025

View reviewed changes

HeikoKlare mentioned this pull request Jan 29, 2025

Application freezes due to using Edge vi-eclipse/Eclipse-Platform#187

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Edge Browser Scheduled Job timeout #1752

WIP: Edge Browser Scheduled Job timeout #1752

amartya4256 commented Jan 22, 2025

fedejeanne commented Jan 22, 2025 •

edited

Loading

github-actions bot commented Jan 22, 2025 •

edited

Loading

amartya4256 commented Jan 22, 2025

HeikoKlare left a comment

HeikoKlare Jan 26, 2025

amartya4256 Jan 27, 2025

HeikoKlare Jan 27, 2025

HeikoKlare Jan 26, 2025

amartya4256 Jan 27, 2025

amartya4256 Jan 27, 2025

HeikoKlare Jan 27, 2025

amartya4256 Jan 27, 2025 •

edited

Loading

HeikoKlare Jan 27, 2025

laeubi Jan 29, 2025

amartya4256 commented Jan 27, 2025

amartya4256 commented Jan 27, 2025

HeikoKlare commented Jan 27, 2025

HeikoKlare commented Jan 27, 2025

fedejeanne commented Jan 29, 2025

laeubi commented Jan 29, 2025

laeubi Jan 29, 2025

laeubi commented Jan 29, 2025 •

edited

Loading

laeubi commented Jan 29, 2025 •

edited

Loading

HeikoKlare commented Jan 29, 2025

laeubi commented Jan 29, 2025

laeubi commented Jan 29, 2025 •

edited by fedejeanne

Loading

	webViewWrapperFuture.orTimeout(3, TimeUnit.SECONDS).exceptionally(exception -> {
	webViewWrapperFuture.orTimeout(3, TimeUnit.SECONDS).exceptionallyAsync(exception -> {...}, browser.getDisplay())

WIP: Edge Browser Scheduled Job timeout #1752

Are you sure you want to change the base?

WIP: Edge Browser Scheduled Job timeout #1752

Conversation

amartya4256 commented Jan 22, 2025

fedejeanne commented Jan 22, 2025 • edited Loading

github-actions bot commented Jan 22, 2025 • edited Loading

Test Results

amartya4256 commented Jan 22, 2025

HeikoKlare left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amartya4256 Jan 27, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amartya4256 commented Jan 27, 2025

amartya4256 commented Jan 27, 2025

HeikoKlare commented Jan 27, 2025

HeikoKlare commented Jan 27, 2025

fedejeanne commented Jan 29, 2025

laeubi commented Jan 29, 2025

Choose a reason for hiding this comment

laeubi commented Jan 29, 2025 • edited Loading

Just a general remark here:

laeubi commented Jan 29, 2025 • edited Loading

HeikoKlare commented Jan 29, 2025

laeubi commented Jan 29, 2025

laeubi commented Jan 29, 2025 • edited by fedejeanne Loading

fedejeanne commented Jan 22, 2025 •

edited

Loading

github-actions bot commented Jan 22, 2025 •

edited

Loading

amartya4256 Jan 27, 2025 •

edited

Loading

laeubi commented Jan 29, 2025 •

edited

Loading

laeubi commented Jan 29, 2025 •

edited

Loading

laeubi commented Jan 29, 2025 •

edited by fedejeanne

Loading