-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EGL X11 support #3
Comments
I think it should be double buffering to not wait until X server finishes working with current buffer. But I am not sure how to implement synchronisation that will say that buffer is processed and you can really swap it. Btw Android uses triple buffering. |
X11 also has a sync protocol, and the present protocol has support for it, but it's just a bit more work to actually implement that. |
Currently everything seems to be better, but still has performance close to llvmpipe. GFX wrapper
llvmpipe
|
And glmark2 working on llvmpipe + virpipe working on the same device. llvmpipe
virpipe
|
Interesting thing, without glReadPixels it has perfomance comparable to virpipe (but higher).
But this is contrary to logic, perfomance with glReadPixels should be close to virpipe perfomance. |
Maybe it is somehow related to waiting for rendering. After disabling both glReadPixels and sending xcb_present_pixmap_checked (replaced with
And for comparing I've tested my device with Android's native glmark2 and got this log:
So as far as I see here busy waiting is a BIG problem, so we should use double buffering here and switch buffers on Thank you in advance. |
Also I think we should call |
Also (for fallback) we should not do |
I implemented double buffering now, so we only have to wait for X if the last present took longer than rendering the next frame. Also I'm waiting for pixmap idle events instead now, so the present operation doesn't have to be complete. With that I get ~800fps in es2gears in the emulator. I'll se if I can modify it to only present in the last present finished, so we don't bother X with too many images, maybe that improves it further. |
Yep, not processing the buffer on the CPU when it's not needed has really improved things, now ~85000 fps in the emulator. |
Next I'll try to create a shared pixmap from an DMABUF fd. Can you try glmark again? In the emulator it gives me this error:
|
|
Ok. Now glmark2-es2 reports highest FPS I've seen but the rendering is not smooth. glmark2-es2 log
Videovideo_2023-05-28_21-04-35.mp4Videovideo_2023-05-28_21-04-38.mp4Videovideo_2023-05-28_21-04-40.mp4Videovideo_2023-05-28_21-04-43.mp4Rendering was much more smooth with the old method, but it did not report high FPS. |
Also about rendewrer name. Mesa's virpipe uses |
Ok, I think I'll keep the new method as an option. Is there some way to predict the time window where a new frame has to be submitted for X11? Maybe that could be used to select an appropriate frame. The problem is that when we aren't bombarding X with frames, the time after the last frame completed and the next one is processed on the CPU could be too high with all the format conversion. With a bit of prediction you could select the approximate last frame that could still make it in time to X and render that. Or maybe a proper triple buffering implementation would be better, to fully decouple the display timing from the rendering. The other question is: do you even need 1000s of frames when X can only display 60 of them a second? It's reasonable that eglSwapBuffers can block, and that would also save CPU and GPU resources for everything else. So should VSync be an option, e.g. to old PresentNotify system? If an application needs rendering decoupled from the window system timings, it can use PBuffers or GLES FBOs.
Fixed that. |
Next I'll work on HardwareBuffer rendering, which can eliminate the copy to the pixmap. And maybe also the format change, depending on available HardwareBuffer formats. But doing the format change in a shader instead should also be possible, and may be faster, depending on whether the memory bandwidth is the bottleneck or not. |
Maybe timers. Currently X server draws image (or tries to) every 17 milliseconds (when possible).
I thought about triple buffering. I thought about the following roles:
So when eglSwapBuffers receives
I think that is a target of SwapInterval feature. Currently we have SwapInterval = 0 for tests.
I think you did not.
|
Maybe we can ask Mesa people how it should work? I am pretty sure they know what's better. |
Oh, you meant for GLES, not for EGL. According to EGL spec a swap interval of 1 is the default, so normal vsync. I'll make that the default then when the testing is complete and properly implement I also made the env variable Approximating the time to the next frame could work, finally an opportunity to apply some sliding average function I learned in university lol. |
Wait. I thought |
For some reason eglinfo has segfault.
|
Can you test what the last commit is that works for you? In the emulator it still works. Maybe it's the HardwareBuffer fd extraction test at the start? If it can crash, it would be better to just perform it at install and cache the result (and provide a way to re-run the test, in case a system update changes the behaviour). |
Hmm...
It fails after this line. |
`Interesting thing. Even a few things.
But I see you are returning modified values from wrapper. Another thing.
So it somehow links to termux's libEGL (which is glvnd), then it links to vendor libraries, but in the end it links to Also it dlsyms to |
As far as I can understand eglinfo never worked with gfx-wrapper. I tried to build every single version I found and eglinfo segfaults with all of them. |
The library loading is expected. libglvnd loads all vendor EGL libraries upfront. The wrapper then loads the system EGL, which in turn loads the system and vendor GLES libs. After that Mesa gets loaded. Libglvnd provides all EGL and GLES core functions, which it then dispatches to the current vendor. |
|
termux-packages repo does not contain meson (or I simply did not find it) so I've built eglinfo like this.
|
glad_glGetStringi returns some invalid pointer. |
I think it's because EGL returns an ES3 context because it's backwards compatible with ES2, glad recognizes that and tries to use glGetStringi, which is an ES3 function. I set the reported version to 2.0 now, you can try that fix. For me eglinfo just never tried to display gles info. |
Oh, and the EGL strings aren't wrapped, because eglinfo is specifically requesting the Android platform (saw that in your output just now), not X11. The Android platform is designed to be a passthrough as much as possible, though I guess wrapping these string wouldn't hurt. |
@twaik I finished HardwareBuffer surfaces now but the rendered content won't show up in X11, even when using glFlush. Using memset to fill the DMABUF fd with 0xff displays a white image correctly though. Is it just not working in the emulator, or do you get the same on hardware? |
It works but glmark score is very low.
With idle mode it is lower too.
I'll check what can be wrong here. |
Ok, I do not know what can be wrong here. And can you please make eglSwapBuffer report success to suppress this warning?
|
I added an |
I can give you remote access to my test device if you need. |
I found something: If I lock and unlock the HardwareBuffer, it works also in the emulator. That probably forces all changes to get applied to the mapped buffer. The performance is still low, and interestingly it goes down over time. |
Actually in emulator GraphicBuffer (and AHardwareBuffer) are implemented with memfd or ashmem file descriptor. So the content of the texture is copied back and forth all the time. It is relevant for emulator, but not for real devices. |
Interesting, I thought in the emulator it would also use DMABUF, that explains that. I added the env variable I also added a simple frame time estimation for PBuffer rendering. Could you try glmark again and see if the rendering is smoother? |
I also tried to optimize the HardwareBuffer surfaces a bit, you can see if that helped. Though they should still be upside-down in X, but at least the color should be right with the BGR format. |
Maybe I miss something but in block mode perfomance seems to be same
|
That was just a guess. I made it such that the gl framebuffer and renderbuffer objects get reused if possible instead of being recreated on every
What exactly is broken? The same thing as before? |
I mean idle mode has a bit better perfomance than a block mode. It was much better in earlier commits. |
Maybe in this case would be better to use |
@tareksander I think you should check how mesa's |
Also there is interesting code in |
@tareksander maybe you can implement something like buffer queue? I can integrate it to |
What kind of buffer queue? |
Like in Android. Surfaces in Android have buffer queues with Consumers and Producers. I am not sure I can describe it correctly. https://source.android.com/docs/core/graphics/arch-bq-gralloc . If you can implement using multiple buffers at once with changing buffers on demand it will make everything a bit more faster. |
There is mechanism called SurfaceTexture which implements both consumers and producers. I see the way to use it without JNI (or emulate JNI without JVM) but there is a problem with interacting with server. It will require connecting directly to X server's Binder, but there is no documented way to register broadcast/intent receiver without real Context. |
This fork of libhybris implements x11: https://github.com/gemian/libhybris Hope that it is helpful for you... |
Libhybris implements WSI using custom ANativeWindow implementation. We are avoiding this. |
Hello. Is there any progress? |
I have the dispatch function generator finished now, including EGL extensions. Now I need to override the reported EGL extensions to the actually supported ones and the GLES version to fixed 2.0 for now and I can start testing. I'll copy the X11 implementation form the C version pretty much exactly, but without hardware buffers for now, but I'll add some performance improvements at the cost of GLES spec non-compliance, that can be tuned via env vars. Optimizations:
That may interfere with programs that use glReadPixels, so I want to use env vars to control this. However doing the format conversion directly on the graphics chip should perform better, and better than using several buffers and doing it in stages via post-processing. That is what I'll have to do in Vulkan, since SPIR-V shaders aren't really modifiable easily. |
I'm testing if the Android EGL passthrough works, but the emulator keeps crashing lol. Let's see if restarting my system helps, Nvidia drivers on Linux are a bit flaky sometimes. |
Any crashlogs? |
First
Seems like I have to fix whatever is wrong with the EGL calls, because it causes the host implementation to crash lol. |
Probably that is not very right since in that case you must flip bytes. But you can try to allocate AHardwareBuffer with format 5 (which stands for non-SDK AHARDWAREBUFFER_FORMAT_B8G8R8A8_UNORM). I use it in Termux:X11 and it seems to be supported on all devices. Best choice for our case since it is compatible with GLES textures and does not need separate convert processing. |
I am not so sure but probably combining the new ability of Termux:X11 to use AHardwareBuffers + using AHARDWAREBUFFER_USAGE_CPU_READ_RARELY + AHARDWAREBUFFER_USAGE_CPU_WRITE_NEVER flags (without using AHARDWAREBUFFER_USAGE_CPU_READ_OFTEN or AHARDWAREBUFFER_USAGE_CPU_WRITE_OFTEN) may improve performance. |
eglSwapInterval
for 0 and 1The text was updated successfully, but these errors were encountered: