OpenGL rendering & display in different processes [duplicate]

OpenGL rendering & display in different processes [duplicate] - windows

Let's say I have an application A which is responsible for painting stuff on-screen via OpenGL library. For tight integration purposes I would like to let this application A do its job, but render in a FBO or directly in a render buffer and allow an application B to have read-only access to this buffer to handle the display on-screen (basically rendering it as a 2D texture).
It seems FBOs belong to OpenGL contexts and contexts are not shareable between processes. I definitely understand that allowing several processes two mess with the same context is evil. But in my particular case, I think it's reasonable to think it could be pretty safe.
EDIT:
Render size is near full screen, I was thinking of a 2048x2048 32bits buffer (I don't use the alpha channel for now but why not later).

Framebuffer Objects can not be shared between OpenGL contexts, be it that they belong to the same process or not. But textures can be shared and textures can be used as color buffer attachment to a framebuffer objects.
Sharing OpenGL contexts between processes it actually possible if the graphics system provides the API for this job. In the case of X11/GLX it is possible to share indirect rendering contexts between multiple processes. It may be possible in Windows by emplyoing a few really, really crude hacks. MacOS X, no idea how to do this.
So what's probably the easiest to do is using a Pixel Buffer Object to gain performant access to the rendered picture. Then send it over to the other application through shared memory and upload it into a texture there (again through pixel buffer object).

In MacOS,you can use IOSurface to share framebuffer between two application.

In my understanding, you won't be able to share the objects between the process under Windows, unless it's a kernel mode object. Even the shared textures and contexts can create performance hits also it has give you the additional responsibility of syncing the SwapBuffer() calls. Especially under windows platform the OpenGL implementation is notorious.
In my opinion, you can relay on inter-process communication mechanisms like Events, mutex, window messages, pipes to sync the rendering. but just realize that there's a performance consideration on approaching in this way. Kernel mode objects are good but the transition to kernel each time has a cost of 100ms. Which is damns costly for a high performance rendering application. In my opinion you have to reconsider the multi-process rendering design.

On Linux, a solution is to use DMABUF, as explained in this blog: https://blaztinn.gitlab.io/post/dmabuf-texture-sharing/

Related

Rendering in DirectX 11

When frame starts, I do my logical update and render after that.
In my render code I do usual stuff. I set few states, buffors, textures, and end by calling Draw.
m_deviceContext->Draw(
nbVertices,
0);
At frame end I call present to show rendered frame.
// Present the back buffer to the screen since rendering is complete.
if(m_vsync_enabled)
{
// Lock to screen refresh rate.
m_swapChain->Present(1, 0);
}
else
{
// Present as fast as possible.
m_swapChain->Present(0, 0);
}
Usual stuff. Now, when I call Draw, according to MSDN
Draw submits work to the rendering pipeline.
Does it mean that data is send to GPU and main thread (the one called Draw) continues? Or does it wait for rendering to finish?
In my opinion, only Present function should make main thread wait for rendering to finish.

There are a number of calls which can trigger the GPU to start working, Draw being one. Other's include Dispatch, CopyResource, etc. What the MSDN docs are trying to say is that stuff like PSSetShader. IASetPrimitiveTopology, etc. doesn't really do anything until you call Draw.
When you call Present that is taken as an implicit indicator of 'end of frame' but your program can often continue on with setting up rendering calls for the next frame well before the first frame is done and showing. By default, Windows will let you queue up to 3 frames ahead before blocking your CPU thread on the Present call to let the GPU catch-up--in real-time rendering you usually don't want the latency between input and render to be really high.
The fact is, however, that GPU/CPU synchronization is complicated and the Direct3D runtime is also batcning up requests to minimize kernel-call overhead so the actual work could be happing after many Draws are submitted to the command-queue. This old article gives you the flavor of how this works. On modern GPUs, you can also have various memory operations for paging in memory, setting up physical video memory areas, etc.
BTW, all this 'magic' doesn't exist with Direct3D 12 but that means the application has to do everything at the 'right' time to ensure it is both efficient and functional. The programmer is much more directly building up command-queues, triggering work on various pixel and compute GPU engines, and doing all the messy stuff that is handled a little more abstracted and automatically by Direct3 11's runtime. Even still, ultimately the video driver is the one actually talking to the hardware so they can do other kinds of optimizations as well.
The general rules of thumb here to keep in mind:
Creating resources is expensive, especially runtime shader compilation (by HLSL complier) and runtime shader blob optimization (by driver)
Copying resources to the GPU (i.e. loading texture data from the CPU memory) requires bus bandwidth that is limited in supply: Prefer to keep textures, VB, and IB data in Static buffers you reuse.
Copying resources from the GPU (i.e. moving GPU memory to CPU memory) uses a backchannel that is slower than going to the GPU: try to avoid the need for readback from the GPU
Submitting larger chunks of geometry per Draw call helps to amortize overhead (i.e. calling draw once for 10,000 triangles with the same state/shader is much faster than calling draw 10 times for a 1000 triangles each with changing state/shaders between).

Multiple contexts per application vs multiple applications per context

I was wondering whether it is a good idea to create a "system" wide rendering server that is responsible for the rendering of all application elements. Currently, applications usually have their own context, meaning whatever data might be identical across different applications, it will be duplicated in GPU memory and the more frequent resource management calls only decrease the count of usable render calls. From what I understand, the OpenGL execution engine/server itself is sequential/single threaded in design. So technically, everything that might be reused across applications, and especially heavy stuff like bitmap or geometry caches for text and UI, is just clogging the server with unnecessary transfers and memory usage.
Are there any downsides to having a scenegraph shared across multiple applications? Naturally, assuming the correct handling of clients which accidentally freeze.

I was wondering whether it is a good idea to create a "system" wide rendering server that is responsible for the rendering of all application elements.
That depends on the task at hand. A small detour: Take a webbrowser for example, where JavaScript performs manipulations on the DOM; CSS transform and SVG elements define graphical elements. Each JavaScript called in response to an event may run as a separate thread/lighweight process. In a matter of sense the webbrowser is a rendering engine (heck they're internally even called rendering engines) for a whole bunch of applications.
And for that it's a good idea.
And in general display servers are a very good thing. Just have a look at X11, which has an incredible track record. These days Wayland is all the hype, and a lot of people drank the Kool-Aid, but you actually want the abstraction of a display server. However not for the reasons you thought. The main reason to have a display server is to avoid redundant code (not redundant data) and to have only a single entity to deal with the dirty details (color spaces, device physical properties) and provide optimized higher order drawing primitives.
But in regard with the direct use of OpenGL none of those considerations matter:
Currently, applications usually have their own context, meaning whatever data might be identical across different applications,
So? Memory is cheap. And you don't gain performance by coalescing duplicate data, because the only thing that matters for performance is the memory bandwidth required to process this data. But that bandwidth doesn't change because it only depends on the internal structure of the data, which however is unchanged by coalescing.
In fact deduplication creates significant overhead, since when one application made changes, that are not to affect the other application a copy-on-write operation has to be invoked which is not for free, usually means a full copy, which however means that while making the whole copy the memory bandwidth is consumed.
However for a small, selected change in the data of one application, with each application having its own copy the memory bus is blocked for much shorter time.
it will be duplicated in GPU memory and the more frequent resource management calls only decrease the count of usable render calls.
Resource management and rendering normally do not interfere with each other. While the GPU is busy turning scalar values into points, lines and triangles, the driver on the CPU can do the housekeeping. In fact a lot of performance is gained by keeping making the CPU do non-rendering related work while the GPU is busy rendering.
From what I understand, the OpenGL execution engine/server itself is sequential/single threaded in design
Where did you read that? There's no such constraint/requirement on this in the OpenGL specifications and real OpenGL implementations (=drivers) are free to parallelize as much as they want.
just clogging the server with unnecessary transfers and memory usage.
Transfer happens only once, when the data gets loaded. Memory bandwidth consumption is unchanged by deduplication. And memory is so cheap these days, that data deduplication simply isn't worth the effort.
Are there any downsides to having a scenegraph shared across multiple applications? Naturally, assuming the correct handling of clients which accidentally freeze.
I think you completely misunderstand the nature of OpenGL. OpenGL is not a scene graph. There's no scene, there are mo models in OpenGL. Each applications has its own layout of data and eventually this data gets passed into OpenGL to draw pixels onto the screen.
To OpenGL however there are just drawing commands to turn arrays of scalar values into points, lines and triangles on the screen. There's nothing more to it.

EGL/OpenGL ES/switching context is slow

I am developing an OpenGL ES 2.0 application (using angleproject on Windows for developement) that is made up of multiple 'frames'.
Each frame is an isolated application that should not interfere with the surrounding frames. The frames are drawn using OpenGL ES 2.0, by the code running inside of that frame.
My first attempt was to assign a frame buffer to each frame. But there was a problem - OpenGL's internal states are changed while one frame is drawing, and if the next frame doesn't comprehensively reset every known OpenGL state, there could be possible side effects. This defeats my requirement that each frame should be isolated and not affect one another.
My next attempt was to use a context per frame. I created a unique context for each frame. I'm using sharing resources, so that I can eglMakeCurrent to each frame, render each to their own frame buffer/texture, then eglMakeCurrent back to globally, to compose each texture to the final screen.
This does a great job at isolating the instances, however.. eglMakeCurrent is very slow. As little as 4 of them can make it take a second or more to render the screen.
What approach can I take? Is there a way I can either speed up context switching, or avoid context switching by somehow saving the OpenGL state per frame?

I have a suggestion that may eliminate the overhead of eglMakeCurrent while allowing you to use your current approach.
The concept of current EGLContext is thread-local. I suggest creating all contexts in your process's master thread, then create one thread per context, passing one context to each thread. During each thread's initialization, it will call eglMakeCurrent on the context it owns, and never call eglMakeCurrent again. Hopefully, in ANGLE's implementation, the thread-local storage for contexts is implemented efficiently and does not have unnecessary synchronization overhead.

The problem here is trying to do this in a generic platform and OS independent way. If you choose a specific platform, there are good solutions. On Windows, there are the wgl and glut libraries that will give you multiple windows with completely independent OpenGL contexts running concurrently. They are called Windows, not Frames. You could also use DirectX instead of OpenGL. Angle uses DirectX. On linux, the solution is X11 for OpenGL. In either case, it's critical to have quality OpenGL drivers. No Intel Extreme chipset drivers. If you want to do this on Android or iOS, then those require different solutions. There was a recent thread on the Khronos.org OpenGL ES forum about the Android case.

Thread effectiveness for graphical applications

Let's say I'm creating lighting for a scene, using my own shaders. It is a good example of a thing that might be divided between many threads, for example by dividing scene into smaller scenes and rendering them in separate threads.
Should I divide it into threads manually, or is graphic library going to somehow automatically divide such operations? Or is it library dependent (i'm using libgdx, which appears to be using OpenGL). Or maybe there is other reason why I should leave it alone in one thread?
If I should take care of dividing workload between threads manually, how many threads should I use? Is the number of threads in such situation graphic card dependent or processor dependent?

OpenGL does not support multi-threaded rendering since an OpenGL context is only valid on the thread which it is created.
What you could do to potentially gain some performance is separate your update logic and your rendering logic into separate threads. However, you can not leverage multiple threads for OpenGL rendering.

OpenGL multihead vsync with different refresh rates

How do I drive multiple displays (multihead) at different resolutions and refresh rates with OpenGL (on Windows 7) and still be able to share textures between the devices?
I have one multi-head gpu. It drives 4 heads. (It happens to be an AMD FirePro V7900 in case it matters.) The heads all share a "scene" (vertex and texture data, etc.), but I want render this scene each time a vsync occurs on the display (each head is essentially a different viewport). But the catch is that the different heads may be at different refresh rates. For example, some displays may be at 60Hz and some may be at 30Hz and some may be at 24Hz.
When I call SwapBuffers the call blocks, so I can't tell which head needs to be rendered to next. I was hoping for something like Direct3D9's IDirect3DSwapChain9::Present with D3DPRESENT_DONOTWAIT flag, and the associated D3DERR_WASSTILLDRAWING return value. Using that approach, I could determine which head to render to next. By round-robin polling the different heads until one succeeded. But I don't know what the equivalent approach is in OpenGL.
I've already discovered wglSwapIntervalEXT(1) to use vsync. And I can switch between HDC's to render to the different windows with a single HGLRC. But the refresh rate difference is messing me up.
I'm not sure what I can do to have a single HGLRC render all these displays at different refresh rates. I assume it has to be a single HGLRC to make efficient use of shared textures (and other resources)...correct me if I'm wrong. It's not interesting to me if the resources are duplicated with multiple HGLRC's because I would expect that would cut my memory down to 25% (4 heads on 1 GPU: so I don't want 4 copies of any resource.)
I'm open to the idea of using multiple threads, if that's what it takes.
Can someone tell me how to structure my main loop so that I can share resources but still drive the displays at their own refresh rates and resolutions?

You can share OpenGL buffers carrying data by sharing the contexts. The call in Windows is names wglShareLists.
Using that you can give each window it's own rendering context running in a own thread, while all of the contexts share their data. Multiple window V-Sync in fact one of the few cases where multithreaded OpenGL makes sense.

I have not done anything like this before.
Looks like you actaully need multiple threads to get independent refresh rates.
An OpenGL render context can be active in only one thread. One thread can only have one active render context. Therfore with multiple threads you will need multiple render contexts.
It is possible to share resources between OpenGL contexts. With this it is not necessary to store resources multiple times.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio