Vulkan/OpenGL subpasses that fetch more than single fragment - opengl-es

So, Vulkan introduced subpasses and opengl implelemts similar behaviour with ARM_framebuffer_fetch
In the past, I have used framebuffer_fetch successfully for tonemapping post-effect shaders.
Back then the limitation was that one could only read the contents of the framebuffer at the location of the currently rendered fragment.
Now, what I wonder is whether there is any way by now in Vulkan (or even OpenGL ES) to read from multiple locations (for example to implement a blur kernel) without having a tiled hardware to store/load to RAM.
In theory I guess it should be possible, the first pass wpuld just need to render slightly larger than the blur subpass, based on kernel size (so for example if kernel size was 4 pixels then the tile resolved would need to be 4 pixels smaller than the in-tile buffer sizes) and some pixels would have to be rendered redundantly (on the overlaps of tiles).
Now, is there a way to do that?
I seem to recall having seen some Vulkan instruction related to subpasses that would allow to define the support size (which sounded like what I’m looking for now) but I can’t recall where I saw that.
So my questions:
With Vulkan on a mobile tiled renderer architecture, is it possible to forward-render some geometry and the render a full-screen blur over it, all within a single in-tile pass (without the hardware having to store the result of the intermediate pass to ram first and then load the texture from ram when bluring)? If so, how?
If the answer to 1 is yes, can it also be done in OpenGL ES?

Short answer, no. Vulkan subpasses still have the 1:1 fragment-to-pixel association requirements.

Related

pow2 textures in FBO

Non power of two textures are very slow in OpenGL ES 2.0.
But in every "render-to-texture" tutorial I saw, people just take screen size (which is never pow2), and just make texture from it.
Should I render to pow2 texture (with projection matrix correction), or there is some kind of magic with FBO?
I don't buy into the "non power of two textures are very slow" premise in your question. First of all, these kinds of performance characteristics can be highly hardware dependent. So saying that this is true for ES 2.0 in general does not really make sense.
I also doubt that any GPU architectures developed within the last 5 to 10 years would be significantly slower when rendering to NPOT textures. If there's data that shows otherwise, I would be very interested in seeing it.
Unless you have conclusive data that shows POT textures to be faster for your target platform, I would simply use the natural size for your render targets.
If you're really convinced that you want to use POT textures, you can use glViewport() to render to part of them, as #MaticOblak also points out in a comment.
There's one slight caveat to the above: ES 2.0 has some limitations on how NPOT textures can be used. According to the standard, they do not support mipmapping, and not all wrap modes. The GL_OES_texture_npot extension, which is supported on many devices, gets rid of these limitations.

Many or few textures (performance 3D engine)

I have two modes to continue programming a hexagonal map in this moment, and I don't know what way is better. Maybe you can help me :)
I used a texture to represent the "grid", so the squad with this texture is static and don't move or edit in runtime.
In the first hand, I have a texture with 7700x6736 pixles, however, his size it's only 3.131KB, when I run in a random engine (Unity in this case) the frame rate it's nice (constants 60fps with VSynk and +100 without VSynk)
This texture is associated in one transparent material to the squad (2 triangles)
With the second mode, I have a 14 textures to 550x496 pixels and 21KB. But with this mode, I need 14 squads (28 triangles against 2) and 14 materials with differents textures, against 1 in the other way.
Too, with this second mode, I need asking the distance of every squad to hide or not hide (a simple occlusion culling)
What is the better way in your opinion?
While your 7k texture works on your dev machine it may be not supported in some of the platforms you'll target. I'd use a 2048^2 as a safe maximum, or even a 1024^2.
The second problem is that it may use 3MB as a JPG/PNG compressed file, but in your video memory it will be as an uncompressed one (unless you use some texture-specific compression, but you may have problems with platform support again).
Additionally - you should consider if you really need the Non Power Of Two textures, officially they should be supported ATM, but you can still get into problems on some older hardware.
In general your solution depends on the platforms that you want to target, and especially if you plan to target mobile devices (and which ones).

How to create textures from large images in opengl (bigger than the MAX_TEXTURE_SIZE)

I've found that the maximum texture size that my opengl can support is 8192 but the image that I'm working with is 16997x15931. As you can see in this link, I've completed the class COpenGLControl and customized it for my own use to work with a smaller 7697x7309 image and activated different navigation tasks for it.
Render an outlined red rectangle on top a 2D texture in OpenGL
but now in the last stages of work, I've decided to change the part where applies the texture and enable it to handle images bigger than the size 8192.
Questions:
Is it possible in my opengl?
what concept should I study mipmaps, multiple texturing?
Will it expand performance of code?
Right now my program uses 271 MB of ram for just showing this small image(7697x7309) and I'm going to add a task to it (for image-processing filtering processes) that I have used all my effort to optimize the code but it uses 376 MB of ram for the (7697x7309) image(the code is already written as a console application will be combined with this project). So I think the final project would use up to 700 MB of ram for images near the 7000x7000 size. Obviously for the bigger image (16997x15931 ) the usage of ram will be alot higher!
So I'm looking for a concept to handle images bigger than the MAX_TEXTURE_SIZE and also optimize the performance of the program
More Questions:
What concept should I study in OpenGL to achieve the above goal?
explain alittle about the concept that you suggest?
I've asked the question in Game Developement too but decided to repeat the question here maybe it will have more viewers. As soon as I get the answer, I will delete the question from either on of the sites. So don't worry about multiple questionings.
I will try to sum up my comments for the original question.
know your proper opengl version: maybe you can load some modern extension and work with even the recent version of opengl.
if it is possible you can take a look at Sparse Textures (Mega Textures): ARB_sparse_texture or AMD_sparse_texture
to reduce memory you can use some texture compression:
How to: load DDS files in OpenGL.
another simple idea: you can split the huge texture and create 4 smaller textures (from 16k x 16k into four 8k x 8k) and somehow render four squares.
maybe you can use OpenCL or CUDA to do the work?
regarding mipmaps: it is set of smaller version of your input texture, mipmaps improve performance and final quality of the filtering, but you need another 33% more memory for a texture with full mipmap chain. In your case they could be very helpful. For instance when you look at a wall from a huge distance you do not have to use full (large) texture... only a small version of it is enough. g-truc on mipmaps
In general there is a lot of options, but it depends on your experience what is simpler and fastest to implement.

Issue with NPOT Atlas (C++/iOS) using glTexCoordPointer

My app uses an atlas and reaches parts of it to display items using glTexCoordPointer.
It works well with power-of-two textures, but I wanted to use NPOT to reduce the amount of memory used.
Actually, the picture itself is well loaded with the linear filter and clamp-to-edge wrapping (the content displayed comes from the pic, even with alpha), but the display is deformed.
The coordinates are not the correct ones, and the "shape" is more a trapezoid than a rectangle.
I guessed I had to play with glEnable(), passing GL_TEXTURE_2D in the case of a POT texture, and GL_APPLE_texture_2D_limited_npot in the other case, but I cannot find a way to do so.
Also, I do not have the GL_TEXTURE_RECTANGLE_ARB, I don't know if it is an issue...
Anyone had the same kind of problem ?
Since OpenGL-2 (i.e. for about 10 years) there are no longer constraints on the size of a regular texture. You can use whatever image size you want, it will just work.

How to work with pixels using Direct2D

Could somebody provide an example of an efficient way to work with pixels using Direct2D?
For example, how can I swap all green pixels (RGB = 0x00FF00) with red pixels (RGB = 0xFF0000) on a render target? What is the standard approach? Is it possible to use ID2D1HwndRenderTarget for that? Here I assume using some kind of hardware acceleration. Should I create a different object for direct pixels manipulations?
Using DirectDraw I would use BltFast method on the IDirectDrawSurface7 with logical operation. Is there something similar with Direct2D?
Another task is to generate complex images dynamically where each point location and color is a result of a mathematical function. For the sake of an example let's simplify everything and draw Y = X ^ 2. How to do that with Direct2D? Ultimately I'm going to need to draw complex functions but if somebody could give me a simple example for Y = X ^ 2.
First, it helps to think of ID2D1Bitmap as a "device bitmap". It may or may not live in local, CPU-addressable memory, and it doesn't give you any convenient (or at least fast) way to read/write the pixels from the CPU side of the bus. So approaching from that angle is probably the wrong approach.
What I think you want is a regular WIC bitmap, IWICBitmap, which you can create with IWICImagingFactory::CreateBitmap(). From there you can call Lock() to get at the buffer, and then read/write using pointers and do whatever you want. Then, when you need to draw it on-screen with Direct2D, use ID2D1RenderTarget::CreateBitmap() to create a new device bitmap, or ID2D1Bitmap::CopyFromMemory() to update an existing device bitmap. You can also render into an IWICBitmap by making use of ID2D1Factory::CreateWicBitmapRenderTarget() (not hardware accelerated).
You will not get hardware acceleration for these types of operations. The updated Direct2D in Win8 (should also be available for Win7 eventually) has some spiffy stuff for this but it's rather complex looking.
Rick's answer talks about the methods you can use if you don't care about losing hardware acceleration. I'm focusing on how to accomplish this using a substantial amount of GPU acceleration.
In order to keep your rendering hardware accelerated and to get the best performance, you are going to want to switch from ID2DHwndRenderTarget to using the newer ID2DDevice and ID2DDeviceContext interfaces. It honestly doesn't add that much more logic to your code and the performance benefits are substantial. It also works on Windows 7 with the Platform Update. To summarize the process:
Create a DXGI factory when you create your D2D factory.
Create a D3D11 device and a D2D device to match.
Create a swap chain using your DXGI factory and the D3D device.
Ask the swap chain for its back buffer and wrap it in a D2D bitmap.
Render like before, between calls to BeginDraw() and EndDraw(). Remember to unbind the back buffer and destroy the D2D bitmap wrapping it!
Call Present() on the swap chain to see the results.
Repeat from 4.
Once you've done that, you have unlocked a number of possible solutions. Probably the simplest and most performant way to solve your exact problem (swapping color channels) is to use the color matrix effect as one of the other answers mentioned. It's important to recognize that you need to use the newer ID2DDeviceContext interface rather than the ID2DHwndRenderTarget to get this however. There are lots of other effects that can do more complicated operations if you so choose. Here are some of the most useful ones for simple pixel manipulation:
Color matrix effect
Arithmetic operation
Blend operation
For generally solving the problem of manipulating the pixels directly without dropping hardware acceleration or doing tons of copying, there are two options. The first is to write a pixel shader and wrap it in a completely custom D2D effect. It's more work than just getting the pixel buffer on the CPU and doing old-fashioned bit mashing, but doing it all on the GPU is substantially faster. The D2D effects framework also makes it super simple to reuse your effect for other purposes, combine it with other effects, etc.
For those times when you absolutely have to do CPU pixel manipulation but still want a substantial degree of acceleration, you can manage your own mappable D3D11 textures. For example, you can use staging textures if you want to asynchronously manipulate your texture resources from the CPU. There is another answer that goes into more detail. See ID3D11Texture2D for more information.
The specific issue of swapping all green pixels with red pixels can be addressed via ID2D1Effect as of Windows 8 and Platform Update for Windows 7.
More specifically, Color matrix effect.

Resources