I created a 1024*1024 texture with
glCompressedTexImage2D(GL_TEXTURE_2D, 0, GL_COMPRESSED_RGBA_PVRTC_4BPPV1_IMG, 1024, 1024, 0, nDataLen*4, pData1);
then update it's first 512*512 part like this
glCompressedTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, 512, 512, GL_COMPRESSED_RGBA_PVRTC_4BPPV1_IMG, nDataLen, pData2);
This update generated glerror 1282(invalid operation), if I update the whole 1024*1024 region all are ok, it seems that pvrtc texture cannot be partial updated.
Is it possible to partial update pvrtc textur, if it is how how?
Sounds to me like you can't on GLES2 (link to spec, see 3.7.3.)
Calling CompressedTexSubImage2D will result in an INVALID_OPERATION error if xoffset or yoffset is not equal to zero, or if width and height do not match the width and height of the texture, respectively. The contents of any texel outside the region modified by the call are undefined. These restrictions may be relaxed for specific compressed internal formats whose images are easily modified
Makes glCompressedTexSubImage2D sound a bit useless to me, tbh, but I guess it's for updating individual mips or texture array levels.
Surprisingly, i copyed a small pvrtc texture data into a large one, it works just like glCompressedTexSubImage2D.But i'am not sure whether it's safe to use this solution in my engine.
Rightly or wrongly, the reason PVRTC1 does not have CompressedTexSubImage2D support is that unlike, say, ETC* or S3TC, the texture data is not compressed as independent 4x4 squares of texels which, in turn, get represented as either 64 or 128 bits of data depending on the format. With ETC*/S3TC any aligned 4x4 block of texels can be replaced without affecting any other region of the texture simply by just replacing its corresponding 64- or 128-bit data block.
With PVRTC1, two aims were to avoid block artifacts and to take advantage of the fact that neighbouring areas are usually very similar and thus can share information. Although the compressed data is grouped into 64-bit units, these affect overlapping areas of texels. In the case of 4bpp they are ~7x7 and for 2bpp, 15x7.
As you later point out, you could copy the data yourself but there may be a fuzzy boundary: For example, I took these 64x64 and 32x32 textures (which have been compressed and decompressed with PVRTC1 #4bpp ) ...
+
and then did the equivalent of "TexSubImage" to get:
As you should be able to see, the border of the smaller texture has smudged as the colour information is shared across the boundaries.
In practice it might not matter but since it doesn't strictly match the requirements of TexSubImage, it's not supported.
PVRTC2 has facilities to do better subimage replacement but is not exposed on at least one well-known platform.
< Unsubtle plug > BTW if you want some more info on texture compression, there is a thread on the Stack Exchange Computer Graphics site < /Unsubtle plug >
Related
I recently found out that one can render alpha-blended primitives correctly not just back-to-front but also front-to-back (http://hacksoflife.blogspot.com/2010/02/alpha-blending-back-to-front-front-to.html) by using GL_ONE_MINUS_DST_ALPHA, GL_ONE, premultiplying the fragment's alpha in the fragment shader and clearing destination alpha to black before rendering.
It occurred to me that it would then be great if one could combine this with EITHER early-z rejection OR some kind of early "destination-alpha testing" in order to discard fragments that won't contribute to the final pixel color.
When rendering with front-to-back alpha-blending, a fragment can be skipped if the destination-alpha at this location already contains the value 1.0.
I did prototype-implement that by using GL_EXT_shader_framebuffer_fetch to test the destination alpha at the start of the pixel shader and then manually discard the fragment if the value is above a certain threshold. That works but it made things actually slower on my test hardware (Snapdragon XR2) - so I wonder:
whether it's somehow possible to not even have the fragment shader execute if destination alpha is already above a certain threshold?
alternatively, if it would be possible to only write to the depth buffer for fragments that are completely opaque and leave the current depth buffer value unchanged for all fragments that have an alpha value of less than 1 (but still depth-test every fragment), that should allow the hardware to use early-z rejection for occluded fragments. So,
Is this possible somehow (i.e. use depth testing, but update the depth buffer value only for opaque fragments and leave it unchanged for others)?
bottom line this would allow to reduce overdraw of alpha-blended sprites to only those fragments that contribute to the final pixel color and I wonder whether there is a performant way of doing this.
For number 2, I think you could modify gl_FragDepth in the fragment shader to achieve something close, but doing so would disable early-z rejection so wouldn't really help.
I think one viable way to reduce overdraw would be to create a tool to generate a mesh for each sprite which aims to cover a decent proportion of the opaque part of the sprite without using too many verts. I imagine for a typical sprite, even just a well placed quad could cover 80%+.
You'd render the generated opaque geometry of your sprites with depth write enabled, and do a second pass the ordinary way with depth testing enabled to cover the transparent parts.
You would massively reduce overdraw, but significantly increase the complexity of your code and number of verts rendered. You would double your draw calls, but if you're atlassing and using texture arrays, you might be doubling from 1 to 2 draw calls which is fine. I've never tried it so can't say if it's worth all the effort that would be involved.
I have this 2d raster upon which are layered from 1 to say 20 other 2d rasters (with random size and offset). I'm searching for fast way to access a sub-rectangle view (with random size and offset). The view should return all the layered pixels for each X and Y coordinate.
I guess this is kind of how say, GIMP or other 2d paint apps draw layers upon each other, with the exception that I want to have all the pixels upon each other, and not just projection where the top pixel hides the other ones below it.
I have met this problem and before and I still do now, spend already a lot time to search around internet and here about similar issues, but can't find any. I will describe two possible solution, both from which I'm not satisfied:
Have a basically 3d array of pre-allocated size. This is easy to manage but the storage wasted and memory overhead is really big. For 4k raster of say 16 slots, 4 bytes each, is like 1 GiB of memory? And in application case, most of that space will be wasted, not used.
My solution which I made before. Have two 2d arrays, one is with indices, the other with actual values. Each "pixel" of the first one says in which range of pixels in the second array you can find the actual pixels contributed from all layers. This is well compressed on size, but any request is bouncing between two memory regions and is a bit hassle to setup, not to mention update (a nice to have feature, but not mandatory).
So... any know-how on such kind of problem? Thank you in advance!
Forgot to add that I'm targeting self-sufficient, preferably single thread, CPU solution. The layers, will be most likely greyscale with alpha (that is, certain pixel data will not existent). Lookup operation is priority, updates like adding/removing a layer can be more slow.
Added by Mark (see comment):
In that image, if taking top-left corner of the red rectangle, a lookup should report red, green, blue and black. If the bottom-right corner is taken, it should report red and black only.
I would store the offsets and size in a data-structure separate from the pixel-data. This way you do not jump around in the memory while you calculate the relative coordinates for each layer (or even if you can ignore some layers).
If you want to access single pixels or small areas rather than iterating big areas a Quad-Tree might be a good idea to store your data with more local memory access while accessing pixels or areas which are near each other (in x or y direction).
I just started learning OpenCL and it seems that Images are sort of Buffers + added goodies like free bilinear sampling.
Is this right, or are there other differences too?
Right, you get things like (hopefully hardware accelerated) bilinear interpolation, edge bounding, mirroring, or held edge color (transparent black). You also get pixel type conversions (image data can be, say half float or 8-bit, but you get float values back; reverse on write). But you trade that for direct access using pointer or array syntax like you have for buffer - all image access is via built-in functions. You might also get texture caching using images, which can accelerate access of nearby pixels.
This question is for OpenGL ES 2.0 (on Android) but may be more general to OpenGL.
Ultimately all performance questions are implementation-dependent, but if anyone can answer this question in general or based on their experience that would be helpful. I'm writing some test code as well.
I have a YUV (12bpp) image I'm loading into a texture and color-converting in my fragment shader. Everything works fine but I'd like to see where I can improve performance (in terms of frames per second).
Currently I'm actually loading three textures for each image - one for the Y component (of type GL_LUMINANCE), one for the U component (of type GL_LUMINANCE and of course 1/4 the size of the Y component), and one for the V component (of type GL_LUMINANCE and of course 1/4 the size of the Y component).
Assuming I can get the YUV pixels in any arrangement (e.g. the U and V in separate planes or interspersed), would it be better to consolidate the three textures into only two or only one? Obviously it's the same number of bytes to push to the GPU no matter how you do it, but maybe with fewer textures there would be less overhead. At the very least, it would use fewer texture units. My ideas:
If the U and V pixels were interspersed with each other, I could load them in a single texture of type GL_LUMINANCE_ALPHA which has two components.
I could load the entire YUV image as a single texture (of type GL_LUMINANCE but 3/2 the size of the image) and then in the fragment shader I could call texture2D() three times on the same texture, doing a bit of arithmetic figure out the correct co-ordinates to pass to texture2D to get the correct texture co-ordinates for the Y, U and V components.
I would combine the data into as few textures as possible. Fewer textures is usually a better option for a few reasons.
Fewer state changes to setup the draw call.
The fewer texture fetches in a fragment shader the better.
Less upload time.
Sources:
I understand some of these are focused on more specific hardware, but the principles apply to most Mobile graphics architectures.
Best Practices for Working with Texture Data
Optimize OpenGL for Tegra
Optimizing performance of a heavy fragment shader
"Binding to a texture takes time for OpenGL ES to process. Apps that reduce the number of changes they make to OpenGL ES state perform better. "
"In my experience mobile GPU performance is roughly proportional to the number of texture2D calls." "There are two texture loads, so the minimum cycle count for the texture sub-unit is two." (Tegra has a texture unit which has to run a cycle for reach texture read)
"making calls to the glTexSubImage and glCopyTexSubImage functions particularly expensive" - upload operations must stall the pipeline until textures are uploaded. It is faster to batch these into a single upload than block a bunch of separate times.
My app uses an atlas and reaches parts of it to display items using glTexCoordPointer.
It works well with power-of-two textures, but I wanted to use NPOT to reduce the amount of memory used.
Actually, the picture itself is well loaded with the linear filter and clamp-to-edge wrapping (the content displayed comes from the pic, even with alpha), but the display is deformed.
The coordinates are not the correct ones, and the "shape" is more a trapezoid than a rectangle.
I guessed I had to play with glEnable(), passing GL_TEXTURE_2D in the case of a POT texture, and GL_APPLE_texture_2D_limited_npot in the other case, but I cannot find a way to do so.
Also, I do not have the GL_TEXTURE_RECTANGLE_ARB, I don't know if it is an issue...
Anyone had the same kind of problem ?
Since OpenGL-2 (i.e. for about 10 years) there are no longer constraints on the size of a regular texture. You can use whatever image size you want, it will just work.