GL_REPEAT vs a High Resolution Image? - performance

If I've got a low-resolution texture with a bunch of dots in it which if I did GL_REPEAT
It'd be unnoticeable. then Is it advisable to use GL_REPEAT by specifying higher texture coordinates to repeat this texture or just use a high-resolution one with all dots I need? (GPU Performance)

You will always get better performance with a smaller texture. If the texture repeats itself and there are no architectural reasons to make it bigger, use the smaller version. Sampling a smaller texture accesses less memory so it is more likely that most (or all) accesses will fall in the GPU's texture cache.

Related

Is a large MTLTexture more performant than using multiple smaller MTLTextures?

I'm using Metal to display an image into a 2D plane. The image is being rendered as tiles to an IOSurface using Core Image. After each tile is rendered, the IOSurface is sent via XPC to the app, wrapped in an MTLTexture and its contents are copied to a master texture using a blit encoder. The IOSurface is then re-used if need-be.
I have full control over the tile sizes and texture sizes and I'm wondering if Metal prefers having a small number of large textures, a large number of small textures or if it just doesn't really matter.
There are some tradeoffs that I've already come across and the most notable one is that if I use smaller textures then the cost to (re)generate the mipmaps can be smaller. There's also the issue of textures having a maximum size of 16384, which implies that any image larger than that needs to be tiled anyways.
Consider the following:
Image dimensions used below are just for easy math, in real life I'm working with DSLR images, panoramas, stitched images, etc..
#1 - A single texture that covers the entire image:
Texture Type: MTLTextureType2D
Texture Size: 400 x 300 px
When Core Image renders a tile and I copy it into the texture, I have to regenerate the mipmaps for the entire texture, even though most of the texture's content did not change.
#2 - Using two textures:
Texture Type: MTLTextureType2DArray
Texture Array Count: 2
Texture Size: 200 x 300 px (x2)
With two tiles covering the image's content, I only have to regenerate the mipmaps for the "dirty" tile.
#3 - Using many textures:
Texture Type: MTLTextureType2DArray
Texture Array Length: 12
Texture Size: 100 x 100 px (x12)
In this scenario, I can minimize mipmap generation by matching the texture tile size to my rendering tile size, but it will require a large number of MTLTextures.
MTLTextureDescriptor.arrayLength is documented as being able to hold values between 1..2048, which suggests to me that using a large number of textures isn't such a bad thing.
A single texture is passed to my fragment shader and all it does is sample the color at the appropriate coordinates.
Using smaller-sized textures gives me a lot more fidelity in marking the "dirty" regions of the image that need invalidating, but I'm curious if the large number of textures is to be avoided or not.
My current attempts at measuring the performance are somewhat inclusive to me and I wonder if that's because this doesn't matter at all from Metal's perspective. Ultimately, the same amount of memory is needed (from my perspective) but I'd be interested to know if there are performance trade-offs I'm not aware of.

THREE.js How to increase texture quality

What are the possible and good ways/best practices/etc to improve texture quality in THREE.js?
I have a scene where I have planes(cards) with 512x512px textures. How it looks you can see on images below. My problem is that textures looks blurred. I have tried to change filters and value of anisotropy and it helps, but just a little and texture still blurred. The only one way that I found when texture looks like I want - increase render size x2 and keep canvas size the same. It is bad way because of performance issues, but I don't find another way to get good texture quality.
The best quality - render size x2
Normal quality - magFilter = minFilter = THREE.LinearMipMapLinearFilter /anisotropy = 16
Bad quality - no filters
I hope for any help, thanks in advance
You hardly can do better than trilinear filtering with 16x anisotropic (and not all hardwares can achieve 16x anisotropic filtering).
However, you say your textures are 512x512, while (if your snapshots are real-size) it appear clear that:
they are rendered way smaller thant 512x512. It mean this is currently a lower mipmap level that is used to render your cardes, a mipmap generated by WebGL.
Your cards are rectangular while your textures are square. Depending how you mapped texture on your shape, this could mean the aspect-ratio change, so the sampler need to do some more interpolation (so filtering, meaning more blur)
So what you can try to do, is to:
use smaller base texture, 256x256 for example, which you done yourself with the best sharpness you can, so no min-filter is needed while WebGL sample the texture.
Adapt the mesh texture coordinates to your texture or vice versa to avoid aspect-ratio changes during texture sampling.

Copying a non-multisampled FBO to a multisampled one

I have been trying to implement a render to texture approach in our application that uses GLES 3 and I have got it working but I am a little disappointed with the frame rate drop.
So far we have been rendering directly to the main FBO, which has been a multisampled FBO created using EGL_SAMPLES=8.
What I want basically is to be able to get a hold of the pixels that have been already drawn, while I'm still drawing. So I thought a render to texture approach should do it. Then I'd just read a section of the off-screen FBO's texture whenever I want and when I'm done rendering to it I'll blit the whole thing to the main FBO.
Digging into this I found I had to implement a system with a multisampled FBO as well as a non-multisampled textured FBO in which I had to resolve the multisampled one. Then just blit the resolved to the main FBO.
This all works but the problem is that by using the above system and a non-multisampled main FBO (EGL_SAMPLES=0) I get quite a big frame rate drop compared to the frame rate I get when I use just the main FBO with EGL_SAMPLES=8.
Digging a bit more into this I found people reporting online as well as a post here https://community.arm.com/thread/6925 that says that the fastest approach to multisampling is to use EGL_SAMPLES. And indeed that's what it looks like on the jetson tk1 too which is our target board.
Which finally leads me to the question, and apologies for the long introduction:
Is there any way that I can design this to use a non-multisampled off-screen fbo for all the rendering that eventually is blitted to a main multisampled FBO that uses EGL_SAMPLES?
The only point of MSAA is to anti-alias geometry edges. It only provides benefit if multiple triangle edges appear in the same pixel. For rendering pipelines which are doing multiple off-screen passes you want to enable multiple samples for the off-screen passes which contain your geometry (normally one of the early passes in the pipeline, before any post-processing effects).
Applying MSAA at the end of the pipeline on the final blit will provide zero benefit, and probably isn't free (it will be close to free on tile-based renderers like IMG Series 6 and Mali (the blog you linked), less free on immediate-mode renders like the Nvidia in your Jetson board).
Note for off-screen anti-aliasing the "standard" approach is rendering to an MSAA framebuffer, and then resolving as a second pass (e.g. using glBlitFramebuffer to blit into a single sampled buffer). This bounce is inefficient on many architectures, so this extension exists to help:
https://www.khronos.org/registry/gles/extensions/EXT/EXT_multisampled_render_to_texture.txt
Effectively this provides the same implicit resolve as the EGL window surface functionality.
Answers to your questions in the comments.
Is the resulting texture a multisampled texture in that case?
From the application point of view, no. The multisampled data is inside an implicitly allocated buffer, allocated by the driver. See this bit of the spec:
"The implementation allocates an implicit multisample buffer with TEXTURE_SAMPLES_EXT samples and the same internalformat, width, and height as the specified texture level."
This may require a real MSAA buffer allocation in main memory on some GPU architectures (and so be no faster than the manual glBlitFramebuffer approach without the extension), but is known to be effectively free on others (i.e. tile-based GPUs where the implicit "buffer" is a small RAM inside the GPU, and not in main memory at all).
The goal is to blur the background behind widgets
MSAA is not in any way a general purpose blur - it only anti-aliases the pixels which are coincident with edges of triangles. If you want to blur triangle faces you'd be better off just using a separable gaussian blur implemented as a pair of fragment shaders, and implement it as a 2D post-processing pass.
Is there any way that I can design this to use a non-multisampled off-screen fbo for all the rendering that eventually is blitted to a main multisampled FBO that uses EGL_SAMPLES?
Not in any way which is genuinely useful.
Framebuffer blitting does allow blits from single-sampled buffers to multisample buffers. But all that does is give every sample within a pixel the same value as the one from the source.
Blitting cannot generate new information. So you won't get any actual antialiasing. All you will get is the same data stored in a much less efficient way.

Is drawing outside the viewport in OpenGL expensive?

I have several thousand quads to draw, some of which might fall entirely outside the viewport. I could write code which will detect which quads fall wholly outside viewport and ask OpenGL to draw only those which will be at least partially visible. Alternatively, I could simply have OpenGL draw all of the quads, regardless of whether they intersect with the viewport.
I don't have enough experience with OpenGL to know if one of these is obviously better (or if OpenGL offers some quick viewport intersection test I can use). Are draws outside the viewport close to being no-ops, or are they expensive enough that I should I try to avoid them?
It depends on your circumstances.
Drawing is best done in batches, preferably batches that are static in structure (ie: each batch is drawn in its entirety). So you shouldn't be culling down at the quad level. But doing some culling of large groups of quads is not unwelcome.
The primary performance that you'll lose is vertex transform (aka: your vertex shader). A vertex shader has to be run on every vertex you provide, regardless of anything else. However, hardware will discard triangles that are trivially outside of the viewport, so you won't soak up any fillrate or other performance.
However, that doesn't mean that it's OK if your vertex T&L is cheap. Rendering large blocks of triangles that aren't visible may very well stall the rasterizer, because all of the triangles are being culled. That is, if you draw a lot of stuff that gets culled by being off screen, the fillrate that you might have used on actually visible triangles may be lost.
So it's not a good idea to just hurl geometry at the GPU willy-nilly.
In any case, if you're doing 2D rendering, coarse culling of discrete groups of quads is really all you need. You could divide your tilemap into screen-sized portions, and you draw up to 4 of these based on the position of the camera.

WebGL and rectangular (power of two) textures

WebGL is known to have poor support for NPOT (non-power-of-two) textures. But what about rectangular textures where both width and height are powers of two? Specifically, I'm trying to draw to a rectangular framebuffer as part of a render-to-texture scheme to generate some UI elements. The framebuffer would need to be 512x64 or thereabouts.
How much less efficient would this be in terms of drawing? If framerate is a concern, would I do better to allocate a 512x512 power-of-two-sized buffer and only render to the top 64 pixels, sacrificing memory for speed?
There has never been the constraint for that width must equal height.
More specifically: 2D textures are not at all required to be square; a 512x64 texture is not only allowed but should also be efficiently implemented by the driver; on the other hand cube maps need to be square.
For 2D textures, you can use NPOT textures if both wrap modes are CLAMP_TO_EDGE and your minification filter does not require a mipmap. Efficiency of NPOT texture may vary depending on your driver.

Resources