Image downsampling performance - algorithm

I'm working with image recognition (marker detection) and was looking into image downsampling pre-recognition to boost performance. Reasoning is I'll downsample the image, run the detection algorithm on that and then interpolate the marker coordinates using the downsampling factor. I thought that downsampling cost would be trivial since it's being done all the time by our gpu .
So I tried using opencv to downsample and saw that not only did I not get any improvements, it actually took even longer. I then figured it was because I was making the cpu do it, so I looked into downsampling using opengl mipmaps or even shaders but from what I've read it still remains a costly task, taking tens or even hundreds of milliseconds to halve common image resolutions.
My question is, if downsampling is being continuously done with apparent ease (think resizing an image on any image viewer or any texture in a videogame) why is it so slow using the most common methods? Is there some secret technique or am I just not understanding something?

You can set your image as texture and use this texture on quad. Changing texture coordinates you'll be able to do any transformation on your image. And it's very fast method. Bottleneck here is copying image from host to device and back.

Related

Performance trade off of using a rectangular shaped texture vs. square shaped texture in Three.js material? Specifically a .basis texture

I'm wondering what the trade off is between using a texture that's 128x512 vs. a texture that's 512x512.
The texture is a skateboard deck (naturally rectangular) so I initially made the texture have an aspect ratio that made the deck appear correctly.
I'd like to use a .basis texture and read "Transcoding to PVRTC1 (for iOS) requires square power-of-two textures." on the Three.js BasisTextureLoader documentation.
So I'm trying to weigh the loading time + performance trade off between using the 128x512 as a JPG or PNG vs. a 512x512 basis texture.
My best guess is that the 128x512 would take up less memory because less texels but I've also read that the GPU likes square textures and basis is much more GPU optimized, so I'm torn between which route to take here.
Any knowledge of the performance trade offs between these two options would be highly appreciated, especially an explanation of the benfits of basis textures in general.
Three.js only really needs power-of-two textures when you're asking the texture's .minFilter to perform mip-mapping. In this case, the GPU will make several copies of the texture at half the resolution as the previous one (512, 256, 128, 64, etc...) which is why it asks for a power-of-two. The default value does perform mip-mapping, you can see alternative .minFilter values in this page under "Minification Filters". Nearest and Linear do not require P.O.T. textures, but you'll get pixellization artifacts when the texture is scaled down.
In WebGL, you can use a 512x128 without problems, since both dimensions are a power-of-two. The perfomance tradeoff is that you save a bunch of pixels that would have been stretched-out duplicates anyway.

Rendering 2D image using Direct3D

I am trying to replace my GDIPlus rendering with Direct3D. I am rendering some large images of the order (10K x 10K) and it gets really slow with GDI. I am now rendering the image as texture onto a Quad using Direct3D. The image does render but the quality is really off when the image is zoomed out.
I am using the following filters.
m_pDevice3D->SetSamplerState(0, D3DSAMP_MAGFILTER, D3DTEXF_LINEAR);
m_pDevice3D->SetSamplerState(0, D3DSAMP_MINFILTER, D3DTEXF_LINEAR);
//m_pDevice3D->SetSamplerState(0, D3DSAMP_MAXANISOTROPY, 4);
m_pDevice3D->SetSamplerState(0, D3DSAMP_MIPFILTER, D3DTEXF_LINEAR);
I have already tried rendering using Anisotropic filter already with no significant improvement.
You can not render a large image on a small surface with a good quality without paying the price of an extensive filtering.
This is a signal processing issue named aliasing. To reproduce a signal, you need a medium that has at least twice the resolution or the spectrum will fold on itself.
In 3D rendering, the typical way to do it is by generating a mipchain. It consists of pre filtered version of the image dividing the resolution by two until we reach a size of 1x1. The GPU is then able to pick the proper version.
If your image is dynamic, and you know the display area, you will prefer a runtime filtering, but to do that with a GPU, you will have to recent direct x version with shaders or work with temporary offscreen surface to ping pong the reduction step by step.

Querying graphics capabilities for deciding whether to apply GPU-intensive effects (through SpriteKit)

I have a game written with SpriteKit which uses a SKEffectNode with blur effect to blur a set of sprites, one of which has a fairly large texture, and which together cover a fairly large area of the screen. An iMac and Mac Book Pro cope quite happily with this but on a more humble Mac Book there is a notable drop in frame rate with the effect node added in. Since the effect isn't crucial to the functionality of the game, I could simply not add the SKEffectNode for machines with less powerful graphics capabilities.
So then the question: what would be a good programmatic check that I could make to determine the "power of the GPU" or "performance when applying texture effects" or [suggest better metric here] and via what API? Thanks for your suggestions!
You'll have to create a performance test using your actual blurring processes and some sample content to get an accurate idea of the time cost of it on each generation of hardware.
Blurs are really weird things, programmatically. A Box Blur can give you most of the appearance of a nice, soft gaussian blur for much less processing cost. A zoom or motion blur (that looks good) is surprisingly expensive, even on strong hardware.
And there's some amazingly effective "cheats" when doing blurs. Because there's no need for detail you can heavily optimise the operations, particularly if the blurs are strong.
Apple, it's believed, does something like this, for example, with its blurs:
Massively shrink the target image
Do a gaussian blur on this tiny image
Scale it back up, somewhat
Apply a cheap Box Blur to soften it
Fully scale back to the desired size
By way of terrible example benefitting from scaling well (with filtering set for good scaling)
This is the full sized image blurred:
And here's a version of the same image, scaled to a 16th of its original size, blurred, and then the blurred image scaled back up. As you can see, due to the good scaling and lack of detail, there's hardly any difference in the blurred image, but the blur takes MUCH less processing energy and time:

Smooth Lower Resolution Voxel Noise

After reading the blog post at n0tch.tumblr.com/post/4231184692/terrain-generation-part-1. I was interested in Notch's solution by sampling at lower resolutions. I implemented this solution in my engine, but instantly noticed he didn't go into detail what he interpolated between to smooth out the noise.
From the blog:
Unfortunately, I immediately ran into both performance issues and
playability issues. Performance issues because of the huge amount of
sampling needed to be done, and playability issues because there were
no flat areas or smooth hills. The solution to both problems turned
out to be just sampling at a lower resolution (scaled 8x along the
horizontals, 4x along the vertical) and doing a linear interpolation.
This is the result of the low-res method without smoothing:
low-res voxel
I attempted to smooth out the noise in the chunk noise array and instantly noticed a problem:
attempt at smoothing
The noise also looks less random now.
As you can see, there is an obvious transition between chunks. How exactly do I use interpolation to smooth out the low resolution noise map so that the border between chunks smoothly connect while still appearing random?

How to get good performance on the gfx card with images larger than the max texture size?

At work, I work with very large images.
I currently do my rendering via SDL2.
The max texture size on the graphics card my machine uses is 8192x8192.
Because my data sets are larger than what will fit in a single texture, I split my image into multiple textures after it is loaded, and tile them.
However, I have found that this comes at a very steep cost. Rendering only 4 textures around 5K by 5K (pixels) each completely tanks the framerate!
Conventional wisdom tells me that the fewer texture swaps the better, but with such large images I've found myself between a rock and a hard place.
One thing I've considered is that perhaps if I were to chunck the images up into many small textures, I could take advantage of culling which would hopefully be a net win. But there's a big problem with that approach - I need to be able to zoom out.
Another option would be to down scale the images. This seems promising as the analysis I am doing on the images do not require the high resolution that the images provide.
I know that OpenGL has mipmapping, but I am inexperienced with OpenGL and am weary of diving into it for a work project. I am not aware of a good way to downscale the images within the confines of SDL2, and for reasons specific to the work I am doing, scaling the images down offline (before I load them) is not appealing.
What is the best approach for me to get the highest framerate in this situation?

Resources