Gaussian Mixture Model for Background Subtraction

Gaussian Mixture Model for Background Subtraction - algorithm

This is more like something that I would like to discuss with the community rather than something that I am seeking for an absolute answer.
I am trying to implement the GMM based background subtraction algorithm from scratch. Apparently, OpenCV already had it well-implemented (the MOG2). I am still trying to implement it from scratch as I would like to test some of the parameters that the OpenCV does not provide access to. However, my implementation was super slow when running on 4k images and took a huge amount of memory while OpenCV can achieve about 5-10 images per second or even faster and does not take much memory. I am NOT surprised that the OpenCV was much faster than mine but still curious about how it was achieved.
So here are my thoughts:
The GMM approach is to build a mixture of Gaussians to describe the background/foreground for each pixel. That been said, each pixel will have 3-5 associated 3-dimensional Gaussian components. We can simplify the computation by using a shared variance for different channels instead of the covariance. Then we should have at least 3 means, 1 variance, and 1 weight parameters for each Gaussian component. If we assume each pixel would maintain 3 components. This would be roughly 4000*2000*3*(3+1+1) parameters when reading an image.
The computation for updating the GMM, although it is not very complex for a single pixel, the total amount of time for computing the whole 4000*2000 pixels should still be very expensive.
I don't think the OpenCV MOG2 was accelerated by CUDA as I tested on my mac without a graphic card. The speed was still fast.
So my question is:
Does the OpenCV compress the image before feeding it into the model and decompress the results at return?
Is it possible to achieve near real-time processing for 4k images (without image compression) with parallelization on CPU?
My implementation used 4000*2000 double linked lists for maintaining the Gaussian Components for the 4k images. I was expecting that it should save me some memory, but the memory still exploded when I tested it on the 4k image.
Plus:
I did test the OpenCV MOG2 on the resized image ((3840, 2160) down to (384, 216)) and the detection seems acceptable.
This might be a weird question... But I would appreciate any opinions on it.

Related

How do I see the GPU's bottleneck in a complex algorithm?

I'm using GLSL fragment shaders for GPGPU calculations (I have my reasons).
In nSight I see that I'm doing 1600 drawcalls per frame.
There could be 3 bottlenecks:
Fillrate
Just too many drawcalls
GPU stalls due to my GPU->CPU downloads and CPU->GPU uploads
How do I find which one it is?
If my algorithm was simple (e.g. a gaussian blur or something), I could force the viewport of each drawcall to be 1x1, and depending on the speed change, I could rule out a fillrate problem.
In my case, though, that would require changing the entire algorithm.

Since you're mentioning Nvidia NSight tool, you could try to follow the procedures explained in the following Nvidia blog post.
It explains how to read and understand hardware performance counters to interpret performance bottlenecks.
The Peak-Performance-Percentage Analysis Method for Optimizing Any GPU Workload :
https://devblogs.nvidia.com/the-peak-performance-analysis-method-for-optimizing-any-gpu-workload/

Instead of finding the one, change the ways to calculate.
I'm using GLSL fragment shaders for GPGPU calculations (I have my reasons).
I am not sure what your OpenGL version is but using computer shader over FS will solve the issue
In nSight I see that I'm doing 1600 drawcalls per frame.
Do you mean actual OpenGL drawcalls? it muse be one of reasons for sure. You may draw something on FBOs to calculate them using GPU. That is the big difference between Computer Shader & Fragment Shader. Draw calls always slow down the program but Computer shader.
An architectural advantage of compute shaders for image processing is
that they skip the ROP(Render output unit) step. It's very likely that writes from pixel
shaders go through all the regular blending hardware even if you don't
use it.
If you have to use FS somehow, then
try to find reduce the drawcalls.
find the way to store data that is being calculated.
It would be like using render textures as a memory, if you need to change vertices using RTTs, you would have to load textures as position, velocity or whatever you need to change vertices or its attributes like normal/color.
To find the actual reason, better use GPU& GPU profilers depending on your chipset and OS.

How to get good performance on the gfx card with images larger than the max texture size?

At work, I work with very large images.
I currently do my rendering via SDL2.
The max texture size on the graphics card my machine uses is 8192x8192.
Because my data sets are larger than what will fit in a single texture, I split my image into multiple textures after it is loaded, and tile them.
However, I have found that this comes at a very steep cost. Rendering only 4 textures around 5K by 5K (pixels) each completely tanks the framerate!
Conventional wisdom tells me that the fewer texture swaps the better, but with such large images I've found myself between a rock and a hard place.
One thing I've considered is that perhaps if I were to chunck the images up into many small textures, I could take advantage of culling which would hopefully be a net win. But there's a big problem with that approach - I need to be able to zoom out.
Another option would be to down scale the images. This seems promising as the analysis I am doing on the images do not require the high resolution that the images provide.
I know that OpenGL has mipmapping, but I am inexperienced with OpenGL and am weary of diving into it for a work project. I am not aware of a good way to downscale the images within the confines of SDL2, and for reasons specific to the work I am doing, scaling the images down offline (before I load them) is not appealing.
What is the best approach for me to get the highest framerate in this situation?

How to create textures from large images in opengl (bigger than the MAX_TEXTURE_SIZE)

I've found that the maximum texture size that my opengl can support is 8192 but the image that I'm working with is 16997x15931. As you can see in this link, I've completed the class COpenGLControl and customized it for my own use to work with a smaller 7697x7309 image and activated different navigation tasks for it.
Render an outlined red rectangle on top a 2D texture in OpenGL
but now in the last stages of work, I've decided to change the part where applies the texture and enable it to handle images bigger than the size 8192.
Questions:
Is it possible in my opengl?
what concept should I study mipmaps, multiple texturing?
Will it expand performance of code?
Right now my program uses 271 MB of ram for just showing this small image(7697x7309) and I'm going to add a task to it (for image-processing filtering processes) that I have used all my effort to optimize the code but it uses 376 MB of ram for the (7697x7309) image(the code is already written as a console application will be combined with this project). So I think the final project would use up to 700 MB of ram for images near the 7000x7000 size. Obviously for the bigger image (16997x15931 ) the usage of ram will be alot higher!
So I'm looking for a concept to handle images bigger than the MAX_TEXTURE_SIZE and also optimize the performance of the program
More Questions:
What concept should I study in OpenGL to achieve the above goal?
explain alittle about the concept that you suggest?
I've asked the question in Game Developement too but decided to repeat the question here maybe it will have more viewers. As soon as I get the answer, I will delete the question from either on of the sites. So don't worry about multiple questionings.

I will try to sum up my comments for the original question.
know your proper opengl version: maybe you can load some modern extension and work with even the recent version of opengl.
if it is possible you can take a look at Sparse Textures (Mega Textures): ARB_sparse_texture or AMD_sparse_texture
to reduce memory you can use some texture compression:
How to: load DDS files in OpenGL.
another simple idea: you can split the huge texture and create 4 smaller textures (from 16k x 16k into four 8k x 8k) and somehow render four squares.
maybe you can use OpenCL or CUDA to do the work?
regarding mipmaps: it is set of smaller version of your input texture, mipmaps improve performance and final quality of the filtering, but you need another 33% more memory for a texture with full mipmap chain. In your case they could be very helpful. For instance when you look at a wall from a huge distance you do not have to use full (large) texture... only a small version of it is enough. g-truc on mipmaps
In general there is a lot of options, but it depends on your experience what is simpler and fastest to implement.

graphics: best performance with floating point accumulation images

I need to speed up some particle system eye candy I'm working on. The eye candy involves additive blending, accumulation, and trails and glow on the particles. At the moment I'm rendering by hand into a floating point image buffer, converting to unsigned chars at the last minute then uploading to an OpenGL texture. To simulate glow I'm rendering the same texture multiple times at different resolutions and different offsets. This is proving to be too slow, so I'm looking at changing something. The problem is, my dev hardware is an Intel GMA950, but the target machine has an Nvidia GeForce 8800, so it is difficult to profile OpenGL stuff at this stage.
I did some very unscientific profiling and found that most of the slow down is coming from dealing with the float image: scaling all the pixels by a constant to fade them out, and converting the float image to unsigned chars and uploading to the graphics hardware. So, I'm looking at the following options for optimization:
Replace floats with uint32's in a fixed point 16.16 configuration
Optimize float operations using SSE2 assembly (image buffer is a 1024*768*3 array of floats)
Use OpenGL Accumulation Buffer instead of float array
Use OpenGL floating-point FBO's instead of float array
Use OpenGL pixel/vertex shaders
Have you any experience with any of these possibilities? Any thoughts, advice? Something else I haven't thought of?

The problem is simply the sheer amount of data you have to process.
Your float buffer is 9 megabytes in size, and you touch the data more than once. Most likely your rendering loop looks somewhat like this:
Clear the buffer
Render something on it (uses reads and writes)
Convert to unsigned bytes
Upload to OpenGL
That's a lot of data that you move around, and the cache can't help you much because the image is much larger than your cache. Let's assume you touch every pixel five times. If so you move 45mb of data in and out of the slow main memory. 45mb does not sound like much data, but consider that almost each memory access will be a cache miss. The CPU will spend most of the time waiting for the data to arrive.
If you want to stay on the CPU to do the rendering there's not much you can do. Some ideas:
Using SSE for non temporary loads and stores may help, but they will complicate your task quite a bit (you have to align your reads and writes).
Try break up your rendering into tiles. E.g. do everything on smaller rectangles (256*256 or so). The idea behind this is, that you actually get a benefit from the cache. After you've cleared your rectangle for example the entire bitmap will be in the cache. Rendering and converting to bytes will be a lot faster now because there is no need to get the data from the relative slow main memory anymore.
Last resort: Reduce the resolution of your particle effect. This will give you a good bang for the buck at the cost of visual quality.
The best solution is to move the rendering onto the graphic card. Render to texture functionality is standard these days. It's a bit tricky to get it working with OpenGL because you have to decide which extension to use, but once you have it working the performance is not an issue anymore.
Btw - do you really need floating point render-targets? If you get away with 3 bytes per pixel you will see a nice performance improvement.

It's best to move the rendering calculation for massive particle systems like this over to the GPU, which has hardware optimized to do exactly this job as fast as possible.
Aaron is right: represent each individual particle with a sprite. You can calculate the movement of the sprites in space (eg, accumulate their position per frame) on the CPU using SSE2, but do all the additive blending and accumulation on the GPU via OpenGL. (Drawing sprites additively is easy enough.) You can handle your trails and blur either by doing it in shaders (the "pro" way), rendering to an accumulation buffer and back, or simply generate a bunch of additional sprites on the CPU representing the trail and throw them at the rasterizer.

Try to replace the manual code with sprites: An OpenGL texture with an alpha of, say, 10%. Then draw lots of them on the screen (ten of them in the same place to get the full glow).

If you by "manual" mean that you are using the CPU to poke pixels, I think pretty much anything you can do where you draw textured polygons using OpenGL instead will represent a huge speedup.

Detecting if two images are visually identical

Sometimes two image files may be different on a file level, but a human would consider them perceptively identical. Given that, now suppose you have a huge database of images, and you wish to know if a human would think some image X is present in the database or not. If all images had a perceptive hash / fingerprint, then one could hash image X and it would be a simple matter to see if it is in the database or not.
I know there is research around this issue, and some algorithms exist, but is there any tool, like a UNIX command line tool or a library I could use to compute such a hash without implementing some algorithm from scratch?
edit: relevant code from findimagedupes, using ImageMagick
try $image->Sample("160x160!");
try $image->Modulate(saturation=>-100);
try $image->Blur(radius=>3,sigma=>99);
try $image->Normalize();
try $image->Equalize();
try $image->Sample("16x16");
try $image->Threshold();
try $image->Set(magick=>'mono');
($blob) = $image->ImageToBlob();
edit: Warning! ImageMagick $image object seems to contain information about the creation time of an image file that was read in. This means that the blob you get will be different even for the same image, if it was retrieved at a different time. To make sure the fingerprint stays the same, use $image->getImageSignature() as the last step.

findimagedupes is pretty good. You can run "findimagedupes -v fingerprint images" to let it print "perceptive hash", for example.

Cross-correlation or phase correlation will tell you if the images are the same, even with noise, degradation, and horizontal or vertical offsets. Using the FFT-based methods will make it much faster than the algorithm described in the question.
The usual algorithm doesn't work for images that are not the same scale or rotation, though. You could pre-rotate or pre-scale them, but that's really processor intensive. Apparently you can also do the correlation in a log-polar space and it will be invariant to rotation, translation, and scale, but I don't know the details well enough to explain that.
MATLAB example: Registering an Image Using Normalized Cross-Correlation
Wikipedia calls this "phase correlation" and also describes making it scale- and rotation-invariant:
The method can be extended to determine rotation and scaling differences between two images by first converting the images to log-polar coordinates. Due to properties of the Fourier transform, the rotation and scaling parameters can be determined in a manner invariant to translation.

Colour histogram is good for the same image that has been resized, resampled etc.
If you want to match different people's photos of the same landmark it's trickier - look at haar classifiers. Opencv is a great free library for image processing.

I don't know the algorithm behind it, but Microsoft Live Image Search just added this capability. Picasa also has the ability to identify faces in images, and groups faces that look similar. Most of the time, it's the same person.
Some machine learning technology like a support vector machine, neural network, naive Bayes classifier or Bayesian network would be best at this type of problem. I've written one each of the first three to classify handwritten digits, which is essentially image pattern recognition.

resize the image to a 1x1 pixle... if they are exact, there is a small probability they are the same picture...
now resize it to a 2x2 pixle image, if all 4 pixles are exact, there is a larger probability they are exact...
then 3x3, if all 9 pixles are exact... good chance etc.
then 4x4, if all 16 pixles are exact,... better chance.
etc...
doing it this way, you can make efficiency improvments... if the 1x1 pixel grid is off by a lot, why bother checking 2x2 grid? etc.

If you have lots of images, a color histogram could be used to get rough closeness of images before doing a full image comparison of each image against each other one (i.e. O(n^2)).

There is DPEG, "The" Duplicate Media Manager, but its code is not open. It's a very old tool - I remember using it in 2003.

You could use diff to see if they are REALLY different.. I guess it will remove lots of useless comparison. Then, for the algorithm, I would use a probabilistic approach.. what are the chances that they look the same.. I'd based that on the amount of rgb in each pixel. You could also find some other metrics such as luminosity and stuff like that.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio