Matlab GPU acceleration for loading large point cloud dataset

Matlab GPU acceleration for loading large point cloud dataset - performance

I'm trying to load a large dataset of a million points in 3d space in MATLAB, but whenever I try to plot it (scatter or plot3) , it takes forever. This is on a laptop with Intel Graphics Media Accelerator 950, up to 224-MB shared system memory. This also sometimes leads to Matlab 2008a crashing. Is there a way to let MATLAB use a Nvidia GPU for plotting this dataset. I have another laptop with Nvidia Go 6150. I'm on Windows Xp and Windows 7.

OpenGL
You can set the renderer used for figures in MATLAB.
http://www.mathworks.com/support/tech-notes/1200/1201.html
To take advantage of GPU, You can set it to OpenGL
set(0,'DefaultFigureRenderer','opengl')
Which
enables MATLAB to access graphics hardware if it is available on your machine. It provides object transparency, lighting, and accelerated performance.
Other ways
Also, the following link shows some ideas about optimizing graphics performance:
http://www.mathworks.com/access/helpdesk/help/techdoc/creating_plots/f7-60415.html
However,
These techniques apply to cases when you are creating many graphs of similar data and can improve rendering speed by preventing MATLAB from performing unnecessary operation.

If you're wanting to use CUDA the minimum card spec required is a G80, your 6150 sadly is too old.
List of compatible cards.

There is Jacket, the commercial product to give GPU power to Matlab:
http://www.accelereyes.com/products/jacket
You can download the trial version (30 days, as I remember).

Related

Gaussian Mixture Model for Background Subtraction

This is more like something that I would like to discuss with the community rather than something that I am seeking for an absolute answer.
I am trying to implement the GMM based background subtraction algorithm from scratch. Apparently, OpenCV already had it well-implemented (the MOG2). I am still trying to implement it from scratch as I would like to test some of the parameters that the OpenCV does not provide access to. However, my implementation was super slow when running on 4k images and took a huge amount of memory while OpenCV can achieve about 5-10 images per second or even faster and does not take much memory. I am NOT surprised that the OpenCV was much faster than mine but still curious about how it was achieved.
So here are my thoughts:
The GMM approach is to build a mixture of Gaussians to describe the background/foreground for each pixel. That been said, each pixel will have 3-5 associated 3-dimensional Gaussian components. We can simplify the computation by using a shared variance for different channels instead of the covariance. Then we should have at least 3 means, 1 variance, and 1 weight parameters for each Gaussian component. If we assume each pixel would maintain 3 components. This would be roughly 4000*2000*3*(3+1+1) parameters when reading an image.
The computation for updating the GMM, although it is not very complex for a single pixel, the total amount of time for computing the whole 4000*2000 pixels should still be very expensive.
I don't think the OpenCV MOG2 was accelerated by CUDA as I tested on my mac without a graphic card. The speed was still fast.
So my question is:
Does the OpenCV compress the image before feeding it into the model and decompress the results at return?
Is it possible to achieve near real-time processing for 4k images (without image compression) with parallelization on CPU?
My implementation used 4000*2000 double linked lists for maintaining the Gaussian Components for the 4k images. I was expecting that it should save me some memory, but the memory still exploded when I tested it on the 4k image.
Plus:
I did test the OpenCV MOG2 on the resized image ((3840, 2160) down to (384, 216)) and the detection seems acceptable.
This might be a weird question... But I would appreciate any opinions on it.

How do I see the GPU's bottleneck in a complex algorithm?

I'm using GLSL fragment shaders for GPGPU calculations (I have my reasons).
In nSight I see that I'm doing 1600 drawcalls per frame.
There could be 3 bottlenecks:
Fillrate
Just too many drawcalls
GPU stalls due to my GPU->CPU downloads and CPU->GPU uploads
How do I find which one it is?
If my algorithm was simple (e.g. a gaussian blur or something), I could force the viewport of each drawcall to be 1x1, and depending on the speed change, I could rule out a fillrate problem.
In my case, though, that would require changing the entire algorithm.

Since you're mentioning Nvidia NSight tool, you could try to follow the procedures explained in the following Nvidia blog post.
It explains how to read and understand hardware performance counters to interpret performance bottlenecks.
The Peak-Performance-Percentage Analysis Method for Optimizing Any GPU Workload :
https://devblogs.nvidia.com/the-peak-performance-analysis-method-for-optimizing-any-gpu-workload/

Instead of finding the one, change the ways to calculate.
I'm using GLSL fragment shaders for GPGPU calculations (I have my reasons).
I am not sure what your OpenGL version is but using computer shader over FS will solve the issue
In nSight I see that I'm doing 1600 drawcalls per frame.
Do you mean actual OpenGL drawcalls? it muse be one of reasons for sure. You may draw something on FBOs to calculate them using GPU. That is the big difference between Computer Shader & Fragment Shader. Draw calls always slow down the program but Computer shader.
An architectural advantage of compute shaders for image processing is
that they skip the ROP(Render output unit) step. It's very likely that writes from pixel
shaders go through all the regular blending hardware even if you don't
use it.
If you have to use FS somehow, then
try to find reduce the drawcalls.
find the way to store data that is being calculated.
It would be like using render textures as a memory, if you need to change vertices using RTTs, you would have to load textures as position, velocity or whatever you need to change vertices or its attributes like normal/color.
To find the actual reason, better use GPU& GPU profilers depending on your chipset and OS.

2D graphics acceleration in SDL2 for Windows game development

I am using Pascal and SDL2.0.5 for 2D game development for windows and as the development progresses it starts to show frame rate drop, especially as I am using each frame particles. I don't use graphics acceleration. I'm just using the SDL2 API. I want to ask you:
For windows should I choose OpenGL or Direct3d?
Will I see asignificant change in my games performance?

I don't use graphics acceleration.
Are you sure? SDL2 uses hardware acceleration by default unless you create your renderer explicitly with SDL_CreateSoftwareRenderer, or if you don't use SDL_Renderer in the first place.
For windows should I choose OpenGL or Direct3d?
If you do use SDL_Renderer, make sure you're using SDL_CreateRenderer, and SDL will use what it thinks is best for the platform it's running on.
Otherwise, it doesn't really matter, especially for a 2D game. While on one system, one API might outperform the other, on another system the opposite might be true. When you look at the big picture, both average out to be roughly the same in terms of performance. Since you don't care about portability, pick the one that looks easiest to you.
Will I see asignificant change in my games performance?
Yes. Hardware acceleration is faster in almost every case. The only exception that comes to mind is when you need to render frames pixel by pixel.

How to create textures from large images in opengl (bigger than the MAX_TEXTURE_SIZE)

I've found that the maximum texture size that my opengl can support is 8192 but the image that I'm working with is 16997x15931. As you can see in this link, I've completed the class COpenGLControl and customized it for my own use to work with a smaller 7697x7309 image and activated different navigation tasks for it.
Render an outlined red rectangle on top a 2D texture in OpenGL
but now in the last stages of work, I've decided to change the part where applies the texture and enable it to handle images bigger than the size 8192.
Questions:
Is it possible in my opengl?
what concept should I study mipmaps, multiple texturing?
Will it expand performance of code?
Right now my program uses 271 MB of ram for just showing this small image(7697x7309) and I'm going to add a task to it (for image-processing filtering processes) that I have used all my effort to optimize the code but it uses 376 MB of ram for the (7697x7309) image(the code is already written as a console application will be combined with this project). So I think the final project would use up to 700 MB of ram for images near the 7000x7000 size. Obviously for the bigger image (16997x15931 ) the usage of ram will be alot higher!
So I'm looking for a concept to handle images bigger than the MAX_TEXTURE_SIZE and also optimize the performance of the program
More Questions:
What concept should I study in OpenGL to achieve the above goal?
explain alittle about the concept that you suggest?
I've asked the question in Game Developement too but decided to repeat the question here maybe it will have more viewers. As soon as I get the answer, I will delete the question from either on of the sites. So don't worry about multiple questionings.

I will try to sum up my comments for the original question.
know your proper opengl version: maybe you can load some modern extension and work with even the recent version of opengl.
if it is possible you can take a look at Sparse Textures (Mega Textures): ARB_sparse_texture or AMD_sparse_texture
to reduce memory you can use some texture compression:
How to: load DDS files in OpenGL.
another simple idea: you can split the huge texture and create 4 smaller textures (from 16k x 16k into four 8k x 8k) and somehow render four squares.
maybe you can use OpenCL or CUDA to do the work?
regarding mipmaps: it is set of smaller version of your input texture, mipmaps improve performance and final quality of the filtering, but you need another 33% more memory for a texture with full mipmap chain. In your case they could be very helpful. For instance when you look at a wall from a huge distance you do not have to use full (large) texture... only a small version of it is enough. g-truc on mipmaps
In general there is a lot of options, but it depends on your experience what is simpler and fastest to implement.

Do Graphics Cards boost speed of other rendering when we don't invoke DirectX or OpenGL?

I am curious about how Graphics Card works in general. Please enlighten me.
If we don't make a call to a graphics library such as DirectX or OpenGL, does Graphics Card render every other things on screen as well? Or all these calculation for rendering depend on the CPU and are rendered by the CPU?
For instance, if I am to create a simple program that will load an image and render it on a window frame, without using DirectX or OpenGL, does having a faster graphics card render this image faster in this case? Or will this solely depend on the CPU if we don't use DirectX or OpenGL?

The simple answer is "yes", in a modern OS the graphics card does render most everything on the screen. This isn't quite a 'graphics card' question, but rather a OS question. The card has been able to do this since 3dfx days, but the OS did not use it for things like window compositing until recently.
For your example, the answer really depends on the API you use to render your window. One could imagine an API that is far removed from the OS and chose to always keep the image data in CPU memory. If every frame is displayed by blitting the visible portion from CPU to GPU, the GPU would likely not be the bottleneck (PCIE probably would be). But, other APIs (hopefully the one you use) could store the image data in the GPU memory and the visible portion could be displayed from GPU memory without traversing PCIE every frame. That said, the 'decoration' part of the window is likely drawn by a series of OpenGL or DX calls.
Hopefully that answers things well enough?

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio