Rendering for large MatLab figure is slow - performance

I'm using MatLab R2014B on Win 8.1 I have a figure with two sub-plots. The data for the first sub-plot is about 700,000 points; the second is about 50,000 points. When I display it or manipulate it in any way (zoom, say), there's a huge lag in time, up to about 30 seconds. Obviously I'd like to improve performance. Here's what I know:
If I break it into 4 plots, each covering 1/4 of the data, performance is fast. Much more than 4 times as fast. The difference seems exponential.
A colleague (running R2014A I believe) has a machine that should be slower but in fact the figure displays with near-realtime speed.
The problem is perhaps how the figure is being rendered. I ran MatLab's "opengl info" and it reports that the Software flag is false. That should mean it's using the display's hardware rendering.
So maybe the display adapter isn't set quite right. My machine (it's a Lenovo laptop) has two display adapters: Intel HD Graphics 3000 and NVIDIA NVS 4200M. I don't know why there are both or whether there are any relevant settings.
Any thoughts on how to proceed?

It could be that you're running it through your integrated graphics processor (Intel HD Graphics 3000) rather than your dedicated graphics processor (NVIDIA NVS 4200M). If your Lenovo has "switchable graphics" enabled, you should be able to switch to the NVIDIA, or check that you are indeed rendering through that. Right click on your power manager in the taskbar. If you see a menu item that says "switchable graphics," you can change it to your NVIDIA. Note, you'll have to close out of MATLAB to do switch.
It does sound like a slowdown caused by the rendering configuration. When you run opengl info in MATLAB, what device is listed as "Renderer"?

If you don't need to manipulate it (say you only want an image file), you can always create your figure with figure('Visible','Off') and save it without actually showing the figure on screen.

MATLAB releases ever since R2014b use a new graphics engine which is known to be extremely slow with large data sets; see for example
http://www.mathworks.com/matlabcentral/newsreader/view_thread/337755
The solution has nothing to do with graphics drivers etc. Revert back to MATLAB R2014a and stay there.

I wrote a function, plotECG, that enables you to show plots with millions of samples. It includes sliders for quick scrolling and zooming.
If you have multiple timeseries and want them to be displayed in a synchronized way, you can pass them as matrix all at once and define the key 'AutoStackSignals', followed by a cell array of strings with the signal's names. Then, the signals are shown one below the other in the same axis with the corresponding name as YTickLabel.
https://de.mathworks.com/matlabcentral/fileexchange/59296

Related

Which library/code is responsible for rendering the terminal in retro computers?

For example as you type, which library is telling the computer screen to display the respective ascii character and to move the cursor accordingly?
Imagine something like the old school computers (with no GUI) running DOS or Basic... what/which library is responsible for the UI?
Links to source code would be great for understanding how said library(ies) works.
The photo you have posted is of a BBC Micro running in Mode 7. This was an exception to most rules. Mode 7 was a low memory mode, in which there where no pixels, just 256 text characters. 1K of memory was reserved in RAM to contain what was displayed on the screen at that moment. A special chip on the circuit board, called the Video ULA (Uncommited Logic Array) read the contents of that memory and coded it to the output. The ULA was ROM and could not be changed by the programmer.
The ZX81 worked in a similar way: 256 possible text characters and no pixels. However the ZX81 had less dedicated chips and the main CPU did most of the work.
A more common setup was that every pixel was represented by a number of bits in memory (often more than one bit per pixel was needed because colours had to be indicated). Examples are BBC in modes 1-6; the Acorn Electron; Spectrum; C64; also many others. When the user placed text on the screen, the computers ROM would convert this to the correct pixels. Graphics could often be written directly to the RAM, or 'plotted' via BASIC. Once again, dedicated ROM chips and circuitry would then render this memory to the output. This approach required much more memory to display.
Every 8 bit computer had its own way of representing the display in RAM. You need to get manuals of the machine you are trying to program (easy to find on internet for the better known Micros).
Many emulators are open source, if you want to see the internals. For example: https://github.com/stardot/beebem
If you're interested in seeing the internals of a terminal to better understand how it works and renders input/output, Bash is completely open source. You can download its latest source code here.

Maximum number of addressable LED's to be controlled by arduino

I'm working on a project that involves building a big 'light wall' (h*w = 3,5 * 7 m). The wall is covered in semi-transparent fabric, and behind this is gonna be mounted strips of addressable RGB-LED's. The strips are gonna be mounted vertically and function as a big display for showing primitive graphics and visuals. I plan to control the diodes using an arduino. The reason for the choice of the arduino as controller is that is must be easily programmable, (relatively) cheap, and should work as a part of different interaction design setups.
Current version of the light wall (without addressable LED's)
My ultimate concern is now this:
How many individual diodes can I control before the arduino becomes inadequate?
As I see it this comes down to two points:
When does the update rate of the diodes become too slow? I want to update the picture at a rate of minimum 10 Hz.
When will the arduino run out of memory? The state of all the diodes on the light wall should be stored for the next frame. At some point then the memory (256 KB for the arduino MEGA 2560) will run out. How many diodes can I control before this happens?
I realize that both questions are related to how the arduino is programmed. I am very open to any tips regarding speed optimization and memory conservation.
Thanks in advance!

What image processing Library should I use

I have been reading https://stackoverflow.com/questions/158756/what-is-the-best-image-manipulation-library And tried a few libraries and are now looking for inputs on what is the best for our need. I will start by describing our current setting and problems.
We have a system that needs to resize and crop a large amount of images from big original images. We handle 50 000+ images every day on 2 powerfull servers. Today we use ImageGlue from WebSupergoo but we don't like it at all, it is slow and hangs the service now and then (Its in another unanswered stack overflow question). We have a threaded windows service that uses Microsoft ThreadPool to resize as much as possible on the 8 core machines.
I have tried AForge and it went very well it was loads faster and never crashed or anything. But I had problems with quality on a few images. This due to what algorithms I used ofc so can be tweaked. But want to widen our eyes to see if thats the right way to go.
so:
It needs to be c# .net and run in a windows service. (Since we wont change the rest of the service only image handling)
It needs to handle threaded environment well.
We have a great need of it being fast since today its too slow. But we also want good quality and small filesize since the images are later displayed on webpage with loads of visitors and needs good quality.
So we have a lot of demands on ability to get god quality at a fast pace, and also secondary keep filesizes lowered even if that can be adjusted with compression a bit.
Any comments or suggestions on what library to use?
I understand it sais that you want to still use C# but providing an alternative.
Depending on the ammount of work you are doing, the fastest way to manipulate images is doing it entirely on a GPU (that would offload most of the pixel work). You can interoperate with CUDA from Managed C++ that you can call from your service. Or use DirectX surfaces and rendering targets (you can have antialiasing and all the high-quality stuff out-of-the-box).
However, before doing anything makes sure your workload is dominated by the trilinear/bilinear resizing and not by the encoding/decoding of the image. BTW you will need at least one fast nVidia videocard on each server to do the offloading (cheap GTX 460 would be more than enough).

How do I ensure GUI responsiveness when using OpenCL on the display GPU?

In my relatively short time learning OpenCL I frequently see my application cause the operating system UI to become significantly less responsive (several seconds for a window to respond to a drag for example). I have encountered this problem on Windows Vista and Mac OS X both with NVidia GPUs.
What can I do when using OpenCL on the same GPU as the display to ensure that my application does not significantly degrade the UI responsiveness like this? Also, can this be done without taking needless performance losses within my application? (Ie, if the user is not doing some UI intensive task then I would not expect my application to run any slower than it does now.)
I understand that any answers will be very platform specific (where platform includes OS/GPU/driver combo).
As described in Dr. David Gohara's OpenCL Tutorial Episode 6 (beginning at 43:49), graphics cards cannot be preemptively scheduled at this time. As a result, using the same graphics card both for an intensive OpenCL kernel and the UI (or other GPU-using operations) will result in clunkiness or the visual appearance of freezing. Until graphics cards get preemptively scheduled multitasking (if ever), there's no way to do exactly what you want with just a single graphics card. I don't believe this is a platform-specific issue at all.
However, this problem might be solvable by dividing the problem up. Given the relative speed of whatever single GPU is available (you'll have to do testing to find the right setup), divide up your OpenCL problem to run the kernel multiple times with different parts of the input data, and later combine the output data when all sets of kernels are complete. I would recommend creating kernel sets that take less than 100 milliseconds to run (on a given GPU) so that lag would be, if not unnoticeable, not significantly annoying (the 100 milliseconds figure is a good "rule of thumb" according to this paper).
Based on your comment about your program being a command-line application, I assume your application will only run once at any given time, versus being a continuously running application with real-time output, as a lot of OpenCL demos are. My above answer is only satisfactory for non-continuous applications, since real-time performance isn't inherently expected. However, if your application is supposed to be continuous, the only solution currently available is to add a second, simpler graphics card that will only be used for UI.

Qt 4.6.x under MacOS/X: widget update performance mystery

I'm working on a Qt-based MacOS/X audio metering application, which contains audio-metering widgets (potentially a lot of them), each of which is supposed to be updated every 50ms (i.e. at 20Hz).
The program works, but when lots of meters are being updated at once, it uses up lots of CPU time and can bog down (spinny-color-wheel, oh no!).
The strange thing is this: Originally this app would just call update() on the meter widget whenever the meter value changed, and therefore the entire meter-widget would be redrawn every 50ms. However, I thought I'd be clever and compute just the area of the meter that actually needs to be redrawn, and only redraw that portion of the widget (e.g. update(x,y,w,h), where y and h are computed based on the old and new values of the meter). However, when I implemented that, it actually made CPU usage four times higher(!)... even though the app was drawing 50% fewer pixels per second.
Can anyone explain why this optimization actually turns out to be a pessimization? I've posted a trivial example application that demonstrates the effect, here:
http://www.lcscanada.com/jaf/meter_test.zip
When I compile (qmake;make) the above app and run it like this:
$ ./meter.app/Contents/MacOS/meter 72
Meter: Using numMeters=72 (partial updates ENABLED)
... top shows the process using ~50% CPU.
When I disable the clever-partial-updates logic, by running it like this:
$ ./meter.app/Contents/MacOS/meter 72 disable_partial_updates
Meter: Using numMeters=72 (partial updates DISABLED)
... top shows the process using only ~12% CPU. Huh? Shouldn't this case take more CPU, not less?
I tried profiling the app using Shark, but the results didn't mean much to me. FWIW, I'm running Snow Leopard on an 8-core Xeon Mac Pro.
GPU drawing is a lot faster then letting CPU caclulate the part to redraw (at least for OpenGL this takes in account, I got the Book OpenGL superbible, and it states that OpenGL is build to redraw not, to draw delta as this is potentially a lot more work to do). Even if you use Software Rendering, the libraries are higly optimzed to do their job properly and fast. So Just redrawing is state of art.
FWIW top on my Linux box shows ~10-11% without partial updates and 12% using partial updates. I had to request 400 meters to get that CPU usage though.
Perhaps it's just that the overhead of Qt setting up a paint region actually dwarfs your paint time? After all your painting is really simple, it's just two rectangular fills.

Resources