Profile CPU Usage of DSP objects in Realtime

Profile CPU Usage of DSP objects in Realtime - performance

Description
Our Application is based of DSP synthesizer mostly used to create music, written in C Language, and I want to create a system-wide feature to give visual feedback to the user so they can find out which DSP objects are the most CPU-hungry.
I researched a lot but I can't find a way to do implement this feature.
Can anyone guide me how can I implement this feature?
I just want someone to point me in the right direction!!
Thanks in Advance
I have tried to understand how Windows Task Manager works and how ps command in Linux works...
I also looked into Win32API but they all just show currently running processes, and My task is to find the CPU usage of DSP objects currently in use...
my naive approach would involve counting CPU cycles in each method of the object, but I have no idea if thats even the right place to start thinking about it

how about this: measure the time each block takes to do its thing?
the scheduler calls each perform-routine in the DSP graph.
so you just need to measure the time it takes for the perform-routine to return.
the longer it takes, the more CPU-hungry the object is (eventually scale the values by the block size)

Related

Analyze Linux real-time characteristics

Is there any way to analyze the RT characteristics of a linux kernel?
Just for fun I plan to study the behavior of a RT system on Raspberry Pi. I want to add events at each task swap, around each ISR etc. Those events shall contain the exact jiffy time, the processor, and the pid. The event information shall be stored on file. After the run I want to study the timing characteristics.
Of cause I want those measurements to disturb the system as little as possible.
Is there some kind of framework for doing this? Is it even possible to put events around ISRs (in a generic way)? I see this as a stackoverflow question as I'm willing to modify the code if necessary.
NB, I'm not looking for some kind of statistics view of aggregated data. I want it all! ;)

Have a look at SystemTap and dtrace. They do what you want and more.
https://sourceware.org/systemtap/
http://dtrace.org/blogs/about/

MATLAB: GUI progressively getting slower

I've been programming some MATLAB GUIs (not using GUIDE), mainly for viewing images and some other simple operations (such as selecting points and plotting some data from the images).
When the GUI starts, all the operations are performed quickly.
However, as the GUI is used (showing different frames from 3D/4D volumes and perfoming the operations mentioned above), it starts getting progressively slower, reaching a point where it is too slow for common usage.
I would like to hear some input regarding:
Possible strategies to find out why the GUI is getting slower;
Good MATLAB GUI programming practices to avoid this;
Possible references that address these issues.
I'm using set/getappdata to save variables in the main figure of the GUI and communicate between functions.
(I wish I could provide a minimal working example, but I don't think it is suitable in this case because this only happens in somewhat more complex GUIs.)
Thanks a lot.
EDIT: (Reporting back some findings using the profiler:)
I used the profiler in two occasions:
immediatly after starting the GUI;
after playing around with it for some time, until it started getting too slow.
I performed the exact same procedure in both profiling operations, which was simply moving the mouse around the GUI (same "path" both times).
The profiler results are as follows:
I am having difficulties in interpreting these results...
Why is the number of calls of certain functions (such as impixelinfo) so bigger in the second case?
Any opinions?
Thanks a lot.

The single best way I have found around this problem was hinted at above: forced garbage collection. Great advice though the command forceGarbageCollection is not recognized in MATLAB. The command you want is java.lang.System.gc()... such a beast.
I was working on a project wherein I was reading 2 serial ports at 40Hz (using a timer) and one NIDAQ at 1000Hz (using startBackground()) and graphing them all in real-time. MATLAB's parallel processing limitations ensured that one of those processes would cause a buffer choke at any given time. Animations would not be able to keep up, and eventually freeze, etc. I gained some initial success by making sure that I was defining a single plot and only updating parameters that changed inside my animation loop with the set command. (ex. figure, subplot(311), axis([...]),hold on, p1 = plot(x1,y1,'erasemode','xor',...); etc. then --> tic, while (toc<8) set(p1,'xdata',x1,'ydata',y1)...
Using set will make your animations MUCH faster and more fluid. However, you will still run into the buffer wall if you animate long enough with too much going on in the background-- especially real-time data inputs. Garbage collection is your answer. It isn't instantaneous so you don't want it to execute every loop cycle unless your loop is extremely long. My solution is to set up a counter variable outside the while loop and use a mod function so that it only executes every 'n' cycles (ex. counter = 0; while ()... counter++; if (~mod(counter,n)) java.lang.System.gc(); and so on.
This will save you (and hopefully others) loads of time and headache, trust me, and you will have MATLAB executing real-time data acq and animation on par with LabVIEW.

A good strategy to find out why anything is slow in Matlab is to use the profiler. Here is the basic way to use the profiler:
profile on
% do stuff now that you want to measure
profile off
profile viewer
I would suggest profiling a freshly opened GUI, and also one that has been open for a while and is noticeably slow. Then compare results and look for functions that have a significant increase in "Self Time" or "Total Time" for clues as to what is causing the slowdown.

Man-in-the-middle: Intercepting an applications function/library calls

Basically, what I want to achieve is to run a program in an environment which will give any value asked by the program on basis criteria decided by me. Say e.g., games regularly query system about the time so as to execute animations, now if I have a control over the values of time that are passed on to the program I could control the speed of the animation (and perhaps clear some difficult rounds easily :P).
On similar grounds keystrokes, mouse movements could also be controlled (I have tried Java's Robot class, but did not find it satisfactory: slows the system {or perhaps my implementation was bad}, the commands are executed on currently focused programs and can't be targeted specifically).
Any graceful way of doing it or some pointers on achieving it will be highly appreciated.

Single-Tasking for programming competitions

I will start with the question and then proceed to explain the need:
Given a single C++ source code file which compiles well in modern g++ and uses nothing more than the standard library, can I set up a single-task operating system and run it there?
EDIT: Following the comment by Chris Lively, I would have better asked: What's the easiest way you can suggest to try to tweak linux into effectively giving me a single-tasking behavior.
Nevertheless, it seems like I did get a good answer although I did not phrase my question well enough. See the second paragraph in sarnold's answer regarding the scheduler.
Motivation:
In some programming competitions the communication between a contestant's program and the grading program involves a huge number of very short interactions.
Thus, using getrusage to measure the time spent by a contestant's program is inaccurate because getrusage works by sampling a process at constant intervals (usually around once per 10ms) which are too large compared to the duration of each interaction.
Another approach to timing would be to measure time before and after the program is run using something like *clock_gettime* and then subtract their values. We should also subtract the amount of time spent on I/O and this can be done be intercepting printf and scanf using something like LD_PRELOAD and accumulate the time spent in each of these functions by checking the time just before and just after each call to printf/scanf (it's OK to require contestants to use these functions only for I/O).
The method proposed in the last paragraph is ofcourse only true assuming that the contestant's program is the only program running which is why I want a single-tasking OS.
To run both the contestant's program and the grading program at the same time I would need a mechanism which, when one of these program tries to read input and blocks, runs the other program until it write enough output. I still consider this to be single tasking because the programs are not going to run at the same time. The "context switch" would occur when it is needed.
Note: I am aware of the fact that there are additional issues to timing such as CPU power management, but I would like to start by addressing the issue of context switches and multitasking.

First things first, what I think would suit your needs best would actually be a language interpreter -- one that you can instrument to keep track of "execution time" of the program in some purpose-made units like "mems" to represent memory accesses or "cycles" to represent the speeds of different instructions. Knuth's MMIX in his The Art of Computer Programming may provide exactly that functionality, though Python, Ruby, Java, Erlang, are all reasonable enough interpreters that can provide some instruction count / cost / memory access costs should you do enough re-writing. (But losing C++, obviously.)
Another approach that might work well for you -- if used with extreme care -- is to run your programming problems in the SCHED_FIFO or SCHED_RR real-time processing class. Programs run in one of these real-time priority classes will not yield for other processes on the system, allowing them to dominate all other tasks. (Be sure to run your sshd(8) and sh(1) in a higher real-time class to allow you to kill runaway tasks.)

What Simple Changes Made the Biggest Improvements to Your Delphi Programs [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have a Delphi 2009 program that handles a lot of data and needs to be as fast as possible and not use too much memory.
What small simple changes have you made to your Delphi code that had the biggest impact on the performance of your program by noticeably reducing execution time or memory use?
Thanks everyone for all your answers. Many great tips.
For completeness, I'll post a few important articles on Delphi optimization that I found.
Before you start optimizing Delphi code at About.com
Speed and Size: Top 10 Tricks also at About.com
Code Optimization Fundamentals and Delphi Optimization Guidelines at High Performance Delphi, relating to Delphi 7 but still very pertinent.

.BeginUpdate;
.EndUpdate;
;)

Use a Delphi Profiling tool (Some here or here) and discover your own bottle necks. Optimizing the wrong bottlenecks is a waste of time. In other words, if you apply all of these suggestions here, but ignore the fact someone put a sleep(1000) (or similar) in some very important code is a waste of your time. Fix your actual bottlenecks first.

Stop using TStringList for everything.
TStringList is not a general purpose datastructure for effective storage and handling of everything from simple to complex types. Look for alternatives. I use Delphi Container and Algorithm Library (DeCAL, formerly known as SDL). Julians EZDSL should also be a good alternative.

Pre-allocating lists and arrays, rather than growing them with each iteration.
This has probably had the biggest impact for me in terms of speed.

If you need to use Application.processmesssages (or similar) in a loop, try calling it only every Nth iteration.
Similarly, if updating a progressbar, don't update it every iteration. Instead, increment it by x units every x iterations, or scale the updates according to time or as a percentage of overall task length.

FastMM
FastCode (lib)
Use high performance data structures, like hash table (etc). Many places it is faster to make one loop which makes lookup hash table for your data. Uses quite lot of memory but it surely is fast. (this maybe is most important one, but 2 first are dead simple and need very little of effort to do)

Reduce disk operations. If there's enough memory, load the file entirely to RAM and do all operations in memory.

Consider the careful use of threads. If you are not using threads now, then consider adding a couple. If you are, make sure you are not using too many. If you are running on a Dual or Quad core computer (which most are any more) then proper thread tuning is very important.
You could look at OmniThread Library by Gabr, but there are a number of thread libraries in development for Delphi. You could easily implement your own parallel for using anonymous types.

Before you do anything, identify slow parts. Do not touch working code which performs fast enough.

The biggest improvement came when I started using AsyncCalls to convert single-threaded applications that used to freeze up the UI, into (sort of) multi-threaded apps.
Although AsyncCalls can do a lot more, I've found it useful for this very simple purpose. Let's say you have a subroutine blocked like this: Disable Button, Do Work, Enable Button.
You move the 'Do Work' part to a local function (call it AsyncDoWork), and add four lines of code:
var a: IAsyncCall;
a := LocalAsyncCall(#AsyncDoWork);
while (NOT a.Finished) do
application.ProcessMessages;
a.Sync;
What this does for you is run AsyncDoWork in a separate thread, while your main thread remains available to respond to the UI (like dragging the window or clicking Abort.) When AsyncDoWork is finished the code continues. Because I moved it to a local function, all local vars are available, an the code does not need to be changed.
This is a very limited type of 'multi-threading'. Specifically, it's dual threading. You must ensure that your Async function and the UI do not both access the same VCL components or data structures. (I disable all controls except the stop button.)
I don't use this to write new programs. It's just a really quick & easy way to make old programs more responsive.

When working with a tstringlist (or similar), set "sorted := false" until needed (if at all). Seems like a no-brainer...

Create unit tests
Verify tests all pass
Profile your application
Refactor looking for bottlenecks and memory
Repeat from Step 2 (comparing to previous pass)

Make intelligent use of SetLength() for strings and arrays. Optimise initialisation with FillChar or ZeroMemory.
Local variables created on stack (e.g. record types) are faster than heap allocated (objects and New()) variables.
Reuse objects rather than Destroy then create. But make sure management code for this is faster than memory manager!

Check heavily-used loops for calculations that could be (at least partially) pre-calculated or handled with a lookup table. Trig functions are a classic for this, but it applies to many others.

If you have a list, use a dynamic array of anything, even a record as follows:
This needs no classes, no freeing and access to it is very fast. Even if it needs to grow you can do this - see below. Only use TList or TStringList if you need lots of size changing flexibility.
type
TMyRec = record
SomeString : string;
SomeValue : double;
end;
var
Data : array of TMyRec;
I : integer;
..begin
SetLength( Data, 100 ); // defines the length and CLEARS ALL DATA
Data[32].SomeString := 'Hello';
ShowMessage( Data[32] );
// Grow the list by 1 item.
I := Length( Data );
SetLength( Data, I+1 );
..end;

Separating the program logic from user interface, refactoring, then optimizing the most-used, most resource-intensive elements independently.

Turn debugging OFF
Turn optimizations ON
Remove all references to units that
you don't actually use
Look for memory leaks

Use a lot of assertions to debug, then turn them off in shipping code.

Turn off range and overflow checking after you have tested extensively.

If you really, really, really need to be light weight then you can shed the VCL. Take a look at the KOL & MCK. Granted if you do that then you are trading features for reduced footprint.

Use the full FastMM and study the documentation and source and see if you can tweak it to your specifications.

For an old BDE development when I first started Delphi, I was using lots of TQuery components. Someone told me to use TTable master-detail after I explained him what I was doing, and that made the program run much faster.
Calling DisableControls can omit unnecessary UI updates.

When identifying records, use integers if at all possible for record comparison. While a primary key of "company name" might seem logical, the time spent generating and storing a hash of this will greatly improve overall search times.

You might consider using runtime packages. This could reduce your memory foot print if there are more then one program running that is written using the same packages.

If you use threads, set their processor affinity. If you don't use threads yet, consider using them, or look into asynchronous I/O (completion ports) if your application does lots of I/O.

Consider if a DBMS database is really the perfect choice. If you are only reading data and never changing it, then a flat fixed record file could work faster, especially if the path to the data can be easily mapped (ie, one index). A trivial binary search on a fixed record file is still extremely fast.

BeginUpdate ... EndUpdate
ShortString vs. String
Use arrays instead of TStrings and TList
But the sad answer is that tuning and optimization will give you maybe 10% improvement (and it's dangerous); re-design can give you 90%. Once you really understand the goal, you often can restate the problem (and therefore the solution) in much better terms.
Cheers

Examine all loops, and look for ways to short circuit. If your looking for something specific and find it in a loop, then use the BREAK command to immediately bail...no sense looping thru the rest. If you know that you don't have a match, then use a CONTINUE as quickly as possible.

Take advantage of some of the FastCode project code. Parts of it were incorporated into VCL/RTL proper (like FastMM was), but there is more out there you can use!
Note, they have a new site they are moving too, but it seems to be a bit inactive.

Consider hardware issues. If you really need performance then consider the type of hard drive(s) your program and your databases are running on. There are a lot of variables especially if you are running a database. RAID is not always the best answer either.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio