I need a way to find out the amount of time taken by a function, and a section of my code inside a function, to execute
Does Visual Studio provide any mechanism for doing this, or is it possible to do so from the program using MFC functions? I am new to MFC so I am not sure how this can be done. I thought this should be a pretty straight forward operation but I cannot find any examples on how this may be done either
A quick way, but quite imprecise, is by using GetTickCount():
DWORD time1 = GetTickCount();
// Code to profile
DWORD time2 = GetTickCount();
DWORD timeElapsed = time2-time1;
The problem is GetTickCount() uses the system timer, which has a typical resolution of 10 - 15 ms. So it is only useful with long computations.
It can't tell the difference between a function that takes 2 ms to run and one that takes 9 ms. But if you are in the seconds range, it may well be enough.
If you need more resolution, you can use the performance counter, as RedEye explains.
Or you can try a profiler (maybe this is what you were looking for?). See this question.
There may be better ways, but I do it like this :
// At the start of the function
LARGE_INTEGER lStart;
QueryPerformanceCounter(&lStart);
LARGE_INTEGER lFreq;
QueryPerformanceFrequency(&lFreq);
// At the en of the function
LARGE_INTEGER lEnd;
QueryPerformanceCounter(&lEnd);
TRACE("FunctionName t = %dms\n", (1000*(lEnd.LowPart - lStart.LowPart))/lFreq.LowPart);
I use this method quite a lot for optimising graphics code, finding time taken for screen updates etc. There are other methods of doing the same or similar, but this is quick and simple.
Related
I have a utility function that I suspect is eating up a large portion of my application's execution time. Using Time Profiler to look at the call stack, this function takes up a large portion of the execution time of any function from which it is called. However, since this utility function is called from many different sources, I am having trouble determining if, overall, this is the best use of my optimization time.
How can I look at total time spent in this function during program execution, regardless of who called it?
For clarity, I want to combine the selected entries with all other calls to that function into a single entry:
For me, what does the trick is ticking "Invert Call Tree". It seems to sort "leaf" functions in the call tree in order of those that cumulate the most time, and allow you to see what calls them.
The checkbox can be found in the right panel, called "Display Settings" (If hidden: ⌘2 or View->Inspectors->Show Display Settings)
I am not aware of an instruments based solution but here is something you can do from code. Hope somebody provides an instruments solution but until then to get you going here goes.
#include <time.h>
//have this as a global variable to track time taken by the culprit function
static double time_consumed = 0;
void myTimeConsumingFunction(){
//add these lines in the function
clock_t start, end;
start = clock();
//main body of the function taking up time
end = clock();
//add this at the bottom and keep accumulating time spent across all calls
time_consumed += (double)(end - start) / CLOCKS_PER_SEC;
}
//at termination/end-of-program log time_consumed.
To see the totals for a particular function, follow these steps:
Profile your program with Time Profiler
Find and select any mention of the function of interest in the Call Tree view (you can use Edit->Find)
Summon the context menu over the selected function and 'Focus on calls made by ' (Or use Instrument->Call Tree Data Mining->Focus on Calls Made By )
If your program is multi-threaded and you want a total across all threads, make sure 'Separate by Thread' is not checked.
I can offer the makings of the answer you're looking for but haven't got this working within Instruments yet...
Instruments uses dtrace under the hood. dtrace allows you to respond to events in your program such as a function being entered or returned from. The response to each event can be scripted.
You can create a custom instrument with scripting in Instruments.
Here is a noddy shell script that launches dtrace outside of Instruments and records the time spent in a certain function.
#!/bin/sh
dtrace -c <yourprogram> -n '
unsigned long long totalTime;
self uint64_t lastEntry;
dtrace:::BEGIN
{
totalTime = 0;
}
pid$target:<yourprogram>:*<yourfunction>*:entry
{
self->lastEntry = vtimestamp;
}
pid$target:<yourprogram>:*<yourfunction>*:return
{
totalTime = totalTime + (vtimestamp - self->lastEntry);
/*#timeByThread[tid] = sum(vtimestamp - self->lastEntry);*/
}
dtrace:::END
{
printf( "\n\nTotal time %dms\n" , totalTime/1000000 )
}
'
What I haven't figured out yet is how to transfer this into instruments and get the results to appear in a useful way in the GUI.
I think you can call system("time ls"); twice and it will just work for you. The output will be printed on debug console.
I have some kind of "main loop" using glut. I'd like to be able to measure how much time it takes to render a frame. The time used to render a frame might be used for other calculations. The use of the function time isn't adequate.
(time (procedure))
I found out that there is a function called current-time. I had to import some package to get it.
(define ct (current-time))
Which define ct as a time object. Unfortunately, I couldn't find any arithmetic packages for dates in scheme. I saw that in Racket there is something called current-inexact-milliseconds which is exactly what I'm looking for because it has nanoseconds.
Using the time object, there is a way to convert it to nanoseconds using
(time->nanoseconds ct)
This lets me do something like this
(let ((newTime (current-time)))
(block)
(print (- (time->nanoseconds newTime) (time->nanoseconds oldTime)))
(set! oldTime newTime))
Seems good enough for me except that for some reasons it was printing things like this
0
10000
0
0
10000
0
10000
I'm rendering things using opengl and I find it hard to believe that some rendering loop are taking 0 nanoseconds. And that each loop is quite stable enough to always take the same amount of nanoseconds.
After all, your results are not so surprising because we have to consider the limited timer resolution for each system. In fact, there are some limits that depend in general by the processor and by the OS processes. These are not able to count in an accurate manner than we expect, despite a quartz oscillator can reach and exceed a period of a nanosecond. You are also limited by the accuracy and resolution of the functions you used. I had a look at the documentation of Chicken scheme but there is nothing similar to (current-inexact-milliseconds) → real? of Racket.
After digging around, I came with the solution that I should write it in C and bind it to scheme using bindings.
(require-extension bind)
(bind-rename "getTime" "current-microseconds")
(bind* #<<EOF
uint64_t getTime();
#ifndef CHICKEN
#include <sys/time.h>
uint64_t getTime() {
struct timeval tim;
gettimeofday(&tim, NULL);
return 1000000 * tim.tv_sec + tim.tv_usec;
}
#endif
EOF
)
Unfortunately this solution isn't the best because it will be chicken-scheme only. It could be implemented as a library but a library to wrap only one function that doesn't exists on any other scheme doesn't make sense.
Since nanoseconds doens't actually make much sense after all, I got the microseconds instead.
Watch the trick here, define the function to get wrapped above and prevent the include to get parsed by bind. When the file will get loaded in Gcc, it will build with the include and function definition.
CHICKEN has current-milliseconds: http://api.call-cc.org/doc/library/current-milliseconds
I'm planning on making a clock. An actual clock, not something for Windows. However, I would like to be able to write most of the code now. I'll be using a PIC16F628A to drive the clock, and it has a timer I can access (actually, it has 3, in addition to the clock it has built in). Windows, however, does not appear to have this function. Which makes making a clock a bit hard, since I need to know how long it's been so I can update the current time. So I need to know how I can get a pulse (1Hz, 1KHz, doesn't really matter as long as I know how fast it is) in Windows.
There are many timer objects available in Windows. Probably the easiest to use for your purposes would be the Multimedia Timer, but that's been deprecated. It would still work, but Microsoft recommends using one of the new timer types.
I'd recommend using a threadpool timer if you know your application will be running under Windows Vista, Server 2008, or later. If you have to support Windows XP, use a Timer Queue timer.
There's a lot to those APIs, but general use is pretty simple. I showed how to use them (in C#) in my article Using the Windows Timer Queue API. The code is mostly API calls, so I figure you won't have trouble understanding and converting it.
The LARGE_INTEGER is just an 8-byte block of memory that's split into a high part and a low part. In assembly, you can define it as:
MyLargeInt equ $
MyLargeIntLow dd 0
MyLargeIntHigh dd 0
If you're looking to learn ASM, just do a Google search for [x86 assembly language tutorial]. That'll get you a whole lot of good information.
You could use a waitable timer object. Since Windows is not a real-time OS, you'll need to make sure you set the period long enough that you won't miss pulses. A tenth of a second should be safe most of the time.
Additional:
The const LARGE_INTEGER you need to pass to SetWaitableTimer is easy to implement in NASM, it's just an eight byte constant:
period: dq 100 ; 100ms = ten times a second
Pass the address of period as the second argument to SetWaitableTimer.
Let's say I have a contrived program:
#include <Windows.h>
void useless_function()
{
Sleep(5000);
}
void useful_function()
{
// ... do some work
useless_function();
// ... do some more work
}
int main()
{
useful_function();
return 0;
}
Objective: I want the profiler to tell me useful_function() is needlessly calling useless_function() which waits for no obvious reasons. Under XPerf, this doesn't show up in any of the graphs I have because the call to WaitForMultipleObjects() seem to be accounted to Idle.exe instead of my own program.
And here's the xperf command line that I currently run:
xperf -on Latency -stackwalk Profile
Any ideas?
(This is not restricted to wait functions. The above might have been solved by placing breakpoints at NtWaitForMultipleObjects. Ideally there could be a way to see the stack sample that's taking up a lot of wall-clock time as opposed to only CPU time)
I think what you are looking for is the Wait analysis with Ready Thread functionality in Xperf. It captures every context switch and gives you the call stack of the thread once it wakes up from sleep (or an otherwise blocked operation). In your case, you would see the stack just after the call sleep(5000) as well as the time spend sleeping.
The functionality is a bit obscure to use. But it is fortunately well described here:
Use Xperf's Wait Analysis for Application-Performance Troubleshooting
Wait Analysis is the way to do this. You should:
Record the CSWITCH provider, in order to get all context switches
Record call stacks on context switches by adding +CSWITCH to your -stackwalk argument
Probably record call stacks on the ready thread to get more information on who readied you (i.e.; who released the Mutex or CS or semaphore and where) by adding +READYTHREAD to your -stackwalk
Then you use CPU Usage (Precise) in WPA (or xperfview, but that's ancient) to look at the context switches and find where your TimeSinceLast is high on a thread that shouldn't be going idle. You'll typically want the columns in CPU Usage (Precise) in this sort of order:
NewProcess (your process being switched in)
NewThreadId
NewThreadStack
ReadyingProcess (who made your thread ready to run)
ReadyingThreadId (optional)
ReadyThreadStack (optional, requires +ReadyThread on -stackwalk)
Orange bar
Count
TimeSinceLast (us) - sort by this column, usually
Whatever other columns you want
For details see these particular articles from my blog:
- https://randomascii.wordpress.com/2014/08/19/etw-training-videos-available-now/
- https://randomascii.wordpress.com/2012/06/19/wpaxperf-trace-analysis-reimagined/
This "profiler" will tell you - just randomly pause it a few times and look at the stack. If do some work takes 5 seconds, and do some more work takes 5 seconds, then 33% of the time the stack will look like this
main: calling useful_function
useful_function: calling useless_function
useless_function: calling Sleep
So roughly 33% of your stack samples will show exactly that. Any line of code that's costing some fraction of wall-clock time will appear on roughly that fraction of samples.
On the rest of the samples you will see it doing the other things.
There are automated profilers that do the same thing in a more pretty way, such as Zoom and LTProf, although they don't actually show you the samples.
I looked at the xperf doc, trying to figure out if you could get stack samples on wall-clock time and get percents at line-level resolution. It seems you gotta be on Windows 7 or Vista. They only bother with functions, not lines, which if you have realistically big functions, is important. I couldn't figure out how to get access to the individual samples, which I think is important for seeing why the program is spending its time.
After noticing some timing descrepencies with events in my code, I boiled the problem all the way down to my Windows Message Loop.
Basically, unless I'm doing something strange, I'm experiencing this behaviour:-
MSG message;
while (PeekMessage(&message, _applicationWindow.Handle, 0, 0, PM_REMOVE))
{
int timestamp = timeGetTime();
bool strange = message.time > timestamp; //strange == true!!!
TranslateMessage(&message);
DispatchMessage(&message);
}
The only rational conclusion I can draw is that MSG::time uses a different timing mechanism then timeGetTime() and therefore is free to produce differing results.
Is this the case or am i missing something fundemental?
Could this be a signed unsigned issue? You are comparing a signed int (timestamp) to an unsigned DWORD (msg.time).
Also, the clock wraps every 40ish days - when that happens strange could well be true.
As an aside, if you don't have a great reason to use timeGetTime, you can use GetTickCount here - it saves you bringing in winmm.
The code below shows how you should go about using times - you should never compare the times directly, because clock wrapping messes that up. Instead you should always subtract the start time from the current time and look at the interval.
// This is roughly equivalent code, however strange should never be true
// in this code
DWORD timestamp = GetTickCount();
bool strange = (timestamp - msg.time < 0);
I don't think it's advisable to expect or rely on any particular relationship between the absolute values of timestamps returned from different sources. For one thing, the multimedia timer may have a different resolution from the system timer. For another, the multimedia timer runs in a separate thread, so you may encounter synchronisation issues. (I don't know if each CPU maintains its own independent tick count.) Furthermore, if you are running any sort of time synchronisation service, it may be making its own adjustments to your local clock and affecting the timestamps you are seeing.
Are you by any chance running an AMD dual core? There is an issue where since each core has a separate timer and can run at different speeds, the timers can diverge from each other. This can manifest itself in negative ping times, for example.
I had similar issues when measuring timeouts in different threads using GetTickCount().
Install this driver (IIRC) to resolve the issue.
MSG.time is based on GetTickCount(), and timeGetTime() uses the multimedia timer, which is completely independent of GetTickCount(). I would not be surprised to see that one timer has 'ticked' before the other.