does gFortran's cpu_time() return user time, system time, or the sum of both? - gcc

I need to do some timing to compare the performance of some Fortran Vs C code.
In C I can get both user time and system time independently.
When using gFortran's cpu_time() what does it represent?
With in IBM's fortran compiler one can choose what to output by setting an environment variable (see CPU_TIME() )
I found no reference to something similar in gFortran's documentation.
So, does anybody know if gFortran's cpu_time() returns user time, system time, or the sum of both?

Gfortran CPU_TIME returns the sum of the user and system time.
On MINGW it uses GetProcessTimes(), on other platforms getrusage() or if getrusage() is not available, times().
See
http://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libgfortran/intrinsics/cpu_time.c;h=619f8d25246409e0f32c96299db724213aa62b45;hb=refs/heads/master
and
http://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libgfortran/intrinsics/time_1.h;h=12d79ebc12fecf52baa0895c7ab8accc41dab500;hb=refs/heads/master
FWIW, if you wish to measure the wallclock time rather than CPU time, please use the SYSTEM_CLOCK intrinsic instead of CPU_TIME.

my guess: the total of user and system time, otherwise it would be mentioned? Probably depends on the OS anyway, maybe not all of them make the distinction. As far a s I know, CPU time is the time which the OS assigns to your process, be it in user mode or in kernel mode executed on behalf of the process.
Is it important for you to have that distinction?
For performance comparison, I would probably go for wall-time anyway, and use CPU time to guess how much I/O it is doing by subtracting it from the wall-time.

If you need wallclock time, you may use date_and_time, http://gcc.gnu.org/onlinedocs/gcc-4.0.2/gfortran/DATE_005fAND_005fTIME.html
I'm not sure how standard it is, but in my experience it works on at least four different platforms, including exotic Cray designs.
One gotcha here is to take care of the midnight, like this:
character*8 :: date
character*10 :: time
character*5 :: zone
integer :: tvalues(8)
real*8 :: time_prev, time_curr, time_elapsed, time_limit
integer :: hr_curr, hr_prev
! set the clock
call date_and_time(date, time, zone, tvalues)
time_curr = tvalues(5)*3600.d0 + tvalues(6)*60.d0 + tvalues(7) ! seconds
hr_curr = tvalues(5)
time_prev=0.d0; time_elapsed = 0.d0; hr_prev = 0
!... do something...
time_prev = time_curr; hr_prev = hr_curr
call date_and_time(date, time, zone, tvalues)
time_curr = tvalues(5)*3600.d0 + tvalues(6)*60.d0 + tvalues(7) ! seconds
hr_curr = tvalues(5)
dt = time_curr - time_prev
if( hr_curr < hr_prev )dt = dt + 24*3600.d0 ! across the midnight
time_elapsed = time_elapsed + dt

#Emanual Ey - In continuation to your comment on #steabert's post - (what follows goes for Intel's; I don't know whether something differs on other compilers). User cpu time + system cpu time should equal cpu time. Elapsed, real, or "wall clock" time should be greater than total charged cpu time. To measure wallclock time, it is best to put the time command, before and after the tricky part. Ugh, I'm gonna make this more complicated than it should be. Could you read the part on Timing your application on Intel's manual page (you'll have to find the "Timing your application" in the index). Should clear up a few things.
As I said before, that goes for Intel's. I don't have access to gfortran's compiler.

Related

Modelica total time calculation of simulation and equation initialization

I would like to measure the total simulation and initialization time of a system of DAEs. I am interested in the wall-clock time (like the one given in Matlab by the function tic-toc).
I noticed in Modelica there are different flags for the simulation time but actually the time I get is very small compared to the time that elapses since I press the simulation button to the end of the simulation (approximately measured with the clock of my phone).
I guess this short time is just the time required for the simulation and it does not include the initialization of the system of eqs.
Is there a way to calculate this total time?
Thank you so much in advance,
Gabriele
Dear Marco,
Thank you so much for your extremely detailed and useful reply!
I am actually using OpenModelica and not Dymola so unfortunately I have to build the function that does it for me and I am very new with OpenModelica language.
So far, I have a model that simulate the physical behavior based on a DAEs. Now, I am trying to build what you suggest here:
With get time() you can build a function that: reads the system time as t_start translates the model and simulate for 0 seconds reads the system time again and as t_stop computes the difference between t_start and t_stop.
Could you please, give me more details: Which command can I use to read the system at time t_start and to simulate it for 0 seconds? To do this for both t_start and t_stop do I need to different function?
Once I have done this, do I have to call the function (or functions) inside the OpenModelica Model of which I want to know its time?
Thank you so much again for your precious help!
Very best regards, Gabriele
Depending on the tool you have, this could mean a lot of work.
The first problem is that the MSL allows you to retrieve the system time, but there is nothing included to easily compute time deltas. Therefore the Testing library in Dymola features the operator records DateTime and Duration. Note, that it is planned to integrate them in future MSL versions, but at the moment this is only available via the Testing library for Dymola users.
The second problem is that there is no standardized way to translate and simulate models. Every tools has its own way to do that from scripts. So without knowing what tool you are using, it's not possible to give an exact answer.
What Modelica offers in the MSL
In the current Modelica Standard Library version 3.2.3 you can read the actual system time via Modelica.Utilities.System.getTime().
This small example shows how to use it:
function printSystemTime
protected
Integer ms, s, min, h, d, mon, a;
algorithm
(ms, s, min, h, d, mon, a) := Modelica.Utilities.System.getTime();
Modelica.Utilities.Streams.print("Current time is: "+String(h)+":"+String(min)+":"+String(s));
end printSystemTime;
You see it gives the current system date and time via 7 return values. These variables are not very nice to deal with if you want to compute a time delta, as you will end up with 14 variables, each with its own value range.
How to measure translation and simulation time in general
With gettime() you can build a function that:
reads the system time as t_start
translates the model and simulate for 0 seconds
reads the system time again and as t_stop
computes the difference of t_start and t_stop.
Step 2 depends on the tool. In Dymola you would call
DymolaCommands.SimulatorAPI.simulateModel("path-to-model", 0, 0);
which translates your model and simulates it for 0 seconds, so it only runs the initialization section.
For Dymola users
The Testing library contains the function Testing.Utilities.Simulation.timing, which does almost exactly what you want.
To translate and simulate your model call it as follows:
Testing.Utilities.Simulation.timing(
"Modelica.Blocks.Examples.PID_Controller",
task=Testing.Utilities.Simulation.timing.Task.fullTranslate_simulate,
loops=3);
This will translate your model and simulate for 1 second three times and compute the average.
To simulate for 0s, duplicate the function and change this
if simulate then
_ :=simulateModel(c);
end if;
to
if simulate then
_ :=simulateModel(c, 0, 0);
end if;

estimate performance gain based on application profiling (math)

this question is rather "math" related - but certainly is of interest to any "software developer".
i have done some profiling of my application. and i have observed there is a huge performance difference that is "environment specific".
there is a "fast" environment and a "slow" environment.
the overall application performance on "fast" is 5 times faster than on "slow".
a particular function call on "fast" is 18 times faster than on "slow".
so let's assume i will be able to reduce invoking this particular function by 50 percent.
how do i calculate the estimated performance improvement on the "slow" environment?
is there any approximate formula for calculating the expected performance gain?
apologies:
nowadays i'm no longer good at doing any math. or rather i never was!
i have been thinking about where to ask such question, best.
didn't come up with any more suitable place.
also, i wasn't able to come up with an optimal question's subject line and also what tags to assign ...
Let's make an assumption (questionable but we have nothing else to go on).
Let's assume all of the 5:1 reduction in time is due to function foo reducing by 18:1.
That means everything else in the program takes the same amount of time.
So suppose in the fast environment the total time is f + x, where f is the time that foo takes in the fast environment, and x is everything else.
In the slow environment, the time is 18f+x, which equals 5(f+x).
OK, solve for x.
18f+x = 5f+5x
13f = 4x
x = 13/4 f
OK, now on the slow environment you want to call foo half as much.
So then the time would be 9f+x, which is:
9f + 13/4 f = 49/4 f
The original time was 18f+x = (18+13/4)f = 85/4 f
So the time goes from 85/4 f to 49/4 f.
That's a speed ratio of 85/49 = 1.73
In other words, that's a speedup of 73%.

Resolution of a nanoTime() Timer

I have searched the postings for an answer related to the time which the System.nanoTime( ) method call takes to process.
Consider the following code:
long lastTime = System.nanoTime();
long currentTime = System.nanoTime();
long deltaTime = currentTime - lastTime;
If you run this, currentTime - lastTime will evaluate to '0'. The only way for this to happen is if the computer processed that second method call outside of the resolution of a nanosecond (i.e. the call took less than a nanosecond). Logically this makes sense, because a computer can (on average) perform multiple processes in a single nanosecond.
Is this correct? If not, where is my logic wrong?
well theoretically it will print out "0". In some cases it may have a slight variation if you have a lot of other tasks executing.
This is theoretically correct, assuming that there are absolutely no other processes running and that the call to the Java methods has no delay. So on a real system it is impossible to achieve as calls to the Java methods will introduce significant delays. As I stated in my comment, I can achieve around 0.015 millisecond difference on a dual core Android 4.0.4 phone.

TickCount() deprecated in Mac OS X 10.8

I am using TickCount() to determine the time difference between events or time required to run a certain piece of code. But it is deprecated in OS X 10.8.
Therefore, I needed an alternative for the same.
If you want to measure absolute time, use gettimeofday(). This gives you the date, e.g., "Thu Nov 22 07:48:52 UTC 2012". This is not always suitable for measuring differences between events because the time reported by gettimeofday() can jump forwards or backwards if the user changes the clock.
If you want to measure relative time, mach_absolute_time(). This lets you measure the difference between two events, e.g., "15.410 s". This does not give absolute times, but is always monotonic.
If you want to measure CPU time, use clock(). This is often but not always the way you measure the performance of a piece of code. It doesn't count time spent on IO, or impact on system speed, so it should only be used when you know you are measuring something CPU bound.
I'm surprised that TickCount() wasn't deprecated earlier. It's really an OS 9 and earlier thing.
While this API may not be suitable for new development, if you find yourself in need of an identical API, it can be re-implemented as follows:
uint32_t TickCount() {
uint64_t mat = mach_absolute_time();
uint32_t mul = 0x80d9594e;
return ((((0xffffffff & mat) * mul) >> 32) + (mat >> 32) * mul) >> 23;
}
The above implementation was created through analysis of /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/CarbonCore.framework/Versions/A/CarbonCore, and was briefly unit-tested against the deprecated TickCount with LLDB by altering the registers returned by mach_absolute_time.

TCL - how to know how much time a function has worked?

Say I have a proc and the proc consists of several statements and function calls. How I can know how much time the function has taken so far?
a very crude example would be something like:
set TIME_start [clock clicks -milliseconds]
...do something...
set TIME_taken [expr [clock clicks -milliseconds] - $TIME_start]
Using the time proc, you can do the following:
% set tt [time {set x [expr 23 * 34]}]
38 microseconds per iteration
To measure the time some code has taken, you either use time or clock.
The time command will run its script argument and return a description of how long the script took, in milliseconds (plus some descriptive text, which is trivial to chop off with lindex). If you're really doing performance analysis work, you can supply an optional count argument that makes the script be run repeatedly, but for just general monitoring you can ignore that.
The clock command lets you get various sorts of timestamps (as well as doing formatting, parsing and arithmetic with times). The coarsest is got with clock seconds, which returns the amount of time since the beginning of the Unix epoch (in seconds computed with civil time; that's what you want unless you're doing something specialized). If you need more detail, you should use clock milliseconds or clock microseconds. There's also clock clicks, but it's not typically defined what unit that's counting in (unless you pass the -milliseconds or -microseconds option). It's up to you to turn the timestamps into something useful to you.
If you're timing things on Tcl 8.4 (or before!) then you're constrained to using time, clock seconds or clock clicks (and even the -microseconds option is absent; there's no microsecond-resolution timer exposed in 8.4). In that case, you should consider upgrading to 8.5, as it's generally faster. Faster is Good! (If you're using pre-8.4, definitely upgrade as you're enormously behind on the support front.)
To tell how long a function has taken, you can either use the time command (wrapped around the function call) or use clock clicks to get the current time before and then during the function. The time option is simple but can only time a whole function (and will only give you a time when the function returns). Using clock clicks can be done several times, but you will need to subtract the current time from the starting time yourself.
In case your really looking for some kind of profiler, have a look at the profiler package in Tcllib:
http://tcllib.sourceforge.net/doc/profiler.html

Resources