Timing a Fortran multithreaded program - time

I have a Fortran 90 program calling a multi threaded routine. I would like to time this program from the calling routine. If I use cpu_time(), I end up getting the cpu_time for all the threads (8 in my case) added together and not the actual time it takes for the program to run. The etime() routine seems to do the same. Any idea on how I can time this program (without using a stopwatch)?

Try omp_get_wtime(); see http://gcc.gnu.org/onlinedocs/libgomp/omp_005fget_005fwtime.html for the signature.

If this is a one-off thing, then I agree with larsmans, that using gprof or some other profiling is probably the way to go; but I also agree that it is very handy to have coarser timers in your code for timing different phases of the computation. The best timing information you have is the stuff you actually use, and it's hard to beat stuff that's output every single tiem you run your code.
Jeremia Wilcock pointing out omp_get_wtime() is very useful; it's standards compliant so should work on any OpenMP compiler - but it only has second resolution, which may or may not be enough, depending on what you're doing. Edited; the above was completely wrong.
Fortran90 defines system_clock() which can also be used on any standards-compliant compiler; the standard doesn't specify a time resolution, but gfortran it seems to be milliseconds and ifort seems to be microseconds. I usually use it in something like this:
subroutine tick(t)
integer, intent(OUT) :: t
call system_clock(t)
end subroutine tick
! returns time in seconds from now to time described by t
real function tock(t)
integer, intent(in) :: t
integer :: now, clock_rate
call system_clock(now,clock_rate)
tock = real(now - t)/real(clock_rate)
end function tock
And using them:
call tick(calc)
! do big calculation
calctime = tock(calc)
print *,'Timing summary'
print *,'Calc: ', calctime

Related

Lua and conditional "compilation" : need clarification

I understood that there is no preprocessor in LUA, so nothing like #define and so on.
But I'd like to have "debug" options. For example, I'd like an optional console debug like :
if do_debug then
function msg(s)
print(s)
end
else
function msg(s)
end
end
msg(string.format(".............",v1,v2,......))
It works, but I wonder what is the CPU cost in "no debug" mode.
The fact is that I call a lot of these msg() function with large strings, sometimes built and formated with a lot of variables. So I would like to avoid extra work. But I suppose that LUA is not clever enough to see that my function is empty, and that there's no need to build its parameter...
So is there a turnaround to avoid these extra costs in LUA ?
NB : you may say that the CPU cost is negligible, but I'm using this for a realtime audio process and CPU does matter in this case.
You can't avoid the creation and unwinding of the stack frame for the msg function.
But what you can improve, at least in the snippet shown, is moving the string.format call into msg:
if do_debug then
function msg(...)
print(string.format(...))
end
else
function msg() end
end
msg(".............",v1,v2,......)
Another approach, trading readability for performance, would be to always do the if do_debug right where you want to print the debug message. A conditional check is much faster than a function call.
But the only way to truly avoid the function call that I know of would be to compile your own version of the Lua interpreter, with a (at least rudimentary) preprocessor added to the parser.

Segmentation Fault (Core Dumped) in Fortran 90

I'm writing a Fortran 90 code (below) and I get a segfault (core dumped) error. What is Core Dumped and how do I fix it?
program make_pict
IMPLICIT NONE
INTEGER, PARAMETER :: REAL8=SELECTED_REAL_KIND(15,300)
INTEGER, SAVE :: nstp,npr,step
REAL(REAL8), SAVE :: r
REAL(REAL8), DIMENSION(:,:), ALLOCATABLE, SAVE :: f,fa
INTEGER :: xw,yw,x,y
REAL:: ax,ay
INTEGER, DIMENSION(250000) :: pxa
REAL(REAL8) :: s,s2
LOGICAL, SAVE :: initialized=.FALSE.
WRITE(*,*) 'give values ax,ay'
READ(*,*) ax,ay
xw = 256
yw = 256
OPEN(1,FILE='picture.pxa')
do x=0, xw-1
do y=0, yw-1
f(x,y)=(765./2)*(ax*(1-cos(2*3.14159*x*(1.0/xw)))+ay(1+cos(2*3.14159*y*(1.0/yw))))
end do
end do
WRITE(1,'(2I6)') xw,yw
ALLOCATE(f(0:xw-1,0:yw-1),fa(0:xw-1,0:yw-1))
DO y=0,yw-1
WRITE(1,'(256I4)') (f(x,y),x=0,xw-1)
END DO
CLOSE(1)
initialized=.TRUE.
step=0
nstp=100
end program make_pict
You are attempting to set f before it's allocated. You need the allocate statement before the double loop which sets it! One way to solve this problem yourself is to put output statements everywhere, which would pinpoint the location of the error.
Some other problems I noticed:
You're missing a * in ay(. I'm surprised this code compiled for you, actually.
Why are you using such a low-precision value for pi? You're requesting precision to the 15th decimal but your value for pi only goes to 6?
What is the purpose of step, nstp, and initialized? I guess they're for features to be implemented? You should strive to provide a minimal, complete, and verifiable example.
Adding the save attribute doesn't do anything here. You should read about what it actually does, but it's typically not needed. In a program it definitely does nothing.
To answer your second question, segfaults can occur for many reasons. Core dumped only refers to the system's handling of the segmentation fault. There are many causes of segmentation faults; attempting to access an unallocated array is one of them.

Is there a way to remove "getKey"'s input lag?

I've recently decided to try ti-basic programming, and while I was playing with getKey; I noticed that it had a 1s~ input lag after the first input. Is this built into the calculator, or can this be changed?
I recognize that "Quick Key" code above ;) (I'm the original author and very glad to see it spread around!).
Anyway, here is my low-level knowledge of the subject:
The operating system uses what is known as an interrupt in order to handle reading the keyboard, link port, USB port, and the run indicator among other things. The interrupt is just software code, nothing hardware implemented. So it is hardwired into the OS not the calculator.
The gist of the code TI uses is that once it reads that a key press occurred, it resets a counter to 50 and decrements it so long as the user holds down the key. Once the counter reaches zero, it tells getKey to recognize it as a new keypress and then it resets the counter to 10. This cause the initial delay to be longer than subsequent delays.
The TI-OS allows third party "hooks" to jump in and modify the getkey process and I used such a hook in another more complicated program (Speedy Keys). However, this hook is never called during BASIC program execution except at a Pause or Menu( command, where it isn't too helpful.
Instead what we can do is setup a parser hook that modifies the getkey counters. Alternatively, you can use the QuickKey code above, or you can use Hybrid BASIC which requires you to download a third-party App. A few of these apps (BatLib [by me], Celtic 3, DoorsCS7, and xLIB) offer a very fast getKey alternative as well as many other powerful functions.
The following is the code for setting up the parser hook. It works very well in my tests! See notes below:
#include "ti83plus.inc" ; ~~This column is the stuff for manually
_EnableParserHook = 5026h ; creating the code on calc. ~~
.db $BB,$6D ;AsmPrgm
.org $9D95 ;
ld hl,hookcode ;21A89D
ld de,appbackupscreen ;117298
ld bc,hookend-hookcode ;010A00
ldir ;EDB0
ld hl,appbackupscreen ;217298
ld a,l ;7D
bcall(_EnableParserHook);EF2650
ret ;C9
hookcode: ;
.db 83h ;83
push af ;F5
ld a,1 ;3E01
ld (8442h),a ;324284
pop af ;F1
cp a ;BF
ret ;C9
hookend: ;
Notes: other apps or programs may use parser hooks. Using this program will disable those hooks and you will need to reinstall them. This is pretty easy.
Finally, if you manually putting this on your calculator, use the right column code. Here is an animated .gif showing how to make such a program:
You will need to run the program once either on the homescreen or at the start of your main program. After this, all getKeys will have no delay.
I figured out this myself too when I was experimenting with my Ti-84 during the summer. This lag cannot be changed. This is built into the calculator. I think this is because of how the microchip used in ti-84 is a Intel Zilog Z80 microprocessor which was made in 1984.
This is unfortunately simply the inefficiency of the calculator. TI-basic is a fairly high-level language and meant to be easy to use and is thus not very efficient or fast. Especially with respect to input and output, i.e. printing messages and getting input.
Quick Key
:AsmPrgm3A3F84EF8C47EFBF4AC9
This is a getKey routine that makes all keys repeat, not just arrows and there is no delay between repeats. The key codes are different, so you might need to experiment.

getting system time in Vxworks

is there anyways to get the system time in VxWorks besides tickGet() and tickAnnounce? I want to measure the time between the task switches of a specified task but I think the precision of tickGet() is not good enough because the the two tickGet() values at the beggining and the end of taskSwitchHookAdd function is always the same!
If you are looking to try and time task switches, I would assume you need a timer at least at the microsecond (us) level.
Usually, timers/clocks this fine grained are only provided by the platform you are running on. If you are working on an embedded system, you can try and read thru the manuals for your board support package (if there is one) to see if there are any functions provided to access various timers on a board.
A more low level solution would be to figure out the processor that is running on your system and then write some simple assembly code to poll the processor's internal timebase register (TBR). This might require a bit of research on the processor you are running on, but could be easily done.
If you are running on a PPC based processor, you can use the code below to read the TBR:
loop: mftbu rx #load most significant half from TBU
mftbl ry #load least significant half from TBL
mftbu rz #load from TBU again
cmpw rz,rx #see if 'old' = 'new'
bne loop #repeat if two values read from TBU are unequal
On an x86 based processor, you might consider using the RDTSC assembly instruction to read the Time Stamp Counter (TSC). On vxWorks, pentiumALib has some library functions (pentiumTscGet64() and pentiumTscGet32()) that will make reading the TSC easier using C.
source: http://www-inteng.fnal.gov/Integrated_Eng/GoodwinDocs/pdf/Sys%20docs/PowerPC/PowerPC%20Elapsed%20Time.pdf
Good luck!
It depends on what platform you are on, but if it is x86 then you can use:
pentiumTscGet64();

Best way to measure elapsed time in Scheme

I have some kind of "main loop" using glut. I'd like to be able to measure how much time it takes to render a frame. The time used to render a frame might be used for other calculations. The use of the function time isn't adequate.
(time (procedure))
I found out that there is a function called current-time. I had to import some package to get it.
(define ct (current-time))
Which define ct as a time object. Unfortunately, I couldn't find any arithmetic packages for dates in scheme. I saw that in Racket there is something called current-inexact-milliseconds which is exactly what I'm looking for because it has nanoseconds.
Using the time object, there is a way to convert it to nanoseconds using
(time->nanoseconds ct)
This lets me do something like this
(let ((newTime (current-time)))
(block)
(print (- (time->nanoseconds newTime) (time->nanoseconds oldTime)))
(set! oldTime newTime))
Seems good enough for me except that for some reasons it was printing things like this
0
10000
0
0
10000
0
10000
I'm rendering things using opengl and I find it hard to believe that some rendering loop are taking 0 nanoseconds. And that each loop is quite stable enough to always take the same amount of nanoseconds.
After all, your results are not so surprising because we have to consider the limited timer resolution for each system. In fact, there are some limits that depend in general by the processor and by the OS processes. These are not able to count in an accurate manner than we expect, despite a quartz oscillator can reach and exceed a period of a nanosecond. You are also limited by the accuracy and resolution of the functions you used. I had a look at the documentation of Chicken scheme but there is nothing similar to (current-inexact-milliseconds) → real? of Racket.
After digging around, I came with the solution that I should write it in C and bind it to scheme using bindings.
(require-extension bind)
(bind-rename "getTime" "current-microseconds")
(bind* #<<EOF
uint64_t getTime();
#ifndef CHICKEN
#include <sys/time.h>
uint64_t getTime() {
struct timeval tim;
gettimeofday(&tim, NULL);
return 1000000 * tim.tv_sec + tim.tv_usec;
}
#endif
EOF
)
Unfortunately this solution isn't the best because it will be chicken-scheme only. It could be implemented as a library but a library to wrap only one function that doesn't exists on any other scheme doesn't make sense.
Since nanoseconds doens't actually make much sense after all, I got the microseconds instead.
Watch the trick here, define the function to get wrapped above and prevent the include to get parsed by bind. When the file will get loaded in Gcc, it will build with the include and function definition.
CHICKEN has current-milliseconds: http://api.call-cc.org/doc/library/current-milliseconds

Resources