Related
TL;DR
My program calls a subroutine twice. The subroutine performs basically identical operations both times, but takes significantly longer to run the second time. The only difference between both calls is that data produced in the first call is used as an input in the second call. This data is stored in allocatable arrays declared in a module before the first call to the subroutine.
Full Story
The following shows the relevant portions of my code for the problem:
program Economy
!! Declarations !!
use Modern_mod, only: Modern
use Globals, only: Na, Ny, Ne, Vimp, Xmp, Pmp, FCp, Vimu,& ! Globals is a module containing vbles.
& Xmu, Pmu, FCu
implicit none
real(kind=nag_wp) :: param(4)
!! Execution !!
! First call to modern !
param = (/1.0d0,2.0d0,3.0d0,4.0d0/)
allocate(Vimp(Na,Ne),FCp(4,Na*Ne),Pmp(Ny,Ne),Xmp(Ny,Ne))
call Modern(param,Vimp,FCp,Pmp,Xmp)
! Second call to modern !
param = (/5.0d0,6.0d0,7.0d0,8.0d0/)
allocate(Vimu(Na,Ne),FCu(4,Na*Ne),Pmu(Ny,Ne),Xmu(Ny,Ne))
call Modern(param,Vimu,FCu,Pmu,Xmu)
end program Economy
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
module Modern_mod
implicit none
contains
subroutine Modern(param,Vim,FCm,Pm,Xm)
!! Declarations !!
! Modules !
use Globals, only: Na, Ne, Ny
use FindVmp_mod, only: FindVmp
use FindVmu_mod, only: FindVmu
implicit none
! Declaring other variables !
real(kind=nag_wp), intent(in) :: param(4) ! param-Parameters specific to one of the modern sectors;
real(kind=nag_wp), intent(out), allocatable :: FCm(:,:), Xm(:,:),& ! FCm-Firm choices; Xm-Policy fun;
& Pm(:,:), Vim(:,:) ! Pm-Price of a share; Vim-Start of period value function;
real(kind=nag_wp), allocatable :: Vm1(:,:), Vim1(:,:), Pm1(:,:),& ! Vm1-Vm next guess; Pm1-Next share price guess;
& Vm(:,:) ! Vm-End of period value function; Vim1-Next Vim guess;
!! Execution !!
! Allocating and initializing functions !
allocate(Vim(Na,Ne),FCm(4,Na*Ne),Vm(Ny,Ne),Pm(Ny,Ne),Xm(Ny,Ne))
allocate(Vim1(Na,Ne),Vm1(Ny,Ne),Pm1(Ny,Ne))
! Inizializing arrays !
Vm = ...
Vim = ...
...
! Doing calculations !
if(param(1) .eq. 1.0d0) then
call FindVmp(FCm,Vim,Pm,Vm1,Pm1,Xm) ! New value funciton guess for productive guys
else
call FindVmu(FCm,Vim,Pm,Vm1,Pm1,Xm) ! New value funciton guess for unproductive guys
end if
end subroutine Modern
end module Modern_mod
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
module FindVmp_mod
implicit none
contains
subroutine FindVmp(FCm,Vim0,P0,Vm,P,x)
!! Declarations !!
use VmFp_mod, only: Calculations ! Contains the operations computing the final values of the outputs to FindVmp
implicit none
real(kind=nag_wp), allocatable, intent(out) :: Vm(:,:), x(:,:), P(:,:) ! Vm-New value function; x-Policy function; P-Share price;
real(kind=nag_wp), intent(in) :: P0(:,:), Vim0(:,:), FCm(:,:) ! P0-Initial share price guess; Vim-Initial guess for beginning of period value function;
! FCm-Firm choices;
!! Execution !!
! Allocate matrices !
allocate(Vm(Ny,Ne), x(Ny,Ne), P(Ny,Ne))
! Compute results !
call Calculations(FCm,Vim0,P0,Vm,P,x)
end subroutine FindVmp
end module FindVmp_mod
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
module FindVmu_mod
implicit none
contains
subroutine FindVmu(FCm,Vim0,P0,Vm,P,x)
!! Declarations !!
use Globals, only: Vmp, Pmp
use VmFu_mod, only: Calculations ! Contains the operations computing the final values of the outputs to FindVmu
implicit none
real(kind=nag_wp), allocatable, intent(out) :: Vm(:,:), x(:,:), P(:,:) ! Vm-New value function; x-Policy function; P-Share price;
real(kind=nag_wp), intent(in) :: P0(:,:), Vim0(:,:), FCm(:,:) ! P0-Initial share price guess; Vim-Initial guess for beginning of period value function;
! FCm-Firm choices;
!! Execution !!
! Allocate matrices !
allocate(Vm(Ny,Ne), x(Ny,Ne), P(Ny,Ne))
! Compute results !
call Calculations(FCm,Vim0,P0,Vm,P,x,Vmp,Pmp) ! Using the values of Vmp and Pmp computes in the first call to Modern
end subroutine FindVmu
end module FindVmu_mod
Each run, Modern is fed different arrays of the same size and type (*p and *u respectively) which are declared in the module Globals. Modern similarly calls one of two very similar subroutines FindVm?, feeding them the corresponding arrays. FindVmp and FindVmu compute almost identical operations, only that the latter uses the values of Vimp, Pmp (computed in FindVmp) as inputs.
I've been trying to figure out why the second call to Modern takes up to an order of magnitude longer to complete compared to the first one.
My first guess was that maybe by allocating Vimp and Pmp at the beginning of the program, and allocating a bunch of other arrays afterwards, each reference to the former arrays might be costly because their memory addresses were far away from the arrays currently being computed in FindVmu (for reference, Na = 101, Ny = 91, Ne = 9). But then I read that allocated arrays are stored in the heap, and that data in the heap isn't stacked (no pun intended) so that this was not necessarily the origin of my problem. As a matter of fact, I've tried allocating all matrices at different points and in different orders in Modern, but I get roughly the same execution times.
In the same spirit, I've tried to vary how I declare some of the arrays in different subroutines (e.g. making some automatic instead of allocatable and using the compiler (ifort18) option to force them on the stack) and although I do get overall performance variations throughout the code, the relative performance of the two calls to Modern does not change.
Finally, I read in this thread that the more arrays you have in memory, the slower your code generally becomes. Although the explanation does make sense to me, this is the first time I experience such a significant performance loss throughout a fortran program. If this were actually the problem I'm facing, I would have expected to run into it in any number of my previous projects. Is this nonetheless a plausible cause of what is happening here?
And I'm basically out of ideas...
Bonus Question
While we're at it, I've found that leaving out the following lines in Economy surprisingly (to me) does not lead to a segfault:
allocate(Vimp(Na,Ne),FCp(4,Na*Ne),Pmp(Ny,Ne),Xmp(Ny,Ne))
allocate(Vimu(Na,Ne),FCu(4,Na*Ne),Pmu(Ny,Ne),Xmu(Ny,Ne))
In words: if I don't manually allocate the arrays declared in Globals, it seems like the program does so automatically once I pass them to Modern. Is this standard behavior or was I just lucky when I was initially not allocating them myself?
The famous linear congruential random number generator also known as minimal standard use formula
x(i+1)=16807*x(i) mod (2^31-1)
I want to implement this using Fortran.
However, as pointed out by "Numerical Recipes", directly implement the formula with default Integer type (32bit) will cause 16807*x(i) to overflow.
So the book recommend Schrage’s algorithm is based on an approximate factorization of m. This method can still implemented with default integer type.
However, I am wondering fortran actually has Integer(8) type whose range is -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 which is much bigger than 16807*x(i) could be.
but the book even said the following sentence
It is not possible to implement equations (7.1.2) and (7.1.3) directly
in a high-level language, since the product of a and m − 1 exceeds the
maximum value for a 32-bit integer.
So why can't we just use Integer(8) type to implement the formula directly?
Whether or not you can have 8-byte integers depends on your compiler and your system. What's worse is that the actual value to pass to kind to get a specific precision is not standardized. While most Fortran compilers I know use the number of bytes (so 8 would be 64 bit), this is not guaranteed.
You can use the selected_int_kindmethod to get a kind of int that has a certain range. This code compiles on my 64 bit computer and works fine:
program ran
implicit none
integer, parameter :: i8 = selected_int_kind(R=18)
integer(kind=i8) :: x
integer :: i
x = 100
do i = 1, 100
x = my_rand(x)
write(*, *) x
end do
contains
function my_rand(x)
implicit none
integer(kind=i8), intent(in) :: x
integer(kind=i8) :: my_rand
my_rand = mod(16807_i8 * x, 2_i8**31 - 1)
end function my_rand
end program ran
Update and explanation of #VladimirF's comment below
Modern Fortran delivers an intrinsic module called iso_fortran_env that supplies constants that reference the standard variable types. In your case, one would use this:
program ran
use, intrinsic :: iso_fortran_env, only: int64
implicit none
integer(kind=int64) :: x
and then as above. This code is easier to read than the old selected_int_kind. (Why did R have to be 18 again?)
Yes. The simplest thing is to append _8 to the integer constants to make them 8 bytes. I know it is "old style" Fortran but is is portable and unambiguous.
By the way, when you write:
16807*x mod (2^31-1)
this is equivalent to take the result of 16807*x and use an and with a 32-bit mask where all the bits are set to one except the sign bit.
The efficient way to write it by avoiding the expensive mod functions is:
iand(16807_8*x, Z'7FFFFFFF')
Update after comment :
or
iand(16807_8*x, 2147483647_8)
if your super modern compiler does not have backwards compatibility.
This question is about a comment in this question
Recommended way to initialize srand? The first comment says that srand() should be called only ONCE in an application. Why is it so?
That depends on what you are trying to achieve.
Randomization is performed as a function that has a starting value, namely the seed.
So, for the same seed, you will always get the same sequence of values.
If you try to set the seed every time you need a random value, and the seed is the same number, you will always get the same "random" value.
Seed is usually taken from the current time, which are the seconds, as in time(NULL), so if you always set the seed before taking the random number, you will get the same number as long as you call the srand/rand combo multiple times in the same second.
To avoid this problem, srand is set only once per application, because it is doubtful that two of the application instances will be initialized in the same second, so each instance will then have a different sequence of random numbers.
However, there is a slight possibility that you will run your app (especially if it's a short one, or a command line tool or something like that) many times in a second, then you will have to resort to some other way of choosing a seed (unless the same sequence in different application instances is ok by you). But like I said, that depends on your application context of usage.
Also, you may want to try to increase the precision to microseconds (minimizing the chance of the same seed), requires (sys/time.h):
struct timeval t1;
gettimeofday(&t1, NULL);
srand(t1.tv_usec * t1.tv_sec);
Random numbers are actually pseudo random. A seed is set first, from which each call of rand gets a random number, and modifies the internal state and this new state is used in the next rand call to get another number. Because a certain formula is used to generate these "random numbers" therefore setting a certain value of seed after every call to rand will return the same number from the call. For example srand (1234); rand (); will return the same value. Initializing once the initial state with the seed value will generate enough random numbers as you do not set the internal state with srand, thus making the numbers more probable to be random.
Generally we use the time (NULL) returned seconds value when initializing the seed value. Say the srand (time (NULL)); is in a loop. Then loop can iterate more than once in one second, therefore the number of times the loop iterates inside the loop in a second rand call in the loop will return the same "random number", which is not desired. Initializing it once at program start will set the seed once, and each time rand is called, a new number is generated and the internal state is modified, so the next call rand returns a number which is random enough.
For example this code from http://linux.die.net/man/3/rand:
static unsigned long next = 1;
/* RAND_MAX assumed to be 32767 */
int myrand(void) {
next = next * 1103515245 + 12345;
return((unsigned)(next/65536) % 32768);
}
void mysrand(unsigned seed) {
next = seed;
}
The internal state next is declared as global. Each myrand call will modify the internal state and update it, and return a random number. Every call of myrand will have a different next value therefore the the method will return the different numbers every call.
Look at the mysrand implementation; it simply sets the seed value you pass to next. Therefore if you set the next value the same everytime before calling rand it will return the same random value, because of the identical formula applied on it, which is not desirable, as the function is made to be random.
But depending on your needs you can set the seed to some certain value to generate the same "random sequence" each run, say for some benchmark or others.
Short answer: calling srand() is not like "rolling the dice" for the random number generator. Nor is it like shuffling a deck of cards. If anything, it's more like just cutting a deck of cards.
Think of it like this. rand() deals from a big deck of cards, and every time you call it, all it does is pick the next card off the top of the deck, give you the value, and return that card to the bottom of the deck. (Yes, that means the "random" sequence will repeat after a while. It's a very big deck, though: typically 4,294,967,296 cards.)
Furthermore, every time your program runs, a brand-new pack of cards is bought from the game shop, and every brand-new pack of cards always has the same sequence. So unless you do something special, every time your program runs, it will get exactly the same "random" numbers back from rand().
Now, you might say, "Okay, so how do I shuffle the deck?" And the answer -- at least as far as rand and srand are concerned -- is that there is no way of shuffling the deck.
So what does srand do? Based on the analogy I've been building here, calling srand(n) is basically like saying, "cut the deck n cards from the top". But wait, one more thing: it's actually start with another brand-new deck and cut it n cards from the top.
So if you call srand(n), rand(), srand(n), rand(), ..., with the same n every time, you won't just get a not-very-random sequence, you'll actually get the same number back from rand() every time. (Probably not the same number you handed to srand, but the same number back from rand over and over.)
So the best you can do is to cut the deck once, that is, call srand() once, at the beginning of your program, with an n that's reasonably random, so that you'll start at a different random place in the big deck each time your program runs. With rand(), that really is the best you can do.
[P.S. Yes, I know, in real life, when you buy a brand-new deck of cards it's typically in order, not in random order. For the analogy here to work, I'm imagining that each deck you buy from the game shop is in a seemingly random order, but the exact same seemingly-random order as every other deck of cards you buy from that same shop. Sort of like the identically shuffled decks of cards they use in bridge tournaments.]
Addendum: For a very cute demonstration of the fact that for a given PRNG algorithm and a given seed value, you always get the same sequence, see this question (which is about Java, not C, but anyway).
The reason is that srand() sets the initial state of the random generator, and all the values that generator produces are only "random enough" if you don't touch the state yourself in between.
For example you could do:
int getRandomValue()
{
srand(time(0));
return rand();
}
and then if you call that function repeatedly so that time() returns the same values in adjacent calls you just get the same value generated - that's by design.
A simpler solution for using srand() for generating different seeds for application instances run at the same second is as seen.
srand(time(NULL)-getpid());
This method makes your seed very close to random as there is no way to guess at what time your thread started and the pid will be different also.
srand seeds the pseudorandom number generator. If you call it more than once, you will reseed the RNG. And if you call it with the same argument, it will restart the same sequence.
To prove it, if you do something simple like this, you will see the same number printed 100 times:
#include <stdlib.h>
#include <stdio.h>
int main() {
for(int i = 0; i != 100; ++i) {
srand(0);
printf("%d\n", rand());
}
}
It seems that every time rand() runs, it will set a new seed for the next rand().
If srand() runs multiple times, the problem is if the two running happen in one second (the time(NULL) does not change), the next rand() will be the same as the rand() right after the previous srand().
I want to learn why my code is not working as I expect. I mean I want to generate a double number between 0 and 1 and I have learnt that when I use
(double)rand() / RAND_MAX, it works well. However I read that srand(time(NULL))
changes each generated random number every time I compile. However When I use them together the program generates same random number all the time. Why does this happen? Thanks.
Here is my code:
//srand(time(NULL));
number = (double)rand() / (double)RAND_MAX;
The srand() function initializes the pseudo-random number generator. You can think of that like it is pointing to the rand() a number to start its 'calculations'. Every time you compile and run your program the srand() function gives your rand() function the seed of time(NULL) (which by the way is a very big number changing every second). If you don't use the srand(), your rand() will always return the same sequence of numbers because it is given by default a standart non-changing seed (number to start the 'calculations'). You can try to give your srand() a static parameter like: srand(1500) You will see that it will return different numbers but their sequence will again be the same every time u compile and run.
For more info read here:
http://www.cplusplus.com/reference/cstdlib/srand/
http://www.cplusplus.com/reference/cstdlib/rand/
I am coding using Fortran MPI and I need to get the run time of the program. Therefore I tried to use the WTIME() function but I am getting some strange results.
Part of the code is like this:
program heat_transfer_1D_parallel
implicit none
include 'mpif.h'
integer myid,np,rc,ierror,status(MPI_STATUS_SIZE)
integer :: N,N_loc,i,k,e !e = number extra points (filled with 0s)
real :: time,tmax,start,finish,dt,dx,xmax,xmin,T_in1,T_in2,T_out1,T_out2,calc_T,t1,t2
real,allocatable,dimension(:) :: T,T_prev,T_loc,T_loc_prev
call MPI_INIT(ierror)
call MPI_COMM_SIZE(MPI_COMM_WORLD,np,ierror)
call MPI_COMM_RANK(MPI_COMM_WORLD,myid,ierror)
...
t1 = MPI_WTIME()
time = 0.
do while (time.le.tmax)
...
end do
...
call MPI_BARRIER(MPI_COMM_WORLD,ierror)
t2 = MPI_WTIME()
call MPI_FINALIZE(ierror)
if(myid.eq.0) then
write(*,"(8E15.7)") T(1:N-e)
write(*,*)t2
write(*,*)t1
end if
And the output value for t1 and t2 is the same and a very big: 1.4240656E+09
Any ideas why? Thank you so much.
From the documentation: Return value: Time in seconds since an arbitrary time in the past. They didn't specify how far back ;-) Only t2-t1 is meaningful here...
Also, the return value of MPI_Wtime() is double precision! t1 and t2 are declared as single precision floats.
Building on the answer from Alexander Vogt, I'd like to add that many Unix implementations of MPI_WTIME use gettimeofday(2) (or similar) to retrieve the system time and then convert the returned struct timeval into a floating-point value. Timekeeping in Unix is done by tracking the number of seconds elapsed since the Epoch (00:00 UTC on 01.01.1970). While writing this, the value is 1424165330.897136 seconds and counting.
With many Fortran compilers, REAL defaults to single-precision floating point representation that can only hold 7.22 decimal digits, while you need at least 9 (more if subsecond precision is needed). The high-precision time value above becomes 1.42416538E9 when stored in a REAL variable. The next nearest value that can be represented by the type is 1.4241655E9. Therefore you cannot measure time periods shorter than (1.4241655 - 1.42416538).109 or 120 seconds.
Use double precision.