debugging old fortran code - debugging

I have an old Fortran code for calculation of Lyapunov exponent which I tried converting to modern Fortran syntax.
PROGRAM ODE
integer, PARAMETER :: N=3
integer, PARAMETER :: NN=12
EXTERNAL FCN
DIMENSION Y(NN),ZNORM(N),GSC(N),CUM(N),C(24),W(NN,9)
Y(1) = 10.0
Y(2) = 1.0
Y(3) = 0.0
! INITIAL CONDITIONS FOR LINEAR SYSTEM (ORTHONORMAL FRAME)
DO 10 I = N+1,NN
Y(I) = 0.0
10 CONTINUE
DO 20 I = 1,N
Y((N+1)*I) = 1.0
CUM(I) = 0.0
20 CONTINUE
! INTEGRATION TOLERANCE, # OF INTEGRATION STEPS,
! TIME PER STEP, AND I/O RATE
write (*,*) "TOL, NSTEP, STPSZE, IO ?"
read (*,*) TOL, NSTEP, STPSZE, IO
! INITIALIZATION FOR INTEGRATOR
NEQ = NN
X=0.0
IND = 1
DO 100 I = 1,NSTEP
XEND = STPSZE*FLOAT(I)
! CALL ANY ODE INTEGRATOR - THIS IS AN LMSL ROUTINE
CALL DVERK (NEQ,FCN,X,Y,XEND,TOL, IND,C,NEQ,W,IER)
! CONSTRUCT A NEW ORTHONORMAL BASIS BY GRAM-SCHMIDT METHOD
! NORMALIZE FIRST VECTOR
ZNORM(1) = 0.0
DO 30 J = 1,N
ZNORM(1) = ZNORM(1)+Y(N*J+1)**2
30 CONTINUE
ZNORM(1) = SQRT(ZNORM(1))
DO 40 J = 1,N
Y(N*J+1) = Y(N*J+1)/ZNORM(1)
40 CONTINUE
! GENERATE THE NEW ORTHONORMAL SET OF VECTORS.
DO 80 J = 2,N
! GENERATE J-1 GSR COEFFICIENTS.
DO 50 K = l,(J-l)
GSC(K) = 0.0
DO 50 L = 1,N
GSC(K) = GSC(K)+Y(N*L+J)*Y(N*L+K)
50 CONTINUE
! CONSTRUCT A NEW VECTOR.
DO 60 K = 1,N
DO 60 L = l,(J-l)
Y(N*K+J) = Y(N*K+J)-GSC(L)*Y(N*K+L)
60 CONTINUE
! CALCULATE THE VECTOR'S NORM
ZNORM(J) = 0.0
DO 70 K = I,N
ZNORM(J) = ZNORM(J)+Y(N*K+J)**2
70 CONTINUE
ZNORM(J) = SQRT(ZNORM(J))
! NORMALIZE THE NEW VECTOR.
DO 80 K = 1,N
Y(N*K+J) = Y(N*K+J)/ZNORM(J)
80 CONTINUE
! UPDATE RUNNING VECTORMAGNITUDES
DO 90 K = 1,N
CUM(K) = CUM(K)+ALOG(ZNORM(K) )/ALOG(2. )
90 CONTINUE
! NORMALIZE EXPONENT AND PRINT EVERY IO ITERATIONS
IF (MOD(I,IO).EQ.0) write (*,*) X,(CUM(K)/X,K = I,N)
100 CONTINUE
CALL EXIT
END
SUBROUTINE FCN (N,X,Y,YPRIME)
! USER DEFINED ROUTINE CALLED BY IMSL INTEGRATOR.
DIMENSION Y(12),YPRIME(12)
! LORENZ EQUATIONS OF MOTION
YPRIME(1) = 16.*(Y(2)-Y(1))
YPRIME(2) = -Y(1)*Y(3)+45.92*Y(1)-Y(2)
YPRIME(3) = Y(1)*Y(2)-4.*Y(3)
! 3 COPIES OF LINEARIZED EQUATIONS OF MOTION.
DO 10 I = 0,2
YPRIME(4+I) = 16.*(Y(7+I)-Y(4+I))
YPRIME(7+I) = (45.92-Y(3))*Y(4+I)-Y(7+I)-Y(1)*Y(10+I)
YPRIME(10+I) = Y(2)*Y(4+I)+Y(1)*Y(7+I)-4.*Y(10+I)
10 CONTINUE
RETURN
END
I have debugged most of this, but I am still left with a few errors that I am unable to get around. The error log says:
main.f95:44.14:
DO 50 L = 1,N
1
Warning: Obsolescent feature: Shared DO termination label 50 at (1)
main.f95:49.18:
DO 60 L = l,(J-l)
1
Warning: Obsolescent feature: Shared DO termination label 60 at (1)
main.f95:59.14:
DO 80 K = 1,N
1
Warning: Obsolescent feature: Shared DO termination label 80 at (1)
/tmp/ccfI69Sj.o: In function `MAIN__':
main.f95:(.text+0x296): undefined reference to `dverk_'
main.f95:(.text+0x844): undefined reference to `exit_'
collect2: error: ld returned 1 exit status
Could someone please help me out in resolving the errors?
Thanks.

It's just what the compiler states:
Shared DO termination label
The nested loop 50 uses the same termination label:
DO 50 K = l,(J-l)
GSC(K) = 0.0
DO 50 L = 1,N
GSC(K) = GSC(K)+Y(N*L+J)*Y(N*L+K)
50 CONTINUE
In modern Fortran, you should use separate enddo statements:
DO K = l,(J-l)
GSC(K) = 0.0
DO L = 1,N
GSC(K) = GSC(K)+Y(N*L+J)*Y(N*L+K)
ENDDO
ENDDO
This omits the loop-label, but in your code you don't need it (I guess).
The same needs to be done with loops 60 and 80
The real errors are the undefined references to dverk and exit. These subroutines are missing in your code, so I assume they are contained in external objects/libraries. You need to tell the compiler where to find them, or include them in your code (after the end of the program or inside a module).

Related

OpenMP on fortran 90 lasts almost the same(if not more) as non parallelized program

I'm trying to parallelize a simulation of an Ising 2D model to get some expected values as a function of the temperature of the system. For L=48, the one-threaded version takes about 240 seconds to run 20 temperatures and 1 seed each, but the parallelized version takes about 268 seconds, which is similar.
If you take the time per seed per temperature, it results in 12 seconds for the one-threaded version and 13.4 seconds for the parallelized version. I'm looking for help with my code because I don't understand these durations. I thought that the parallelized version would split one temperature among all threads and therefore should take about 30 seconds to complete.
I need to run the simulation for 50 temperatures and 200 seeds each, for 5 values of L. It would be helpful to reduce the compute time, because otherwise it could take 20 hours for L=48 and some days for L=72.
I'm using an i7-10700KF (8 cores, 16 logical threads).
program Ising
use omp_lib
implicit none
integer L, seed, i, j, seed0, nseed,k
parameter (L=48)
integer s(1:L, 1:L)
integer*4 pbc(0:L+1), mctot, N, mcd, mcini, difE
real*8 genrand_real2, magne, energ, energia, temp, temp1, DE
real*8 mag, w(-8:8)
real*8 start, finish
real*8 sum, sume, sume2, summ, summ2, sumam, vare, varm, maxcv, maxx
real*8 cv, x, Tmaxcv, Tmaxx
integer irand, jrand
11 format(10(f20.6))
! Initialize variables
mctot = 80000
mcd = 20
mcini = 8000
N = L*L
seed0 = 20347880
nseed = 20
maxcv=0.d0
maxx=0.d0
! Initialize vector pbc
pbc(0) = L
pbc(L+1) = 1
do i = 1, L
pbc(i) = i
end do
! Initialize matrix s with random values
do i = 1, L
do j = 1, L
if (genrand_real2() < 0.5) then
s(i,j) = 1
else
s(i,j) = -1
endif
end do
end do
! Metropolis algorithm
open(1, file='Expectation values.dat')
start = omp_get_wtime()
write(1,*) '#Temp, ','E, ','E2, ','M, ','M2, ','|M|, ','VarE, ','VarM, ',&
'Cv, ','X, '
!Start loop to calculate for different temperatures
!$OMP PARALLEL PRIVATE(s,seed,w,energia,difE,irand,jrand,temp,mag,sum,sume,sume2,summ,summ2,sumam,vare,varm,cv,x)
temp1 = 1.59d0
!$OMP DO ordered schedule(dynamic)
do k = 1, 10
temp = temp1 + (0.01d0*k)
!Define the matrix w, which contains the values of the Boltzmann function for each temperature, so as not to have to calculate them each iteration
do i = -8, 8
w(i) = dexp(-i/temp)
end do
write(*,*) "Temperature: ", temp, "Thread", omp_get_thread_num()
sum = 0.d0
sume = 0.d0
sume2 = 0.d0
summ = 0.d0
summ2 = 0.d0
sumam = 0.d0
do seed = seed0, seed0 + nseed-1, 1
call init_genrand(seed)
call reinicia(s,l)
energia = energ(s,l,pbc)
do i = 1, mctot
do j = 1, N
irand = int(genrand_real2()*L) + 1
jrand = int(genrand_real2()*L) + 1
difE = int(DE(s,l,irand,jrand,pbc))
if (difE < 0) then
s(irand,jrand) = -s(irand,jrand)
energia = energia + difE
else if (genrand_real2() < w(int(difE))) then
s(irand,jrand) = -s(irand,jrand)
energia = energia + difE
endif
end do
if ((i > mcini).and.(mcd*(i/mcd)==i)) then
mag= magne(s,l)
sum = sum + 1.d0
sume = sume + energia
sume2 = sume2 + energia**2
summ = summ + mag
summ2 = summ2 + mag**2
sumam = sumam + abs(mag)
endif
end do
end do
!Energy
sume=sume/(sum*N)
sume2=sume2/(sum*N*N)
!Magnetitzation
summ = summ/(sum*N)
sumam=sumam/(sum*N)
summ2=summ2/(sum*N*N)
!Variances
vare = dsqrt(sume2-sume*sume)/dsqrt(sum)
varm = dsqrt(summ2-summ*summ)/dsqrt(sum)
!Cv
cv = (N*(sume2-sume*sume))/temp**2
if (cv.gt.maxcv) then
maxcv=cv
Tmaxcv=temp
endif
!X
x = (N*(summ2-summ*summ))/temp
if (x.gt.maxx) then
maxx=x
Tmaxx=temp
endif
write(1,11) temp,sume,sume2,summ,summ2,sumam,vare,varm,cv,x
end do
!$OMP END DO
!$OMP END PARALLEL
finish = omp_get_wtime()
close(1)
print*, "Time: ",(finish-start),"Seconds"
end program Ising
! Functions
!Function that calculates the energy of the matrix s
real*8 function energ(S,L, pbc)
implicit none
integer s(1:L, 1:L), i, j, L
integer*4 pbc(0:L+1)
real*8 ene
ene = 0.0d0
do i = 1, L
do j = 1, L
ene = ene - s(i,j) * s(pbc(i+1),j) - s(i,j) * s(i,pbc(j+1))
end do
end do
energ = ene
return
end function energ
!Function that calculates the difference in energy that occurs when the spin of position (i, j) is changed
real*8 function DE(S,L,i,j,pbc)
implicit none
integer s(1:L, 1:L), i, j, L, difE
integer*4 pbc(0:L+1)
real*8 suma
difE = 0
suma = 0.0d0
suma = suma + s(pbc(i-1),j) + s(pbc(i+1),j) + s(i,pbc(j-1)) + s(i,pbc(j+1))
difE = difE + int(2 * s(i,j) * suma)
DE = difE
return
end function DE
!Function that calculates the magnetization of the matrix s
real*8 function magne(S,L)
implicit none
integer s(1:L, 1:L),L
magne = sum(s)
return
end function magne
! SUBRUTINES
!Subroutine that resets the matrix s with random values
subroutine reinicia(S,L)
implicit none
integer s(1:L, 1:L), i,j,L
real*8 genrand_real2
do i = 1, L
do j = 1, L
if (genrand_real2() < 0.5) then
s(i,j) = 1
else
s(i,j) = -1
endif
end do
end do
return
end subroutine
I have tried parallelizing the seeds loop instead of the temperatures, but it lasts almost the same, so i think i'm not parallelizing it correctly, because it looks a nice code to parallelize.
The other option I thought of is to manually parallelize the simulation. I could do this by compiling 16 programs, each of which handles a different range of temperatures. Then I could run all of the programs concurrently, so each program would get its own thread. However, this approach would require a lot of extra RAM.

How to reduce the allocations in Julia?

I am starting to use Julia mainly because of its speed. Currently, I am solving a fixed point problem. Although the current version of my code runs fast I would like to know some methods to improve its speed.
First of all, let me summarize the algorithm.
There is an initial seed called C0 that maps from the space (b,y) into an action space c, then we have C0(b,y)
There is a formula that generates a rule Ct from C0.
Then, using an additional restriction, I can obtain an updating of b [let's called it bt]. Thus,it generates a rule Ct(bt,y)
I need to interpolate the previous rule to move from the grid bt into the original grid b. It gives me an update for C0 [let's called that C1]
I will iterate until the distance between C1 and C0 is below a convergence threshold.
To implement it I created two structures:
struct Parm
lC::Array{Float64, 2} # Lower limit
uC::Array{Float64, 2} # Upper limit
γ::Float64 # CRRA coefficient
δ::Float64 # factor in the euler
γ1::Float64 #
r1::Float64 # inverse of the gross interest rate
yb1::Array{Float64, 2} # y - b(t+1)
P::Array{Float64, 2} # Transpose of transition matrix
end
mutable struct Upd1
pol::Array{Float64,2} # policy function
b::Array{Float64, 1} # exogenous grid for interpolation
dif::Float64 # updating difference
end
The first one is a set of parameters while the second one stores the decision rule C1. I also define some functions:
function eulerm(x::Upd1,p::Parm)
ct = p.δ *(x.pol.^(-p.γ)*p.P).^(-p.γ1); #Euler equation
bt = p.r1.*(ct .+ p.yb1); #Endeogenous grid for bonds
return ct,bt
end
function interp0!(bt::Array{Float64},ct::Array{Float64},x::Upd1, p::Parm)
polold = x.pol;
polnew = similar(x.pol);
#inbounds #simd for col in 1:size(bt,2)
F1 = LinearInterpolation(bt[:,col], ct[:,col],extrapolation_bc=Line());
polnew[:,col] = F1(x.b);
end
polnew[polnew .< p.lC] .= p.lC[polnew .< p.lC];
polnew[polnew .> p.uC] .= p.uC[polnew .> p.uC];
dif = maximum(abs.(polnew - polold));
return polnew,dif
end
function updating!(x::Upd1,p::Parm)
ct, bt = eulerm(x,p); # endogeneous grid
x.pol, x.dif = interp0!(bt,ct,x,p);
end
function conver(x::Upd1,p::Parm)
while x.dif>1e-8
updating!(x,p);
end
end
The first formula implements steps 2 and 3. The third one makes the updating (last part of step 4), and the last one iterates until convergence (step 5).
The most important function is the second one. It makes the interpolation. While I was running the function #time and #btime I realized that the largest number of allocations are in the loop inside this function. I tried to reduce it by not defining polnew and goes directly to x.pol but in this case, the results are not correct since it only need two iterations to converge (I think that Julia is thinking that polold is exactly the same than x.pol and it is updating both at the same time).
Any advice is well received.
To anyone that wants to run it by themselves, I add the rest of the required code:
function rouwen(ρ::Float64, σ2::Float64, N::Int64)
if (N % 2 != 1)
return "N should be an odd number"
end
sigz = sqrt(σ2/(1-ρ^2));
zn = sigz*sqrt(N-1);
z = range(-zn,zn,N);
p = (1+ρ)/2;
q = p;
Rho = [p 1-p;1-q q];
for i = 3:N
zz = zeros(i-1,1);
Rho = p*[Rho zz; zz' 0] + (1-p)*[zz Rho; 0 zz'] + (1-q)*[zz' 0; Rho zz] + q *[0 zz'; zz Rho];
Rho[2:end-1,:] = Rho[2:end-1,:]/2;
end
return z,Rho;
end
#############################################################
# Parameters of the model
############################################################
lb = 0; ub = 1000; pivb = 0.25; nb = 500;
ρ = 0.988; σz = 0.0439; μz =-σz/2; nz = 7;
ϕ = 0.0; σe = 0.6376; μe =-σe/2; ne = 7;
β = 0.98; r = 1/400; γ = 1;
b = exp10.(range(start=log10(lb+pivb), stop=log10(ub+pivb), length=nb)) .- pivb;
#=========================================================
Algorithm
======================================================== =#
(z,Pz) = rouwen(ρ,σz, nz);
μZ = μz/(1-ρ);
z = z .+ μZ;
(ee,Pe) = rouwen(ϕ,σe,ne);
ee = ee .+ μe;
y = exp.(vec((z .+ ee')'));
P = kron(Pz,Pe);
R = 1 + r;
r1 = R^(-1);
γ1 = 1/γ;
δ = (β*R)^(-γ1);
m = R*b .+ y';
lC = max.(m .- ub,0);
uC = m .- lb;
by1 = b .- y';
# initial guess for C0
c0 = 0.1*(m);
# Set of parameters
pp = Parm(lC,uC,γ,δ,γ1,r1,by1,P');
# Container of results
up1 = Upd1(c0,b,1);
# Fixed point problem
conver(up1,pp)
UPDATE As it was reccomend, I made the following changes to the third function
function interp0!(bt::Array{Float64},ct::Array{Float64},x::Upd1, p::Parm)
polold = x.pol;
polnew = similar(x.pol);
#inbounds for col in 1:size(bt,2)
F1 = LinearInterpolation(#view(bt[:,col]), #view(ct[:,col]),extrapolation_bc=Line());
polnew[:,col] = F1(x.b);
end
for j in eachindex(polnew)
polnew[j] < p.lC[j] ? polnew[j] = p.lC[j] : nothing
polnew[j] > p.uC[j] ? polnew[j] = p.uC[j] : nothing
end
dif = maximum(abs.(polnew - polold));
return polnew,dif
end
This leads to an improvement in the speed (from ~1.5 to ~1.3 seconds). And a reduction in the number of allocations. Somethings that I noted were:
Changing from polnew[:,col] = F1(x.b) to polnew[:,col] .= F1(x.b) can reduce the total allocations but the time is slower, why is that?
How should I understand the difference between #time and #btime. For this case, I have:
up1 = Upd1(c0,b,1);
#time conver(up1,pp)
1.338042 seconds (385.72 k allocations: 1.157 GiB, 3.37% gc time)
up1 = Upd1(c0,b,1);
#btime conver(up1,pp)
4.200 ns (0 allocations: 0 bytes)
Just to be precise, in both cases, I run it several times and I choose representative numbers for each line.
Does it mean that all the time is due allocations during the compilation?
Start going through the "performance tips" as advised by #DNF but below you will find most important comments for your code.
Vectorize vector assignments - a small dot makes big difference
julia> julia> a = rand(3,4);
julia> #btime $a[3,:] = $a[3,:] ./ 2;
40.726 ns (2 allocations: 192 bytes)
julia> #btime $a[3,:] .= $a[3,:] ./ 2;
20.562 ns (1 allocation: 96 bytes)
Use views when doing something with subarrays:
julia> #btime sum($a[3,:]);
18.719 ns (1 allocation: 96 bytes)
julia> #btime sum(#view($a[3,:]));
5.600 ns (0 allocations: 0 bytes)
Your code around a lines polnew[polnew .< p.lC] .= p.lC[polnew .< p.lC]; will make much less allocations when you do it with a for loop over each element of polnew
#simd will have no effect on conditionals (point 3) neither when code is calling complex external functions
I want to give an update about this problem. I made two main changes to my code: (i) I define my own linear interpolation function and (ii) I include the check of bounds in the interpolation.
With this the new function three is
function interp0!(bt::Array{Float64},ct::Array{Float64},x::Upd1, p::Parm)
polold = x.pol;
polnew = similar(x.pol);
#inbounds #views for col in 1:size(bt,2)
polnew[:,col] = myint(bt[:,col], ct[:,col],x.b[:],p.lC[:,col],p.uC[:,col]);
end
dif = maximum(abs.(polnew - polold));
return polnew,dif
end
And the interpolation is now:
function myint(x0,y0,x1,ly,uy)
y1 = similar(x1);
n = size(x0,1);
j = 1;
#simd for i in eachindex(x1)
while (j <= n) && (x1[i] > x0[j])
j+=1;
end
if j == 1
y1[i] = y0[1] + ((y0[2]-y0[1])/(x0[2]-x0[1]))*(x1[i]-x0[1]) ;
elseif j == n+1
y1[i] = y0[n] + ((y0[n]-y0[n-1])/(x0[n]-x0[n-1]))*(x1[i]-x0[n]);
else
y1[i] = y0[j-1]+ ((x1[i]-x0[j-1])/(x0[j]-x0[j-1]))*(y0[j]-y0[j-1]);
end
y1[i] > uy[i] ? y1[i] = uy[i] : nothing;
y1[i] < ly[i] ? y1[i] = ly[i] : nothing;
end
return y1;
end
As you can see, I am taking advantage (and assuming) that both vectors that we use as basis are ordered while the two last lines in the outer loops checks the bounds imposed by lC and uC.
With that I get the following total time
up1 = Upd1(c0,b,1);
#time conver(up1,pp)
0.734630 seconds (28.93 k allocations: 752.214 MiB, 3.82% gc time)
up1 = Upd1(c0,b,1);
#btime conver(up1,pp)
4.200 ns (0 allocations: 0 bytes)
which is almost as twice faster with ~8% of the total allocations. the use of views in the loop of the function interp0! also helps a lot.

Issues with setting random seed [duplicate]

This question already has an answer here:
Random numbers keep coming out the same, despite random seed being used
(1 answer)
Closed last year.
I am attempting to write a Montecarlo algorithm to simulate interaction between agents in a population. This algorithm needs to call two random numbers at each iteration (say, 10^9 iterations).
My issue here is that everytime I change the seed (to obtain different realizations), the RNG is giving me the same output (same Montecarlo events). I have tried different ways of implementing it with to no avail.
I am completely new to Fortran and copying this code from MATLAB. Am I doing something wrong in the way I'm implementing this code?
Below is what I tried:
program Gillespie
implicit none
integer*8, parameter :: n_max = 10.0**8 ! max. number of iterations
integer*8 :: t_ext, I_init, S_init, jump, S_now, I_now, i, u
real*8 :: t, N, a0, tau, st, r1, r2
real, parameter :: beta = 1000
real, parameter :: gammma = 99.98
real, parameter :: mu = 0.02
real, parameter :: R0 = beta/(gammma+mu)
integer :: seed = 11
real, dimension(n_max) :: S_new ! susceptible pop. array
real, dimension(n_max) :: I_new ! infected pop. array
real, dimension(n_max) :: t_new ! time array
real, dimension(5) :: events ! events array
open(unit=3, file='SIS_output.dat')
t = 0 ! initial time
N = 40 ! initial population size
jump = 1 ! time increment (save in uniform increments)
u = 2
t_ext = 0 ! extiction time
I_init = 2 ! initial infected pop.
S_init = N-I_init ! initial susceptible pop.
S_now = S_init
I_now = I_init
S_new(1) = S_init ! initialize susceptibles array
I_new(1) = I_init ! initialize infected array
t_new(1) = t ! initialize time array
write(3,*) t_new(1), S_new(1), I_new(1) ! write i.c. to array
call random_seed(seed)
do i=2, n_max
call random_number(r1)
call random_number(r2)
events(1) = mu*N ! Birth(S)
events(2) = mu*S_now ! Death(S)
events(3) = mu*I_now ! Death(I)
events(4) = (beta*S_now*I_now)/N ! Infection
events(5) = gammma*I_now ! Recovery
a0 = events(1)+events(2)+events(3)+events(4)+events(5)
tau = LOG(1/r1)*(1/a0) ! time increment
t = t + tau ! update time
st = r2*a0 ! stochastic time???
! update the populations
if (st .le. events(1)) then
S_now = S_now + 1
else if (st .gt. events(1) .AND. st .le.
#(events(1) + events(2))) then
S_now = S_now - 1
else if (st .gt. (events(1) + events(2)) .AND. st .le.
#(events(1) + events(2) + events(3))) then
I_now = I_now - 1
else if (st .gt. (events(1) + events(2) + events(3)) .AND.
#st .le. (events(1) + events(2) + events(3) + events(4))) then
S_now = S_now - 1
I_now = I_now + 1
else
S_now = S_now + 1
I_now = I_now - 1
end if
! save time in uniform increments
if(t .ge. jump) then
t_new(u) = floor(t)
S_new(u) = S_now
I_new(u) = I_now
write(3,*) t_new(u), S_new(u), I_new(u)
jump = jump+1
u = u+1
end if
! write(3,*) t_new(i), S_new(i), I_new(i)
!N = S_now + I_now ! update population post event
if(I_now .le. 0) then ! if extinct, terminate
print *, "extinct"
goto 2
end if
end do
2 end program Gillespie
I appreciate all input. Thanks.
Your call
call random_seed(seed)
is strange. I thought it should not be allowed without a keyword argument, but it actually is inquiring for the size of the random seed array.
For a proper way of initializing seed see the example in
https://gcc.gnu.org/onlinedocs/gfortran/RANDOM_005fSEED.html

How to speed up a double loop in matlab

This is a follow-up question of this question.
The following code takes an enormous amount of time to loop through. Do you have any recommendations for speeding up the process? The variable z has a size of 479x1672 and others will be around 479x12000.
z = HongKongPrices;
zmat = false(size(z));
r = size(z,1);
c = size(z,2);
for k = 1:c
for i = 5:r
if z(i,k) == z(i-4,k) && z(i,k) == z(i-3,k) && z(i,k) == z(end,k)
zmat(i-3:i,k) = 1
end
end
end
z(zmat) = NaN
I am currently running this with MatLab R2014b on an iMac with 3.2 Intel i5 and 16 GB DDR3.
You can use logical indexing here to your advantage to replace the IF-conditional statement and have a small-loop -
%// Get size parameters
[r,c] = size(z);
%// Get logical mask with ones for each column at places that satisfy the condition
%// mentioned as the IF conditional statement in the problem code
mask = z(1:r-4,:) == z(5:r,:) & z(2:r-3,:) == z(5:r,:) & ...
bsxfun(#eq,z(end,:),z(5:r,:));
%// Use logical indexing to map entire z array and set mask elements as NaNs
for k = 1:4
z([false(k,c) ; mask ; false(4-k,c)]) = NaN;
end
Benchmarking
%// Size parameters
nrows = 479;
ncols = 12000;
max_num = 10;
num_iter = 10; %// number of iterations to run each approach,
%// so that runtimes are over 1 sec mark
z_org = randi(max_num,nrows,ncols); %// random input data of specified size
disp('--------------------------------- With proposed approach')
tic
for iter = 1:num_iter
z = z_org;
[..... code from the proposed approach ...]
end
toc, clear z k mask r c
disp('--------------------------------- With original approach')
tic
for iter = 1:num_iter
z = z_org;
[..... code from the problem ...]
end
toc
Results
Case # 1: z as 479 x 1672 (num_iter = 50)
--------------------------------- With proposed approach
Elapsed time is 1.285337 seconds.
--------------------------------- With original approach
Elapsed time is 2.008256 seconds.
Case # 2: z as 479 x 12000 (num_iter = 10)
--------------------------------- With proposed approach
Elapsed time is 1.941858 seconds.
--------------------------------- With original approach
Elapsed time is 2.897006 seconds.

Checking for termination when converting real to rational

Recently I found this in some code I wrote a few years ago. It was used to rationalize a real value (within a tolerance) by determining a suitable denominator and then checking if the difference between the original real and the rational was small enough.
Edit to clarify : I actually don't want to convert all real values. For instance I could choose a max denominator of 14, and a real value that equals 7/15 would stay as-is. It's not as clear that as it's an outside variable in the algorithms I wrote here.
The algorithm to get the denominator was this (pseudocode):
denominator(x)
frac = fractional part of x
recip = 1/frac
if (frac < tol)
return 1
else
return recip * denominator(recip)
end
end
Seems to be based on continued fractions although it became clear on looking at it again that it was wrong. (It worked for me because it would eventually just spit out infinity, which I handled outside, but it would be often really slow.) The value for tol doesn't really do anything except in the case of termination or for numbers that end up close. I don't think it's relatable to the tolerance for the real - rational conversion.
I've replaced it with an iterative version that is not only faster but I'm pretty sure it won't fail theoretically (d = 1 to start with and fractional part returns a positive, so recip is always >= 1) :
denom_iter(x d)
return d if d > maxd
frac = fractional part of x
recip = 1/frac
if (frac = 0)
return d
else
return denom_iter(recip d*recip)
end
end
What I'm curious to know if there's a way to pick the maxd that will ensure that it converts all values that are possible for a given tolerance. I'm assuming 1/tol but don't want to miss something. I'm also wondering if there's an way in this approach to actually limit the denominator size - this allows some denominators larger than maxd.
This can be considered a 2D minimization problem on error:
ArgMin ( r - q / p ), where r is real, q and p are integers
I suggest the use of Gradient Descent algorithm . The gradient in this objective function is:
f'(q, p) = (-1/p, q/p^2)
The initial guess r_o can be q being the closest integer to r, and p being 1.
The stopping condition can be thresholding of the error.
The pseudo-code of GD can be found in wiki: http://en.wikipedia.org/wiki/Gradient_descent
If the initial guess is close enough, the objective function should be convex.
As Jacob suggested, this problem can be better solved by minimizing the following error function:
ArgMin ( p * r - q ), where r is real, q and p are integers
This is linear programming, which can be efficiently solved by any ILP (Integer Linear Programming) solvers. GD works on non-linear cases, but lack efficiency in linear problems.
Initial guesses and stopping condition can be similar to stated above. Better choice can be obtained for individual choice of solver.
I suggest you should still assume convexity near the local minimum, which can greatly reduce cost. You can also try Simplex method, which is great on linear programming problem.
I give credit to Jacob on this.
A problem similar to this is solved in the Approximations section beginning ca. page 28 of Bill Gosper's Continued Fraction Arithmetic document. (Ref: postscript file; also see text version, from line 1984.) The general idea is to compute continued-fraction approximations of the low-end and high-end range limiting numbers, until the two fractions differ, and then choose a value in the range of those two approximations. This is guaranteed to give a simplest fraction, using Gosper's terminology.
The python code below (program "simpleden") implements a similar process. (It probably is not as good as Gosper's suggested implementation, but is good enough that you can see what kind of results the method produces.) The amount of work done is similar to that for Euclid's algorithm, ie O(n) for numbers with n bits, so the program is reasonably fast. Some example test cases (ie the program's output) are shown after the code itself. Note, function simpleratio(vlo, vhi) as shown here returns -1 if vhi is smaller than vlo.
#!/usr/bin/env python
def simpleratio(vlo, vhi):
rlo, rhi, eps = vlo, vhi, 0.0000001
if vhi < vlo: return -1
num = denp = 1
nump = den = 0
while 1:
klo, khi = int(rlo), int(rhi)
if klo != khi or rlo-klo < eps or rhi-khi < eps:
tlo = denp + klo * den
thi = denp + khi * den
if tlo < thi:
return tlo + (rlo-klo > eps)*den
elif thi < tlo:
return thi + (rhi-khi > eps)*den
else:
return tlo
nump, num = num, nump + klo * num
denp, den = den, denp + klo * den
rlo, rhi = 1/(rlo-klo), 1/(rhi-khi)
def test(vlo, vhi):
den = simpleratio(vlo, vhi);
fden = float(den)
ilo, ihi = int(vlo*den), int(vhi*den)
rlo, rhi = ilo/fden, ihi/fden;
izok = 'ok' if rlo <= vlo <= rhi <= vhi else 'wrong'
print '{:4d}/{:4d} = {:0.8f} vlo:{:0.8f} {:4d}/{:4d} = {:0.8f} vhi:{:0.8f} {}'.format(ilo,den,rlo,vlo, ihi,den,rhi,vhi, izok)
test (0.685, 0.695)
test (0.685, 0.7)
test (0.685, 0.71)
test (0.685, 0.75)
test (0.685, 0.76)
test (0.75, 0.76)
test (2.173, 2.177)
test (2.373, 2.377)
test (3.484, 3.487)
test (4.0, 4.87)
test (4.0, 8.0)
test (5.5, 5.6)
test (5.5, 6.5)
test (7.5, 7.3)
test (7.5, 7.5)
test (8.534537, 8.534538)
test (9.343221, 9.343222)
Output from program:
> ./simpleden
8/ 13 = 0.61538462 vlo:0.68500000 9/ 13 = 0.69230769 vhi:0.69500000 ok
6/ 10 = 0.60000000 vlo:0.68500000 7/ 10 = 0.70000000 vhi:0.70000000 ok
6/ 10 = 0.60000000 vlo:0.68500000 7/ 10 = 0.70000000 vhi:0.71000000 ok
2/ 4 = 0.50000000 vlo:0.68500000 3/ 4 = 0.75000000 vhi:0.75000000 ok
2/ 4 = 0.50000000 vlo:0.68500000 3/ 4 = 0.75000000 vhi:0.76000000 ok
3/ 4 = 0.75000000 vlo:0.75000000 3/ 4 = 0.75000000 vhi:0.76000000 ok
36/ 17 = 2.11764706 vlo:2.17300000 37/ 17 = 2.17647059 vhi:2.17700000 ok
18/ 8 = 2.25000000 vlo:2.37300000 19/ 8 = 2.37500000 vhi:2.37700000 ok
114/ 33 = 3.45454545 vlo:3.48400000 115/ 33 = 3.48484848 vhi:3.48700000 ok
4/ 1 = 4.00000000 vlo:4.00000000 4/ 1 = 4.00000000 vhi:4.87000000 ok
4/ 1 = 4.00000000 vlo:4.00000000 8/ 1 = 8.00000000 vhi:8.00000000 ok
11/ 2 = 5.50000000 vlo:5.50000000 11/ 2 = 5.50000000 vhi:5.60000000 ok
5/ 1 = 5.00000000 vlo:5.50000000 6/ 1 = 6.00000000 vhi:6.50000000 ok
-7/ -1 = 7.00000000 vlo:7.50000000 -7/ -1 = 7.00000000 vhi:7.30000000 wrong
15/ 2 = 7.50000000 vlo:7.50000000 15/ 2 = 7.50000000 vhi:7.50000000 ok
8030/ 941 = 8.53347503 vlo:8.53453700 8031/ 941 = 8.53453773 vhi:8.53453800 ok
24880/2663 = 9.34284641 vlo:9.34322100 24881/2663 = 9.34322193 vhi:9.34322200 ok
If, rather than the simplest fraction in a range, you seek the best approximation given some upper limit on denominator size, consider code like the following, which replaces all the code from def test(vlo, vhi) forward.
def smallden(target, maxden):
global pas
pas = 0
tol = 1/float(maxden)**2
while 1:
den = simpleratio(target-tol, target+tol);
if den <= maxden: return den
tol *= 2
pas += 1
# Test driver for smallden(target, maxden) routine
import random
totalpass, trials, passes = 0, 20, [0 for i in range(20)]
print 'Maxden Num Den Num/Den Target Error Passes'
for i in range(trials):
target = random.random()
maxden = 10 + round(10000*random.random())
den = smallden(target, maxden)
num = int(round(target*den))
got = float(num)/den
print '{:4d} {:4d}/{:4d} = {:10.8f} = {:10.8f} + {:12.9f} {:2}'.format(
int(maxden), num, den, got, target, got - target, pas)
totalpass += pas
passes[pas-1] += 1
print 'Average pass count: {:0.3}\nPass histo: {}'.format(
float(totalpass)/trials, passes)
In production code, drop out all the references to pas (etc.), ie, drop out pass-counting code.
The routine smallden is given a target value and a maximum value for allowed denominators. Given maxden possible choices of denominators, it's reasonable to suppose that a tolerance on the order of 1/maxden² can be achieved. The pass-counts shown in the following typical output (where target and maxden were set via random numbers) illustrate that such a tolerance was reached immediately more than half the time, but in other cases tolerances 2 or 4 or 8 times as large were used, requiring extra calls to simpleratio. Note, the last two lines of output from a 10000-number test run are shown following the complete output of a 20-number test run.
Maxden Num Den Num/Den Target Error Passes
1198 32/ 509 = 0.06286837 = 0.06286798 + 0.000000392 1
2136 115/ 427 = 0.26932084 = 0.26932103 + -0.000000185 1
4257 839/2670 = 0.31423221 = 0.31423223 + -0.000000025 1
2680 449/ 509 = 0.88212181 = 0.88212132 + 0.000000486 3
2935 440/1853 = 0.23745278 = 0.23745287 + -0.000000095 1
6128 347/1285 = 0.27003891 = 0.27003899 + -0.000000077 3
8041 1780/4243 = 0.41951449 = 0.41951447 + 0.000000020 2
7637 3926/7127 = 0.55086292 = 0.55086293 + -0.000000010 1
3422 27/ 469 = 0.05756930 = 0.05756918 + 0.000000113 2
1616 168/1507 = 0.11147976 = 0.11147982 + -0.000000061 1
260 62/ 123 = 0.50406504 = 0.50406378 + 0.000001264 1
3775 52/3327 = 0.01562970 = 0.01562750 + 0.000002195 6
233 6/ 13 = 0.46153846 = 0.46172772 + -0.000189254 5
3650 3151/3514 = 0.89669892 = 0.89669890 + 0.000000020 1
9307 2943/7528 = 0.39094049 = 0.39094048 + 0.000000013 2
962 206/ 225 = 0.91555556 = 0.91555496 + 0.000000594 1
2080 564/1975 = 0.28556962 = 0.28556943 + 0.000000190 1
6505 1971/2347 = 0.83979548 = 0.83979551 + -0.000000022 1
1944 472/ 833 = 0.56662665 = 0.56662696 + -0.000000305 2
3244 291/1447 = 0.20110574 = 0.20110579 + -0.000000051 1
Average pass count: 1.85
Pass histo: [12, 4, 2, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
The last two lines of output from a 10000-number test run:
Average pass count: 1.77
Pass histo: [56659, 25227, 10020, 4146, 2072, 931, 497, 233, 125, 39, 33, 17, 1, 0, 0, 0, 0, 0, 0, 0]

Resources