Improve Fortran computation-time [closed] - performance

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I have this snippet of code I implemented at first to be checked. After check, I am now wondering if it could be optimised. In particular, the question that raised is:
"Would it be better to pre-initialise all single members of that big computation instead of computing them directly inside it (i.e. sqrt() ) ?"
In other words, would it exist a trade-off between memory allocation vs. CPU time in executing calculation within calculation themselves ?
here it is the code snippet (EDIT):
do id=1,3
BF(idx1,idx2,idx3) = BF(idx1,idx2,idx3) +&
! i-j, i-k
2*bsa_wind_coeff(ilib,4+id,inode)*bsa_wind_coeff(jlib,1+id,jnode)*bsa_wind_coeff(klib,1+id,knode)* &
(nodCorr(id,p1)* sqrt(SU(i1,id,inode)*SU(i1,id,jnode))) * (nodCorr(id,p2)* sqrt(SU(i2,id,inode)*SU(i2,id,knode)) )+&
! i-j, j-k
2*bsa_wind_coeff(ilib,1+id,inode)*bsa_wind_coeff(jlib,4+id,jnode)*bsa_wind_coeff(klib,1+id,knode)* &
(nodCorr(id,p1)* sqrt(SU12(id,inode)*SU12(id,jnode))) * (nodCorr(id,p3)* sqrt(SU(i2,id,jnode)*SU(i2,id,knode)) )+&
! i-k, j-k
2*bsa_wind_coeff(ilib,1+id,inode)*bsa_wind_coeff(jlib,1+id,jnode)*bsa_wind_coeff(klib,4+id,knode)* &
(nodCorr(id,p2)* sqrt(SU12(id,inode)*SU12(id,knode))) * (nodCorr(id,p3)* sqrt(SU(i1,id,jnode)*SU(i1,id,knode)) )
enddo
! dir 1-2
BF(idx1,idx2,idx3) = BF(idx1,idx2,idx3) +&
! i-k, i-j
bsa_wind_coeff(ilib,8,inode)*bsa_wind_coeff(jlib,3,jnode)*bsa_wind_coeff(klib,2,knode)* &
(nodCorr(1,p2)* sqrt( SU(i2,1,inode)*SU(i2,1,knode) )) * (nodCorr(2,p1)* sqrt( SU(i1,2,inode)*SU(i1,2,jnode) ) )+&
! i-j, j-k
bsa_wind_coeff(ilib,2,inode)*bsa_wind_coeff(jlib,8,jnode)*bsa_wind_coeff(klib,3,knode)* &
(nodCorr(1,p2)* sqrt( SU12(1,inode)*SU12(1,jnode) )) * (nodCorr(2,p3)* sqrt( SU(i2,2,jnode)*SU(i2,2,knode) ) )+&
! j-k, i-k
bsa_wind_coeff(ilib,3,inode)*bsa_wind_coeff(jlib,2,jnode)*bsa_wind_coeff(klib,8,knode)* &
(nodCorr(1,p3)* sqrt( SU(i1,1,jnode)*SU(i1,1,knode) )) * (nodCorr(2,p2)* sqrt( SU12(2,inode)*SU12(2,knode) ) )+&
! i-j, i-k
bsa_wind_coeff(ilib,8,inode)*bsa_wind_coeff(jlib,2,jnode)*bsa_wind_coeff(klib,3,knode)* &
(nodCorr(1,p1)* sqrt( SU(i1,1,inode)*SU(i1,1,jnode) )) * (nodCorr(2,p2)* sqrt( SU(i2,2,inode)*SU(i2,2,knode) ) )+&
! j-k, i-j
bsa_wind_coeff(ilib,3,inode)*bsa_wind_coeff(jlib,8,jnode)*bsa_wind_coeff(klib,2,knode)* &
(nodCorr(1,p3)* sqrt( SU(i2,1,jnode)*SU(i2,1,knode) )) * (nodCorr(2,p1)* sqrt( SU12(2,inode)*SU12(2,jnode) ) )+&
! i-k, j-k
bsa_wind_coeff(ilib,2,inode)*bsa_wind_coeff(jlib,3,jnode)*bsa_wind_coeff(klib,8,knode)* &
(nodCorr(1,p2)* sqrt( SU12(1,inode)*SU12(1,knode) )) * (nodCorr(2,p3)* sqrt( SU(i1,2,jnode)*SU(i1,2,knode) ) )
! dir 1-3
BF(idx1,idx2,idx3) = BF(idx1,idx2,idx3) +&
! i-k, i-j
bsa_wind_coeff(ilib,9,inode)*bsa_wind_coeff(jlib,4,jnode)*bsa_wind_coeff(klib,2,knode)* &
(nodCorr(1,p2)* sqrt( SU(i2,1,inode)*SU(i2,1,knode) )) * (nodCorr(3,p1)* sqrt( SU(i1,3,inode)*SU(i1,3,jnode) ) )+&
! i-j, j-k
bsa_wind_coeff(ilib,2,inode)*bsa_wind_coeff(jlib,9,jnode)*bsa_wind_coeff(klib,4,knode)* &
(nodCorr(1,p2)* sqrt( SU12(1,inode)*SU12(1,jnode) )) * (nodCorr(3,p3)* sqrt( SU(i2,3,jnode)*SU(i2,3,knode) ) )+&
! j-k, i-k
bsa_wind_coeff(ilib,4,inode)*bsa_wind_coeff(jlib,2,jnode)*bsa_wind_coeff(klib,9,knode)* &
(nodCorr(1,p3)* sqrt( SU(i1,1,jnode)*SU(i1,1,knode) )) * (nodCorr(3,p2)* sqrt( SU12(3,inode)*SU12(3,knode) ) )+&
! i-j, i-k
bsa_wind_coeff(ilib,9,inode)*bsa_wind_coeff(jlib,2,jnode)*bsa_wind_coeff(klib,4,knode)* &
(nodCorr(1,p1)* sqrt( SU(i1,1,inode)*SU(i1,1,jnode) )) * (nodCorr(3,p2)* sqrt( SU(i2,3,inode)*SU(i2,3,knode) ) )+&
! j-k, i-j
bsa_wind_coeff(ilib,4,inode)*bsa_wind_coeff(jlib,9,jnode)*bsa_wind_coeff(klib,2,knode)* &
(nodCorr(1,p3)* sqrt( SU(i2,1,jnode)*SU(i2,1,knode) )) * (nodCorr(3,p1)* sqrt( SU12(3,inode)*SU12(3,jnode) ) )+&
! i-k, j-k
bsa_wind_coeff(ilib,2,inode)*bsa_wind_coeff(jlib,4,jnode)*bsa_wind_coeff(klib,9,knode)* &
(nodCorr(1,p2)* sqrt( SU12(1,inode)*SU12(1,knode) )) * (nodCorr(3,p3)* sqrt( SU(i1,3,jnode)*SU(i1,3,knode) ) )
! dir 2-3
BF(idx1,idx2,idx3) = BF(idx1,idx2,idx3) +&
bsa_wind_coeff(ilib,10,inode)*bsa_wind_coeff(jlib,4,jnode)*bsa_wind_coeff(klib,3,knode)* &
(nodCorr(2,p2)* sqrt( SU(i2,2,inode)*SU(i2,2,knode) )) * (nodCorr(3,p1)* sqrt( SU(i1,3,inode)*SU(i1,3,jnode) ) )+&
bsa_wind_coeff(ilib,3,inode)*bsa_wind_coeff(jlib,10,jnode)*bsa_wind_coeff(klib,4,knode)* &
(nodCorr(2,p2)* sqrt( SU12(2,inode)*SU12(2,jnode) )) * (nodCorr(3,p3)* sqrt( SU(i2,3,jnode)*SU(i2,3,knode) ) )+&
bsa_wind_coeff(ilib,4,inode)*bsa_wind_coeff(jlib,3,jnode)*bsa_wind_coeff(klib,10,knode)* &
(nodCorr(2,p3)* sqrt( SU(i1,2,jnode)*SU(i1,2,knode) )) * (nodCorr(3,p2)* sqrt( SU12(3,inode)*SU12(3,knode) ) )+&
bsa_wind_coeff(ilib,10,inode)*bsa_wind_coeff(jlib,3,jnode)*bsa_wind_coeff(klib,4,knode)* &
(nodCorr(2,p1)* sqrt( SU(i1,2,inode)*SU(i1,2,jnode) )) * (nodCorr(3,p2)* sqrt( SU(i2,3,inode)*SU(i2,3,knode) ) )+&
bsa_wind_coeff(ilib,4,inode)*bsa_wind_coeff(jlib,10,jnode)*bsa_wind_coeff(klib,3,knode)* &
(nodCorr(2,p3)* sqrt( SU(i2,2,jnode)*SU(i2,2,knode) )) * (nodCorr(3,p1)* sqrt( SU12(3,inode)*SU12(3,jnode) ) )+&
bsa_wind_coeff(ilib,3,inode)*bsa_wind_coeff(jlib,4,jnode)*bsa_wind_coeff(klib,10,knode)* &
(nodCorr(2,p2)* sqrt( SU12(2,inode)*SU12(2,knode) )) * (nodCorr(3,p3)* sqrt( SU(i1,3,jnode)*SU(i1,3,knode) ) )
Some NOTES:
bsa_wind_coeffs is a Common Variable (declared in a Module), while all others present in the code snippet are passed directly (i.e. I assume a copy is created locally). Within the procedure itself, indexes ilib, jlib, klib, inode, jnode, knode are looped. This computation is called many time within an external procedure.
EDIT:
all variables in here are static in the sense that they're computed once (before calling this procedure) and then just walked based on some indexes values as you can see. That is, at each call, same arrays are passed.

In the comments you say you want a "conceptual" answer as to when pre-computing is faster than not pre-computing.
Unfortunately, making a static analysis of which of two methods is better is often difficult, for a couple of reasons. Firstly, optimising compilers are very clever, and do a lot of fairly unpredictable things behind the scenes. Secondly, there are a lot of external variables, and the details of the compiler, the cpu, the memory etc. will change the results.
The answer to the question "which of these two algorithms is faster" is usually "try both and see". You should at least time your code, but using a profiling tool is ideal. Not only will a profiling tool tell you how long the different approaches take, it will help narrow down what parts of the two approaches are taking the most time. This lets you know which parts of the code you should optimise for the best results.

Related

Finite difference method for solving the Klein-Gordon equation in Matlab

I am trying to numerically solve the Klein-Gordon equation that can be found here. To make sure I solved it correctly, I am comparing it with an analytical solution that can be found on the same link. I am using the finite difference method and Matlab. The initial spatial conditions are known, not the initial time conditions.
I start off by initializing the constants and the space-time coordinate system:
close all
clear
clc
%% Constant parameters
A = 2;
B = 3;
lambda = 2;
mu = 3;
a = 4;
b = - (lambda^2 / a^2) + mu^2;
%% Coordinate system
number_of_discrete_time_steps = 300;
t = linspace(0, 2, number_of_discrete_time_steps);
dt = t(2) - t(1);
number_of_discrete_space_steps = 100;
x = transpose( linspace(0, 1, number_of_discrete_space_steps) );
dx = x(2) - x(1);
Next, I define and plot the analitical solution:
%% Analitical solution
Wa = cos(lambda * x) * ( A * cos(mu * t) + B * sin(mu * t) );
figure('Name', 'Analitical solution');
surface(t, x, Wa, 'edgecolor', 'none');
colormap(jet(256));
colorbar;
xlabel('t');
ylabel('x');
title('Wa(x, t) - analitical solution');
The plot of the analytical solution is shown here.
In the end, I define the initial spatial conditions, execute the finite difference method algorithm and plot the solution:
%% Numerical solution
Wn = zeros(number_of_discrete_space_steps, number_of_discrete_time_steps);
Wn(1, :) = Wa(1, :);
Wn(2, :) = Wa(2, :);
for j = 2 : (number_of_discrete_time_steps - 1)
for i = 2 : (number_of_discrete_space_steps - 1)
Wn(i + 1, j) = dx^2 / a^2 ...
* ( ( Wn(i, j + 1) - 2 * Wn(i, j) + Wn(i, j - 1) ) / dt^2 + b * Wn(i - 1, j - 1) ) ...
+ 2 * Wn(i, j) - Wn(i - 1, j);
end
end
figure('Name', 'Numerical solution');
surface(t, x, Wn, 'edgecolor', 'none');
colormap(jet(256));
colorbar;
xlabel('t');
ylabel('x');
title('Wn(x, t) - numerical solution');
The plot of the numerical solution is shown here.
The two plotted graphs are not the same, which is proof that I did something wrong in the algorithm. The problem is, I can't find the errors. Please help me find them.
To summarize, please help me change the code so that the two plotted graphs become approximately the same. Thank you for your time.
The finite difference discretization of w_tt = a^2 * w_xx - b*w is
( w(i,j+1) - 2*w(i,j) + w(i,j-1) ) / dt^2
= a^2 * ( w(i+1,j) - 2*w(i,j) + w(i-1,j) ) / dx^2 - b*w(i,j)
In your order this gives the recursion equation
w(i,j+1) = dt^2 * ( (a/dx)^2 * ( w(i+1,j) - 2*w(i,j) + w(i-1,j) ) - b*w(i,j) )
+2*w(i,j) - w(i,j-1)
The stability condition is that at least a*dt/dx < 1. For the present parameters this is not satisfied, they give this ratio as 2.6. Increasing the time discretization to 1000 points is sufficient.
Next up is the boundary conditions. Besides the two leading columns for times 0 and dt one also needs to set the values at the boundaries for x=0 and x=1. Copy also them from the exact solution.
Wn(:,1:2) = Wa(:,1:2);
Wn(1,:)=Wa(1,:);
Wn(end,:)=Wa(end,:);
Then also correct the definition (and use) of b to that in the source
b = - (lambda^2 * a^2) + mu^2;
and the resulting numerical image looks identical to the analytical image in the color plot. The difference plot confirms the closeness

How to convert rotation matrix to quaternion

Can I convert rotation matrix to quaternion?
I know how to convert quaternion to rotation matrix but I can't find way to do opposite that.
I can show you the code how to convert quaternion to rotation matrix as bellow.
Example(C++): Quaterniond quat; MatrixXd t; t = quat.matrix();
I want to know way to convert rotation matrix to quaternion like this.
A numerically stable algorithm for converting a direction cosine matrix D into a quaternion q is as follows:
T = D(1,1) + D(2,2) + D(3,3)
M = max( D(1,1), D(2,2), D(3,3), T )
qmax = (1/2) * sqrt( 1 – T + 2*M )
if( M == D(1,1) )
qx = qmax
qy = ( D(1,2) + D(2,1) ) / ( 4*qmax )
qz = ( D(1,3) + D(3,1) ) / ( 4*qmax )
qw = ±( D(3,2) - D(2,3) ) / ( 4*qmax )
elseif( M == D(2,2) )
qx = ( D(1,2) + D(2,1) ) / ( 4*qmax )
qy = qmax
qz = ( D(2,3) + D(3,2) ) / ( 4*qmax )
qw = ±( D(1,3) - D(3,1) ) / ( 4*qmax )
elseif( M == D(3,3) )
qx = ( D(1,3) + D(3,1) ) / ( 4*qmax )
qy = ( D(2,3) + D(3,2) ) / ( 4*qmax )
qz = qmax
qw = ±( D(1,3) - D(3,1) ) / ( 4*qmax )
else
qx = ±( D(3,2) - D(2,3) ) / ( 4*qmax )
qy = ±( D(1,3) - D(3,1) ) / ( 4*qmax )
qz = ±( D(2,1) - D(1,2) ) / ( 4*qmax )
qw = qmax
endif
Note that there is a sign ambiguity inherent in quaternions. The algorithm above arbitrarily picks the sign of the largest element qmax to be positive, but it is equally valid to pick this sign as negative (i.e., essentially flipping all of the signs of the result). It is up to the user to determine which is the more appropriate selection based on the application.
The ± selection is made based on the quaternion convention you are using:
Choose + for Hamilton Left Chain Convention or JPL Right Chain Convention
Choose - for Hamilton Right Chain Convention or JPL Left Chain Convention
Hamilton Convention means the quaternion elements i, j, k behave in a right-handed manner for multiplication (like cross products):
i * j = k , j * k = i , k * i = j
JPL Convention means the quaternion elements i, j, k behave in a left-handed manner for multiplication (negative of cross products):
i * j = -k , j * k = -i , k * i = -j
Right Chain means the quaternion rotation operation on a vector has the unmodified quaternion on the right side:
D * v1 = v2 = q^-1 * v1 * q
Left Chain means the quaternion rotation operation on a vector has the unmodified quaternion on the left side:
D * v1 = v2 = q * v1 * q^-1
For completeness, here is the algorithm for the other direction, converting a quaternion to a direction cosine matrix:
D = (qw^2 - dot(qv,qv))*I3 + 2*qv*qv^T ± 2*qw*Skew(qv)
where ^T means transpose (for outer product in that term) and
qv = [qx]
[qy]
[qz]
I3 = [1 0 0]
[0 1 0]
[0 0 1]
Skew(qv) = [ 0 -qz qy]
[ qz 0 -qx]
[-qy qx 0]

Mutual recursion analysis

I'm trying to analyze these functions but i am getting a bit lost. So for function f when t(n) = c if n < 1^-5
so if n >= 1^5 i get t(n) = c2 + t( n / 2 ) + t2( n / 2) where t2 is the time analysis of function h, but i'm confused on expanding it should it be something like
t(n) = ( t(n / 2) + t2( n / 2) ) * c2 + c
or should i be expanding t2 in side of that?
here is the code i am trying to analyze.
float f( float x) {
if ( abs( x ) < 1e-5 ) {
return x + ( ( x * x * x ) / 2 );
}
float y = f( x / 2 );
float z = g( x / 2 );
return 2 * y * z;
}
float g( float x ) {
if ( abs( x ) < 1e-5 ) {
return 1 + ( ( x * x ) / 2 );
}
float y = f( x / 2 );
float z = g( x / 2 );
return ( z * z ) + ( y * y );
}
T1(n) = T1(n / 2) + T2(n / 2) + c1
T2(n) = T1(n / 2)+T2(n / 2) + c2
so we have
T1(n) = O(T2(n))
T1(n) = 2T1(n / 2) + c1
since c1 = O(nlog22) master theorem implies that
T(n) = O(n)
Even though we are calling two different functions in this code, there is a thing about them that makes finding the complexity of this recursion easy.
What's happening is that at the top level, if you are entering f(), you are evaluating x and then calling two different functions - itself and g(). Even if you enter the function g() first, same thing happens, i.e. g() calls itself and f().
Since, every level down the tree the value of x halves, the number of levels on this tree would be Log2(n). Also, every node has 2 children viz. f(x/2) and g(x/2).
This is a complete binary tree of length Log2(n).
Work done on each node is constant - If the node represents the call to f(), you do 2 * y * z, which is constant. If the node represents the call to g(), you do y*y + z*z, which is also constant.
Hence, all we need to do is, find the total number of nodes in a compete binary tree of length Log2(n) and we have our complexity.
A perfect binary tree of height h has total 2h + 1 - 1 nodes.
In this case it would be 2Log2(n) + 1 - 1 nodes.
Also, aLogab = b (By property of logarithms)1
Hence, the complexity is O(2Log2(n)) = O(n).
1 See first property in "Cancelling Exponentials" section.

How to calculate the Spatial frequency in Gabor filter?

This question describe the gabor filter family and its application pretty well. Though, there is nothing described about the wavelength (spatial frequency) of the filter. The creation of gabor wavelets are done in the following for loop:
for v = 0 : 4
for u = 1 : 8
GW = GaborWavelet ( R, C, Kmax, f, u, v, Delt2 ); % Create the Gabor wavelets
figure( 2 );
subplot( 5, 8, v * 8 + u ),imshow ( real( GW ) ,[]); % Show the real part of Gabor wavelets
GW_ALL( v*8+u, :) = GW(:);
end
figure ( 3 );
subplot( 1, 5, v + 1 ),imshow ( abs( GW ),[]); % Show the magnitude of Gabor wavelets
end
I know that the second loop variable is the orientation with pi/8 intervals. Though, I don't know how the first loop variable is linked with the spatial frequency (wavelength) in this code and its function [pixels/cycle]. Can anyone help?
I found the answer finally. The GaborWavelet function is defined as follows:
function GW = GaborWavelet (R, C, Kmax, f, u, v, Delt2)
k = ( Kmax / ( f ^ v ) ) * exp( 1i * u * pi / 8 );% Wave Vector
kn2 = ( abs( k ) ) ^ 2;
GW = zeros ( R , C );
for m = -R/2 + 1 : R/2
for n = -C/2 + 1 : C/2
GW(m+R/2,n+C/2) = ( kn2 / Delt2 ) * exp( -0.5 * kn2 * ( m ^ 2 + n ^ 2 ) / Delt2) * ( exp( 1i * ( real( k ) * m + imag ( k ) * n ) ) - exp ( -0.5 * Delt2 ) );
end
end
The Kmax is the maximum frequency, f is the spacing factor and v is the resolution. The spacing factor f is usually considered as sqrt(2).
Based on this paper, k= 2*pi*f*exp(i*ϑ) and in the code Kmax=fmax*2*pi which is not described and is the key to find the wavelength of the filter. I also read this implementation and noticed that wavelength can easily be found using f = 1/lambda where lambda is wavelength of sinusoid.
So for example, if Kmax=pi/2 and v=0, so the k=Kmax*exp(1i*u*pi/8) and considering the above mentioned formula, lambda = 2*pi/Kmax = 4 [pixel/cycle].

Jacobi method converging then diverging

I am working to solve Poisson's equation (in 2d axisymmetric cylindrical coordinates) using the Jacobi method. The L2 norm decreases from ~1E3 on the first iteration (I have a really bad guess) to ~0.2 very slowly. Then, the L2 norm begins to increase over many iterations.
My geometry is parallel plates with sharp points at r = 0 on both plates. (If that matters).
Is there some error in my code? Do I need to go to a different algorithm? (I have a not yet working DADI algorithm.)
Here is my Jacobi method algorithm. Then this just in wrapped in a while loop.
subroutine Jacobi(PoissonRHS, V, resid)
implicit none
real, dimension(0:,0:) :: PoissonRHS, V
REAL resid
integer :: i,j, lb, ub
real, dimension(0:size(V,1)-1, 0:size(V,2)-1) :: oldV
real :: dr = delta(1)
real :: dz = delta(2)
real :: dr2 = (delta(1))**(-2)
real :: dz2 = (delta(2))**(-2)
integer :: M = cells(1)
integer :: N = cells(2)
oldV = V
!Note: All of the equations are second order accurate
!If at r = 0 and in the computational domain
! This is the smoothness condition, dV(r=0)/dr = 0
V(0,:) = (4.0*oldV(1,:)-oldV(2,:))/3.0
!If at r = rMax and in the computational domain
! This is an approximation and should be fixed to improve accuracy, it should be
! lim r->inf V' = 0, while this is V'(r = R) = 0
V(M, 1:N-1) = 0.5 / (dr2 + dz2) * ( &
(2.0*dr2)*oldV(M-1,1:N-1) + &
dz2 * (oldV(M,2:N) + oldV(M,0:N-2)) &
- PoissonRHS(M,1:N-1))
do i = 1, M-1
lb = max(0, nint(lowerBoundary(i * dr) / dz)) + 1
ub = min(N, nint(upperBoundary(i * dr) / dz)) - 1
V(i,lb:ub) = 0.5 / (dr2 + dz2) * ( &
((1.0 - 0.5/dble(i))*dr2)*oldV(i-1,lb:ub) + &
((1.0 + 0.5/dble(i))*dr2)*oldV(i+1,lb:ub) + &
dz2 * (oldV(i,lb+1:ub+1) + oldV(i,lb-1:ub-1)) &
- PoissonRHS(i,lb:ub))
V(i, 0:lb-1) = V0
V(i, ub+1:N) = VL
enddo
!compare to old V values to check for convergence
resid = sqrt(sum((oldV-V)**2))
return
end subroutine Jacobi
Based on additional readings it seems like it was a precision problem. Because (for example), I had the expression
V(i,lb:ub) = 0.5 / (dr2 + dz2) * ( &
((1.0 - 0.5/dble(i))*dr2)*oldV(i-1,lb:ub) + &
((1.0 + 0.5/dble(i))*dr2)*oldV(i+1,lb:ub) + &
dz2 * (oldV(i,lb+1:ub+1) + oldV(i,lb-1:ub-1)) &
- PoissonRHS(i,lb:ub))
where dr2 and dz2 are very large. So by distributing these I got terms that were ~1 and the code converges (slowly, but that's a function of the mathematics).
So my new code is
subroutine Preconditioned_Jacobi(PoissonRHS, V, resid)
implicit none
real, dimension(0:,0:) :: PoissonRHS, V
REAL resid
integer :: i,j, lb, ub
real, dimension(0:size(V,1)-1, 0:size(V,2)-1) :: oldV
real :: dr = delta(1)
real :: dz = delta(2)
real :: dr2 = (delta(1))**(-2)
real :: dz2 = (delta(2))**(-2)
real :: b,c,d
integer :: M = cells(1)
integer :: N = cells(2)
b = 0.5*(dr**2)/((dr**2) + (dz**2))
c = 0.5*(dz**2)/((dr**2) + (dz**2))
d = -0.5 / (dr2 + dz2)
oldV = V
!Note: All of the equations are second order accurate
!If at r = 0 and in the computational domain
! This is the smoothness condition, dV(r=0)/dr = 0
V(0,:) = (4.0*oldV(1,:)-oldV(2,:))/3.0 !same as: oldV(0,:) - 2.0/3.0 * (1.5 * oldV(0,:) - 2.0 * oldV(1,:) + 0.5 * oldV(2,:) - 0)
!If at r = rMax and in the computational domain
! This is an approximation and should be fixed to improve accuracy, it should be
! lim r->inf V' = 0, while this is V'(r = R) = 0
V(M,1:N-1) = d*PoissonRHS(M,1:N-1) &
+ 2.0*c * oldV(M-1,1:N-1) &
+ b * ( oldV(M,0:N) + oldV(M,2:N) )
do i = 1, M-1
lb = max(0, nint(lowerBoundary(i * dr) / dz)) + 1
ub = min(N, nint(upperBoundary(i * dr) / dz)) - 1
V(i,lb:ub) = d*PoissonRHS(i,lb:ub) &
+ (c * (1.0-0.5/dble(i)) * oldV(i-1,lb:ub)) &
+ (c * (1.0+0.5/dble(i)) * oldV(i+1,lb:ub)) &
+ b * (oldV(i,lb-1:ub-1) + oldV(i,lb+1:ub+1))
V(i, 0:lb-1) = V0
V(i, ub+1:N) = VL
enddo
!compare to old V values to check for convergence
resid = sum(abs(oldV-V))
return
end subroutine Preconditioned_Jacobi

Resources