OpenMP and array summation with Fortran 90 - openmp

I'm trying to to compute pressure tensor of a crystal structure.
To do so, I have to go throught all pair of particle like in the simplify code below
do i=1, atom_number ! sum over atoms i
type1 = ATOMS(i)
do nj=POINT(i), POINT(i+1)-1 ! sum over atoms j of i's atoms list
j = LIST(nj)
type2 = ATOMS(j)
call get_ff_param(type1,type2,Aab,Bab,Cab,Dab)
call distance_sqr2(i,j,r,VECT_R)
call gettensor_rij(i,j,T)
r = sqrt(r)
if (r<coub_cutoff) then
local_sum_real(id+1) = local_sum_real(id+1) + ( erfc(alpha_ewald*r)
derivative = -( erfc(alpha_ewald*r)
ff = - (1.0d0/r)*derivative
STRESS_EWALDD = STRESS_EWALDD + ff * T ! A 3x3 matrix
FDX(i) = FDX(i) + VECT_R(1) * ff
FDY(i) = FDY(i) + VECT_R(2) * ff
FDZ(i) = FDZ(i) + VECT_R(3) * ff
FDX(j) = FDX(j) - VECT_R(1) * ff
FDY(j) = FDY(j) - VECT_R(2) * ff
FDZ(j) = FDZ(j) - VECT_R(3) * ff
end if
end do ! sum over atoms j
sum_kin = sum_kin + ATMMASS(i) * (VX(i)**2 + VY(i)**2 + VZ(i)**2)
call gettensor_v(i,Q)
STRESS_KINETIC = STRESS_KINETIC + ATMMASS(i) * Q
end do
I tried to parallelized this double loop with the "paralle do" directive but got some issue with the tensor STRESS_EWALDD and the forces FDX ....
I therefore tried to assign manualy to each thread a number of particles like in the follwing code but still get the wrong tensor value.
!$omp parallel private(id,j,i,nj,type1,type2,T,Q,r,VECT_R,derivative,Aab,Bab,Cab,Dab,ff)
id = omp_get_thread_num()
do i=id+1, atom_number,nthreads ! sum over atoms i
type1 = ATOMS(i)
do nj=POINT(i), POINT(i+1)-1 ! sum over atoms j of i's atoms list
j = LIST(nj)
type2 = ATOMS(j)
call get_ff_param(type1,type2,Aab,Bab,Cab,Dab)
call distance_sqr2(i,j,r,VECT_R)
call gettensor_rij(i,j,T)
r = sqrt(r)
if (r<coub_cutoff) then
local_sum_real(id+1) = local_sum_real(id+1) + ( erfc(alpha_ewald*r)
derivative = -( erfc(alpha_ewald*r)
ff = - (1.0d0/r)*derivative
local_STRESS_EWALDD(id+1,:,:) = local_STRESS_EWALDD(id+1,:,:) + ff * T
FDX(i) = FDX(i) + VECT_R(1) * ff
FDY(i) = FDY(i) + VECT_R(2) * ff
FDZ(i) = FDZ(i) + VECT_R(3) * ff
FDX(j) = FDX(j) - VECT_R(1) * ff
FDY(j) = FDY(j) - VECT_R(2) * ff
FDZ(j) = FDZ(j) - VECT_R(3) * ff
end if
end do ! sum over atoms j
local_sum_kin(id+1) = local_sum_kin(id+1) + ATMMASS(i) * (VX(i)**2 + VY(i)**2 + VZ(i)**2)
call gettensor_v(i,Q)
local_STRESS_KINETIC(id+1,:,:) = local_STRESS_KINETIC(id+1,:,:) + ATMMASS(i) * Q
end do ! sum over atoms i
!$omp end parallel
do i=1,nthreads
sum_real = sum_real + local_sum_real(i)
sum_virial_buck = sum_virial_buck + local_sum_virial_buck(i)
STRESS_BUCK = STRESS_BUCK + local_STRESS_BUCK(i,:,:)
STRESS_EWALDD = STRESS_EWALDD + local_STRESS_EWALDD(i,:,:)
sum_buck = sum_buck + local_sum_buck(i)
sum_kin = sum_kin + local_sum_kin(i)
STRESS_KINETIC = STRESS_KINETIC + local_STRESS_KINETIC(i,:,:)
end do
Scalars and STRESS_KINETIC values are correct but the STRESS_EWALDD is wrong and I can't figure out why.
I've no idea about forces so far.
So I'd really appreciate some hit here.
Thanks,
Éric.

You have taken a somewhat unorthodox approach to using OpenMP.
OpenMP provides the reduction(op:vars) clause that performs reduction with op over local values of the variables in vars and you should use it for STRESS_EWALD, sum_kin, sum_real and STRESS_KINETIC. Force accumulation for the i-th atom should work since atoms in the Verlet list POINT are ordered (you took the code that builds it from the book from Allen & Tildesley, right?) but not for the j-th atom. That's why you should perform reduction on them too. Don't worry if you read in some older OpenMP manual that reduction variables have to be scalar - newer OpenMP versions support reduction over array variables in Fortran 90+.
!$OMP PARALLEL DO PRIVATE(....)
!$OMP& REDUCTION(+:FDX,FDY,FDZ,sum_kin,sum_real,STRESS_EWALDD,STRESS_KINETIC)
!$OMP& SCHEDULE(DYNAMIC)
do i=1, atom_number ! sum over atoms i
type1 = ATOMS(i)
do nj=POINT(i), POINT(i+1)-1 ! sum over atoms j of i's atoms list
j = LIST(nj)
type2 = ATOMS(j)
call get_ff_param(type1,type2,Aab,Bab,Cab,Dab)
call distance_sqr2(i,j,r,VECT_R)
call gettensor_rij(i,j,T)
r = sqrt(r)
if (r<coub_cutoff) then
sum_real = sum_real + ( erfc(alpha_ewald*r)
derivative = -( erfc(alpha_ewald*r)
ff = - (1.0d0/r)*derivative
STRESS_EWALDD = STRESS_EWALDD + ff * T ! A 3x3 matrix
FDX(i) = FDX(i) + VECT_R(1) * ff
FDY(i) = FDY(i) + VECT_R(2) * ff
FDZ(i) = FDZ(i) + VECT_R(3) * ff
FDX(j) = FDX(j) - VECT_R(1) * ff
FDY(j) = FDY(j) - VECT_R(2) * ff
FDZ(j) = FDZ(j) - VECT_R(3) * ff
end if
end do ! sum over atoms j
sum_kin = sum_kin + ATMMASS(i) * (VX(i)**2 + VY(i)**2 + VZ(i)**2)
call gettensor_v(i,Q)
STRESS_KINETIC = STRESS_KINETIC + ATMMASS(i) * Q
end do
!$OMP END PARALLEL DO
Using dynamic loop scheduling will reduce the load imbalance when the number of neighbours to each atom varies wildly.

Related

Local infeasibility in MATLAB

I need help with a dynamic optimization problem that consist in a consumed energy optimization of a UAV with this optimal control problem.
enter image description here
enter image description here
My code is this
Ecuations:
Parameters
tf
#Velocidad de rotores rad/s
#Las condiciones iniciales permiten igualar la acción de la gravedad
#Se tomo 4000rad/s como la velocidad maxima de los rotores
w1 = 912.32, >=0, <=3000
w2 = 912.32, >=0, <=3000
w3 = 912.32, >=0, <=3000
w4 = 912.32, >=0, <=3000
t1 = 0, >=0
t2 = 0, >=0
t3 = 0, >=0
t4 = 0, >=0
Constants
!----------------COEFICIENTES DEL MODELO-----------------!
#Gravedad
g = 9.81 !m/s^2
pi = 3.14159265359
#Motor Coefficients
J = 4.1904e-5 !kg*m^2
kt = 0.0104e-3 !N*m/A
kv = 96.342 !rad/s/volt
Dv = 0.2e-3 !N*m*s/rad
R = 0.2 !Ohms
#Battery parameters
Q = 1.55 !Ah
Rint = 0.02 !Ohms
E0 = 1.24 !volt
K = 2.92e-3 !volt
A = 0.156
B =2.35
#Quadrotor parameters
l = 0.175 !m
m = 1.3 !kg
Ix = 0.081 !kg*m^2
Iy = 0.081 !kg*m^2
Iz = 0.142 !kg*m^2
kb = 3.8305e-6 !N/rad/s
ktau = 2.2518e-8 !(N*m)/rad/s
#Parametrizacion del polinomio
a1 = -1.72e-5
a2 = 1.95e-5
a3 = -6.98e-6
a4 = 4.09e-7
b1 = 0.014
b2 = -0.0157
b3 = 5.656e-3
b4 = -3.908e-4
c1 = -0.8796
c2 = 0.3385
c3 = 0.2890
c4 = 0.1626
Variables
!------------------CONDICONES INICIALES------------------!
x = 0
xp = 0
y = 0
yp = 0
z = 0
zp = 0
pitch = 0, >=-pi/2, <=pi/2 !theta - restricciones
pitchp = 0
roll = 0, >=-pi/2, <=pi/2 !phi - restricciones
rollp = 0
yaw = 0 !psi
yawp = 0%, >=-200/180, <=200/180
#Función objetivo
of = 0 !condición inicial de la función objetivo
Intermediates
#Motor 1
aw1 = a1*w1^2 + b1*w1 + c1
bw1 = a2*w1^2 + b2*w1 + c2
cw1 = a3*w1^2 + b3*w1 + c3
dw1 = a4*w1^2 + b4*w1 + c4
#Motor 2
aw2 = a1*w2^2 + b1*w2 + c1
bw2 = a2*w2^2 + b2*w2 + c2
cw2 = a3*w2^2 + b3*w2 + c3
dw2 = a4*w2^2 + b4*w2 + c4
#Motor 3
aw3 = a1*w3^2 + b1*w3 + c1
bw3 = a2*w3^2 + b2*w3 + c2
cw3 = a3*w3^2 + b3*w3 + c3
dw3 = a4*w3^2 + b4*w3 + c4
#Motor 4
aw4 = a1*w4^2 + b1*w4 + c1
bw4 = a2*w4^2 + b2*w4 + c2
cw4 = a3*w4^2 + b3*w4 + c3
dw4 = a4*w4^2 + b4*w4 + c4
#frj(wj(t),Tj(t))
fr1=aw1*t1^3 + bw1*t1^2 + cw1*t1 + dw1
fr2=aw2*t2^3 + bw2*t2^2 + cw2*t2 + dw2
fr3=aw3*t3^3 + bw3*t3^2 + cw3*t3 + dw3
fr4=aw4*t4^3 + bw4*t4^2 + cw4*t4 + dw4
!---------------------CONTROL INPUTS---------------------!
T = kb * (w1^2 + w2^2 + w3^2 + w4^2)
u1 = kb * (w2^2 - w4^2)
u2 = kb * (w3^2 - w1^2)
u3 = ktau * (w1^2 - w2^2 + w3^2 - w4^2)
wline = w1 - w2 + w3 - w4
!-------------------ENERGIA POR ROTOR--------------------!
Ec1 = ((J*$w1 + ktau*w1^2 + Dv*w1)/fr1)*w1
Ec2 = ((J*$w2 + ktau*w2^2 + Dv*w2)/fr2)*w2
Ec3 = ((J*$w3 + ktau*w3^2 + Dv*w3)/fr3)*w3
Ec4 = ((J*$w4 + ktau*w4^2 + Dv*w4)/fr4)*w4
Ectotal = Ec1 + Ec2 + Ec3 + Ec4
Equations
!---------------MINIMIZAR FUNCIÓN OBJETIVO---------------!
minimize tf * of
!-----------------RELACION DE VARIABLES------------------!
xp = $x
yp = $y
zp = $z
pitchp = $pitch
rollp = $roll
yawp = $yaw
!-----------------CONDICONES DE FRONTERA-----------------!
#Condiciones finales del modelo
tf * x = 4
tf * y = 5
tf * z = 6
tf * xp = 0
tf * yp = 0
tf * zp = 0
tf * roll = 0
tf * pitch = 0
tf * yaw = 0
!-----------------TORQUE DE LOS MOTORES------------------!
t1 = J*$w1 + ktau*w1^2 + Dv*w1
t2 = J*$w2 + ktau*w2^2 + Dv*w2
t3 = J*$w3 + ktau*w3^2 + Dv*w3
t4 = J*$w4 + ktau*w4^2 + Dv*w4
!------------------------SUJETO A------------------------!
#Modelo aerodinámico del UAV
m*$xp = (cos(roll)*sin(pitch)*cos(yaw) + sin(roll)*sin(yaw))*T
m*$yp = (cos(roll)*sin(pitch)*sin(yaw) - sin(roll)*cos(yaw))*T
m*$zp = (cos(roll)*cos(pitch))*T-m*g
Ix*$rollp = ((Iy - Iz)*pitchp*yawp + J*pitchp*wline + l*u1)
Iy*$pitchp = ((Iz - Ix)*rollp*yawp - J*rollp*wline + l*u2)
Iz*$yawp = ((Ix - Iy)*rollp*pitchp + u3)
!--------------------FUNCIÓN OBJETIVO--------------------!
$of = Ectotal
MATLAB:
clear all; close all; clc
server = 'http://127.0.0.1';
app = 'traj_optima';
addpath('C:/Program Files/MATLAB/apm_matlab_v0.7.2/apm')
apm(server,app,'clear all');
apm_load(server,app,'ecuaciones_mod.apm');
csv_load(server,app,'tiempo2.csv');
apm_option(server,app,'apm.max_iter',200);
apm_option(server,app,'nlc.nodes',3);
apm_option(server,app,'apm.rtol',1);
apm_option(server,app,'apm.otol',1);
apm_option(server,app,'nlc.solver',3);
apm_option(server,app,'nlc.imode',6);
apm_option(server,app,'nlc.mv_type',1);
costo=1e-5;%1e-5
%VARIABLES CONTROLADAS
%Velocidades angulares
apm_info(server,app,'MV','w1');
apm_option(server,app,'w1.status',1);
apm_info(server,app,'MV','w2');
apm_option(server,app,'w2.status',1);
apm_info(server,app,'MV','w3');
apm_option(server,app,'w3.status',1);
apm_info(server,app,'MV','w4');
apm_option(server,app,'w4.status',1);
% Torques
apm_info(server,app,'MV','t1');
apm_option(server,app,'t1.status',1);
apm_info(server,app,'MV','t2');
apm_option(server,app,'t2.status',1);
apm_info(server,app,'MV','t3');
apm_option(server,app,'t3.status',1);
apm_info(server,app,'MV','t4');
apm_option(server,app,'t4.status',1);
%Salida
output = apm(server,app,'solve');
disp(output)
y = apm_sol(server,app);
z = y.x;
tiempo2.csv
time,tf
0,0
0.001,0
0.2,0
0.4,0
0.6,0
0.8,0
1,0
1.2,0
1.4,0
1.6,0
1.8,0
2,0
2.2,0
2.4,0
2.6,0
2.8,0
3,0
3.2,0
3.4,0
3.6,0
3.8,0
4,0
4.2,0
4.4,0
4.6,0
4.8,0
5,0
5.2,0
5.4,0
5.6,0
5.8,0
6,0
6.2,0
6.4,0
6.6,0
6.8,0
7,0
7.2,0
7.4,0
7.6,0
7.8,0
8,0
8.2,0
8.4,0
8.6,0
8.8,0
9,0
9.2,0
9.4,0
9.6,0
9.8,0
10,1
Finally the answer obtained is:
enter image description here
I need help with this local infeasibility problem, please.
The infeasible solution is caused by the terminal constraints:
tf * z = 4
tf * z = 5
tf * z = 6
When tf=0, the constraints are evaluated to 0=4, 0=5, 0=6 and the solver reports that these can not be satisfied by the solver. Instead, you can pose the constraints as:
tf * (x-4) = 0
tf * (y-5) = 0
tf * (z-6) = 0
That way, the constraint is valid when tf=0 and when tf=1 at the final time. A potential better way yet is to convert the terminal constraints to objective terms with f=1000 such as:
minimize f*tf*((x-4)^2 + (y-5)^2 + (z-6)^2)
minimize f*tf*(xp^2 + yp^2 + zp^2)
minimize f*tf*(roll^2 + pitch^2 + yaw^2)
That way, the optimizer won't report an infeasible solution if it can't reach the terminal constraints as discussed in the pendulum problem. I made a few other modifications to your model and script to achieve a successful solution. Here is a summary:
Converted terminal constraints to objective function (soft constraints)
Parameters t1-t4 should be variables
Fixed degree of freedom issue by making w1-w4 variables and w1p-w4p variables. w1-w4 are differential states.
Added constraints to w1p-w4p between -10 and 10 to help the solver converge
Added initialization step to simulate the model before optimizing. There are more details on initialization strategies in this paper: Safdarnejad, S.M., Hedengren, J.D., Lewis, N.R., Haseltine, E., Initialization Strategies for Optimization of Dynamic Systems, Computers and Chemical Engineering, 2015, Vol. 78, pp. 39-50, DOI: 10.1016/j.compchemeng.2015.04.016
Model
Parameters
tf
w1p = 0 > -10 < 10
w2p = 0 > -10 < 10
w3p = 0 > -10 < 10
w4p = 0 > -10 < 10
Constants
!----------------COEFICIENTES DEL MODELO-----------------!
#Gravedad
g = 9.81 !m/s^2
pi = 3.14159265359
#Motor Coefficients
J = 4.1904e-5 !kg*m^2
kt = 0.0104e-3 !N*m/A
kv = 96.342 !rad/s/volt
Dv = 0.2e-3 !N*m*s/rad
R = 0.2 !Ohms
#Battery parameters
Q = 1.55 !Ah
Rint = 0.02 !Ohms
E0 = 1.24 !volt
K = 2.92e-3 !volt
A = 0.156
B =2.35
#Quadrotor parameters
l = 0.175 !m
m = 1.3 !kg
Ix = 0.081 !kg*m^2
Iy = 0.081 !kg*m^2
Iz = 0.142 !kg*m^2
kb = 3.8305e-6 !N/rad/s
ktau = 2.2518e-8 !(N*m)/rad/s
#Parametrizacion del polinomio
a1 = -1.72e-5
a2 = 1.95e-5
a3 = -6.98e-6
a4 = 4.09e-7
b1 = 0.014
b2 = -0.0157
b3 = 5.656e-3
b4 = -3.908e-4
c1 = -0.8796
c2 = 0.3385
c3 = 0.2890
c4 = 0.1626
Variables
!------------------CONDICONES INICIALES------------------!
x = 0
xp = 0
y = 0
yp = 0
z = 0
zp = 0
pitch = 0, >=-pi/2, <=pi/2 !theta - restricciones
pitchp = 0
roll = 0, >=-pi/2, <=pi/2 !phi - restricciones
rollp = 0
yaw = 0 !psi
yawp = 0 %, >=-200/180, <=200/180
#Velocidad de rotores rad/s
#Las condiciones iniciales permiten igualar la acción de la gravedad
#Se tomo 4000rad/s como la velocidad maxima de los rotores
w1 = 912.32, >=0, <=3000
w2 = 912.32, >=0, <=3000
w3 = 912.32, >=0, <=3000
w4 = 912.32, >=0, <=3000
t1 = 0, >=0
t2 = 0, >=0
t3 = 0, >=0
t4 = 0, >=0
#Función objetivo
of = 0 !condición inicial de la función objetivo
Intermediates
#Motor 1
aw1 = a1*w1^2 + b1*w1 + c1
bw1 = a2*w1^2 + b2*w1 + c2
cw1 = a3*w1^2 + b3*w1 + c3
dw1 = a4*w1^2 + b4*w1 + c4
#Motor 2
aw2 = a1*w2^2 + b1*w2 + c1
bw2 = a2*w2^2 + b2*w2 + c2
cw2 = a3*w2^2 + b3*w2 + c3
dw2 = a4*w2^2 + b4*w2 + c4
#Motor 3
aw3 = a1*w3^2 + b1*w3 + c1
bw3 = a2*w3^2 + b2*w3 + c2
cw3 = a3*w3^2 + b3*w3 + c3
dw3 = a4*w3^2 + b4*w3 + c4
#Motor 4
aw4 = a1*w4^2 + b1*w4 + c1
bw4 = a2*w4^2 + b2*w4 + c2
cw4 = a3*w4^2 + b3*w4 + c3
dw4 = a4*w4^2 + b4*w4 + c4
#frj(wj(t),Tj(t))
fr1=aw1*t1^3 + bw1*t1^2 + cw1*t1 + dw1
fr2=aw2*t2^3 + bw2*t2^2 + cw2*t2 + dw2
fr3=aw3*t3^3 + bw3*t3^2 + cw3*t3 + dw3
fr4=aw4*t4^3 + bw4*t4^2 + cw4*t4 + dw4
!---------------------CONTROL INPUTS---------------------!
T = kb * (w1^2 + w2^2 + w3^2 + w4^2)
u1 = kb * (w2^2 - w4^2)
u2 = kb * (w3^2 - w1^2)
u3 = ktau * (w1^2 - w2^2 + w3^2 - w4^2)
wline = w1 - w2 + w3 - w4
!-------------------ENERGIA POR ROTOR--------------------!
Ec1 = ((J*$w1 + ktau*w1^2 + Dv*w1)/fr1)*w1
Ec2 = ((J*$w2 + ktau*w2^2 + Dv*w2)/fr2)*w2
Ec3 = ((J*$w3 + ktau*w3^2 + Dv*w3)/fr3)*w3
Ec4 = ((J*$w4 + ktau*w4^2 + Dv*w4)/fr4)*w4
Ectotal = Ec1 + Ec2 + Ec3 + Ec4
! scaling factor for terminal constraint
f = 1000
Equations
!---------------MINIMIZAR FUNCIÓN OBJETIVO---------------!
minimize tf * of
!-----------------RELACION DE VARIABLES------------------!
xp = $x
yp = $y
zp = $z
pitchp = $pitch
rollp = $roll
yawp = $yaw
w1p = $w1
w2p = $w2
w3p = $w3
w4p = $w4
!-----------------CONDICONES DE FRONTERA-----------------!
#Condiciones finales del modelo
#tf * (x-4) = 0
#tf * (y-5) = 0
#tf * (z-6) = 0
#tf * xp = 0
#tf * yp = 0
#tf * zp = 0
#tf * roll = 0
#tf * pitch = 0
#tf * yaw = 0
minimize f*tf*((x-4)^2 + (y-5)^2 + (z-6)^2)
minimize f*tf*(xp^2 + yp^2 + zp^2)
minimize f*tf*(roll^2 + pitch^2 + yaw^2)
!-----------------TORQUE DE LOS MOTORES------------------!
t1 = J*w1p + ktau*w1^2 + Dv*w1
t2 = J*w2p + ktau*w2^2 + Dv*w2
t3 = J*w3p + ktau*w3^2 + Dv*w3
t4 = J*w4p + ktau*w4^2 + Dv*w4
!------------------------SUJETO A------------------------!
#Modelo aerodinámico del UAV
m*$xp = (cos(roll)*sin(pitch)*cos(yaw) + sin(roll)*sin(yaw))*T
m*$yp = (cos(roll)*sin(pitch)*sin(yaw) - sin(roll)*cos(yaw))*T
m*$zp = (cos(roll)*cos(pitch))*T-m*g
Ix*$rollp = ((Iy - Iz)*pitchp*yawp + J*pitchp*wline + l*u1)
Iy*$pitchp = ((Iz - Ix)*rollp*yawp - J*rollp*wline + l*u2)
Iz*$yawp = ((Ix - Iy)*rollp*pitchp + u3)
!--------------------FUNCIÓN OBJETIVO--------------------!
$of = Ectotal
MATLAB Script
clear all; close all; clc
server = 'http://byu.apmonitor.com';
app = 'traj_optima';
addpath('apm')
apm(server,app,'clear all');
apm_load(server,app,'ecuaciones_mod.apm');
csv_load(server,app,'tiempo2.csv');
apm_option(server,app,'apm.max_iter',1000);
apm_option(server,app,'apm.nodes',3);
apm_option(server,app,'apm.rtol',1e-6);
apm_option(server,app,'apm.otol',1e-6);
apm_option(server,app,'apm.solver',3);
apm_option(server,app,'apm.imode',6);
apm_option(server,app,'apm.mv_type',1);
costo=1e-5;%1e-5
%VARIABLES CONTROLADAS
%Velocidades angulares
apm_info(server,app,'MV','w1p');
apm_option(server,app,'w1p.status',1);
apm_info(server,app,'MV','w2p');
apm_option(server,app,'w2p.status',1);
apm_info(server,app,'MV','w3p');
apm_option(server,app,'w3p.status',1);
apm_info(server,app,'MV','w4p');
apm_option(server,app,'w4p.status',1);
%Salida
disp('')
disp('------------- Initialize ----------------')
apm_option(server,app,'apm.coldstart',1);
output = apm(server,app,'solve');
disp(output)
disp('')
disp('-------------- Optimize -----------------')
apm_option(server,app,'apm.time_shift',0);
apm_option(server,app,'apm.coldstart',0);
output = apm(server,app,'solve');
disp(output)
y = apm_sol(server,app);
z = y.x;
This gives a successful solution but the terminal constraints are not met. The solver optimizes the use of w1p-w4p to minimize the objective but there is no solution that makes it to the terminal constraints.
The solution was found.
The final value of the objective function is 50477.4537378181
---------------------------------------------------
Solver : IPOPT (v3.12)
Solution time : 3.06940000000759 sec
Objective : 50477.4537378181
Successful solution
---------------------------------------------------
As a next step, I recommend that you increase the number of time points or allow the final time to change to meet the terminal constraints. You may also want to consider switching to Python Gekko that uses the same underlying engine as APM MATLAB. In this case, the modeling language is fully integrated with Python.

Matlab visualization

I used this link to walk through FDTD code and write it for myself to practice.
https://www.youtube.com/watch?v=y7hJAhKp2d8
This is the code that I wrote (its not verbatim, but extremely similar). When I run the program, I am told that "Field amplitude is too small to visualize properly." I don't understand why. As far as I understand Matlab (I'm very new to it), I am scaling the graph on my own. There is a function called draw1d that is in a protected file here: http://emlab.utep.edu/ee5390fdtd.htm
How should I fix this?
%FDTD1D
%Initailize
close all;
clc;
clear all;
% Units
meters = 1;
seconds = 1;
%Fundamental constants
c0 = 3e8 * meters/seconds;
e0 = 8.85e-12 * 1/meters;
u0 = 1.26e-6 * 1/meters;
%Figure Window
figure('Color', 'b');
%%%%%%%
%% FUNAMENTAL CONSTANTS
%%%%%%%%%
%Simple parameters
dz = 5 * meters;
Nz = 200;
dt = 1e-3 * seconds;
STEPS = 1000;
%%%%%%%
%% BUILD DEVICE ON GRID
%%%%%%%%%
%Grid Device - let it be air
ER = ones(1, Nz);
UR = ones(1, Nz);
%%%%%%%
%% Initialize FDTD Parameters
%%%%%%%%%
%Initialize Vectors
mEy = (c0*dt)./ER; %unique update coefficient for each place on the grid
mHx = (c0*dt)./UR; %unique update coefficent again
%%%%%%%
%% Initialize Fields
%%%%%%%%%
%Initialize Fields
Ey = zeros(1, Nz);
Hx = zeros(1, Nz);
%%%%%%%
%% IFDTD LOOP ANALYSIS
%%%%%%%%%
%Actually doing FDTD
for T = 1 : STEPS
%Update H from E
for nz = 1 : Nz - 1
Hx(nz) = Hx(nz) + mHx(nz)*(Ey(nz+1) - Ey(nz))/dz;
end
Hx(Nz) = Hx(Nz) + mHx(Nz)*(0 - Ey(Nz))/dz;
%Update E from H
Ey(1) = Ey(1) + mEy(1) * ( Hx(1) - 0 )/dz;
for nz = 2 : Nz
Ey(nz) = Ey(nz) + mEy(nz) * ( Hx(nz) - Hx(nz-1) )/dz;
end
%Show Status
if ~mod(T, 10)
draw1d(ER, Ey, Hx, dz);
xlim([dz Nz*dz]);
xlabel('z');
title(['Field at step ' num2str(T) ' of ' num2str(STEPS)]);
end
end

Vectorizing nested loops in matlab using bsxfun and with GPU

For loops seem to be extremely slow, so I was wondering if the nested loops in the code shown next could be vectorized using bsxfun and maybe GPU could be introduced too.
Code
%// Paramaters
i = 1;
j = 3;
n1 = 1500;
n2 = 1500;
%// Pre-allocate for output
LInc(n1+n2,n1+n2)=0;
%// Nested Loops - I
for x = 1:n1
for y = 1:n1
num = ((n2 ^ 2) * (L1(i, i) + L2(j, j) + 1)) - (n2 * n * (L1(x,i) + L1(y,i)));
LInc(x, y) = L1(x, y) + (num/denom);
LInc(y, x) = LInc(x, y);
end
end
%// Nested Loops - II
for x = 1:n1
for y = 1:n2
num = (n1 * n * L1(x,i)) + (n2 * n * L2(y,j)) - ((n1 * n2 * (L1(i, i) + L2(j, j) + 1)));
LInc(x, n1+y) = num/denom;
LInc(n1+y, x) = LInc(x, n1+y);
end
end
Edit 1: n and denom could be assumed as constants too.
Here are vectorized CPU and GPU codes and I am hoping that I am using at least good practices for the GPU code and the benchmarking later on.
CPU Code
%// Pre-allocate for output
LInc(n1+n2,n1+n2)=0;
%// Calculate num/denom value for stage 1 and 2
nd1 = L1 + (((n2 ^ 2) * (L1(i, i) + L2(j, j) + 1)) - n2*n*bsxfun(#plus,L1(:,i),L1(:,i).'))./denom; %//'
nd2 = (bsxfun(#plus,n1*n*L1(:,i),n2*n*L2(:,j).') - ((n1 * n2 * (L1(i, i) + L2(j, j) + 1))))./denom; %//'
%// Plug in the values in the output matrix
LInc(1:n1,1:n1) = tril(nd1) + tril(nd1,-1).'; %//'
LInc(n1+1:end,1:n1) = nd2.'; %//'
LInc(1:n1,n1+1:end) = nd2;
GPU Code
%// Pre-allocate for output
gLInc = zeros(n1+n2,n1+n2,'gpuArray');
%// Convert to gpu arrays
gL1 = gpuArray(L1);
gL2 = gpuArray(L2);
%// Calculate num/denom value for stage 1 and 2
nd1 = gL1 + (((n2 ^ 2) * (gL1(i, i) + gL2(j, j) + 1)) - n2*n*bsxfun(#plus,gL1(:,i),gL1(:,i).'))./denom; %//'
nd2 = (bsxfun(#plus,n1*n*gL1(:,i),n2*n*gL2(:,j).') - ((n1 * n2 * (gL1(i, i) + gL2(j, j) + 1))))./denom; %//'
%// Plug in the values in the output matrix
gLInc(1:n1,1:n1) = tril(nd1) + tril(nd1,-1).'; %//'
gLInc(n1+1:end,1:n1) = nd2.'; %//'
gLInc(1:n1,n1+1:end) = nd2;
%// Gather data from GPU back to CPU
LInc = gather(gLInc);
Benchmarking
GPU benchmarking tips were taken from Measure and Improve GPU Performance.
%// Warm up GPU call with insignificant small scalar inputs, just in case
%// gputimeit doesn't do the same
temp1 = modp2(1,1,1,1,1,1,1,1); %// This is vectorized GPU code
i = 1;
j = 3;
n = 1000; %// Assumed
denom = 1e6; %// Assumed
N_arr = [50 100 200 500 1000 1500]; %// array elements for N (datasize)
timeall = zeros(3,numel(N_arr));
for k1 = 1:numel(N_arr)
N = N_arr(k1);
n1 = N; %// n1, n2 are assumed identical for less-complicated benchmarking
n2 = N;
L1 = rand(n1,n1);
L2 = rand(n2,j);
f = #() modp0(i,j,n1,n2,L1,L2,n,denom);%// Original CPU w/ preallocation
timeall(1,k1) = timeit(f);
clear f
f = #() modp1(i,j,n1,n2,L1,L2,n,denom);%// Vectorzied CPU code
timeall(2,k1) = timeit(f);
clear f
f = #() modp2(i,j,n1,n2,L1,L2,n,denom);%// Vectorized GPU(GTX 750Ti) code
timeall(3,k1) = gputimeit(f);
clear f
end
%// Display benchmark results
figure,hold on, grid on
plot(N_arr,timeall(1,:),'-b.')
plot(N_arr,timeall(2,:),'-ro')
plot(N_arr,timeall(3,:),'-kx')
legend('Original CPU','Vectorized CPU','Vectorized GPU (GTX 750 Ti)')
xlabel('Datasize (N) ->'),ylabel('Time(sec) ->')
Results
Conclusions
Results show that the vectorized GPU code performs really well with higher datasize and goes from slower than both the vectorized CPU and original code to being twice as fast as the vectorized CPU code.
If you have not done so, you should preallocate LInc.
LInc = zeros(n1,n2);
If you want to vectorize it, you don't need to use bsxfun to vectorize your code. I think you can do something like
x = 1:n1;
y = 1:n1;
num = ((n2 ^ 2) * (L1(i, i) + L2(j, j) + 1)) - (n2 * n * (L1(x,i) + L1(y,i)));
LInc(x, y) = L1(x, y) + (num/denom);
However, this code is confusing to me because as it is, you are overwriting the value of LInc several times. Without knowing what your goal is its hard for me to help more. The above code probably will not return the same values as your function.

MATLAB code running slow on MacBookPro, triple while loop

I have been running a MATLAB program for almost six hours now, and it is still not complete. It is cycling through three while loops (the outer two loops are n=855, the inner loop is n=500). Is this a surprise that it is taking this long? Is there anything I can do to increase the speed? I am including the code below, as well as the variable data types underneath that.
while i < (numAtoms + 1)
pointAccessible = ones(numPoints,1);
j = 1;
while j <(numAtoms + 1)
if (i ~= j)
k=1;
while k < (numPoints + 1)
if (pointAccessible(k) == 1)
sphereCoord = [cell2mat(atomX(i)) + p + sphereX(k), cell2mat(atomY(i)) + p + sphereY(k), cell2mat(atomZ(i)) + p + sphereZ(k)];
neighborCoord = [cell2mat(atomX(j)), cell2mat(atomY(j)), cell2mat(atomZ(j))];
coords(1,:) = [sphereCoord];
coords(2,:) = [neighborCoord];
if (pdist(coords) < (atomRadius(j) + p))
pointAccessible(k)=0;
end
end
k = k + 1;
end
end
j = j+1;
end
remainingPoints(i) = sum(pointAccessible);
i = i +1;
end
Variable Data Types:
numAtoms = 855
numPoints = 500
p = 1.4
atomRadius = <855 * 1 double>
pointAccessible = <500 * 1 double>
atomX, atomY, atomZ = <1 * 855 cell>
sphereX, sphereY, sphereZ = <500 * 1 double>
remainingPoints = <855 * 1 double>

How to accelerate matlab code?

I'm using matlab to implement a multilayer neural network. In the code I represent
the value of each node AS netValue{k}
the weight between layer k and k + 1 AS weight{k}
etc.
Since these data is three-dimensional, I have to use cell to hold a 2-D matrix to enable matrix multiply.
So it becomes really really slow to train the model, which I expect to have resulted from the usage of cell.
Can anyone tell me how to accelerate this code? Thanks
clc;
close all;
clear all;
input = [-2 : 0.4 : 2;-2:0.4:2];
ican = 4;
depth = 4; % total layer - 1, by convension
[featureNum , sampleNum] = size(input);
levelNum(1) = featureNum;
levelNum(2) = 5;
levelNum(3) = 5;
levelNum(4) = 5;
levelNum(5) = 2;
weight = cell(0);
for k = 1 : depth
weight{k} = rand(levelNum(k+1), levelNum(k)) - 2 * rand(levelNum(k+1) , levelNum(k));
threshold{k} = rand(levelNum(k+1) , 1) - 2 * rand(levelNum(k+1) , 1);
end
runCount = 0;
sumMSE = 1; % init MSE
minError = 1e-5;
afa = 0.1; % step of "gradient ascendence"
% training loop
while(runCount < 100000 & sumMSE > minError)
sumMSE = 0; % sum of MSE
for i = 1 : sampleNum % sample loop
netValue{1} = input(:,i);
for k = 2 : depth
netValue{k} = weight{k-1} * netValue{k-1} + threshold{k-1}; %calculate each layer
netValue{k} = 1 ./ (1 + exp(-netValue{k})); %apply logistic function
end
netValue{depth+1} = weight{depth} * netValue{depth} + threshold{depth}; %output layer
e = 1 + sin((pi / 4) * ican * netValue{1}) - netValue{depth + 1}; %calc error
assistS{depth} = diag(ones(size(netValue{depth+1})));
s{depth} = -2 * assistS{depth} * e;
for k = depth - 1 : -1 : 1
assistS{k} = diag((1-netValue{k+1}).*netValue{k+1});
s{k} = assistS{k} * weight{k+1}' * s{k+1};
end
for k = 1 : depth
weight{k} = weight{k} - afa * s{k} * netValue{k}';
threshold{k} = threshold{k} - afa * s{k};
end
sumMSE = sumMSE + e' * e;
end
sumMSE = sqrt(sumMSE) / sampleNum;
runCount = runCount + 1;
end
x = [-2 : 0.1 : 2;-2:0.1:2];
y = zeros(size(x));
z = 1 + sin((pi / 4) * ican .* x);
% test
for i = 1 : length(x)
netValue{1} = x(:,i);
for k = 2 : depth
netValue{k} = weight{k-1} * netValue{k-1} + threshold{k-1};
netValue{k} = 1 ./ ( 1 + exp(-netValue{k}));
end
y(:, i) = weight{depth} * netValue{depth} + threshold{depth};
end
plot(x(1,:) , y(1,:) , 'r');
hold on;
plot(x(1,:) , z(1,:) , 'g');
hold off;
Have you used the profiler to find out what functions are actually slowing down your code? It shows what lines take the most time to execute.

Resources