Speed up the mullite operator cross product and dot product in numpy python - performance

I have a loop in one part of my code. I tried to change all vectors into an example to make it simple as see in below sample. I have to try it this loop 230000 inside the other loop. this part take about 26.36.
is there any way to speed up or tune the speed to get optimized speed.
trr=time.time()
for i in range (230000):
print(+1 *0.0001 * 1 * 1000 * (
1 * np.dot(np.subtract([2,1], [4,3]), [1,2]) + 1
* np.dot(
np.cross(np.array([0, 0, 0.5]),
np.array([1,2,3])),
np.array([1,0,0]))
- 1 * np.dot((np.cross(
np.array([0,0,-0.5]),
np.array([2,4,1]))), np.array(
[0,1,0]))) )
print(time.time()-trr)
the code with variable:
For i in range (23000):
.......
.....
else:
delta_fs = +1 * dt * 1 * ks_smoot * A_2d * (
np.dot(np.subtract(grains[p].v, grains[w].v), vti) * sign +
np.dot(np.cross(np.array([0, 0, grains[p].rotational_speed]),
np.array(np.array(xj_c) - np.array(xj_p))),
np.array([vti[0], vti[1], 0])) * sign
- np.dot((np.cross( np.array([0, 0, grains[w].rotational_speed]),
np.array(np.array(xj_c) - np.array(xj_w)))), np.array(
[vti[0], vti[1], 0])) * sign)

It would've been better if you kept your examples in variables, since your code is very difficult to read. Ignoring the fact that the loop in your example just computes the same constant value over and over again, I am working under the assumption that you need to run a specific set of numpy operations many times on various numpy arrays/vectors. You may find it useful to spend some time looking into the documentation for numba. Here's a very basic example:
import numpy as np
import numba as nb
CONST = 1*0.0001*1*1000
a0 = np.array([2.,1.])
a1 = np.array([4.,3.])
a2 = np.array([1.,2.])
b0 = np.array([0., 0., 0.5])
b1 = np.array([1.,2.,3.])
b2 = np.array([1.,0.,0.])
c0 = np.array([0.,0.,-0.5])
c1 = np.array([2.,4.,1.])
c2 = np.array([0.,1.,0.])
#nb.jit()
def op1(iters):
for i in range(iters):
op = CONST * (1 * np.dot(a0-a1,a2)
+ 1 * np.dot(np.cross(b0,b1),b2)
- 1 * np.dot(np.cross(c0,c1),c2))
op1(1) # Initial compilation
def op2(iters):
for i in range(iters):
op = CONST * (1 * np.dot(a0-a1,a2)
+ 1 * np.dot(np.cross(b0,b1),b2)
- 1 * np.dot(np.cross(c0,c1),c2))
%timeit -n 100 op1(100)
# 54 µs ± 2.49 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit -n 100 op2(100)
# 15.5 ms ± 313 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Seems like it'll be multiple order of magnitudes faster, which should easily bring your time down to a fraction of a second.

Related

Why does the code terminate with a "Solution Not Found" error and "EXIT: Converged to a point of local infeasibility. Problem may be infeasible"?

I cannot seem to figure out why IPOPT cannot find a solution to this. Initially, I thought the problem was totally infeasible but when I reduce the value of col_total to any number below 161000 or comment out the last constraint equation that contains col_total, it solves and EXITs with an Optimal Solution Found and a final objective value function of -161775.256826753. I have solved the same Maximization problem using Artificial Bee Colony and Particle Swamp Optimization techniques, and they solve and return optimal objective value function at least 225000 and 226000 respectively. Could it be that another solver is required? I have also tried APOPT, BPOPT, and IPOPT and have tinkered around with the tolerance values, but no combination none seems to work just yet. The code is posted below. Any guidance will be hugely appreciated.
from gekko import GEKKO
import numpy as np
distances = np.array([[[0, 0],[0,0],[0,0],[0,0]],\
[[155,0],[0,0],[0,0],[0,0]],\
[[310,0],[155,0],[0,0],[0,0]],\
[[465,0],[310,0],[155,0],[0,0]],\
[[620,0],[465,0],[310,0],[155,0]]])
alpha = 0.5 / np.log(30/0.075)
diam = 31
free = 7
rho = 1.2253
area = np.pi * (diam / 2)**2
min_v = 5.5
axi_max = 0.32485226746
col_total = 176542.96546512868
rat = 14
nn = 5
u_hub_lowerbound = 5.777777777777778
c_pow = 0.59230249
p_max = 0.5 * rho * area * c_pow * free**3
# Initialize Model
m = GEKKO(remote=True)
#initialize variables, Set lower and upper bounds
x = [m.Var(value = 0.03902278, lb = 0, ub = axi_max) \
for i in range(nn)]
# i = 0
b = 1
c = 0
v_s = list()
for i in range(nn-1): # Loop runs for nn-1 times
# print(i)
# print(i,b,c)
squared_defs = list()
while i < b:
d = distances[b][c][0]
r = distances[b][c][1]
ss = (2 * (alpha * d) / diam)
tt = r / ((diam/2) + (alpha * d))
squared_defs.append((2 * x[i] / (1 + ss**2)) * np.exp(-(tt**2)) ** 2)
i+=1
c+=1
#Equations
m.Equation((free * (1 - (sum(squared_defs))**0.5)) - rat <= 0)
m.Equation((free * (1 - (sum(squared_defs))**0.5)) - u_hub_lowerbound >= 0)
v_s.append(free * (1 - (sum(squared_defs))**0.5))
squared_defs.clear()
b+=1
c=0
# Inserts free as the first item on the v_s list to
# increase len(v_s) to nn, so that 'v_s' and 'x'
# are of same length
v_s.insert(0, free)
gamma = list()
for i in range(len(x)):
bet = (4*x[i]*((1-x[i])**2) * rho * area) / 2
gam = bet * v_s[i]**3
gamma.append(gam)
#Equations
m.Equation(x[i] - axi_max <= 0)
m.Equation((((4*x[i]*((1-x[i])**2) * rho * area) / 2) \
* v_s[i]**3) - p_max <= 0)
m.Equation((((4*x[i]*((1-x[i])**2) * rho * area) / 2) * \
v_s[i]**3) > 0)
#Equation
m.Equation(col_total - sum(gamma) <= 0)
#Objective
y = sum(gamma)
m.Maximize(y) # Maximize
#Set global options
m.options.IMODE = 3 #steady state optimization
#Solve simulation
m.options.SOLVER = 3
m.solver_options = ['linear_solver ma27','mu_strategy adaptive','max_iter 2500', 'tol 1.0e-5' ]
m.solve()
Built the equations without .value in the expressions. The x[i].value is only needed at the end to view the solution after the solution is complete or to initialize the value of x[i]. The expression m.Maximize(y) is more readable than m.Obj(-y) although they are equivalent.
from gekko import GEKKO
import numpy as np
distances = np.array([[[0, 0],[0,0],[0,0],[0,0]],\
[[155,0],[0,0],[0,0],[0,0]],\
[[310,0],[155,0],[0,0],[0,0]],\
[[465,0],[310,0],[155,0],[0,0]],\
[[620,0],[465,0],[310,0],[155,0]]])
alpha = 0.5 / np.log(30/0.075)
diam = 31
free = 7
rho = 1.2253
area = np.pi * (diam / 2)**2
min_v = 5.5
axi_max = 0.069262150781
col_total = 20000
p_max = 4000
rat = 14
nn = 5
# Initialize Model
m = GEKKO(remote=True)
#initialize variables, Set lower and upper bounds
x = [m.Var(value = 0.03902278, lb = 0, ub = axi_max) \
for i in range(nn)]
i = 0
b = 1
c = 0
v_s = list()
for turbs in range(nn-1): # Loop runs for nn-1 times
squared_defs = list()
while i < b:
d = distances[b][c][0]
r = distances[b][c][1]
ss = (2 * (alpha * d) / diam)
tt = r / ((diam/2) + (alpha * d))
squared_defs.append((2 * x[i] / (1 + ss**2)) \
* m.exp(-(tt**2)) ** 2)
i+=1
c+=1
#Equations
m.Equation((free * (1 - (sum(squared_defs))**0.5)) - rat <= 0)
m.Equation(min_v - (free * (1 - (sum(squared_defs))**0.5)) <= 0 )
v_s.append(free * (1 - (sum(squared_defs))**0.5))
squared_defs.clear()
b+=1
a=0
c=0
# Inserts free as the first item on the v_s list to
# increase len(v_s) to nn, so that 'v_s' and 'x'
# are of same length
v_s.insert(0, free)
beta = list()
gamma = list()
for i in range(len(x)):
bet = (4*x[i]*((1-x[i])**2) * rho * area) / 2
gam = bet * v_s[i]**3
#Equations
m.Equation((((4*x[i]*((1-x[i])**2) * rho * area) / 2) \
* v_s[i]**3) - p_max <= 0)
m.Equation((((4*x[i]*((1-x[i])**2) * rho * area) / 2) \
* v_s[i]**3) > 0)
gamma.append(gam)
#Equation
m.Equation(col_total - sum(gamma) <= 0)
#Objective
y = sum(gamma)
m.Maximize(y) # Maximize
#Set global options
m.options.IMODE = 3 #steady state optimization
#Solve simulation
m.options.SOLVER = 3
m.solve()
This gives a successful solution with maximized objective 20,000:
Number of Iterations....: 12
(scaled) (unscaled)
Objective...............: -4.7394814741924645e+00 -1.9999999999929641e+04
Dual infeasibility......: 4.4698510326511536e-07 1.8862194343304290e-03
Constraint violation....: 3.8275766582203308e-11 1.2941979026166479e-07
Complementarity.........: 2.1543608536533588e-09 9.0911246952931704e-06
Overall NLP error.......: 4.6245685940749926e-10 1.8862194343304290e-03
Number of objective function evaluations = 80
Number of objective gradient evaluations = 13
Number of equality constraint evaluations = 80
Number of inequality constraint evaluations = 0
Number of equality constraint Jacobian evaluations = 13
Number of inequality constraint Jacobian evaluations = 0
Number of Lagrangian Hessian evaluations = 12
Total CPU secs in IPOPT (w/o function evaluations) = 0.010
Total CPU secs in NLP function evaluations = 0.011
EXIT: Optimal Solution Found.
The solution was found.
The final value of the objective function is -19999.9999999296
---------------------------------------------------
Solver : IPOPT (v3.12)
Solution time : 3.210000000399305E-002 sec
Objective : -19999.9999999296
Successful solution
---------------------------------------------------

how to vectorize this function

The following code works, but I would like to create Z by vectorization. How to achieve that?
import numpy as np
from numpy import sqrt
from math import fsum
points = np.array([[0,0],\
[5,-1],\
[4,6],\
[1,3]])
d = lambda x: fsum([sqrt((x[0]-z[0])**2 + (x[1]-z[1])**2) for z in points])
x = np.linspace(min(points[:,0]),max(points[:,0]),100)
y = np.linspace(min(points[:,1]),max(points[:,1]),100)
X, Y = np.meshgrid(x,y)
Z = np.zeros(np.shape(X))
for (i,j),_ in np.ndenumerate(Z):
Z[i,j] = d([X[i,j],Y[i,j]])
#Z=d([X,Y]) #this fails
We can leverage broadcasting to work directly with the 1D versions and thus be more memory efficient and give ourselves a vectorized one-liner, like so -
Z = np.sqrt((x[:,None] - points[:,0])**2 + (y[:,None,None] - points[:,1])**2).sum(2)
Timings on posted sample data -
In [80]: %%timeit
...: X, Y = np.meshgrid(x,y)
...: Z = np.zeros(np.shape(X))
...: for (i,j),_ in np.ndenumerate(Z):
...: Z[i,j] = d([X[i,j],Y[i,j]])
10 loops, best of 3: 101 ms per loop
In [81]: %timeit ((x[:,None] - points[:,0])**2 + (y[:,None,None] - points[:,1])**2).sum(2)
1000 loops, best of 3: 246 µs per loop
400x speedup there!

OpenCV 3.1 optimization

I'm currently trying to implement an algorithm from a paper with OpenCV 3.1 on python 2.7 but the process is taking way too long.
The section of my code that's giving me trouble looks something like this:
width, height = mr.shape[:2]
Pm = []
for i in d:
M = np.float32([[1,0,-d[i]], [0,1,1]])
mrd = cv2.warpAffine(mr, M, (height,width))
C = cv2.subtract(ml, mrd)
C = cv2.pow(C,2)
C = np.divide(C, sigma_m)
C = p0 + (1-p0)**(-C)
Pm.append(C)
Where ml, mr and mrd are cv2 objects and d, p0 and sigma_m are integers.
The division and final equation in the last 3 lines are the real troublemakers here. Every iteration of this cycle is independent so in theory I could just split the 'for loop' through a few processors, but that seems like a lazy approach where I would just bypass the problem instead of fixing it.
Does anyone know a way to perform those computations faster?
We can leverage numexpr module to efficiently perform all of those latter arithmetic operations as one evaluate expression.
Thus, these steps :
C = cv2.subtract(ml, mrd)
C = cv2.pow(C,2)
C = np.divide(C, sigma_m)
C = p0 + (1-p0)**(-C)
could be replaced by one expression -
import numexpr as ne
C = ne.evaluate('p0 +(1-p0)**(-((ml-mrd)**2)/sigma_m)')
Let's verify things. The original approach as func -
def original_app(ml, mrd, sigma_m, p0):
C = cv2.subtract(ml, mrd)
C = cv2.pow(C,2)
C = np.divide(C, sigma_m)
C = p0 + (1-p0)**(-C)
return C
Verification -
In [28]: # Setup inputs
...: S = 1024 # Size parameter
...: ml = np.random.randint(0,255,(S,S))/255.0
...: mrd = np.random.randint(0,255,(S,S))/255.0
...: sigma_m = 0.45
...: p0 = 0.56
...:
In [29]: out1 = original_app(ml, mrd, sigma_m, p0)
In [30]: out2 = ne.evaluate('p0 +(1-p0)**(-((ml-mrd)**2)/sigma_m)')
In [31]: np.allclose(out1, out2)
Out[31]: True
Timings across various sizes of datasets -
In [19]: # Setup inputs
...: S = 1024 # Size parameter
...: ml = np.random.randint(0,255,(S,S))/255.0
...: mrd = np.random.randint(0,255,(S,S))/255.0
...: sigma_m = 0.45
...: p0 = 0.56
...:
In [20]: %timeit original_app(ml, mrd, sigma_m, p0)
10 loops, best of 3: 67.1 ms per loop
In [21]: %timeit ne.evaluate('p0 +(1-p0)**(-((ml-mrd)**2)/sigma_m)')
100 loops, best of 3: 12.9 ms per loop
In [22]: # Setup inputs
...: S = 512 # Size parameter
In [23]: %timeit original_app(ml, mrd, sigma_m, p0)
100 loops, best of 3: 15.3 ms per loop
In [24]: %timeit ne.evaluate('p0 +(1-p0)**(-((ml-mrd)**2)/sigma_m)')
100 loops, best of 3: 3.39 ms per loop
In [25]: # Setup inputs
...: S = 256 # Size parameter
In [26]: %timeit original_app(ml, mrd, sigma_m, p0)
100 loops, best of 3: 3.65 ms per loop
In [27]: %timeit ne.evaluate('p0 +(1-p0)**(-((ml-mrd)**2)/sigma_m)')
1000 loops, best of 3: 878 µs per loop
Around 5x speedup across various sizes with better speedups for larger arrays!
Also, as a side-note, I would advise using initialized arrays instead of appending as you are doing at the final step. Thus, we could initialize before going into the loop with something like out = np.zeros((len(d), width, height)) / np.empty and at the final step assign into the output array with : out[iteration_ID] = C.

speeding up some for loops in matlab

Basically I am trying to solve a 2nd order differential equation with the forward euler method. I have some for loops inside my code, which take considerable time to solve and I would like to speed things up a bit. Does anyone have any suggestions how could I do this?
And also when looking at the time it takes, I notice that my end at line 14 takes 45 % of my total time. What is end actually doing and why is it taking so much time?
Here is my simplified code:
t = 0:0.01:100;
dt = t(2)-t(1);
B = 3.5 * t;
F0 = 2 * t;
BB=zeros(1,length(t)); % Preallocation
x = 2; % Initial value
u = 0; % Initial value
for ii = 1:length(t)
for kk = 1:ii
BB(ii) = BB(ii) + B(kk) * u(ii-kk+1)*dt; % This line takes the most time
end % This end takes 45% of the other time
x(ii+1) = x(ii) + dt*u(ii);
u(ii+1) = u(ii) + dt * (F0(ii) - BB(ii));
end
Running the code it takes me 8.552 sec.
You can remove the inner loop, I think:
for ii = 1:length(t)
for kk = 1:ii
BB(ii) = BB(ii) + B(kk) * u(ii-kk+1)*dt; % This line takes the most time
end % This end takes 45% of the other time
x(ii+1) = x(ii) + dt*u(ii);
u(ii+1) = u(ii) + dt * (F0(ii) - BB(ii));
end
So BB(ii) = BB(ii) (zero at initalisation) + sum for 1 to ii of BB(kk)* u(ii-kk+1).dt
but kk = 1:ii, so for a given ii, ii-kk+1 → ii-(1:ii) + 1 → ii:-1:1
So I think this is equivalent to:
for ii = 1:length(t)
BB(ii) = sum(B(1:ii).*u(ii:-1:1)*dt);
x(ii+1) = x(ii) + dt*u(ii);
u(ii+1) = u(ii) + dt * (F0(ii) - BB(ii));
end
It doesn't take as long as 8 seconds for me using either method, but the version with only one loop is about 2x as fast (the output of BB appears to be the same).
Is the sum loop of B(kk) * u(ii-kk+1) just conv(B(1:ii),u(1:ii),'same')
The best way to speed up loops in matlab is to try to avoid them. Try if you are able to perform a matrix operation instead of the inner loop. For example try to break the calculation you do there in small parts, then decide, if there are parts you can perform in advance without knowing the results of the next iteration of the loop.
to your secound part of the question, my guess:: The end contains the check if the loop runs for another round and this check by it self is not that long but called 50.015.001 times!

Non-linear counter

So I have a counter. It is supposed to calculate the current amount of something. To calculate this, I know the start date, and start amount, and the amount to increment the counter by each second. Easy peasy. The tricky part is that the growth is not quite linear. Every day, the increment amount increases by a set amount. I need to recreate this algorithmically - basically figure out the exact value at the current date based on the starting value, the amount incremented over time, and the amount the increment has increased over time.
My target language is Javascript, but pseudocode is fine too.
Based on AB's solution:
var now = new Date();
var startDate1 = new Date("January 1 2010");
var days1 = (now - startDate1) / 1000 / 60 / 60 / 24;
var startNumber1 = 9344747520;
var startIncrement1 = 463;
var dailyIncrementAdjustment1 = .506;
var currentIncrement = startIncrement1 + (dailyIncrementAdjustment1 * days1);
startNumber1 = startNumber1 + (days1 / 2) * (2 * startIncrement1 + (days1 - 1) * dailyIncrementAdjustment1);
Does that look reasonable to you guys?
It's a quadratic function. If t is the time passed, then it's the usual at2+bt+c, and you can figure out a,b,c by substituting the results for the first 3 seconds.
Or: use the formula for the arithmetic progression sum, where a1 is the initial increment, and d is the "set amount" you refer to. Just don't forget to add your "start amount" to what the formula gives you.
If x0 is the initial amount, d is the initial increment, and e is the "set amount" to increase the incerement, it comes to
x0 + (t/2)*(2d + (t-1)*e)
If I understand your question correctly, you have an initial value x_0, an initial increment per second of d_0 and an increment adjustment of e per day. That is, on day one the increment per second is d_0, on day two the increment per second is d_0 + e, etc.
Then, we note that the increment per second at time t is
d(t) = d_0 + floor(t / S) * e
where S is the number of seconds per day and t is the number of seconds that have elapsed since t = t_0. Then
x = x_0 + sum_{k < floor(t / S)} S * d(k) + S * (t / S - floor(t / S)) * d(t)
is the formula that you are seeking. From here, you can simplify this to
x = x_0 + S * floor(t / S) d_0 + S * e * (floor(t / S) - 1) * floor(t / S) / 2.
use strict; use warnings;
my $start = 0;
my $stop = 100;
my $current = $start;
for my $day ( 1 .. 100 ) {
$current += ($day / 10);
last unless $current < $stop;
printf "Day: %d\tLeft %.2f\n", $day, (1 - $current/$stop);
}
Output:
Day: 1 Left 1.00
Day: 2 Left 1.00
Day: 3 Left 0.99
Day: 4 Left 0.99
Day: 5 Left 0.98
...
Day: 42 Left 0.10
Day: 43 Left 0.05
Day: 44 Left 0.01

Resources