What is the easiest to implement linear regression algorithm? - algorithm

I want to implement single variable regression using ordinary least squares. I have no access to linear algebra or calculus libraries, so any matrix operations or differentiation methods needs to be implemented by me. What is the least complex method?

John D. Cook has an excelent post on the subject with a simple C++ implementation. His implementation uses constant memory and can be parallelized with little effort.
I wrote a simple Python version of it. Use with caution, there may be bugs:
class Regression:
def __init__(self):
self.n = 0.0
self.sXY = 0.0
self.xM1 = 0.0
self.xM2 = 0.0
self.yM1 = 0.0
self.yM2 = 0.0
def add(self, x, y):
self.sXY += (self.xM1 - x) * (self.yM1 - y) * self.n / (self.n + 1.0);
n1 = self.n;
self.n+=1;
xdelta = x - self.xM1;
xdelta_n = xdelta / self.n;
self.xM1 += xdelta_n;
self.xM2 += xdelta * xdelta_n * n1;
ydelta = y - self.yM1;
ydelta_n = ydelta / self.n;
self.yM1 += ydelta_n;
self.yM2 += ydelta * ydelta_n * n1;
def count(self):
return self.n
def slope(self):
return self.sXY / self.xM2
def intercept(self):
return self.yM1 - (self.sXY / self.xM2) * self.xM1
def correlation(self):
return self.sXY / (self.xM2**0.5 * self.yM2**0.5)
def covariance(self):
return self.sXY / self.n
r = Regression()
r.add(1, 2)
r.add(4, 9)
r.add(16, 17)
r.add(17, 13)
r.add(21, 11)
print 'Count:', r.count()
print 'Slope:', r.slope()
print 'Intercept:', r.intercept()
print 'Correlation:', r.correlation()
print 'Covariance:', r.covariance()

Related

SVM to multiclass SVM

I have a code snippet for implementing SVM from scratch (binary classefier)
These code wouldn't work on minist because output predictions 0 or 1
class SVM:
def __init__(self, learning_rate=1e-3, lambda_param=1e-2, n_iters=1000):
self.lr = learning_rate
self.lambda_param = lambda_param
self.n_iters = n_iters
self.w = None
self.b = None
def _init_weights_bias(self, X):
n_features = X.shape[1]
self.w = np.zeros(n_features)
self.b = 0
def _get_cls_map(self, y):
return np.where(y <= 0, -1, 1)
def _satisfy_constraint(self, x, idx):
linear_model = np.dot(x, self.w) + self.b
return self.cls_map[idx] * linear_model >= 1
def _get_gradients(self, constrain, x, idx):
if constrain:
dw = self.lambda_param * self.w
db = 0
return dw, db
dw = self.lambda_param * self.w - np.dot(self.cls_map[idx], x)
db = - self.cls_map[idx]
return dw, db
def _update_weights_bias(self, dw, db):
self.w -= self.lr * dw
self.b -= self.lr * db
def fit(self, X, y):
self._init_weights_bias(X)
self.cls_map = self._get_cls_map(y)
for _ in range(self.n_iters):
for idx, x in enumerate(X):
constrain = self._satisfy_constraint(x, idx)
dw, db = self._get_gradients(constrain, x, idx)
self._update_weights_bias(dw, db)
def predict(self, X):
estimate = np.dot(X, self.w) + self.b
prediction = np.sign(estimate)
return np.where(prediction == -1, 0, 1)
What can I do to classify more than 20 classes
In other words make it multiclass SVM
any hint please ?
SVM does not handle multiclass cases natively. Most implementations just fit as much binary classifiers as there are classes (one vs rest) or as much as there are possible pairs (one vs one).
A more elegant approach would be using Hamming codes (see e.g. https://github.com/christianversloot/machine-learning-articles/blob/main/using-error-correcting-output-codes-for-multiclass-svm-classification.md), though it sounds like quite a challenge to implement from scratch.

Why does the code terminate with a "Solution Not Found" error and "EXIT: Converged to a point of local infeasibility. Problem may be infeasible"?

I cannot seem to figure out why IPOPT cannot find a solution to this. Initially, I thought the problem was totally infeasible but when I reduce the value of col_total to any number below 161000 or comment out the last constraint equation that contains col_total, it solves and EXITs with an Optimal Solution Found and a final objective value function of -161775.256826753. I have solved the same Maximization problem using Artificial Bee Colony and Particle Swamp Optimization techniques, and they solve and return optimal objective value function at least 225000 and 226000 respectively. Could it be that another solver is required? I have also tried APOPT, BPOPT, and IPOPT and have tinkered around with the tolerance values, but no combination none seems to work just yet. The code is posted below. Any guidance will be hugely appreciated.
from gekko import GEKKO
import numpy as np
distances = np.array([[[0, 0],[0,0],[0,0],[0,0]],\
[[155,0],[0,0],[0,0],[0,0]],\
[[310,0],[155,0],[0,0],[0,0]],\
[[465,0],[310,0],[155,0],[0,0]],\
[[620,0],[465,0],[310,0],[155,0]]])
alpha = 0.5 / np.log(30/0.075)
diam = 31
free = 7
rho = 1.2253
area = np.pi * (diam / 2)**2
min_v = 5.5
axi_max = 0.32485226746
col_total = 176542.96546512868
rat = 14
nn = 5
u_hub_lowerbound = 5.777777777777778
c_pow = 0.59230249
p_max = 0.5 * rho * area * c_pow * free**3
# Initialize Model
m = GEKKO(remote=True)
#initialize variables, Set lower and upper bounds
x = [m.Var(value = 0.03902278, lb = 0, ub = axi_max) \
for i in range(nn)]
# i = 0
b = 1
c = 0
v_s = list()
for i in range(nn-1): # Loop runs for nn-1 times
# print(i)
# print(i,b,c)
squared_defs = list()
while i < b:
d = distances[b][c][0]
r = distances[b][c][1]
ss = (2 * (alpha * d) / diam)
tt = r / ((diam/2) + (alpha * d))
squared_defs.append((2 * x[i] / (1 + ss**2)) * np.exp(-(tt**2)) ** 2)
i+=1
c+=1
#Equations
m.Equation((free * (1 - (sum(squared_defs))**0.5)) - rat <= 0)
m.Equation((free * (1 - (sum(squared_defs))**0.5)) - u_hub_lowerbound >= 0)
v_s.append(free * (1 - (sum(squared_defs))**0.5))
squared_defs.clear()
b+=1
c=0
# Inserts free as the first item on the v_s list to
# increase len(v_s) to nn, so that 'v_s' and 'x'
# are of same length
v_s.insert(0, free)
gamma = list()
for i in range(len(x)):
bet = (4*x[i]*((1-x[i])**2) * rho * area) / 2
gam = bet * v_s[i]**3
gamma.append(gam)
#Equations
m.Equation(x[i] - axi_max <= 0)
m.Equation((((4*x[i]*((1-x[i])**2) * rho * area) / 2) \
* v_s[i]**3) - p_max <= 0)
m.Equation((((4*x[i]*((1-x[i])**2) * rho * area) / 2) * \
v_s[i]**3) > 0)
#Equation
m.Equation(col_total - sum(gamma) <= 0)
#Objective
y = sum(gamma)
m.Maximize(y) # Maximize
#Set global options
m.options.IMODE = 3 #steady state optimization
#Solve simulation
m.options.SOLVER = 3
m.solver_options = ['linear_solver ma27','mu_strategy adaptive','max_iter 2500', 'tol 1.0e-5' ]
m.solve()
Built the equations without .value in the expressions. The x[i].value is only needed at the end to view the solution after the solution is complete or to initialize the value of x[i]. The expression m.Maximize(y) is more readable than m.Obj(-y) although they are equivalent.
from gekko import GEKKO
import numpy as np
distances = np.array([[[0, 0],[0,0],[0,0],[0,0]],\
[[155,0],[0,0],[0,0],[0,0]],\
[[310,0],[155,0],[0,0],[0,0]],\
[[465,0],[310,0],[155,0],[0,0]],\
[[620,0],[465,0],[310,0],[155,0]]])
alpha = 0.5 / np.log(30/0.075)
diam = 31
free = 7
rho = 1.2253
area = np.pi * (diam / 2)**2
min_v = 5.5
axi_max = 0.069262150781
col_total = 20000
p_max = 4000
rat = 14
nn = 5
# Initialize Model
m = GEKKO(remote=True)
#initialize variables, Set lower and upper bounds
x = [m.Var(value = 0.03902278, lb = 0, ub = axi_max) \
for i in range(nn)]
i = 0
b = 1
c = 0
v_s = list()
for turbs in range(nn-1): # Loop runs for nn-1 times
squared_defs = list()
while i < b:
d = distances[b][c][0]
r = distances[b][c][1]
ss = (2 * (alpha * d) / diam)
tt = r / ((diam/2) + (alpha * d))
squared_defs.append((2 * x[i] / (1 + ss**2)) \
* m.exp(-(tt**2)) ** 2)
i+=1
c+=1
#Equations
m.Equation((free * (1 - (sum(squared_defs))**0.5)) - rat <= 0)
m.Equation(min_v - (free * (1 - (sum(squared_defs))**0.5)) <= 0 )
v_s.append(free * (1 - (sum(squared_defs))**0.5))
squared_defs.clear()
b+=1
a=0
c=0
# Inserts free as the first item on the v_s list to
# increase len(v_s) to nn, so that 'v_s' and 'x'
# are of same length
v_s.insert(0, free)
beta = list()
gamma = list()
for i in range(len(x)):
bet = (4*x[i]*((1-x[i])**2) * rho * area) / 2
gam = bet * v_s[i]**3
#Equations
m.Equation((((4*x[i]*((1-x[i])**2) * rho * area) / 2) \
* v_s[i]**3) - p_max <= 0)
m.Equation((((4*x[i]*((1-x[i])**2) * rho * area) / 2) \
* v_s[i]**3) > 0)
gamma.append(gam)
#Equation
m.Equation(col_total - sum(gamma) <= 0)
#Objective
y = sum(gamma)
m.Maximize(y) # Maximize
#Set global options
m.options.IMODE = 3 #steady state optimization
#Solve simulation
m.options.SOLVER = 3
m.solve()
This gives a successful solution with maximized objective 20,000:
Number of Iterations....: 12
(scaled) (unscaled)
Objective...............: -4.7394814741924645e+00 -1.9999999999929641e+04
Dual infeasibility......: 4.4698510326511536e-07 1.8862194343304290e-03
Constraint violation....: 3.8275766582203308e-11 1.2941979026166479e-07
Complementarity.........: 2.1543608536533588e-09 9.0911246952931704e-06
Overall NLP error.......: 4.6245685940749926e-10 1.8862194343304290e-03
Number of objective function evaluations = 80
Number of objective gradient evaluations = 13
Number of equality constraint evaluations = 80
Number of inequality constraint evaluations = 0
Number of equality constraint Jacobian evaluations = 13
Number of inequality constraint Jacobian evaluations = 0
Number of Lagrangian Hessian evaluations = 12
Total CPU secs in IPOPT (w/o function evaluations) = 0.010
Total CPU secs in NLP function evaluations = 0.011
EXIT: Optimal Solution Found.
The solution was found.
The final value of the objective function is -19999.9999999296
---------------------------------------------------
Solver : IPOPT (v3.12)
Solution time : 3.210000000399305E-002 sec
Objective : -19999.9999999296
Successful solution
---------------------------------------------------

Filling a matrix using parallel processing in Julia

I'm trying to speed up the solution time for a dynamic programming problem in Julia (v. 0.5.0), via parallel processing. The problem involves choosing the optimal values for every element of a 1073 x 19 matrix at every iteration, until successive matrix differences fall within a tolerance. I thought that, within each iteration, filling in the values for each element of the matrix could be parallelized. However, I'm seeing a huge performance degradation using SharedArray, and I'm wondering if there's a better way to approach parallel processing for this problem.
I construct the arguments for the function below:
est_params = [.788,.288,.0034,.1519,.1615,.0041,.0077,.2,0.005,.7196]
r = 0.015
tau = 0.35
rho =est_params[1]
sigma =est_params[2]
delta = 0.15
gamma =est_params[3]
a_capital =est_params[4]
lambda1 =est_params[5]
lambda2 =est_params[6]
s =est_params[7]
theta =est_params[8]
mu =est_params[9]
p_bar_k_ss =est_params[10]
beta = (1+r)^(-1)
sigma_range = 4
gz = 19
gp = 29
gk = 37
lnz=collect(linspace(-sigma_range*sigma,sigma_range*sigma,gz))
z=exp(lnz)
gk_m = fld(gk,2)
# Need to add mu somewhere to k_ss
k_ss = (theta*(1-tau)/(r+delta))^(1/(1-theta))
k=cat(1,map(i->k_ss*((1-delta)^i),collect(1:gk_m)),map(i->k_ss/((1-delta)^i),collect(1:gk_m)))
insert!(k,gk_m+1,k_ss)
sort!(k)
p_bar=p_bar_k_ss*k_ss
p = collect(linspace(-p_bar/2,p_bar,gp))
#Tauchen
N = length(z)
Z = zeros(N,1)
Zprob = zeros(Float32,N,N)
Z[N] = lnz[length(z)]
Z[1] = lnz[1]
zstep = (Z[N] - Z[1]) / (N - 1)
for i=2:(N-1)
Z[i] = Z[1] + zstep * (i - 1)
end
for a = 1 : N
for b = 1 : N
if b == 1
Zprob[a,b] = 0.5*erfc(-((Z[1] - mu - rho * Z[a] + zstep / 2) / sigma)/sqrt(2))
elseif b == N
Zprob[a,b] = 1 - 0.5*erfc(-((Z[N] - mu - rho * Z[a] - zstep / 2) / sigma)/sqrt(2))
else
Zprob[a,b] = 0.5*erfc(-((Z[b] - mu - rho * Z[a] + zstep / 2) / sigma)/sqrt(2)) -
0.5*erfc(-((Z[b] - mu - rho * Z[a] - zstep / 2) / sigma)/sqrt(2))
end
end
end
# Collecting tauchen results in a 2 element array of linspace and array; [2] gets array
# Zprob=collect(tauchen(gz, rho, sigma, mu, sigma_range))[2]
Zcumprob=zeros(Float32,gz,gz)
# 2 in cumsum! denotes the 2nd dimension, i.e. columns
cumsum!(Zcumprob, Zprob,2)
gm = gk * gp
control=zeros(gm,2)
for i=1:gk
control[(1+gp*(i-1)):(gp*i),1]=fill(k[i],(gp,1))
control[(1+gp*(i-1)):(gp*i),2]=p
end
endog=copy(control)
E=Array(Float32,gm,gm,gz)
for h=1:gm
for m=1:gm
for j=1:gz
# set the nonzero net debt indicator
if endog[h,2]<0
p_ind=1
else
p_ind=0
end
# set the investment indicator
if (control[m,1]-(1-delta)*endog[h,1])!=0
i_ind=1
else
i_ind=0
end
E[m,h,j] = (1-tau)*z[j]*(endog[h,1]^theta) + control[m,2]-endog[h,2]*(1+r*(1-tau)) +
delta*endog[h,1]*tau-(control[m,1]-(1-delta)*endog[h,1]) -
(i_ind*gamma*endog[h,1]+endog[h,1]*(a_capital/2)*(((control[m,1]-(1-delta)*endog[h,1])/endog[h,1])^2)) +
s*endog[h,2]*p_ind
elem = E[m,h,j]
if E[m,h,j]<0
E[m,h,j]=elem+lambda1*elem-.5*lambda2*elem^2
else
E[m,h,j]=elem
end
end
end
end
I then constructed the function with serial processing. The two for loops iterate through each element to find the largest value in a 1072-sized (=the gm scalar argument in the function) array:
function dynam_serial(E,gm,gz,beta,Zprob)
v = Array(Float32,gm,gz )
fill!(v,E[cld(gm,2),cld(gm,2),cld(gz,2)])
Tv = Array(Float32,gm,gz)
# Set parameters for the loop
convcrit = 0.0001 # chosen convergence criterion
diff = 1 # arbitrary initial value greater than convcrit
while diff>convcrit
exp_v=v*Zprob'
for h=1:gm
for j=1:gz
Tv[h,j]=findmax(E[:,h,j] + beta*exp_v[:,j])[1]
end
end
diff = maxabs(Tv - v)
v=copy(Tv)
end
end
Timing this, I get:
#time dynam_serial(E,gm,gz,beta,Zprob)
> 106.880008 seconds (91.70 M allocations: 203.233 GB, 15.22% gc time)
Now, I try using Shared Arrays to benefit from parallel processing. Note that I reconfigured the iteration so that I only have one for loop, rather than two. I also use v=deepcopy(Tv); otherwise, v is copied as an Array object, rather than a SharedArray:
function dynam_parallel(E,gm,gz,beta,Zprob)
v = SharedArray(Float32,(gm,gz),init = S -> S[Base.localindexes(S)] = myid() )
fill!(v,E[cld(gm,2),cld(gm,2),cld(gz,2)])
# Set parameters for the loop
convcrit = 0.0001 # chosen convergence criterion
diff = 1 # arbitrary initial value greater than convcrit
while diff>convcrit
exp_v=v*Zprob'
Tv = SharedArray(Float32,gm,gz,init = S -> S[Base.localindexes(S)] = myid() )
#sync #parallel for hj=1:(gm*gz)
j=cld(hj,gm)
h=mod(hj,gm)
if h==0;h=gm;end;
#async Tv[h,j]=findmax(E[:,h,j] + beta*exp_v[:,j])[1]
end
diff = maxabs(Tv - v)
v=deepcopy(Tv)
end
end
Timing the parallel version; and using a 4-core 2.5 GHz I7 processor with 16GB of memory, I get:
addprocs(3)
#time dynam_parallel(E,gm,gz,beta,Zprob)
> 164.237208 seconds (2.64 M allocations: 201.812 MB, 0.04% gc time)
Am I doing something incorrect here? Or is there a better way to approach parallel processing in Julia for this particular problem? I've considered using Distributed Arrays, but it's difficult for me to see how to apply them to the present problem.
UPDATE:
Per #DanGetz and his helpful comments, I turned instead to trying to speed up the serial processing version. I was able to get performance down to 53.469780 seconds (67.36 M allocations: 103.419 GiB, 19.12% gc time) through:
1) Upgrading to 0.6.0 (saved about 25 seconds), which includes the helpful #views macro.
2) Preallocating the main array I'm trying to fill in (Tv), per the section on Preallocating Outputs in the Julia Performance Tips: https://docs.julialang.org/en/latest/manual/performance-tips/. (saved another 25 or so seconds)
The biggest remaining slow-down seems to be coming from the add_vecs function, which sums together subarrays of two larger matrices. I've tried devectorizing and using BLAS functions, but haven't been able to produce better performance.
In any event, the improved code for dynam_serial is below:
function add_vecs(r::Array{Float32},h::Int,j::Int,E::Array{Float32},exp_v::Array{Float32},beta::Float32)
#views r=E[:,h,j] + beta*exp_v[:,j]
return r
end
function dynam_serial(E::Array{Float32},gm::Int,gz::Int,beta::Float32,Zprob::Array{Float32})
v = Array{Float32}(gm,gz)
fill!(v,E[cld(gm,2),cld(gm,2),cld(gz,2)])
Tv = Array{Float32}(gm,gz)
r = Array{Float32}(gm)
# Set parameters for the loop
convcrit = 0.0001 # chosen convergence criterion
diff = 1 # arbitrary initial value greater than convcrit
while diff>convcrit
exp_v=v*Zprob'
for h=1:gm
for j=1:gz
#views Tv[h,j]=findmax(add_vecs(r,h,j,E,exp_v,beta))[1]
end
end
diff = maximum(abs,Tv - v)
v=copy(Tv)
end
return Tv
end
If add_vecs seems to be the critical function, writing an explicit for loop could offer more optimization. How does the following benchmark:
function add_vecs!(r::Array{Float32},h::Int,j::Int,E::Array{Float32},
exp_v::Array{Float32},beta::Float32)
#inbounds for i=1:size(E,1)
r[i]=E[i,h,j] + beta*exp_v[i,j]
end
return r
end
UPDATE
To continue optimizing dynam_serial I have tried to remove more allocations. The result is:
function add_vecs_and_max!(gm::Int,r::Array{Float64},h::Int,j::Int,E::Array{Float64},
exp_v::Array{Float64},beta::Float64)
#inbounds for i=1:gm
r[i] = E[i,h,j]+beta*exp_v[i,j]
end
return findmax(r)[1]
end
function dynam_serial(E::Array{Float64},gm::Int,gz::Int,
beta::Float64,Zprob::Array{Float64})
v = Array{Float64}(gm,gz)
fill!(v,E[cld(gm,2),cld(gm,2),cld(gz,2)])
r = Array{Float64}(gm)
exp_v = Array{Float64}(gm,gz)
# Set parameters for the loop
convcrit = 0.0001 # chosen convergence criterion
diff = 1.0 # arbitrary initial value greater than convcrit
while diff>convcrit
A_mul_Bt!(exp_v,v,Zprob)
diff = -Inf
for h=1:gm
for j=1:gz
oldv = v[h,j]
newv = add_vecs_and_max!(gm,r,h,j,E,exp_v,beta)
v[h,j]= newv
diff = max(diff, oldv-newv, newv-oldv)
end
end
end
return v
end
Switching the functions to use Float64 should increase speed (as CPUs are inherently optimized for 64-bit word lengths). Also, using the mutating A_mul_Bt! directly saves another allocation. Avoiding the copy(...) by switching the arrays v and Tv.
How do these optimizations improve your running time?
2nd UPDATE
Updated the code in the UPDATE section to use findmax. Also, changed dynam_serial to use v without Tv, as there was no need to save the old version except for the diff calculation, which is now done inside the loop.
Here's the code I copied-and-pasted, provided by Dan Getz above. I include the array and scalar definitions exactly as I ran them. Performance was: 39.507005 seconds (11 allocations: 486.891 KiB) when running #time dynam_serial(E,gm,gz,beta,Zprob).
using SpecialFunctions
est_params = [.788,.288,.0034,.1519,.1615,.0041,.0077,.2,0.005,.7196]
r = 0.015
tau = 0.35
rho =est_params[1]
sigma =est_params[2]
delta = 0.15
gamma =est_params[3]
a_capital =est_params[4]
lambda1 =est_params[5]
lambda2 =est_params[6]
s =est_params[7]
theta =est_params[8]
mu =est_params[9]
p_bar_k_ss =est_params[10]
beta = (1+r)^(-1)
sigma_range = 4
gz = 19 #15 #19
gp = 29 #19 #29
gk = 37 #25 #37
lnz=collect(linspace(-sigma_range*sigma,sigma_range*sigma,gz))
z=exp.(lnz)
gk_m = fld(gk,2)
# Need to add mu somewhere to k_ss
k_ss = (theta*(1-tau)/(r+delta))^(1/(1-theta))
k=cat(1,map(i->k_ss*((1-delta)^i),collect(1:gk_m)),map(i->k_ss/((1-delta)^i),collect(1:gk_m)))
insert!(k,gk_m+1,k_ss)
sort!(k)
p_bar=p_bar_k_ss*k_ss
p = collect(linspace(-p_bar/2,p_bar,gp))
#Tauchen
N = length(z)
Z = zeros(N,1)
Zprob = zeros(Float64,N,N)
Z[N] = lnz[length(z)]
Z[1] = lnz[1]
zstep = (Z[N] - Z[1]) / (N - 1)
for i=2:(N-1)
Z[i] = Z[1] + zstep * (i - 1)
end
for a = 1 : N
for b = 1 : N
if b == 1
Zprob[a,b] = 0.5*erfc(-((Z[1] - mu - rho * Z[a] + zstep / 2) / sigma)/sqrt(2))
elseif b == N
Zprob[a,b] = 1 - 0.5*erfc(-((Z[N] - mu - rho * Z[a] - zstep / 2) / sigma)/sqrt(2))
else
Zprob[a,b] = 0.5*erfc(-((Z[b] - mu - rho * Z[a] + zstep / 2) / sigma)/sqrt(2)) -
0.5*erfc(-((Z[b] - mu - rho * Z[a] - zstep / 2) / sigma)/sqrt(2))
end
end
end
# Collecting tauchen results in a 2 element array of linspace and array; [2] gets array
# Zprob=collect(tauchen(gz, rho, sigma, mu, sigma_range))[2]
Zcumprob=zeros(Float64,gz,gz)
# 2 in cumsum! denotes the 2nd dimension, i.e. columns
cumsum!(Zcumprob, Zprob,2)
gm = gk * gp
control=zeros(gm,2)
for i=1:gk
control[(1+gp*(i-1)):(gp*i),1]=fill(k[i],(gp,1))
control[(1+gp*(i-1)):(gp*i),2]=p
end
endog=copy(control)
E=Array(Float64,gm,gm,gz)
for h=1:gm
for m=1:gm
for j=1:gz
# set the nonzero net debt indicator
if endog[h,2]<0
p_ind=1
else
p_ind=0
end
# set the investment indicator
if (control[m,1]-(1-delta)*endog[h,1])!=0
i_ind=1
else
i_ind=0
end
E[m,h,j] = (1-tau)*z[j]*(endog[h,1]^theta) + control[m,2]-endog[h,2]*(1+r*(1-tau)) +
delta*endog[h,1]*tau-(control[m,1]-(1-delta)*endog[h,1]) -
(i_ind*gamma*endog[h,1]+endog[h,1]*(a_capital/2)*(((control[m,1]-(1-delta)*endog[h,1])/endog[h,1])^2)) +
s*endog[h,2]*p_ind
elem = E[m,h,j]
if E[m,h,j]<0
E[m,h,j]=elem+lambda1*elem-.5*lambda2*elem^2
else
E[m,h,j]=elem
end
end
end
end
function add_vecs_and_max!(gm::Int,r::Array{Float64},h::Int,j::Int,E::Array{Float64},
exp_v::Array{Float64},beta::Float64)
maxr = -Inf
#inbounds for i=1:gm r[i] = E[i,h,j]+beta*exp_v[i,j]
maxr = max(r[i],maxr)
end
return maxr
end
function dynam_serial(E::Array{Float64},gm::Int,gz::Int,
beta::Float64,Zprob::Array{Float64})
v = Array{Float64}(gm,gz)
fill!(v,E[cld(gm,2),cld(gm,2),cld(gz,2)])
Tv = Array{Float64}(gm,gz)
r = Array{Float64}(gm)
exp_v = Array{Float64}(gm,gz)
# Set parameters for the loop
convcrit = 0.0001 # chosen convergence criterion
diff = 1.0 # arbitrary initial value greater than convcrit
while diff>convcrit
A_mul_Bt!(exp_v,v,Zprob)
diff = -Inf
for h=1:gm
for j=1:gz
Tv[h,j]=add_vecs_and_max!(gm,r,h,j,E,exp_v,beta)
diff = max(abs(Tv[h,j]-v[h,j]),diff)
end
end
(v,Tv)=(Tv,v)
end
return v
end
Now, here's another version of the algorithm and inputs. The functions are similar to what Dan Getz suggested, except that I use findmax rather than an iterated max function to find the array maximum. In the input construction, I am using both Float32 and mixing different bit-types together. However, I've consistently achieved better performance this way: 24.905569 seconds (1.81 k allocations: 46.829 MiB, 0.01% gc time). But it's not clear at all why.
using SpecialFunctions
est_params = [.788,.288,.0034,.1519,.1615,.0041,.0077,.2,0.005,.7196]
r = 0.015
tau = 0.35
rho =est_params[1]
sigma =est_params[2]
delta = 0.15
gamma =est_params[3]
a_capital =est_params[4]
lambda1 =est_params[5]
lambda2 =est_params[6]
s =est_params[7]
theta =est_params[8]
mu =est_params[9]
p_bar_k_ss =est_params[10]
beta = Float32((1+r)^(-1))
sigma_range = 4
gz = 19
gp = 29
gk = 37
lnz=collect(linspace(-sigma_range*sigma,sigma_range*sigma,gz))
z=exp(lnz)
gk_m = fld(gk,2)
# Need to add mu somewhere to k_ss
k_ss = (theta*(1-tau)/(r+delta))^(1/(1-theta))
k=cat(1,map(i->k_ss*((1-delta)^i),collect(1:gk_m)),map(i->k_ss/((1-delta)^i),collect(1:gk_m)))
insert!(k,gk_m+1,k_ss)
sort!(k)
p_bar=p_bar_k_ss*k_ss
p = collect(linspace(-p_bar/2,p_bar,gp))
#Tauchen
N = length(z)
Z = zeros(N,1)
Zprob = zeros(Float32,N,N)
Z[N] = lnz[length(z)]
Z[1] = lnz[1]
zstep = (Z[N] - Z[1]) / (N - 1)
for i=2:(N-1)
Z[i] = Z[1] + zstep * (i - 1)
end
for a = 1 : N
for b = 1 : N
if b == 1
Zprob[a,b] = 0.5*erfc(-((Z[1] - mu - rho * Z[a] + zstep / 2) / sigma)/sqrt(2))
elseif b == N
Zprob[a,b] = 1 - 0.5*erfc(-((Z[N] - mu - rho * Z[a] - zstep / 2) / sigma)/sqrt(2))
else
Zprob[a,b] = 0.5*erfc(-((Z[b] - mu - rho * Z[a] + zstep / 2) / sigma)/sqrt(2)) -
0.5*erfc(-((Z[b] - mu - rho * Z[a] - zstep / 2) / sigma)/sqrt(2))
end
end
end
# Collecting tauchen results in a 2 element array of linspace and array; [2] gets array
# Zprob=collect(tauchen(gz, rho, sigma, mu, sigma_range))[2]
Zcumprob=zeros(Float32,gz,gz)
# 2 in cumsum! denotes the 2nd dimension, i.e. columns
cumsum!(Zcumprob, Zprob,2)
gm = gk * gp
control=zeros(gm,2)
for i=1:gk
control[(1+gp*(i-1)):(gp*i),1]=fill(k[i],(gp,1))
control[(1+gp*(i-1)):(gp*i),2]=p
end
endog=copy(control)
E=Array(Float32,gm,gm,gz)
for h=1:gm
for m=1:gm
for j=1:gz
# set the nonzero net debt indicator
if endog[h,2]<0
p_ind=1
else
p_ind=0
end
# set the investment indicator
if (control[m,1]-(1-delta)*endog[h,1])!=0
i_ind=1
else
i_ind=0
end
E[m,h,j] = (1-tau)*z[j]*(endog[h,1]^theta) + control[m,2]-endog[h,2]*(1+r*(1-tau)) +
delta*endog[h,1]*tau-(control[m,1]-(1-delta)*endog[h,1]) -
(i_ind*gamma*endog[h,1]+endog[h,1]*(a_capital/2)*(((control[m,1]-(1-delta)*endog[h,1])/endog[h,1])^2)) +
s*endog[h,2]*p_ind
elem = E[m,h,j]
if E[m,h,j]<0
E[m,h,j]=elem+lambda1*elem-.5*lambda2*elem^2
else
E[m,h,j]=elem
end
end
end
end
function add_vecs!(gm::Int,r::Array{Float32},h::Int,j::Int,E::Array{Float32},
exp_v::Array{Float32},beta::Float32)
#inbounds #views for i=1:gm
r[i]=E[i,h,j] + beta*exp_v[i,j]
end
return r
end
function dynam_serial(E::Array{Float32},gm::Int,gz::Int,beta::Float32,Zprob::Array{Float32})
v = Array{Float32}(gm,gz)
fill!(v,E[cld(gm,2),cld(gm,2),cld(gz,2)])
Tv = Array{Float32}(gm,gz)
# Set parameters for the loop
convcrit = 0.0001 # chosen convergence criterion
diff = 1.00000 # arbitrary initial value greater than convcrit
iter=0
exp_v=Array{Float32}(gm,gz)
r=Array{Float32}(gm)
while diff>convcrit
A_mul_Bt!(exp_v,v,Zprob)
for h=1:gm
for j=1:gz
Tv[h,j]=findmax(add_vecs!(gm,r,h,j,E,exp_v,beta))[1]
end
end
diff = maximum(abs,Tv - v)
(v,Tv)=(Tv,v)
end
return v
end

SciPy: way to speed up a complicated integral

I have a very complex integral to calculate:
from __future__ import division
from scipy.integrate import quad, nquad
import numpy as np
alpha = np.array([0.298073, 1.242567, 5.782948, 38.474970])
trial = np.array([0.08704173, 0.52509737, 0.51920929, 0.31233737])
class EigenvalueProblem:
def __init__(self, a, t):
self.alpha = a
self.trial = t
# Hamiltonian, interaction part
def hartree_integrand(self, coeff):
def hartree_potential(rr2):
return np.array([coeff[ii] * coeff[jj] *
np.exp(-(self.alpha[ii] +
self.alpha[jj]) * rr2 ** 2)
for ii in range(0, 4) for jj in range(0, 4)]).sum()
def length(theta, rr1, rr2):
return 1 / np.sqrt(rr1 ** 2 + rr2 ** 2 -
2 * rr1 * rr2 * np.cos(theta))
def tmp(theta, rr1, rr2):
return 8 * np.pi ** 2 * rr1 ** 2 * rr2 ** 2 * \
np.sin(theta) * hartree_potential(rr2) * \
length(theta, rr1, rr2)
def integrand(ii, jj, theta, rr1, rr2):
return np.exp(-(self.alpha[ii] + self.alpha[jj]) * rr1 ** 2) * tmp(theta, rr1, rr2)
return [
nquad(lambda theta, rr1, rr2: integrand(i, j, theta, rr1, rr2),
[[0, np.pi], [0, np.inf], [0, np.inf]]) for i in range(0, 4) for j in range(0, 4)]
hat = EigenvalueProblem(alpha, trial)
print hat.hartree_integrand(trial)
mathematically what I want to calculate is like this (which is the integrand function), with paremeters here. However, it takes more than several hours to compute this integral. I wonder is there any method to speed up it? Thank you very much!
You ought first to extend limits in integration over r1 and r2 to be from -Infinity to +Infinity - extend limits, multiply by 1/2*1/2, etc
Second, switch to use Gauss-Hermite quadrature, which is exactly suited to integrate function with e-x2 kernels.
Appropriate code is in NumPy, see references therein

Why do the weights of my iterative reweighted least square algorithm for logistic regression always end up with NaN?

I'm trying to code a iterative reweighted least square algorithm for logistic regression for face recognition (images represented as a 19x19 greyscale image) but the weights always end up with NaN.
w_new = zeros(361,1);
for i = 1:35 % 100-fold cross-validation of 3480 samples
[ phi, t, ~, ~ ] = removeRows100FoldCV(i, trainx, traint);
t( t == -1 ) = 0;
while(true)
w_old = w_new;
y = computeYs(w_old, phi);
R = generateR(y);
w_new = w_old - inv(phi' * R * phi) * phi' * (y - t);
if onlyMarginalChangesInW(w_new, w_old) == true
break;
end
end
end
The target vector t is originally 1 or -1 depending of the image representing a face or not.
Computation of y:
function [ y ] = computeYs( w, phi )
y = zeros(size(phi,1), 1);
for i = 1:size(phi,1)
a = w' * phi(i,:)';
y(i) = 1/(1+exp(-a));
end
end
Generation of R:
function [ R ] = generateR( y )
R = zeros(size(y));
for i = 1:size(R,1)
R(i,i) = y(i) * (1 - y(i));
end
end
And the break condition trigger:
function [ result ] = onlyMarginalChangesInW( w_new, w_old )
result = true;
for i = 1:size(w_new)
if (w_new(i) / w_old(i) > 0.01)
result = false;
break;
end
end
end
The NaN result occurs in your inv(phi' * R * phi). Did you check your phi? Try cond(phi) to check whether it is very large. That may cause that inverse operation provides a huge elements.
By the way, I am trying to understand why it is not w_new = w_old - inv(phi' * R * phi) * phi' * R * (y - t); in your iterative reweighted least square algorithm implementation?

Resources