Sympy Broadcasting Matrix-Vector Addition for MLP - matrix

I'm trying to get a grasp on declarative programming, so I've started learning Sympy and my "hello world" is to try to represent a standard MLP by converting a vanilla numpy implementation I found online. I am getting stuck trying to add the bias vector. Is there a differentiable way to do this operation in Sympy?
#! /usr/bin/env python3
import numpy as np
import random
import sympy as sp
i = 3
o = 1
x = sp.Symbol('x')
w = sp.Symbol('w')
b = sp.Symbol('b')
y = sp.Symbol('y')
Φ = sp.tanh # activation function
mlp = Φ(x*w+b)
L = lambda a, e: a - e # loss function
C = L(mlp, y)
dC = sp.diff(C, w) # partial deriv of C with respect to each weight
η = 0.01 # learning rate
if __name__ == "__main__":
random.seed(1)
train_inputs = np.array([[0, 0, 1], [1, 1, 1], [1, 0, 1], [0, 1, 1]])
train_outputs = np.array([[0, 1, 1, 0]]).T
W = 2 * np.random.rand(i, o) - 1 # TODO parameterize initialization
W = sp.Matrix(W)
B = 2 * np.random.rand(1, o) - 1 # TODO parameterize initialization
B = sp.Matrix(B)
for temp, ye in zip(train_inputs, train_outputs):
X = sp.Matrix(temp)
ya = mlp.subs({'x':X, 'w':W, 'b':B}).n()
Δ = dC.subs({'y':ye, 'x':X, 'b':B, 'w':W}).n()
W -= η * Δ
b -= η * Δ

Related

Monte Carlo program throws a method error in Julia

I am running this code but it shows a method error. Can someone please help me?
Code:
function lsmc_am_put(S, K, r, σ, t, N, P)
Δt = t / N
R = exp(r * Δt)
T = typeof(S * exp(-σ^2 * Δt / 2 + σ * √Δt * 0.1) / R)
X = Array{T}(N+1, P)
for p = 1:P
X[1, p] = x = S
for n = 1:N
x *= R * exp(-σ^2 * Δt / 2 + σ * √Δt * randn())
X[n+1, p] = x
end
end
V = [max(K - x, 0) / R for x in X[N+1, :]]
for n = N-1:-1:1
I = V .!= 0
A = [x^d for d = 0:3, x in X[n+1, :]]
β = A[:, I]' \ V[I]
cV = A' * β
for p = 1:P
ev = max(K - X[n+1, p], 0)
if I[p] && cV[p] < ev
V[p] = ev / R
else
V[p] /= R
end
end
end
return max(mean(V), K - S)
end
lsmc_am_put(100, 90, 0.05, 0.3, 180/365, 1000, 10000)
error:
MethodError: no method matching (Array{Float64})(::Int64, ::Int64)
Closest candidates are:
(Array{T})(::LinearAlgebra.UniformScaling, ::Integer, ::Integer) where T at /Volumes/Julia-1.8.3/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/LinearAlgebra/src/uniformscaling.jl:508
(Array{T})(::Nothing, ::Any...) where T at baseext.jl:45
(Array{T})(::UndefInitializer, ::Int64) where T at boot.jl:473
...
Stacktrace:
[1] lsmc_am_put(S::Int64, K::Int64, r::Float64, σ::Float64, t::Float64, N::Int64, P::Int64)
# Main ./REPL[39]:5
[2] top-level scope
# REPL[40]:1
I tried this code and I was expecting a numeric answer but this error came up. I tried to look it up on google but I found nothing that matches my situation.
The error occurs where you wrote X = Array{T}(N+1, P). Instead, use one of the following approaches if you need a Vector:
julia> Array{Float64, 1}([1,2,3])
3-element Vector{Float64}:
1.0
2.0
3.0
julia> Vector{Float64}([1, 2, 3])
3-element Vector{Float64}:
1.0
2.0
3.0
And in your case, you should write X = Array{T,1}([N+1, P]) or X = Vector{T}([N+1, P]). But since there's such a X[1, p] = x = S expression in your code, I guess you mean to initialize a 2D array and update its elements through the algorithm. For this, you can define X like the following:
X = zeros(Float64, N+1, P)
# Or
X = Array{Float64, 2}(undef, N+1, P)
So, I tried the following in your code:
# I just changed the definition of `X` in your code like the following
X = Array{T, 2}(undef, N+1, P)
#And the result of the code was:
julia> lsmc_am_put(100, 90, 0.05, 0.3, 180/365, 1000, 10000)
3.329213731484463

MVGC F value and P value calculation

I am currently working to find Multivariate Granger Causality F value and p value via this code.
def demean(x, axis=0):
"Return x minus its mean along the specified axis"
x = np.asarray(x)
if axis == 0 or axis is None or x.ndim <= 1:
return x - x.mean(axis)
ind = [slice(None)] * x.ndim
ind[axis] = np.newaxis
return x - x.mean(axis)[ind]
#------------------------------
def tsdata_to_autocov(X, q):
import numpy as np
from matplotlib import pylab
if len(X.shape) == 2:
X = np.expand_dims(X, axis=2)
[n, m, N] = np.shape(X)
else:
[n, m, N] = np.shape(X)
X = demean(X, axis=1)
G = np.zeros((n, n, (q+1)))
for k in range(q+1):
M = N * (m-k)
G[:,:,k] = np.dot(np.reshape(X[:,k:m,:], (n, M)), np.reshape(X[:,0:m-k,:], (n, M)).conj().T) / M-1
return G
#-------------------------
def autocov_to_mvgc(G, x, y):
import numpy as np
from mvgc import autocov_to_var
n = G.shape[0]
z = np.arange(n)
z = np.delete(z,[np.array(np.hstack((x, y)))])
# indices of other variables (to condition out)
xz = np.array(np.hstack((x, z)))
xzy = np.array(np.hstack((xz, y)))
F = 0
# full regression
ixgrid1 = np.ix_(xzy,xzy)
[AF,SIG] = autocov_to_var(G[ixgrid1])
# reduced regression
ixgrid2 = np.ix_(xz,xz)
[AF,SIGR] = autocov_to_var(G[ixgrid2])
ixgrid3 = np.ix_(x,x)
F = np.log(np.linalg.det(SIGR[ixgrid3]))-np.log(np.linalg.det(SIG[ixgrid3]))
return F
Can anyone show me an example for how they got to solving for F and p?
It would also help a lot to see what your timeseries data looks like.

Tricks to improve the performance of a cunstom function in Julia

I am replicating using Julia a sequence of steps originally made in Matlab. In Octave, this procedure takes 1.4582 seconds and in Julia (using Jupyter) it takes approximately 10 seconds. I'll try to be brief in the scripts. My goal is to achieve or improve Octave's performance. First of all, I will describe my variables and some function:
zgrid (double 1x7 size)
kgrid (double 500x1 size)
V0 (double 500x7 size)
P (double 7x7 size) a transition matrix
delta and beta are fixed parameters.
F(z,k) and u(c) are particular functions and are specified in the Julia script.
% Octave script
% V0 is given
[K, Z, K2] = meshgrid(kgrid, zgrid, kgrid);
K = permute(K, [2, 1, 3]);
Z = permute(Z, [2, 1, 3]);
K2 = permute(K2, [2, 1, 3]);
C = max(f(Z,K) + (1-delta)*K - K2,0);
U = u(C);
EV = V0*P';% EV is a 500x7 matrix size
EV = permute(repmat(EV, 1, 1, 500), [3, 2, 1]);
H = U + beta*EV;
[TV, index] = max(H, [], 3);
In Julia, I created a function that replicates this procedure. I used loops, but it has a performance 9 times longer.
% Julia script
% V0 is the input of my T operator function
V0 = repeat(sqrt.(kgrid), outer = [1,7]);
F = (z,k) -> exp(z)*(k^α);
u = (c) -> (c^(1-μ) - 1)/(1-μ)
% parameters
α = 1/3
β = 0.987
δ = 0.012;
μ = 2
Kss = 48.1905148382166
kgrid = range(0.75*Kss, stop=1.25*Kss, length=500);
zgrid = [-0.06725382459813659, -0.044835883065424395, -0.0224179415327122, 0 , 0.022417941532712187, 0.04483588306542438, 0.06725382459813657]
function T(V)
E=V*P'
T1 = zeros(Float64, 500, 7 )
aux = zeros(Float64, 500)
for i = 1:7
for j = 1:500
for l = 1:500
c= maximum( (F(zrid[i],kgrid[j]) +(1-δ)*kgrid[j] - kgrid[l],0))
aux[l] = u(c) + β*E[l,i]
end
T1[j,i] = maximum(aux)
end
end
return T1
end
I would very much like to improve my performance in Julia. I believe there is a way to improve, but I am new in Julia programming.
This code runs for me in 5ms. Note that I have made F and u into proper (not anonymous) functions, F_ and u_, but you could get a similar effect by making the anonymous functions const.
Your main problem is that you have a lot of non-const global variables, and also that your main function is doing unnecessary work multiple times, and creating an unnecessary array, aux.
The performance tips section in the manual is essential reading: https://docs.julialang.org/en/v1/manual/performance-tips/
F_(z,k) = exp(z) * (k^(1/3)); # you can still use α, but it must be const
u_(c) = (c^(1-2) - 1)/(1-2)
function T_(V, P, kgrid, zgrid, β, δ)
E = V * P'
T1 = similar(V)
for i in axes(T1, 2)
for j in axes(T1, 1)
temp = F_(zgrid[i], kgrid[j]) + (1-δ)*kgrid[j]
aux = -Inf
for l in eachindex(kgrid)
c = max(0.0, temp - kgrid[l])
aux = max(aux, u_(c) + β * E[l, i])
end
T1[j,i] = aux
end
end
return T1
end
Benchmark:
V0 = repeat(sqrt.(kgrid), outer = [1,7]);
zgrid = sort!(rand(1, 7); dims=2)
kgrid = sort!(rand(500, 1); dims=1)
P = rand(length(zgrid), length(zgrid))
#btime T_($V0, $P, $kgrid, $zgrid, $β, $δ);
# output: 5.126 ms (4 allocations: 54.91 KiB)
The following should perform much better. The most noticeable differences are that it calculates F 500x less, and doesn't rely on global variables.
function T(V,kgrid,zgrid,β,δ)
E=V*P'
T1 = zeros(Float64, 500, 7)
for j = 1:500
for i = 1:7
x = F(zrid[i],kgrid[j]) +(1-δ)*kgrid[j]
T1[j,i] = maximum(u(max(x - kgrid[l], 0)) + β*E[l,i] for l in 1:500)
end
end
return T1
end

Basinhopping algorithm returns 'RuntimeWarning: invalid value encountered in double_scalars'

Hei everyone. So, I'm having a problem with basinhopping algorithm. It works quite OK for all my other datasets, but for a specific one, it returns this error I mentioned in the title. The goal is to solve a non-linear system of three equations, with three unknowns (x[0], x[1], x[2]). Code is summarized bellow:
import numpy as np
from scipy.optimize import basinhopping
xmin = [-10 , -1, 0]
xmax = [10, 1, 1]
bnds = np.array([(low, high) for low, high in zip(xmin, xmax)])
x0 = np.array([0,0,0])
Solution = np.zeros((len(Vp),3))
kwargs = dict(bounds=bnds, tol=1.0e-16)
for ii in range(0,len(Vp)):
def fun(x):
return ((C[1]*x[1] + C[2]*x[2] + C[3]*x[0]*x[2] + C[4]*x[1]**2 +
C[5]*x[1]*x[2] + C[6]**x[2]**2 - Vp[ii])**2 +
(D[1]*x[1] + D[2]*x[0]*x[1] - Vs[ii])**2 +
(E[1]*x[1] + E[2]*x[0]**2 +
E[3]*x[0]*x[1]- Density[ii])**2)
sol = basinhopping(fun, x0, minimizer_kwargs = kwargs)
Solution[ii,:] = np.asarray(sol.x)
x0 = Solution[ii,:]
The values of C, D, E, Vp, Vs and Density are
C = [ 0. , 0.23602278, -0.03112353, -0.00200731, 0.12631723,
0.01686503, -0.03604464]
D = [ 0. , -0.01411336, -0.00048822]
E = [ 0.00000000e+00, 2.74052684e-02, -2.51367338e-05,
9.57081839e-04]
Vp = [ 0.026575, -0.076332, -0.063064, -0.058712, -0.103915, -0.145887,
-0.080618, -0.079216, -0.171163, -0.115634, -0.076444]
Vs = [ 0.002064, -0.000847, 0.002788, 0.004255, 0.003588, 0.003474,
0.001675, 0.004111, 0.006459, 0.004753, 0.009658]
Density = [ 0.001782, -0.001433, -0.002109, -0.002997, -0.009515, -0.005077,
-0.004877, -0.013798, -0.010524, -0.015607, -0.017562]
I think I may be missing something quite basic here, because this code has been working so far. I realize this error is usually related to dividing by zero, but I do not see as it is happening in this case... Any tips?

Scipy - A better way to avoid manually loop when matrix is sparse

Logistic regression's objective function is
and the gradient is
where w is a scipy's csr sparse matrix with dim n-by-1.
My question is, when I have one scipy's csr sparse matrix and one numpy array, X_train and y_train respectively. (Each row of X_train is x_i, each element of y_train is y_i)
Is there a better way to calculate the gradient without using manully for loop?
For further information, I'm implementing large scale logistic regression. Therefore the performance is important.
Thanks.
Update 5/19 (Add my current code)
Thanks for #Jaime's reminding, here is my code. I basically want to see if there is a better way to implement gradient(X, y, w).
import numpy as np
import scipy as sp
from sklearn import datasets
from numpy.linalg import norm
from scipy import sparse
eta = 0.01
xi = 0.1
C = 1
X_train, y_train = datasets.load_svmlight_file('lr/datasets/a9a')
X_test, y_test = datasets.load_svmlight_file('lr/datasets/a9a.t', n_features=X_train.shape[1])
def gradient(X, y, w):
# w should be a col vector
summation = w
for i in range(X.shape[0]):
exp_i = np.exp( y[i] * X.getrow(i).dot(w)[0, 0] )
summation = summation - (y[i] / (1 + exp_i)) * X.getrow(i).T
return summation
def hes_mul(X, D, s):
# w and s should be a col vector
# should return a col vector
return s + C * X.T.dot( D.dot( X.dot(s) ) )
def cg(X, y, w):
# gradF is col vector, so all of these are col vectors
gradF = gradient(X, y, w)
s = sparse.csr_matrix( np.zeros(X_train.shape[1]) ).T
r = -1 * gradF
d = r
D = []
for i in range(X.shape[0]):
exp_i = np.exp( (-1) * y[i] * w.T.dot(X.getrow(i).T)[0, 0] )
D.append(exp_i / ((1 + exp_i) ** 2))
D = sparse.diags(D, 0)
while True:
r_norm = np.sqrt((r.data ** 2).sum())
print r_norm
print np.sqrt((gradF.data ** 2).sum())
if r_norm <= xi * np.sqrt((gradF.data ** 2).sum()):
return s
hes_mul_d = hes_mul(X, D, d)
alpha = (r_norm ** 2) / d.T.dot( hes_mul_d )[0, 0]
s = s + alpha * d
r = r - alpha * hes_mul_d
beta = (r.data ** 2).sum() / (r_norm ** 2)
d = r + beta * d
w = sparse.csr_matrix( np.zeros(X_train.shape[1]) ).T
s = cg(X_train, y_train, w)

Resources