Error in x, because it is not a numeric matrix - matrix-inverse

Here is the function I wrote:
myOLS <- function(y = dependent_var, x = independent_vars) {
as.matrix(y)
as.vector(x)
b <- inv((t(x) %*% x) %*% (t(x)) %*% y)
y_hat <- x %*% b
res <- y - y_hat
var(res)
print(b)
print(var(res))
}
myOLS()
But this is not working. If I run myOLS(), an error occurs that says this: Error in Inverse(X, tol = sqrt(.Machine$double.eps), ...) : X must be a square numeric matrix
Can someone tell me what I did wrong?

Related

Monte Carlo program throws a method error in Julia

I am running this code but it shows a method error. Can someone please help me?
Code:
function lsmc_am_put(S, K, r, σ, t, N, P)
Δt = t / N
R = exp(r * Δt)
T = typeof(S * exp(-σ^2 * Δt / 2 + σ * √Δt * 0.1) / R)
X = Array{T}(N+1, P)
for p = 1:P
X[1, p] = x = S
for n = 1:N
x *= R * exp(-σ^2 * Δt / 2 + σ * √Δt * randn())
X[n+1, p] = x
end
end
V = [max(K - x, 0) / R for x in X[N+1, :]]
for n = N-1:-1:1
I = V .!= 0
A = [x^d for d = 0:3, x in X[n+1, :]]
β = A[:, I]' \ V[I]
cV = A' * β
for p = 1:P
ev = max(K - X[n+1, p], 0)
if I[p] && cV[p] < ev
V[p] = ev / R
else
V[p] /= R
end
end
end
return max(mean(V), K - S)
end
lsmc_am_put(100, 90, 0.05, 0.3, 180/365, 1000, 10000)
error:
MethodError: no method matching (Array{Float64})(::Int64, ::Int64)
Closest candidates are:
(Array{T})(::LinearAlgebra.UniformScaling, ::Integer, ::Integer) where T at /Volumes/Julia-1.8.3/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/LinearAlgebra/src/uniformscaling.jl:508
(Array{T})(::Nothing, ::Any...) where T at baseext.jl:45
(Array{T})(::UndefInitializer, ::Int64) where T at boot.jl:473
...
Stacktrace:
[1] lsmc_am_put(S::Int64, K::Int64, r::Float64, σ::Float64, t::Float64, N::Int64, P::Int64)
# Main ./REPL[39]:5
[2] top-level scope
# REPL[40]:1
I tried this code and I was expecting a numeric answer but this error came up. I tried to look it up on google but I found nothing that matches my situation.
The error occurs where you wrote X = Array{T}(N+1, P). Instead, use one of the following approaches if you need a Vector:
julia> Array{Float64, 1}([1,2,3])
3-element Vector{Float64}:
1.0
2.0
3.0
julia> Vector{Float64}([1, 2, 3])
3-element Vector{Float64}:
1.0
2.0
3.0
And in your case, you should write X = Array{T,1}([N+1, P]) or X = Vector{T}([N+1, P]). But since there's such a X[1, p] = x = S expression in your code, I guess you mean to initialize a 2D array and update its elements through the algorithm. For this, you can define X like the following:
X = zeros(Float64, N+1, P)
# Or
X = Array{Float64, 2}(undef, N+1, P)
So, I tried the following in your code:
# I just changed the definition of `X` in your code like the following
X = Array{T, 2}(undef, N+1, P)
#And the result of the code was:
julia> lsmc_am_put(100, 90, 0.05, 0.3, 180/365, 1000, 10000)
3.329213731484463

How can I speedup this Julia code?

The code implements an example of a Pollard rho() function for finding a factor of a positive integer, n. I've examined some of the code in the Julia "Primes" package that runs rapidly in an attempt to speedup the pollard_rho() function, all to no avail. The code should execute n = 1524157897241274137 in approximately 100 mSec to 30 Sec (Erlang, Haskell, Mercury, SWI Prolog) but takes about 3 to 4 minutes on JuliaBox, IJulia, and the Julia REPL. How can I make this go fast?
pollard_rho(1524157897241274137) = 1234567891
__precompile__()
module Pollard
export pollard_rho
function pollard_rho{T<:Integer}(n::T)
f(x::T, r::T, n) = rem(((x ^ T(2)) + r), n)
r::T = 7; x::T = 2; y::T = 11; y1::T = 11; z::T = 1
while z == 1
x = f(x, r, n)
y1 = f(y, r, n)
y = f(y1, r, n)
z = gcd(n, abs(x - y))
end
z >= n ? "error" : z
end
end # module
There are quite a few problems with type instability here.
Don't return either the string "error" or a result; instead explicitly call error().
As Chris mentioned, x and r ought to be annotated to be of type T, else they will be unstable.
There also seems to be a potential problem with overflow. A solution is to widen in the squaring step before truncating back to type T.
function pollard_rho{T<:Integer}(n::T)
f(x::T, r::T, n) = rem(Base.widemul(x, x) + r, n) % T
r::T = 7; x::T = 2; y::T = 11; y1::T = 11; z::T = 1
while z == 1
x = f(x, r, n)
y1 = f(y, r, n)
y = f(y1, r, n)
z = gcd(n, abs(x - y))
end
z >= n ? error() : z
end
After making these changes the function will run as fast as you could expect.
julia> #btime pollard_rho(1524157897241274137)
4.128 ms (0 allocations: 0 bytes)
1234567891
To find these problems with type instability, use the #code_warntype macro.

Haskell performance of memoization implement of dynamic programming is poor [duplicate]

I'm trying to memoize the following function:
gridwalk x y
| x == 0 = 1
| y == 0 = 1
| otherwise = (gridwalk (x - 1) y) + (gridwalk x (y - 1))
Looking at this I came up with the following solution:
gw :: (Int -> Int -> Int) -> Int -> Int -> Int
gw f x y
| x == 0 = 1
| y == 0 = 1
| otherwise = (f (x - 1) y) + (f x (y - 1))
gwlist :: [Int]
gwlist = map (\i -> gw fastgw (i `mod` 20) (i `div` 20)) [0..]
fastgw :: Int -> Int -> Int
fastgw x y = gwlist !! (x + y * 20)
Which I then can call like this:
gw fastgw 20 20
Is there an easier, more concise and general way (notice how I had to hardcode the max grid dimensions in the gwlist function in order to convert from 2D to 1D space so I can access the memoizing list) to memoize functions with multiple parameters in Haskell?
You can use a list of lists to memoize the function result for both parameters:
memo :: (Int -> Int -> a) -> [[a]]
memo f = map (\x -> map (f x) [0..]) [0..]
gw :: Int -> Int -> Int
gw 0 _ = 1
gw _ 0 = 1
gw x y = (fastgw (x - 1) y) + (fastgw x (y - 1))
gwstore :: [[Int]]
gwstore = memo gw
fastgw :: Int -> Int -> Int
fastgw x y = gwstore !! x !! y
Use the data-memocombinators package from hackage. It provides easy to use memorization techniques and provides an easy and breve way to use them:
import Data.MemoCombinators (memo2,integral)
gridwalk = memo2 integral integral gridwalk' where
gridwalk' x y
| x == 0 = 1
| y == 0 = 1
| otherwise = (gridwalk (x - 1) y) + (gridwalk x (y - 1))
Here is a version using Data.MemoTrie from the MemoTrie package to memoize the function:
import Data.MemoTrie(memo2)
gridwalk :: Int -> Int -> Int
gridwalk = memo2 gw
where
gw 0 _ = 1
gw _ 0 = 1
gw x y = gridwalk (x - 1) y + gridwalk x (y - 1)
If you want maximum generality, you can memoize a memoizing function.
memo :: (Num a, Enum a) => (a -> b) -> [b]
memo f = map f (enumFrom 0)
gwvals = fmap memo (memo gw)
fastgw :: Int -> Int -> Int
fastgw x y = gwvals !! x !! y
This technique will work with functions that have any number of arguments.
Edit: thanks to Philip K. for pointing out a bug in the original code. Originally memo had a "Bounded" constraint instead of "Num" and began the enumeration at minBound, which would only be valid for natural numbers.
Lists aren't a good data structure for memoizing, though, because they have linear lookup complexity. You might be better off with a Map or IntMap. Or look on Hackage.
Note that this particular code does rely on laziness, so if you wanted to switch to using a Map you would need to take a bounded amount of elements from the list, as in:
gwByMap :: Int -> Int -> Int -> Int -> Int
gwByMap maxX maxY x y = fromMaybe (gw x y) $ M.lookup (x,y) memomap
where
memomap = M.fromList $ concat [[((x',y'),z) | (y',z) <- zip [0..maxY] ys]
| (x',ys) <- zip [0..maxX] gwvals]
fastgw2 :: Int -> Int -> Int
fastgw2 = gwByMap 20 20
I think ghc may be stupid about sharing in this case, you may need to lift out the x and y parameters, like this:
gwByMap maxX maxY = \x y -> fromMaybe (gw x y) $ M.lookup (x,y) memomap

Monte Carlo pi method

I try to calculate Monte Carlo pi function in R. I have some problems in the code.
For now I write this code:
ploscinaKvadrata <- 0
ploscinaKroga <- 0
n = 1000
for (i in i:n) {
x <- runif(1000, min= -1, max= 1)
y <- runif(1000, min= -1, max= 1)
if ((x^2 + y^2) <= 1) {
ploscinaKroga <- ploscinaKroga + 1
} else {
ploscinaKvadrata <- ploscinaKvadrata + 1
}
izracunPi = 4* ploscinaKroga/ploscinaKvadrata
}
izracunPi
This is not working, but I don't know how to fix it.
I would also like to write a code to plot this (with circle inside square and with dots).
Here is a vectorized version (and there was also something wrong with your math)
N <- 1000000
R <- 1
x <- runif(N, min= -R, max= R)
y <- runif(N, min= -R, max= R)
is.inside <- (x^2 + y^2) <= R^2
pi.estimate <- 4 * sum(is.inside) / N
pi.estimate
# [1] 3.141472
As far as plotting the points, you can do something like this:
plot.new()
plot.window(xlim = 1.1 * R * c(-1, 1), ylim = 1.1 * R * c(-1, 1))
points(x[ is.inside], y[ is.inside], pch = '.', col = "blue")
points(x[!is.inside], y[!is.inside], pch = '.', col = "red")
but I'd recommend you use a smaller N value, maybe 10000.
This is a fun game -- and there are a number of versions of it floating around the web. Here's one I hacked from the named source (tho' his code was somewhat naive).
from http://giventhedata.blogspot.com/2012/09/estimating-pi-with-r-via-mcs-dart-very.html
est.pi <- function(n){
# drawing in [0,1] x [0,1] covers one quarter of square and circle
# draw random numbers for the coordinates of the "dart-hits"
a <- runif(n,0,1)
b <- runif(n,0,1)
# use the pythagorean theorem
c <- sqrt((a^2) + (b^2) )
inside <- sum(c<1)
#outside <- n-inside
pi.est <- inside/n*4
return(pi.est)
}
Typo 'nside' to 'inside'

Python performance: iteration and operations on nested lists

Problem Hey folks. I'm looking for some advice on python performance. Some background on my problem:
Given:
A (x,y) mesh of nodes each with a value (0...255) starting at 0
A list of N input coordinates each at a specified location within the range (0...x, 0...y)
A value Z that defines the "neighborhood" in count of nodes
Increment the value of the node at the input coordinate and the node's neighbors. Neighbors beyond the mesh edge are ignored. (No wrapping)
BASE CASE: A mesh of size 1024x1024 nodes, with 400 input coordinates and a range Z of 75 nodes.
Processing should be O(x*y*Z*N). I expect x, y and Z to remain roughly around the values in the base case, but the number of input coordinates N could increase up to 100,000. My goal is to minimize processing time.
Current results Between my start and the comments below, we've got several implementations.
Running speed on my 2.26 GHz Intel Core 2 Duo with Python 2.6.1:
f1: 2.819s
f2: 1.567s
f3: 1.593s
f: 1.579s
f3b: 1.526s
f4: 0.978s
f1 is the initial naive implementation: three nested for loops.
f2 is replaces the inner for loop with a list comprehension.
f3 is based on Andrei's suggestion in the comments and replaces the outer for with map()
f is Chris's suggestion in the answers below
f3b is kriss's take on f3
f4 is Alex's contribution.
Code is included below for your perusal.
Question How can I further reduce the processing time? I'd prefer sub-1.0s for the test parameters.
Please, keep the recommendations to native Python. I know I can move to a third-party package such as numpy, but I'm trying to avoid any third party packages. Also, I've generated random input coordinates, and simplified the definition of the node value updates to keep our discussion simple. The specifics have to change slightly and are outside the scope of my question.
thanks much!
**`f1` is the initial naive implementation: three nested `for` loops.**
def f1(x,y,n,z):
rows = [[0]*x for i in xrange(y)]
for i in range(n):
inputX, inputY = (int(x*random.random()), int(y*random.random()))
topleft = (inputX - z, inputY - z)
for i in xrange(max(0, topleft[0]), min(topleft[0]+(z*2), x)):
for j in xrange(max(0, topleft[1]), min(topleft[1]+(z*2), y)):
if rows[i][j] <= 255: rows[i][j] += 1
f2 is replaces the inner for loop with a list comprehension.
def f2(x,y,n,z):
rows = [[0]*x for i in xrange(y)]
for i in range(n):
inputX, inputY = (int(x*random.random()), int(y*random.random()))
topleft = (inputX - z, inputY - z)
for i in xrange(max(0, topleft[0]), min(topleft[0]+(z*2), x)):
l = max(0, topleft[1])
r = min(topleft[1]+(z*2), y)
rows[i][l:r] = [j+(j<255) for j in rows[i][l:r]]
UPDATE: f3 is based on Andrei's suggestion in the comments and replaces the outer for with map(). My first hack at this requires several out-of-local-scope lookups, specifically recommended against by Guido: local variable lookups are much faster than global or built-in variable lookups I hardcoded all but the reference to the main data structure itself to minimize that overhead.
rows = [[0]*x for i in xrange(y)]
def f3(x,y,n,z):
inputs = [(int(x*random.random()), int(y*random.random())) for i in range(n)]
rows = map(g, inputs)
def g(input):
inputX, inputY = input
topleft = (inputX - 75, inputY - 75)
for i in xrange(max(0, topleft[0]), min(topleft[0]+(75*2), 1024)):
l = max(0, topleft[1])
r = min(topleft[1]+(75*2), 1024)
rows[i][l:r] = [j+(j<255) for j in rows[i][l:r]]
UPDATE3: ChristopeD also pointed out a couple improvements.
def f(x,y,n,z):
rows = [[0] * y for i in xrange(x)]
rn = random.random
for i in xrange(n):
topleft = (int(x*rn()) - z, int(y*rn()) - z)
l = max(0, topleft[1])
r = min(topleft[1]+(z*2), y)
for u in xrange(max(0, topleft[0]), min(topleft[0]+(z*2), x)):
rows[u][l:r] = [j+(j<255) for j in rows[u][l:r]]
UPDATE4: kriss added a few improvements to f3, replacing min/max with the new ternary operator syntax.
def f3b(x,y,n,z):
rn = random.random
rows = [g1(x, y, z) for x, y in [(int(x*rn()), int(y*rn())) for i in xrange(n)]]
def g1(x, y, z):
l = y - z if y - z > 0 else 0
r = y + z if y + z < 1024 else 1024
for i in xrange(x - z if x - z > 0 else 0, x + z if x + z < 1024 else 1024 ):
rows[i][l:r] = [j+(j<255) for j in rows[i][l:r]]
UPDATE5: Alex weighed in with his substantive revision, adding a separate map() operation to cap the values at 255 and removing all non-local-scope lookups. The perf differences are non-trivial.
def f4(x,y,n,z):
rows = [[0]*y for i in range(x)]
rr = random.randrange
inc = (1).__add__
sat = (0xff).__and__
for i in range(n):
inputX, inputY = rr(x), rr(y)
b = max(0, inputX - z)
t = min(inputX + z, x)
l = max(0, inputY - z)
r = min(inputY + z, y)
for i in range(b, t):
rows[i][l:r] = map(inc, rows[i][l:r])
for i in range(x):
rows[i] = map(sat, rows[i])
Also, since we all seem to be hacking around with variations, here's my test harness to compare speeds: (improved by ChristopheD)
def timing(f,x,y,z,n):
fn = "%s(%d,%d,%d,%d)" % (f.__name__, x, y, z, n)
ctx = "from __main__ import %s" % f.__name__
results = timeit.Timer(fn, ctx).timeit(10)
return "%4.4s: %.3f" % (f.__name__, results / 10.0)
if __name__ == "__main__":
print timing(f, 1024, 1024, 400, 75)
#add more here.
On my (slow-ish;-) first-day Macbook Air, 1.6GHz Core 2 Duo, system Python 2.5 on MacOSX 10.5, after saving your code in op.py I see the following timings:
$ python -mtimeit -s'import op' 'op.f1()'
10 loops, best of 3: 5.58 sec per loop
$ python -mtimeit -s'import op' 'op.f2()'
10 loops, best of 3: 3.15 sec per loop
So, my machine is slower than yours by a factor of a bit more than 1.9.
The fastest code I have for this task is:
def f3(x=x,y=y,n=n,z=z):
rows = [[0]*y for i in range(x)]
rr = random.randrange
inc = (1).__add__
sat = (0xff).__and__
for i in range(n):
inputX, inputY = rr(x), rr(y)
b = max(0, inputX - z)
t = min(inputX + z, x)
l = max(0, inputY - z)
r = min(inputY + z, y)
for i in range(b, t):
rows[i][l:r] = map(inc, rows[i][l:r])
for i in range(x):
rows[i] = map(sat, rows[i])
which times as:
$ python -mtimeit -s'import op' 'op.f3()'
10 loops, best of 3: 3 sec per loop
so, a very modest speedup, projecting to more than 1.5 seconds on your machine - well above the 1.0 you're aiming for:-(.
With a simple C-coded extensions, exte.c...:
#include "Python.h"
static PyObject*
dopoint(PyObject* self, PyObject* args)
{
int x, y, z, px, py;
int b, t, l, r;
int i, j;
PyObject* rows;
if(!PyArg_ParseTuple(args, "iiiiiO",
&x, &y, &z, &px, &py, &rows
))
return 0;
b = px - z;
if (b < 0) b = 0;
t = px + z;
if (t > x) t = x;
l = py - z;
if (l < 0) l = 0;
r = py + z;
if (r > y) r = y;
for(i = b; i < t; ++i) {
PyObject* row = PyList_GetItem(rows, i);
for(j = l; j < r; ++j) {
PyObject* pyitem = PyList_GetItem(row, j);
long item = PyInt_AsLong(pyitem);
if (item < 255) {
PyObject* newitem = PyInt_FromLong(item + 1);
PyList_SetItem(row, j, newitem);
}
}
}
Py_RETURN_NONE;
}
static PyMethodDef exteMethods[] = {
{"dopoint", dopoint, METH_VARARGS, "process a point"},
{0}
};
void
initexte()
{
Py_InitModule("exte", exteMethods);
}
(note: I haven't checked it carefully -- I think it doesn't leak memory due to the correct interplay of reference stealing and borrowing, but it should be code inspected very carefully before being put in production;-), we could do
import exte
def f4(x=x,y=y,n=n,z=z):
rows = [[0]*y for i in range(x)]
rr = random.randrange
for i in range(n):
inputX, inputY = rr(x), rr(y)
exte.dopoint(x, y, z, inputX, inputY, rows)
and the timing
$ python -mtimeit -s'import op' 'op.f4()'
10 loops, best of 3: 345 msec per loop
shows an acceleration of 8-9 times, which should put you in the ballpark you desire. I've seen a comment saying you don't want any third-party extension, but, well, this tiny extension you could make entirely your own;-). ((Not sure what licensing conditions apply to code on Stack Overflow, but I'll be glad to re-release this under the Apache 2 license or the like, if you need that;-)).
1. A (smaller) speedup could definitely be the initialization of your rows...
Replace
rows = []
for i in range(x):
rows.append([0 for i in xrange(y)])
with
rows = [[0] * y for i in xrange(x)]
2. You can also avoid some lookups by moving random.random out of the loops (saves a little).
3. EDIT: after corrections -- you could arrive at something like this:
def f(x,y,n,z):
rows = [[0] * y for i in xrange(x)]
rn = random.random
for i in xrange(n):
topleft = (int(x*rn()) - z, int(y*rn()) - z)
l = max(0, topleft[1])
r = min(topleft[1]+(z*2), y)
for u in xrange(max(0, topleft[0]), min(topleft[0]+(z*2), x)):
rows[u][l:r] = [j+(j<255) for j in rows[u][l:r]]
EDIT: some new timings with timeit (10 runs) -- seems this provides only minor speedups:
import timeit
print timeit.Timer("f1(1024,1024,400,75)", "from __main__ import f1").timeit(10)
print timeit.Timer("f2(1024,1024,400,75)", "from __main__ import f2").timeit(10)
print timeit.Timer("f(1024,1024,400,75)", "from __main__ import f3").timeit(10)
f1 21.1669280529
f2 12.9376120567
f 11.1249599457
in your f3 rewrite, g can be simplified. (Can also be applied to f4)
You have the following code inside a for loop.
l = max(0, topleft[1])
r = min(topleft[1]+(75*2), 1024)
However, it appears that those values never change inside the for loop. So calculate them once, outside the loop instead.
Based on your f3 version I played with the code. As l and r are constants you can avoid to compute them in g1 loop. Also using new ternary if instead of min and max seems to be consistently faster. Also simplified expression with topleft. On my system it appears to be about 20% faster using with the code below.
def f3b(x,y,n,z):
rows = [g1(x, y, z) for x, y in [(int(x*random.random()), int(y*random.random())) for i in range(n)]]
def g1(x, y, z):
l = y - z if y - z > 0 else 0
r = y + z if y + z < 1024 else 1024
for i in xrange(x - z if x - z > 0 else 0, x + z if x + z < 1024 else 1024 ):
rows[i][l:r] = [j+(j<255) for j in rows[i][l:r]]
You can create your own Python module in C, and control the performance as you want:
http://docs.python.org/extending/

Resources