I am currently working to find Multivariate Granger Causality F value and p value via this code.
def demean(x, axis=0):
"Return x minus its mean along the specified axis"
x = np.asarray(x)
if axis == 0 or axis is None or x.ndim <= 1:
return x - x.mean(axis)
ind = [slice(None)] * x.ndim
ind[axis] = np.newaxis
return x - x.mean(axis)[ind]
def tsdata_to_autocov(X, q):
import numpy as np
from matplotlib import pylab
if len(X.shape) == 2:
X = np.expand_dims(X, axis=2)
[n, m, N] = np.shape(X)
[n, m, N] = np.shape(X)
X = demean(X, axis=1)
G = np.zeros((n, n, (q+1)))
for k in range(q+1):
M = N * (m-k)
G[:,:,k] = np.dot(np.reshape(X[:,k:m,:], (n, M)), np.reshape(X[:,0:m-k,:], (n, M)).conj().T) / M-1
return G
def autocov_to_mvgc(G, x, y):
import numpy as np
from mvgc import autocov_to_var
n = G.shape[0]
z = np.arange(n)
z = np.delete(z,[np.array(np.hstack((x, y)))])
# indices of other variables (to condition out)
xz = np.array(np.hstack((x, z)))
xzy = np.array(np.hstack((xz, y)))
F = 0
# full regression
ixgrid1 = np.ix_(xzy,xzy)
[AF,SIG] = autocov_to_var(G[ixgrid1])
# reduced regression
ixgrid2 = np.ix_(xz,xz)
[AF,SIGR] = autocov_to_var(G[ixgrid2])
ixgrid3 = np.ix_(x,x)
F = np.log(np.linalg.det(SIGR[ixgrid3]))-np.log(np.linalg.det(SIG[ixgrid3]))
return F
Can anyone show me an example for how they got to solving for F and p?
It would also help a lot to see what your timeseries data looks like.
I am trying to implement gabor filter for enhancing the image .I got a code snippet from google on which i am working on which gave me a error which am not familiar with please guide me through the code attached below so that the error can be corrected.
the which give me error is
def gabor(im, W, angles):
x, y = im.size
im_load = im.load()
freqs = frequency.freq(im, W, angles)
print "computing local ridge frequency done"
gauss = utils.gauss_kernel(3)
utils.apply_kernel(freqs, gauss)
for i in range(1, x / W - 1):
for j in range(1, y / W - 1):
kernel = gabor_kernel(W, angles[i][j], freqs[i][j])
for k in range(0, W):
for l in range(0, W):
im_load[i * W + k, j * W + l] = utils.apply_kernel_at(lambda x, y: im_load[x, y],kernel,i * W + k,j * W + l)
return im
the iterations goes smoothly after the fourth for loop it shows a error in
im_load[i * W + k, j * W + l] = utils.apply_kernel_at(lambda x, y: im_load[x, y],kernel,i * W + k,j * W + l)
the error says like this
im_load[i * W + k, j * W + l] = utils.apply_kernel_at(lambda x, y: im_load[x, y],kernel,i * W + k,j * W + l)
SystemError: new style getargs format but argument is not a tuple
I found this answer after a few more googling
seems the code never works on python 2.7.10 which i am using
it works on 2.7.6
how to make it work in python 2.7.10
Update after installing python 2.7.6
i tried the code after installing python 2.7.6 as stated in a answer in stack overflow but still the error persist.
what to do now?
update after long testing of code
i had gone through the code in depth to understand its working found the error code but i am unable to correct it
error is within the function apply_kernel_at
def apply_kernel_at(get_value, kernel, i, j):
kernel_size = len(kernel)
result = 0
for k in range(0, kernel_size):
for l in range(0, kernel_size):
pixel = get_value(i + k - kernel_size / 2, j + l - kernel_size / 2)
result += pixel * kernel[k][l]
#print pixel
return result
which returns a single value result but the line which uses the function call do require two values
im_load[i * W + k, j * W + l]=utils.apply_kernel_at(lambda x, y: im_load[x, y],kernel,i * W + k,j * W + l)
i am not sure whether my findings are correct or not
any help would be appreciated.
I am simply adding numbers together but it continues to error. I used type() to check if vector is a table or not and it always said it was but it continues to say that it is a number.
Can anyone tell me why this is happening and a way to fix it(the variable vector is a vector3 object)? Any help is greatly appreciated.
function new(x, y, z)
return setmetatable({x = x, y = y, z = z}, meta) --{} has public variables
All of the Vector3 file here: http://pastebin.com/csBmJG36
attempt to index local 'vector' (a number value)
function translate(object, x, y, z)
for i, v in pairs(object) do
if (i == "Vertices") then
for _, q in pairs(v) do
for l, vector in pairs(q) do
vector.x = vector.x + x;
vector.y = vector.y + y;
vector.z = vector.z + z;
Let's refactor your code by removing the loop-switch anti-pattern:
function translate(object, x, y, z)
for _, q in pairs(object.Vertices) do
for l, vector in pairs(q) do
-- Test the type of vector here...
vector.x = vector.x + x;
vector.y = vector.y + y;
vector.z = vector.z + z;
So, the error occurs with an access to object.Vertices[_][l].x.
That would be a curious vertex-list which contains lists of vertex-lists instead.
Following is a basic implementation of the Xorshift RNG (copied from the Wikipedia):
uint32_t xor128(void) {
static uint32_t x = 123456789;
static uint32_t y = 362436069;
static uint32_t z = 521288629;
static uint32_t w = 88675123;
uint32_t t;
t = x ^ (x << 11);
x = y; y = z; z = w;
return w = w ^ (w >> 19) ^ (t ^ (t >> 8));
I understand that w is the returned value and x, y and z are the state ("memory") variables. However, I can't understand the purpose of more than one memory variable. Can anyone explain me this point?
Also, I tried to copy the above code to Python:
class R2:
def __init__(self):
self.x = x = 123456789
self.y = 362436069
self.z = 521288629
self.w = 88675123
def __call__(self):
t = self.x ^ (self.x<<11)
self.x = self.y
self.y = self.z
self.z = self.w
w = self.w
self.w = w ^ (w >> 19) ^(t ^ (t >> 8))
return self.w
Then, I have generated 100 numbers and plotted their log10 values:
r2 = R2()
x2 = [math.log10(r2()) for _ in range(100)]
plot(x2, '.g')
Here is the output of the plot:
And this what happens when 10000 (and not 100) numbers are generated:
The overall tendency is very clear. And don't forget that the Y axis is log10 of the actual value.
Pretty strange behavior, don't you think?
The problem here is of course that you're using Python to do this.
Python has a notion of big integers, so even though you are copying an implementation that deals with 32-bit numbers, Python just says "I'll just go ahead and keep everything for you".
If you try this instead:
x2 = [r2() for _ in range(100)]
You'll notice that it produces ever-longer numbers, for instance here's the first number:
and here's the last:
Here's code that has been fixed to handle this:
def __call__(self):
t = self.x ^ (self.x<<11) & 0xffffffff # <-- keep 32 bits
self.x = self.y
self.y = self.z
self.z = self.w
w = self.w
self.w = (w ^ (w >> 19) ^(t ^ (t >> 8))) & 0xffffffff # <-- keep 32 bits
return self.w
And with a generator:
def xor128():
x = 123456789
y = 362436069
z = 521288629
w = 88675123
while True:
t = (x ^ (x<<11)) & 0xffffffff
(x,y,z) = (y,z,w)
w = (w ^ (w >> 19) ^ (t ^ (t >> 8))) & 0xffffffff
yield w
"However, I can't understand the purpose of more than one memory variable" - if you need to 'remember' 128 bits then you need 4 x 32bit integers.
As to the very strange distribution of 100 randoms, no idea! I could understand perhaps if you had generated a few million, and the steps in the graph were artifacts, but not 100.
Problem Hey folks. I'm looking for some advice on python performance. Some background on my problem:
A (x,y) mesh of nodes each with a value (0...255) starting at 0
A list of N input coordinates each at a specified location within the range (0...x, 0...y)
A value Z that defines the "neighborhood" in count of nodes
Increment the value of the node at the input coordinate and the node's neighbors. Neighbors beyond the mesh edge are ignored. (No wrapping)
BASE CASE: A mesh of size 1024x1024 nodes, with 400 input coordinates and a range Z of 75 nodes.
Processing should be O(x*y*Z*N). I expect x, y and Z to remain roughly around the values in the base case, but the number of input coordinates N could increase up to 100,000. My goal is to minimize processing time.
Current results Between my start and the comments below, we've got several implementations.
Running speed on my 2.26 GHz Intel Core 2 Duo with Python 2.6.1:
f1: 2.819s
f2: 1.567s
f3: 1.593s
f: 1.579s
f3b: 1.526s
f4: 0.978s
f1 is the initial naive implementation: three nested for loops.
f2 is replaces the inner for loop with a list comprehension.
f3 is based on Andrei's suggestion in the comments and replaces the outer for with map()
f is Chris's suggestion in the answers below
f3b is kriss's take on f3
f4 is Alex's contribution.
Code is included below for your perusal.
Question How can I further reduce the processing time? I'd prefer sub-1.0s for the test parameters.
Please, keep the recommendations to native Python. I know I can move to a third-party package such as numpy, but I'm trying to avoid any third party packages. Also, I've generated random input coordinates, and simplified the definition of the node value updates to keep our discussion simple. The specifics have to change slightly and are outside the scope of my question.
thanks much!
**`f1` is the initial naive implementation: three nested `for` loops.**
def f1(x,y,n,z):
rows = [[0]*x for i in xrange(y)]
for i in range(n):
inputX, inputY = (int(x*random.random()), int(y*random.random()))
topleft = (inputX - z, inputY - z)
for i in xrange(max(0, topleft[0]), min(topleft[0]+(z*2), x)):
for j in xrange(max(0, topleft[1]), min(topleft[1]+(z*2), y)):
if rows[i][j] <= 255: rows[i][j] += 1
f2 is replaces the inner for loop with a list comprehension.
def f2(x,y,n,z):
rows = [[0]*x for i in xrange(y)]
for i in range(n):
inputX, inputY = (int(x*random.random()), int(y*random.random()))
topleft = (inputX - z, inputY - z)
for i in xrange(max(0, topleft[0]), min(topleft[0]+(z*2), x)):
l = max(0, topleft[1])
r = min(topleft[1]+(z*2), y)
rows[i][l:r] = [j+(j<255) for j in rows[i][l:r]]
UPDATE: f3 is based on Andrei's suggestion in the comments and replaces the outer for with map(). My first hack at this requires several out-of-local-scope lookups, specifically recommended against by Guido: local variable lookups are much faster than global or built-in variable lookups I hardcoded all but the reference to the main data structure itself to minimize that overhead.
rows = [[0]*x for i in xrange(y)]
def f3(x,y,n,z):
inputs = [(int(x*random.random()), int(y*random.random())) for i in range(n)]
rows = map(g, inputs)
def g(input):
inputX, inputY = input
topleft = (inputX - 75, inputY - 75)
for i in xrange(max(0, topleft[0]), min(topleft[0]+(75*2), 1024)):
l = max(0, topleft[1])
r = min(topleft[1]+(75*2), 1024)
rows[i][l:r] = [j+(j<255) for j in rows[i][l:r]]
UPDATE3: ChristopeD also pointed out a couple improvements.
def f(x,y,n,z):
rows = [[0] * y for i in xrange(x)]
rn = random.random
for i in xrange(n):
topleft = (int(x*rn()) - z, int(y*rn()) - z)
l = max(0, topleft[1])
r = min(topleft[1]+(z*2), y)
for u in xrange(max(0, topleft[0]), min(topleft[0]+(z*2), x)):
rows[u][l:r] = [j+(j<255) for j in rows[u][l:r]]
UPDATE4: kriss added a few improvements to f3, replacing min/max with the new ternary operator syntax.
def f3b(x,y,n,z):
rn = random.random
rows = [g1(x, y, z) for x, y in [(int(x*rn()), int(y*rn())) for i in xrange(n)]]
def g1(x, y, z):
l = y - z if y - z > 0 else 0
r = y + z if y + z < 1024 else 1024
for i in xrange(x - z if x - z > 0 else 0, x + z if x + z < 1024 else 1024 ):
rows[i][l:r] = [j+(j<255) for j in rows[i][l:r]]
UPDATE5: Alex weighed in with his substantive revision, adding a separate map() operation to cap the values at 255 and removing all non-local-scope lookups. The perf differences are non-trivial.
def f4(x,y,n,z):
rows = [[0]*y for i in range(x)]
rr = random.randrange
inc = (1).__add__
sat = (0xff).__and__
for i in range(n):
inputX, inputY = rr(x), rr(y)
b = max(0, inputX - z)
t = min(inputX + z, x)
l = max(0, inputY - z)
r = min(inputY + z, y)
for i in range(b, t):
rows[i][l:r] = map(inc, rows[i][l:r])
for i in range(x):
rows[i] = map(sat, rows[i])
Also, since we all seem to be hacking around with variations, here's my test harness to compare speeds: (improved by ChristopheD)
def timing(f,x,y,z,n):
fn = "%s(%d,%d,%d,%d)" % (f.__name__, x, y, z, n)
ctx = "from __main__ import %s" % f.__name__
results = timeit.Timer(fn, ctx).timeit(10)
return "%4.4s: %.3f" % (f.__name__, results / 10.0)
if __name__ == "__main__":
print timing(f, 1024, 1024, 400, 75)
#add more here.
On my (slow-ish;-) first-day Macbook Air, 1.6GHz Core 2 Duo, system Python 2.5 on MacOSX 10.5, after saving your code in op.py I see the following timings:
$ python -mtimeit -s'import op' 'op.f1()'
10 loops, best of 3: 5.58 sec per loop
$ python -mtimeit -s'import op' 'op.f2()'
10 loops, best of 3: 3.15 sec per loop
So, my machine is slower than yours by a factor of a bit more than 1.9.
The fastest code I have for this task is:
def f3(x=x,y=y,n=n,z=z):
rows = [[0]*y for i in range(x)]
rr = random.randrange
inc = (1).__add__
sat = (0xff).__and__
for i in range(n):
inputX, inputY = rr(x), rr(y)
b = max(0, inputX - z)
t = min(inputX + z, x)
l = max(0, inputY - z)
r = min(inputY + z, y)
for i in range(b, t):
rows[i][l:r] = map(inc, rows[i][l:r])
for i in range(x):
rows[i] = map(sat, rows[i])
which times as:
$ python -mtimeit -s'import op' 'op.f3()'
10 loops, best of 3: 3 sec per loop
so, a very modest speedup, projecting to more than 1.5 seconds on your machine - well above the 1.0 you're aiming for:-(.
With a simple C-coded extensions, exte.c...:
#include "Python.h"
static PyObject*
dopoint(PyObject* self, PyObject* args)
int x, y, z, px, py;
int b, t, l, r;
int i, j;
PyObject* rows;
if(!PyArg_ParseTuple(args, "iiiiiO",
&x, &y, &z, &px, &py, &rows
return 0;
b = px - z;
if (b < 0) b = 0;
t = px + z;
if (t > x) t = x;
l = py - z;
if (l < 0) l = 0;
r = py + z;
if (r > y) r = y;
for(i = b; i < t; ++i) {
PyObject* row = PyList_GetItem(rows, i);
for(j = l; j < r; ++j) {
PyObject* pyitem = PyList_GetItem(row, j);
long item = PyInt_AsLong(pyitem);
if (item < 255) {
PyObject* newitem = PyInt_FromLong(item + 1);
PyList_SetItem(row, j, newitem);
static PyMethodDef exteMethods[] = {
{"dopoint", dopoint, METH_VARARGS, "process a point"},
Py_InitModule("exte", exteMethods);
(note: I haven't checked it carefully -- I think it doesn't leak memory due to the correct interplay of reference stealing and borrowing, but it should be code inspected very carefully before being put in production;-), we could do
import exte
def f4(x=x,y=y,n=n,z=z):
rows = [[0]*y for i in range(x)]
rr = random.randrange
for i in range(n):
inputX, inputY = rr(x), rr(y)
exte.dopoint(x, y, z, inputX, inputY, rows)
and the timing
$ python -mtimeit -s'import op' 'op.f4()'
10 loops, best of 3: 345 msec per loop
shows an acceleration of 8-9 times, which should put you in the ballpark you desire. I've seen a comment saying you don't want any third-party extension, but, well, this tiny extension you could make entirely your own;-). ((Not sure what licensing conditions apply to code on Stack Overflow, but I'll be glad to re-release this under the Apache 2 license or the like, if you need that;-)).
1. A (smaller) speedup could definitely be the initialization of your rows...
rows = []
for i in range(x):
rows.append([0 for i in xrange(y)])
rows = [[0] * y for i in xrange(x)]
2. You can also avoid some lookups by moving random.random out of the loops (saves a little).
3. EDIT: after corrections -- you could arrive at something like this:
def f(x,y,n,z):
rows = [[0] * y for i in xrange(x)]
rn = random.random
for i in xrange(n):
topleft = (int(x*rn()) - z, int(y*rn()) - z)
l = max(0, topleft[1])
r = min(topleft[1]+(z*2), y)
for u in xrange(max(0, topleft[0]), min(topleft[0]+(z*2), x)):
rows[u][l:r] = [j+(j<255) for j in rows[u][l:r]]
EDIT: some new timings with timeit (10 runs) -- seems this provides only minor speedups:
import timeit
print timeit.Timer("f1(1024,1024,400,75)", "from __main__ import f1").timeit(10)
print timeit.Timer("f2(1024,1024,400,75)", "from __main__ import f2").timeit(10)
print timeit.Timer("f(1024,1024,400,75)", "from __main__ import f3").timeit(10)
f1 21.1669280529
f2 12.9376120567
f 11.1249599457
in your f3 rewrite, g can be simplified. (Can also be applied to f4)
You have the following code inside a for loop.
l = max(0, topleft[1])
r = min(topleft[1]+(75*2), 1024)
However, it appears that those values never change inside the for loop. So calculate them once, outside the loop instead.
Based on your f3 version I played with the code. As l and r are constants you can avoid to compute them in g1 loop. Also using new ternary if instead of min and max seems to be consistently faster. Also simplified expression with topleft. On my system it appears to be about 20% faster using with the code below.
def f3b(x,y,n,z):
rows = [g1(x, y, z) for x, y in [(int(x*random.random()), int(y*random.random())) for i in range(n)]]
def g1(x, y, z):
l = y - z if y - z > 0 else 0
r = y + z if y + z < 1024 else 1024
for i in xrange(x - z if x - z > 0 else 0, x + z if x + z < 1024 else 1024 ):
rows[i][l:r] = [j+(j<255) for j in rows[i][l:r]]
You can create your own Python module in C, and control the performance as you want: