PyMC3 - Index 2-dimensional data while fitting hierarchical autoregressive model - pymc

I (new to PyMC3) want to extend the model proposed in the PyMC3 example A Hierarchical model for Rugby prediction by making the latent variables for attack and defence strength autoregressive. I am unsure about how to use 2-dimensional data and the shape parameter of the GaussianRandomWalk class (explained below).
Edit 1: I did not find explicit documentation of multi-dimensional usage, but I found this comment by fonnesbeck among the PyMC3 github issues:
[...] I think most people would expect a vector of variables, which implies that the first dimension is the number of variable elements and the remaining dimension(s) the size of each variable.
As defined below, I use the time index as my 1st dimension. I tried to switch axis, which yields the same result. So, my current model is:
with pm.Model() as model:
home = pm.Normal('home', 0, .0001)
intercept = pm.Normal('intercept', 0, .0001)
tau_att = pm.Exponential('tau_att', 1./.02)
tau_def = pm.Exponential('tau_def', 1./.02)
atts = pm.GaussianRandomWalk('atts', tau_att**-2, shape=[T, num_teams])
defs = pm.GaussianRandomWalk('defs', tau_def**-2, shape=[T, num_teams])
home_theta = tt.exp(intercept + home + atts[:, home_team] + defs[:, away_team])
away_theta = tt.exp(intercept + atts[:, away_team] + defs[:, home_team])
home_points = pm.Poisson('home_points', mu=home_theta, observed=observed_home_goals)
away_points = pm.Poisson('away_points', mu=away_theta, observed=observed_away_goals)
The input is a two dimensional array, with rows being the time steps and columns containing the home or away goals for all teams that played in that time step. Assuming the following mock data
home_score away_score home_team away_team i_home i_away t
0 1 0 Arsenal Liverpool 0 1 1
1 1 1 Liverpool Burnley 1 2 1
2 2 4 Burnley Arsenal 2 0 1
3 0 3 Liverpool Arsenal 1 0 2
4 1 1 Burnley Liverpool 2 1 2
5 5 0 Arsenal Burnley 0 2 2
observed_home_goals (similar to observed_away_goals) would look like this:
[[1 1 2]
[0 1 5]]
and the corresponding team index would look like this:
[[0 1 2]
[1 2 0]]
meaning that in time step 1, teams [0 1 2] shot [1 1 2] goals respectively.
Fitting the model does not throw errors, sampling however yields zero estimates for all parameters. I tried to browse the distributions/timeseries.py source code about how the shape parameter would be used for multiple dimensions in the GaussianRandomWalk class.
My question is, if the model definition would actually work as intended for the 2-dimensional time series data. I am not sure, if I index the atts and defs variables correctly.
Edit 2: I ended up building the time series manually, similar to Javier's solution, which seems to work fine!

Related

Algorithm strategy to prevent values from bouncing between 2 values when on edge of two 'buckets'

I'm tracking various colored balls in OpenCV (Python) in real-time. The tracking is very stable. i.e. when stationary the values do not change with more then 1 / 2 pixels for the center of the circle.
However i'm running into what must surely be a well researched issue: I need to now place the positions of the balls into an rougher grid - essentially simply dividing (+ rounding) the x,y positions.
e.g.
input range is 0 -> 9
target range is 0 -> 1 (two buckets)
so i do: floor(input / 5)
input: [0 1 2 3 4 5 6 7 8 9]
output: [0 0 0 0 0 1 1 1 1 1]
This is fine, but the problem occurs when just a small change in the initial value might result it to be either in quickly changes output single I.e. at the 'edge' of the divisions -or a 'sensitive' area.
input: [4 5 4 5 4 5 5 4 ...]
output:[0 1 0 1 0 1 1 0 ...]
i.e. values 4 and 5 (which falls withing the 1 pixel error/'noisy' margin) cause rapid changes in output.
What are some of the strategems / algorithms that deal with these so help me further?
I searched but it seems i do not how to express the issue correctly for Google (or StackOverflow).
I tried adding 'deadzones'. i.e. rather then purely dividing i leave 'gaps' in my ouput range which means a value sometimes has no output (i.e. between 'buckets'). This somewhat works but means i have a lot (i.e. the range of the fluctuation) of the screen that is not used...
i.e.
input = [0 1 2 3 4 5 6 7 8 9]
output = [0 0 0 0 x x 1 1 1 1]
Temporal averaging is not ideal (and doesn't work too well either) - and increases the latency.
I just have a 'hunch' there is a whole set of Computer / Signal science about this.

How to remove connected components from an image while retaining some

Let's say I have 5 connected components (labelled objects) in an image called labelledImage from bwlabel. How can I manipulate labelledImage so that the objects that are labelled as 1 and 4 only display, while removing the objects that are labelled as 2, 3 and 5. Then, how can I manipulate the original RGB image so that the connected components that are labelled as 1 and 4 only display.
I know how to retain a single connected component by using this line of code below. However, I don't know how to do this for multiple labelled regions.
Works.
connectedComponent1 = (labelledImage == 1);
imshow(connectedComponent1)
Doesn't work.
connectedComponent1and4 = (labelledImage == [1 4]);
imshow(connectedComponent1and4)
You can't do logical indexing that way. The simplest way is to perhaps use Boolean statements to combine things.
connectedCompoonent1and4 = labelledImage == 1 | labelledImage == 4;
In general, supposing you had a vector of elements that denote which components you want to keep, you could use bsxfun, permute and any to help you with that. Something like this should work:
components = [1 4];
connected = any(bsxfun(#eq, labelledImage, permute(components, [1 3 2])), 3);
The above code uses matrix broadcasting to create a temporary 3D matrix where each slice i contains the ith value of the vector components which contain the desired labels you want to keep. labelledImage is also replicated in the third dimension so the result using bsxfun creates a 3D matrix where each slice i segments out the ith object you want to keep. We then combine all of the objects together using any and looking in the third dimension.
If you don't like one-liners, you could even use a simple for loop:
components = [1 4];
connected = false(size(labelledImage, 1), size(labelledImage, 2));
for ind = 1 : numel(components)
connected = connected | labelledImage == components(ind);
end
This creates an output image that is all false, then we loop through each value in the vector of components you want to keep and append those results on top of the result. The end will give you all of the components you want to keep.
Lastly, you could use also use ismember and determine those values in your matrix that can be found between the label matrix and the components vector and simply create your mask that way:
connected = ismember(labelledImage, components);
Now that you have a mask of objects you want to extract out, to use this on the original image, simply multiply each channel with the mask. Another use of bsxfun can do that for you. Assuming your image in RGB is called img, simply do the following:
outImg = bsxfun(#times, img, cast(connected, class(img)));
To perform element-wise multiplication, you must ensure that both matrices that are being multiplied have the same type. I convert the mask into the same class as whatever the input image is and perform the multiplication.
Use ismember.
Ex:
A = randi(5,5); % your connected component matrix
B = [1 4] % list of components you want to keep
A =
4 2 1 3 5
2 4 2 5 1
3 4 5 1 4
1 4 1 3 5
4 3 5 1 5
A(~ismember(A,B)) = 0
A =
4 0 1 0 0
0 4 0 0 1
0 4 0 1 4
1 4 1 0 0
4 0 0 1 0

Append matrix to another matrix in Matlab

I have a matrix M=[4 3 2 1;1 2 3 4]. I want to append different size matrices at each iteration:
M=[4 3 2 1;1 2 3 4];
for i=1:t
newM=createNewMatrix;
M=[M;newM];
end
newM can be [] or a Nx4 matrix. This is very slow though. What is the fastest way to do this?
Update
Pre-allocating would look like this?
M=zeros(200000,4)
start=1
M(1:2,:)=M=[4 3 2 1;1 2 3 4];
for i=1:t
newM=createNewMatrix;
size_of_newM=size(newM,1);
finish=start+size_of_newM-1;
M(start:finish,:)=newM;
start=finish;
end
Like suggested, preallocation gives the most boost.
Using cell arrays is another good approach and could be implemented like this:
M = cell(200000, 1);
M{1} = [4 3 2 1; 1 2 3 4];
for t=2:200000
i = randi(3)-1;
M{t}=rand(i,4);
end
MC = vertcat(M{:});
In principle you generate a cell array with arbitrary long arrays in each cell and then concatenate them afterwards.
This worked for me nearly twice as fast as your preallocation update. On the other hand, this still was only around one second for the example with 200k iterations...

fortran library for sparse matrix multiplication

I have a large matrix which I have stored in the following format, given the matrix A;
A =
1 0 3
5 1 -2
0 0 7
3 vectors;
NVPN = [1 3 4 7] - I arbitrarily put a 1 in the first column, then from the second onwards it is a cumulatively summing the number of non-zero elements per column.
NNVI = [1 2 2 1 2 3] - row index of each non-zero element.
CONT = [1 5 1 3 -2 7] - value of each non-zero element.
I now need to perform matrix*matrix multiplication and matrix*vector multiplication. Does anyone know if the are any FORTRAN libraries, which I can amend to fit my problem, to do this above?
Thanks in advance
The MATMUL function allows you to perform matrix products, which is defined in the section 13.7.70 of the FORTRAN 90 standard. See also: GCC reference.
There is already a topic on sparse matrix libraries here.

effective way of transformation from 2D to 1D vector

i want to create 1D vector in matlab from given matrix,for this i have implemented following algorithm ,which use trivial way
% create one dimensional vector from 2D matrix
function [x]=one_dimensional(b,m,n)
k=1;
for i=1:m
for t=1:n
x(k)=b(i,t);
k=k+1;
end
end
x;
end
when i run it using following example,it seems to do it's task fine
b=[2 1 3;4 2 3;1 5 4]
b =
2 1 3
4 2 3
1 5 4
>> one_dimensional(b,3,3)
ans =
2 1 3 4 2 3 1 5 4
but generally i know that,arrays are not good way to use in matlab,because it's performance,so what should be effective way for transformation matrix into row/column vector?i am just care about performance.thanks very much
You can use the (:) operator...But it works on columns not rows so you need to transpose using the 'operator before , for example:
b=b.';
b(:)'
ans=
2 1 3 4 2 3 1 5 4
and I transposed again to get a row output (otherwise it'll the same vector only in column form)
or also, this is an option (probably a slower one):
reshape(b.',1,[])

Resources