Related
I have a matrix, for example:
1 2
3 4
4 5
And I also have a rule of grouping the rows, which is defined as a vector of group IDs like this:
1
2
1
Which means that the first and the third rows belong to the same group (ID 1) and the second row belong to another group (ID 2). So, I would like to compute the mean value for each group. Here is the result for my example:
2.5 3.5
3 4
More formally, there is a matrix A of size (m, n), a number of groups k and a vector v of size (m, 1), values of which are integers in range from 1 to k. The result is a matrix R of size (k, n), where each row with index r corresponds to the mean value of the group r.
Here is my solution (which does what I need) using for-loop in Octave:
R = zeros(k, n);
for r = 1:k
R(r, :) = mean(A((v == r), :), 1);
end
I wonder whether it could be vectorized. So, what I need is to replace the for-loop with a vectorized solution, which is going to be much more efficient than the iterative one.
Here is one of my many attempts (which do not work) to solve the problem in a vectorized way:
R = mean(A((v == 1:k), :);
As long as our data is of floating point, you can just do it manually by doing the sum yourself and then divide, by making use of accumdim. Like so:
octave:1> A = [1 2; 3 4; 4 5];
octave:2> subs = [1; 2; 1];
octave:3> accumdim (subs, A) ./ accumdim (subs, ones (rows (subs), 1))
ans =
2.5000 3.5000
3.0000 4.0000
You can consider it as a matrix multiplication problem. For instance, for your example this corresponds to
A = [1 2; 3 4; 4 5];
B = [0.5,0,0.5;0,1,0];
C = B*A
The main issue, is to construct B from your list of indicies in an efficient manner. My suggestion is to use the implicit expansion of ==.
A = [1 2; 3 4; 4 5]; % Input data
idx = [1;2;1]; % Input Grouping
k = 2; % number of groups, ( = max(idx) )
m = 3; % Number of "observations"
Btmp = (idx == 1:k)'; % Mark locations
B = Btmp ./sum(Btmp,2); % Normalise
C = B*A
C =
2.5000 3.5000
3.0000 4.0000
for an input matrix
in = [1 1;
1 2;
1 3;
1 4;
2 5;
2 6;
2 7;
3 8;
3 9;
3 10;
3 11];
i want to get the output matrix
out = [1 5 8;
2 6 9;
3 7 10;
4 0 11];
meaning i want to reshape the second input column into an output matrix, where all values corresponding to one value in the first input column are written into one column of the output matrix.
As there can be different numbers of entries for each value in the first input column (here 4 values for "1" and "3", but only 3 for "2"), the normal reshape function is not applicable. I need to pad all columns to the maximum number of rows.
Do you have an idea how to do this matlab-ish?
The second input column can only contain positive numbers, so the padding values can be 0, -x, NaN, ...
The best i could come up with is this (loop-based):
maxNumElem = 0;
for i=in(1,1):in(end,1)
maxNumElem = max(maxNumElem,numel(find(in(:,1)==i)));
end
out = zeros(maxNumElem,in(end,1)-in(1,1));
for i=in(1,1):in(end,1)
tmp = in(in(:,1)==i,2);
out(1:length(tmp),i) = tmp;
end
Either of the following approaches assumes that column 1 of in is sorted, as in the example. If that's not the case, apply this initially to sort in according to that criterion:
in = sortrows(in,1);
Approach 1 (using accumarray)
Compute the required number of rows, using mode;
Use accumarray to gather the values corresponding to each column, filled with zeros at the end. The result is a cell;
Concatenate horizontally the contents of all cells.
Code:
[~, n] = mode(in(:,1)); %//step 1
out = accumarray(in(:,1), in(:,2), [], #(x){[x; zeros(n-numel(x),1)]}); %//step 2
out = [out{:}]; %//step 3
Alternatively, step 1 could be done with histc
n = max(histc(in(:,1), unique(in(:,1)))); %//step 1
or with accumarray:
n = max(accumarray(in(:,1), in(:,2), [], #(x) numel(x))); %//step 1
Approach 2 (using sparse)
Generate a row-index vector using this answer by #Dan, and then build your matrix with sparse:
a = arrayfun(#(x)(1:x), diff(find([1,diff(in(:,1).'),1])), 'uni', 0); %//'
out = full(sparse([a{:}], in(:,1), in(:,2)));
Introduction to proposed solution and Code
Proposed here is a bsxfun based masking approach that uses the binary operators available as builtins for use with bsxfun and as such I would consider this very appropriate for problems like this. Of course, you must also be aware that bsxfun is a memory hungry tool. So, it could pose a threat if you are dealing with maybe billions of elements depending also on the memory available for MATLAB's usage.
Getting into the details of the proposed approach, we get the counts of each ID from column-1 of the input with histc. Then, the magic happens with bsxfun + #le to create a mask of positions in the output array (initialized by zeros) that are to be filled by the column-2 elements from input. That's all you need to tackle the problem with this approach.
Solution Code
counts = histc(in(:,1),1:max(in(:,1)))'; %//' counts of each ID from column1
max_counts = max(counts); %// Maximum counts for each ID
mask = bsxfun(#le,[1:max_counts]',counts); %//'# mask of locations where
%// column2 elements are to be placed
out = zeros(max_counts,numel(counts)); %// Initialize the output array
out(mask) = in(:,2); %// place the column2 elements in the output array
Benchmarking (for performance)
The benchmarking presented here compares the proposed solution in this post against the various methods presented in Luis's solution. This skips the original loopy approach presented in the problem as it appeared to be very slow for the input generated in the benchmarking code.
Benchmarking Code
num_ids = 5000;
counts_each_id = randi([10 100],num_ids,1);
num_runs = 20; %// number of iterations each approach is run for
%// Generate random input array
in = [];
for k = 1:num_ids
in = [in ; [repmat(k,counts_each_id(k),1) rand(counts_each_id(k),1)]];
end
%// Warm up tic/toc.
for k = 1:50000
tic(); elapsed = toc();
end
disp('------------- With HISTC + BSXFUN Masking approach')
tic
for iter = 1:num_runs
counts = histc(in(:,1),1:max(in(:,1)))';
max_counts = max(counts);
out = zeros(max_counts,numel(counts));
out(bsxfun(#le,[1:max_counts]',counts)) = in(:,2);
end
toc
clear counts max_counts out
disp('------------- With MODE + ACCUMARRAY approach')
tic
for iter = 1:num_runs
[~, n] = mode(in(:,1)); %//step 1
out = accumarray(in(:,1), in(:,2), [], #(x){[x; zeros(n-numel(x),1)]}); %//step 2
out = [out{:}];
end
toc
clear n out
disp('------------- With HISTC + ACCUMARRAY approach')
tic
for iter = 1:num_runs
n = max(histc(in(:,1), unique(in(:,1))));
out = accumarray(in(:,1), in(:,2), [], #(x){[x; zeros(n-numel(x),1)]}); %//step 2
out = [out{:}];
end
toc
clear n out
disp('------------- With ARRAYFUN + Sparse approach')
tic
for iter = 1:num_runs
a = arrayfun(#(x)(1:x), diff(find([1,diff(in(:,1).'),1])), 'uni', 0); %//'
out = full(sparse([a{:}], in(:,1), in(:,2)));
end
toc
clear a out
Results
------------- With HISTC + BSXFUN Masking approach
Elapsed time is 0.598359 seconds.
------------- With MODE + ACCUMARRAY approach
Elapsed time is 2.452778 seconds.
------------- With HISTC + ACCUMARRAY approach
Elapsed time is 2.579482 seconds.
------------- With ARRAYFUN + Sparse approach
Elapsed time is 1.455362 seconds.
slightly better, but still uses a loop :(
out=zeros(4,3);%set to zero matrix
for i = 1:max(in(:,1)); %find max in column 1, and loop for that number
ind = find(in(:,1)==i); %
out(1: size(in(ind,2),1),i)= in(ind,2);
end
don't know if you can avoid the loop...
As always trying to learn more from you, I was hoping I could receive some help with the following code.
I need to accomplish the following:
1) I have a vector:
x = [1 2 3 4 5 6 7 8 9 10 11 12]
2) and a matrix:
A =[11 14 1
5 8 18
10 8 19
13 20 16]
I need to be able to multiply each value from x with every value of A, this means:
new_matrix = [1* A
2* A
3* A
...
12* A]
This will give me this new_matrix of size (12*m x n) assuming A (mxn). And in this case (12*4x3)
How can I do this using bsxfun from matlab? and, would this method be faster than a for-loop?
Regarding my for-loop, I need some help here as well... I am not able to storage each "new_matrix" as the loop runs :(
for i=x
new_matrix = A.*x(i)
end
Thanks in advance!!
EDIT: After the solutions where given
First solution
clear all
clc
x=1:0.1:50;
A = rand(1000,1000);
tic
val = bsxfun(#times,A,permute(x,[3 1 2]));
out = reshape(permute(val,[1 3 2]),size(val,1)*size(val,3),[]);
toc
Output:
Elapsed time is 7.597939 seconds.
Second solution
clear all
clc
x=1:0.1:50;
A = rand(1000,1000);
tic
Ps = kron(x.',A);
toc
Output:
Elapsed time is 48.445417 seconds.
Send x to the third dimension, so that singleton expansion would come into effect when bsxfun is used for multiplication with A, extending the product result to the third dimension. Then, perform the bsxfun multiplication -
val = bsxfun(#times,A,permute(x,[3 1 2]))
Now, val is a 3D matrix and the desired output is expected to be a 2D matrix concatenated along the columns through the third dimension. This is achieved below -
out = reshape(permute(val,[1 3 2]),size(val,1)*size(val,3),[])
Hope that made sense! Spread the bsxfun word around! woo!! :)
The kron function does exactly that:
kron(x.',A)
Here is my benchmark of the methods mentioned so far, along with a few additions of my own:
function [t,v] = testMatMult()
% data
%{
x = [1 2 3 4 5 6 7 8 9 10 11 12];
A = [11 14 1; 5 8 18; 10 8 19; 13 20 16];
%}
x = 1:50;
A = randi(100, [1000,1000]);
% functions to test
fcns = {
#() func1_repmat(A,x)
#() func2_bsxfun_3rd_dim(A,x)
#() func2_forloop_3rd_dim(A,x)
#() func3_kron(A,x)
#() func4_forloop_matrix(A,x)
#() func5_forloop_cell(A,x)
#() func6_arrayfun(A,x)
};
% timeit
t = cellfun(#timeit, fcns, 'UniformOutput',true);
% check results
v = cellfun(#feval, fcns, 'UniformOutput',false);
isequal(v{:})
%for i=2:numel(v), assert(norm(v{1}-v{2}) < 1e-9), end
end
% Amro
function B = func1_repmat(A,x)
B = repmat(x, size(A,1), 1);
B = bsxfun(#times, B(:), repmat(A,numel(x),1));
end
% Divakar
function B = func2_bsxfun_3rd_dim(A,x)
B = bsxfun(#times, A, permute(x, [3 1 2]));
B = reshape(permute(B, [1 3 2]), [], size(A,2));
end
% Vissenbot
function B = func2_forloop_3rd_dim(A,x)
B = zeros([size(A) numel(x)], 'like',A);
for i=1:numel(x)
B(:,:,i) = x(i) .* A;
end
B = reshape(permute(B, [1 3 2]), [], size(A,2));
end
% Luis Mendo
function B = func3_kron(A,x)
B = kron(x(:), A);
end
% SergioHaram & TheMinion
function B = func4_forloop_matrix(A,x)
[m,n] = size(A);
p = numel(x);
B = zeros(m*p,n, 'like',A);
for i=1:numel(x)
B((i-1)*m+1:i*m,:) = x(i) .* A;
end
end
% Amro
function B = func5_forloop_cell(A,x)
B = cell(numel(x),1);
for i=1:numel(x)
B{i} = x(i) .* A;
end
B = cell2mat(B);
%B = vertcat(B{:});
end
% Amro
function B = func6_arrayfun(A,x)
B = cell2mat(arrayfun(#(xx) xx.*A, x(:), 'UniformOutput',false));
end
The results on my machine:
>> t
t =
0.1650 %# repmat (Amro)
0.2915 %# bsxfun in the 3rd dimension (Divakar)
0.4200 %# for-loop in the 3rd dim (Vissenbot)
0.1284 %# kron (Luis Mendo)
0.2997 %# for-loop with indexing (SergioHaram & TheMinion)
0.5160 %# for-loop with cell array (Amro)
0.4854 %# arrayfun (Amro)
(Those timings can slightly change between different runs, but this should give us an idea how the methods compare)
Note that some of these methods are going to cause out-of-memory errors for larger inputs (for example my solution based on repmat can easily run out of memory). Others will get significantly slower for larger sizes but won't error due to exhausted memory (the kron solution for instance).
I think that the bsxfun method func2_bsxfun_3rd_dim or the straightforward for-loop func4_forloop_matrix (thanks to MATLAB JIT) are the best solutions in this case.
Of course you can change the above benchmark parameters (size of x and A) and draw your own conclusions :)
Just to add an alternative, you maybe can use cellfun to achieve what you want. Here's an example (slightly modified from yours):
x = randi(2, 5, 3)-1;
a = randi(3,3);
%// bsxfun 3D (As implemented in the accepted solution)
val = bsxfun(#and, a, permute(x', [3 1 2])); %//'
out = reshape(permute(val,[1 3 2]),size(val,1)*size(val,3),[]);
%// cellfun (My solution)
val2 = cellfun(#(z) bsxfun(#and, a, z), num2cell(x, 2), 'UniformOutput', false);
out2 = cell2mat(val2); % or use cat(3, val2{:}) to get a 3D matrix equivalent to val and then permute/reshape like for out
%// compare
disp(nnz(out ~= out2));
Both give the same exact result.
For more infos and tricks using cellfun, see: http://matlabgeeks.com/tips-tutorials/computation-using-cellfun/
And also this: https://stackoverflow.com/a/1746422/1121352
If your vector x is of lenght = 12 and your matrix of size 3x4, I don't think that using one or the other would change much in term of time. If you are working with higher size matrix and vector, now that might become an issue.
So first of all, we want to multiply a vector with a matrix. In the for-loop method, that would give something like that :
s = size(A);
new_matrix(s(1),s(2),numel(x)) = zeros; %This is for pre-allocating. If you have a big vector or matrix, this will help a lot time efficiently.
for i = 1:numel(x)
new_matrix(:,:,i)= A.*x(i)
end
This will give you 3D matrix, with each 3rd dimension being a result of your multiplication. If this is not what you are looking for, I'll be adding another solution which might be more time efficient with bigger matrixes and vectors.
I want to store some results in the following way:
Res.0 = magic(4); % or Res.baseCase = magic(4);
Res.2 = magic(5); % I would prefer to use integers on all other
Res.7 = magic(6); % elements than the first.
Res.2000 = 1:3;
I want to use numbers between 0 and 3000, but I will only use approx 100-300 of them. Is it possible to use 0 as an identifier, or will I have to use a minimum value of 1? (The numbers have meaning, so I would prefer if I don't need to change them). Can I use numbers as identifiers in structs?
I know I can do the following:
Res{(last number + 1)} = magic(4);
Res{2} = magic(5);
Res{7} = magic(6);
Res{2000} = 1:3;
And just remember that the last element is really the "number zero" element.
In this case I will create a bunch of empty cell elements [] in the non-populated positions. Does this cause a problem? I assume it will be best to assign the last element first, to avoid creating a growing cell, or does this not have an effect? Is this an efficient way of doing this?
Which will be most efficient, struct's or cell's? (If it's possible to use struct's, that is).
My main concern is computational efficiency.
Thanks!
Let's review your options:
Indexing into a cell arrays
MATLAB indices start from 1, not from 0. If you want to store your data in cell arrays, in the worst case, you could always use the subscript k + 1 to index into cell corresponding to the k-th identifier (k ≥ 0). In my opinion, using the last element as the "base case" is more confusing. So what you'll have is:
Res{1} = magic(4); %// Base case
Res{2} = magic(5); %// Corresponds to identifier 1
...
Res{k + 1} = ... %// Corresponds to indentifier k
Accessing fields in structures
Field names in structures are not allowed to begin with numbers, but they are allowed to contain them starting from the second character. Hence, you can build your structure like so:
Res.c0 = magic(4); %// Base case
Res.c1 = magic(5); %// Corresponds to identifier 1
Res.c2 = magic(6); %// Corresponds to identifier 2
%// And so on...
You can use dynamic field referencing to access any field, for instance:
k = 3;
kth_field = Res.(sprintf('c%d', k)); %// Access field k = 3 (i.e field 'c3')
I can't say which alternative seems more elegant, but I believe that indexing into a cell should be faster than dynamic field referencing (but you're welcome to check that out and prove me wrong).
As an alternative to EitanT's answer, it sounds like matlab's map containers are exactly what you need. They can deal with any type of key and the value may be a struct or cell.
EDIT:
In your case this will be:
k = {0,2,7,2000};
Res = {magic(4),magic(5),magic(6),1:3};
ResMap = containers.Map(k, Res)
ResMap(0)
ans =
16 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1
I agree with the idea in #wakjah 's comment. If you are concerned about the efficiency of your program it's better to change the interpretation of the problem. In my opinion there is definitely a way that you could priorotize your data. This prioritization could be according to the time you acquired them, or with respect to the inputs that they are calculated. If you set any kind of priority among them, you can sort them into an structure or cell (structure might be faster).
So
Priority (Your Current Index Meaning) Data
1 0 magic(4)
2 2 magic(5)
3 7 magic(6)
4 2000 1:3
Then:
% Initialize Result structure which is different than your Res.
Result(300).Data = 0; % 300 the maximum number of data
Result(300).idx = 0; % idx or anything that represent the meaning of your current index.
% Assigning
k = 1; % Priority index
Result(k).idx = 0; Result(k).Data = magic(4); k = k + 1;
Result(k).idx = 2; Result(k).Data = magic(5); k = k + 1;
Result(k).idx = 7; Result(k).Data = magic(6); k = k + 1;
...
im not not a programmer, i just need to solve something numerically in matlab.
i need a function to make the following transformation for any square matrix:
from
row 1: 1 2 3
row 2: 4 5 6
row 3: 7 8 9
to
1 4 2 7 5 3 8 6 9
ie write the matrix in a vector along its diagonals from left to top right.
any ideas please?
i really need a little more help though:
say the matrix that we have transformed into the vector, has entries denoted by M(i,j), where i are rows and j columns. now i need to be able to find out from a position in the vector, the original position in the matrix, i.e say if its 3rd entry in the vector, i need a function that would give me i=1 j=2. any ideas please? im really stuck on this:( thanks
This is quite similar to a previous question on traversing the matrix in a zigzag order. With slight modification we get:
A = rand(3); %# input matrix
ind = reshape(1:numel(A), size(A)); %# indices of elements
ind = spdiags(fliplr(ind)); %# get the anti-diagonals
ind = ind(end:-1:1); %# reverse order
ind = ind(ind~=0); %# keep non-zero indices
B = A(ind); %# get elements in desired order
using the SPDIAGS function. The advantage of this is that it works for any arbitrary matrix size (not just square matrices). Example:
A =
0.75127 0.69908 0.54722 0.25751
0.2551 0.8909 0.13862 0.84072
0.50596 0.95929 0.14929 0.25428
B =
Columns 1 through 6
0.75127 0.2551 0.69908 0.50596 0.8909 0.54722
Columns 7 through 12
0.95929 0.13862 0.25751 0.14929 0.84072 0.25428
Here's one way to do this.
%# n is the number of rows (or cols) of the square array
n = 3;
array = [1 2 3;4 5 6;7 8 9]; %# this is the array we'll reorder
%# create list of indices that allow us
%# to read the array in the proper order
hh = hankel(1:n,n:(2*n-1)); %# creates a matrix with numbered antidiagonals
[dummy,sortIdx] = sort(hh(:)); %# sortIdx contains the new order
%# reorder the array
array(sortIdx)
ans =
1 4 2 7 5 3 8 6 9
You can convert your matrix to a vector using the function HANKEL to generate indices into the matrix. Here's a shortened version of Jonas' answer, using M as your sample matrix given above:
N = size(M,1);
A = hankel(1:N,N:(2*N-1));
[junk,sortIndex] = sort(A(:));
Now, you can use sortIndex to change your matrix M to a vector vec like so:
vec = M(sortIndex);
And if you want to get the row and column indices (rIndex and cIndex) into your original matrix that correspond to the values in vec, you can use the function IND2SUB:
[rIndex,cIndex] = ind2sub(N,sortIndex);
A=[1,2,3;4,5,6;7,8,9];
d = size(A,1);
X=[];
for n = 1:2*size(A,1) - 1
j = min(n,d); i = (n+1)-(j);
X = cat(2,X,diag(flipud(A(i:j,i:j)))');
end
X
X =
1 4 2 7 5 3 8 6 9
You can generate the diagonals in this way:
for i = -2:2
diag(flipud(a), i)
end
I don't know whether this is the optimal way to concatenate the diagonals:
d = []
for i = -2:2
d = vertcat(d, diag(flipud(a), i))
end
(I tested it in octave, not in matlab)