I have a large sparse adjacency matrix with around 10M nodes, which I am processing with MATLAB. I want to convert the matrix into adjacency list as efficiently as possible. As an example adjacency matrix to illustrate this:
adj =
1 0 1
0 0 1
0 1 1
And the output is:
ans =
0 0 2
1 2
2 1 2
I want to do it as efficiently as possible, is there any efficient way to do it?
The result needs to be a cell array of vector, because the number of nodes connected to each node varies. Here's a way to do it:
[ii, jj] = find(adj); % row and col indices of connections
y = accumarray(ii, jj-1 , [], #(x){sort(x.')}); % get all nodes connected to each node,
% sorted. Subtract 1 for 0-based indexing
This gives
>> celldisp(y)
y{1} =
0 2
y{2} =
2
y{3} =
1 2
Related
Say we have a matrix of zeros and ones
0 1 1 1 0 0 0
1 1 1 1 0 1 1
0 0 1 0 0 1 0
0 1 1 0 1 1 1
0 0 0 0 0 0 1
0 0 0 0 0 0 1
and we want to find all the submatrices (we just need the row indices and column indices of the corners) with these properties:
contain at least L ones and L zeros
contain max H elements
i.e. take the previous matrix with L=1 and H=5, the submatrix 1 2 1 4 (row indices 1 2 and column indices 1 4)
0 1 1 1
1 1 1 1
satisfies the property 1 but has 8 elements (bigger than 5) so it is not good;
the matrix 4 5 1 2
0 1
0 0
is good because satisfies both the properties.
The objective is then to find all the submatrices with min area 2*L, max area H and containg at least L ones and L zeros.
If we consider a matrix as a rectangle it is easy to find all the possibile subrectangles with max area H and min area 2*L by looking at the divisors of all the numbers from H to 2*L.
For example, with H=5 and L=1 all the possibile subrectangles/submatrices are given by the divisors of
H=5 -> divisors [1 5] -> possibile rectangles of area 5 are 1x5 and 5x1
4 -> divisors [1 2 4] -> possibile rectangles of area 4 are 1x4 4x1 and 2x2
3 -> divisors [1 3] -> possibile rectangles of area 3 are 3x1 and 1x3
2*L=2 -> divisors [1 2] -> possibile rectangles of area 2 are 2x1 and 1x2
I wrote this code, which, for each number finds its divisors and cycles over them to find the submatrices. To find the submatrices it does this: take for example a 1x5 submatrix, what the code does is to fix the first line of the matrix and move step by step (along all the columns of the matrix) the submatrix from the left edge of the matrix to the right edge of the matrix, then the code fixes the second row of the matrix and moves the submatrix along all the columns from left to right, and so on until it arrives at the last row.
It does this for all the 1x5 submatrices, then it considers the 5x1 submatrices, then the 1x4, then the 4x1, then the 2x2, etc.
The code do the job in 2 seconds (it finds all the submatrices) but for big matrices, i.e. 200x200, a lot of minutes are needed to find all the submatrices. So I wonder if there are more efficient ways to do the job, and eventually which is the most efficient.
This is my code:
clc;clear all;close all
%% INPUT
P= [0 1 1 1 0 0 0 ;
1 1 1 1 0 1 1 ;
0 0 1 0 0 1 0 ;
0 1 1 0 1 1 1 ;
0 0 0 0 0 0 1 ;
0 0 0 0 0 0 1];
L=1; % a submatrix has to containg at least L ones and L zeros
H=5; % max area of a submatrix
[R,C]=size(P); % rows and columns of P
sub=zeros(1,6); % initializing the matrix containing the indexes of each submatrix (columns 1-4), their area (5) and the counter (6)
counter=1; % no. of submatrices found
%% FIND ALL RECTANGLES OF AREA >= 2*L & <= H
%
% idea: all rectangles of a certain area can be found using the area's divisors
% e.g. divisors(6)=[1 2 3 6] -> rectangles: 1x6 6x1 2x3 and 3x2
tic
for sH = H:-1:2*L % find rectangles of area H, H-1, ..., 2*L
div_sH=divisors(sH); % find all divisors of sH
disp(['_______AREA ', num2str(sH), '_______'])
for i = 1:round(length(div_sH)/2) % cycle over all couples of divisors
div_small=div_sH(i);
div_big=div_sH(end-i+1);
if div_small <= R && div_big <= C % rectangle with long side <= C and short side <= R
for j = 1:R-div_small+1 % cycle over all possible rows
for k = 1:C-div_big+1 % cycle over all possible columns
no_of_ones=length(find(P(j:j-1+div_small,k:k-1+div_big))); % no. of ones in the current submatrix
if no_of_ones >= L && no_of_ones <= sH-L % if the submatrix contains at least L ones AND L zeros
% row indexes columns indexes area position
sub(counter,:)=[j,j-1+div_small , k,k-1+div_big , div_small*div_big , counter]; % save the submatrix
counter=counter+1;
end
end
end
disp([' [', num2str(div_small), 'x', num2str(div_big), '] submatrices: ', num2str(size(sub,1))])
end
if div_small~=div_big % if the submatrix is a square, skip this part (otherwise there will be duplicates in sub)
if div_small <= C && div_big <= R % rectangle with long side <= R and short side <= C
for j = 1:C-div_small+1 % cycle over all possible columns
for k = 1:R-div_big+1 % cycle over all possible rows
no_of_ones=length(find(P(k:k-1+div_big,j:j-1+div_small)));
if no_of_ones >= L && no_of_ones <= sH-L
sub(counter,:)=[k,k-1+div_big,j,j-1+div_small , div_big*div_small, counter];
counter=counter+1;
end
end
end
disp([' [', num2str(div_big), 'x', num2str(div_small), '] submatrices: ', num2str(size(sub,1))])
end
end
end
end
fprintf('\ntime: %2.2fs\n\n',toc)
Here is a solution centered around 2D matrix convolution. The rough idea is to convolve P for each submatrix shape with a second matrix such that each element of the resulting matrix indicates how many ones are in the submatrix having its top left corner at said element. Like this you get all solutions for a single shape in one go, without having to loop over rows/columns, greatly speeding things up (it takes less than a second for a 200x200 matrix on my 8 years old laptop)
P= [0 1 1 1 0 0 0
1 1 1 1 0 1 1
0 0 1 0 0 1 0
0 1 1 0 1 1 1
0 0 0 0 0 0 1
0 0 0 0 0 0 1];
L=1; % a submatrix has to containg at least L ones and L zeros
H=5; % max area of a submatrix
submats = [];
for sH = H:-1:2*L
div_sH=divisors(sH); % find all divisors of sH
for i = 1:length(div_sH) % cycle over all couples of divisors
%number of rows of the current submatrix
nrows=div_sH(i);
% number of columns of the current submatrix
ncols=div_sH(end-i+1);
% perpare matrix to convolve P with
m = zeros(nrows*2-1,ncols*2-1);
m(1:nrows,1:ncols) = 1;
% get the number of ones in the top left corner each submatrix
submatsums = conv2(P,m,'same');
% set values where the submatrices go outside P invalid
validsums = zeros(size(P))-1;
validsums(1:(end-nrows+1),1:(end-ncols+1)) = submatsums(1:(end-nrows+1),1:(end-ncols+1));
% get the indexes where the number of ones and zeros is >= L
topLeftIdx = find(validsums >= L & validsums<=sH-L);
% save submatrixes in following format: [index, nrows, ncols]
% You can ofc use something different, but it seemed the simplest way to me
submats = [submats ; [topLeftIdx bsxfun(#times,[nrows ncols],ones(length(topLeftIdx),1))]];
end
end
First, I suggest that you combine finding the allowable sub-matrix sizes.
for smaller = 1:sqrt(H)
for larger = 2*L:H/smaller
# add smaller X larger and larger x smaller to your shapes list
Next, start with the smallest rectangles in the shapes. Note that any solution to a small rectangle can be extended in any direction, to the area limit of H, and the added elements will not invalidate the solution you found. This will identify many solutions without bothering to check the populations within.
Keep track of the solutions you've found. As you work your way toward larger rectangles, you can avoid checking anything already in your solutions set. If you keep that in a hash table, checking membership is O(1). All you'll need to check thereafter will be larger blocks of mostly-1 adjacent to mostly-0. This should speed up the processing somewhat.
Is that enough of a nudge to help?
I am trying to write an algorithm that takes an adjacency-matrix A and gives me the number of shortest paths between all pairs of nodes from length 1 to 20. For example, if there are 4 nodes that have a direct neighbor and 2 nodes that are connected by a shortest path of length two (and no paths longer than 2) the algorithm should return a vector [4 2 0 ... ].
My Idea is to use the fact that A^N gives to number of paths of length N between nodes.
Here is how my code looks so far:
function separations=FindShortestPaths(A)
#Save the original A to exponentiate
B = A;
#C(j,k) will be non-zero if there is a shorter path between nodes j and k
C = sparse(zeros(size(A)));
n = size(A,1);
#the vector in which i save the number of paths
separations = zeros(20,1);
for i = 1:20
#D(j,k) shows how many paths of length i there are from j to k
#if there is a shorter path it will be saved in C,
#so every index j,k that is non-zero in C will be zero in D.
D = A;
D(find(C)) = 0;
#Afterwards the diagonal of D is set to zero so that paths from nodes to themselves are not counted.
D(1:n+1:n*n) = 0;
#The number of remaining non-zero elements in D is the number of shortest paths of length i
separations(i) = size(find(D),1);
#C is updated by adding the matrix of length-i paths.
C = C + A;
#A is potentiated and now A(j,k) gives the number of length i+1 paths from j to k
A = A*B;
endfor
endfunction
I tried it with some smaller graphs of length 5 to 8 and it worked (each pair is counted twice though) but when I tried it with a larger graph it gave me odd numbers for some lengths, which can't be right as every pair is counted twice.
Edit:
Here is an example of one of the graphs I tried and with which it worked: Graph
The adjacency Matrix of this graph is
[010000
101010
010101
001000
010000
001000]
And the number of shortest paths of length n from any Node is:
n: 1 2 3 4 5 6
A: 1 2 2 0 0 0
B: 3 2 0 0 0 0
C: 3 2 0 0 0 0
D: 1 2 2 0 0 0
E: 1 2 2 0 0 0
F: 1 2 2 0 0 0
Total: 10 12 8 0 0 0
So the algorithm should (and in this case does) return a vector where the first 3 elements are 10 12 8 and the rest are zeros. However, with this bigger matrix it doesn't work.
An algorithm that need process a matrix n x m that is scalable.
E.g. I have a time series of 3 seconds containing the values: 2,1,4.
I need to decompose it to take a 3 x 4 matrix, where 3 is the number of elements of time series and 4 the maximum value. The resulting matrix that would look like this:
1 1 1
1 0 1
0 0 1
0 0 1
Is this a bad solution or is it only considered a data entry problem?
The question is,
do I need to distribute information from each row of the matrix for various elements without losing the values?
I would like to calculate a couple of texture features (namely: small/ large number emphasis, number non-uniformity, second moment and entropy). Those can be computed from Neighboring gray-level dependence matrix. I'm struggling with understanding/implementation of this. There is very little info on this method (publicly available).
According to this paper:
This matrix takes the form of a two-dimensional array Q, where Q(i,j) can be considered as frequency counts of grayness variation of a processed image. It has a similar meaning as histogram of an image. This array is Ng×Nr where Ng is the number of possible gray levels and Nr is the number of possible neighbours to a pixel in an image.
If the image function f(i,j) is discrete, then it is easy to computer the Q matrix (for positive integer d, a) by counting the number of times the difference between each element in f(i,j) and its neighbours is equal or less than a at a certain distance d.
Here is the example from the same paper (d = 1, a = 0):
Input (image) matrix and output matrix Q:
I've been looking at this example for hours now and still can't figure out how they got that Q matrix. Anyone?
The method was originally created by C. Sun and W. Wee and was described in a paper called: "Neighboring gray level dependence matrix for texture classification" to which I got access, but can't download (after pressing download the page reloads and that's it).
In the example that you have provided, d=1 and a=0. When d=1, we consider pixels in an 8-pixel neighbourhood. When a=0, this means that we look for pixels that have the same value as the centre of the neighbourhood.
The basic algorithm is the following:
Initialize your NGLDM matrix to all zeroes. The total number of rows corresponds to the total number of possible intensities / values in your image. The total number of columns corresponds to how many pixels are in your neighbourhood plus 1. As such for d=1, we have an 8-pixel neighbourhood and so 8 + 1 = 9. Because there are 4 possible intensities (0,1,2,3), we thus have a 4 x 9 matrix. Let's call this matrix M.
For each pixel in your matrix, take note of this pixel. This goes in the Ng row.
Write out how many valid neighbours there are that surround this pixel.
Count how many times you see the neighbouring pixels matching that pixel in Step #1. This is your Nr column.
Once you figure out the numbers in Step #1 and Step #2, increment this location by 1.
Here's a slight gotcha: They ignore the border locations. As such, you don't do this procedure for the first row, last row, first column or last column. My guess is that they want to be sure that you have an 8-pixel neighbourhood all the time. This is also dictated by the distance d=1. You must be able to grab every valid pixel given a centre location at d=1. If d=2, then you would have to make sure that every pixel in the centre of the neighbourhood has a 25 pixel neighbourhood and so on.
Let's start from the second row, second column location of this matrix. Let's go through the steps:
Ng = 1 as the location is 1.
Valid neighbours - Starting from the top left pixel in this neighbourhood, and scanning left to right and omitting the centre, we have: 1, 1, 2, 0, 1, 0, 2, 2.
How many values are equal to 1? Three times. Therefore Nr = 3
M(Ng,Nr) += 1. Access row Ng = 1, and access row Nr = 3, and increment this spot by 1.
Want to know how I figured out they don't count the borders? Let's do the bottom left pixel. That location is 0, so Ng = 0. If you repeat the algorithm that I just said, you would expect Ng = 0, Nr = 1, and so you would expect at least one entry in that location in your matrix... but you don't! If you do similar checks around the border of the image, you'll see that entries that are supposed to be there... aren't. Take a look at the third row, fifth column. You would think that Ng = 1 and Nr = 1, but we don't see that in the matrix.
One more example. Why is M(Ng,Nr) = 4, Ng = 2, Nr = 4? Well, take a look at every pixel that has a 2 in it. The only valid locations where we can capture an 8 pixel neighbourhood successfully are the row=2, col=4, row=3, col=3, row=3, col=4, row=4, col=3, and row=4, col=4. By applying the same algorithm that we have seen, you'll see that for each of those locations, Nr = 4. As such, we see this combination of Ng = 2, Nr = 4 four times, and that's why the location is set to 4. However, in row=3, col=4, this actually is Nr = 5, as there are five 2s in that neighbourhood at that centre. That's why you see Ng = 2, Nr = 5, M(Ng,Nr) = 1.
As an example, let's do one of the locations. Let's do the 2 smack dab in the middle of the matrix (row=3, col=3):
Ng = 2
What are the valid neighbouring pixels? 1, 1, 2, 0, 2, 3, 2, 2 (omit the centre)
Count how many pixels equal to 2. There are four of them, so Nr = 4
M(Ng,Nr) += 1. Take Ng = 2, Nr = 4 and increment this spot by 1.
If you do this with the other valid locations that have 2, you'll see that Nr = 4 each time with the exception of the third row and fourth column, where Nr = 5.
So how would we implement this in MATLAB? What you can do is use im2col to transform each valid neighbourhood into columns. What I'm also going to do is extract the centre of each neighbourhood. This is actually the middle row of the matrix. We will then figure out how many pixels for each neighbourhood equal the centre, sum them up, and this will determine our Nr values. The Ng values will be the middle row values themselves. Once we do this, we can compute a histogram based on these values just like how the algorithm is doing to get our matrix. In other words, try doing this:
% // Your example
A = [1 1 2 3 1; 0 1 1 2 2; 0 0 2 2 1; 3 3 2 2 1; 0 0 2 0 1];
B = im2col(A, [3 3]); %//Convert neighbourhoods to columns - 3 x 3 means d = 1
C = bsxfun(#eq, B, B(5,:)); %//Figure out a logical matrix where each column tells
%//you how many elements equals the one in each centre
D = sum(C, 1) - 1; %// Must subtract by 1 to discount centre pixel
Ng = B(5,:).' + 1; % // We must make this into a column vector, and we also must
% // offset by 1 as MATLAB starts indexing by 1.
%// Column vector is for accumarray input
Nr = D.' + 1; %// Do the same for Nr. We could have simply left out the + 1 here and
%// took out the subtraction of -1 for D, but I want to explicitly show
%// the steps
Q = accumarray([Ng Nr], 1, [4 9]); %// 4 unique intensities, 9 possible locations (0-8)
... and here is our matrix:
Q =
0 0 1 0 0 0 0 0 0
0 0 1 1 0 0 0 0 0
0 0 0 0 4 1 0 0 0
0 1 0 0 0 0 0 0 0
If you check this, you'll see this matches with Q.
Bonus
If you want to be able to accommodate for the algorithm in general, where you specify d and a, we can simply follow the guidelines of your text. For each neighbourhood, you find the difference between the centre pixel and all of the other pixels. You count how many pixels are <= a for any positive integer d. Note that this will create a 2*d + 1 x 2*d + 1 neighbourhood we need to examine. We can also make this into a function. Without further ado:
%// Set A up yourself, then use a and d as inputs
%// Precondition - a and d are both integers. a can be 0 and d is positive!
function [Q] = calculateGrayDepMatrix(A, a, d)
neigh = 2*d + 1; % //Calculate rows/columns of neighbourhood
numTotalNeigh = neigh*neigh; % //Calculate total number of pixels in neighbourhood
middleRow = ceil(numTotalNeigh / 2); %// Figure out which index the middle row is
B = im2col(A, [neigh neigh]); %// Make into columns
Cdiff = abs(bsxfun(#minus, B, B(middleRow,:))); %// For each neighbourhood, subtract with its centre
C = Cdiff <= a; %// For each neighbourhood, figure out which differences are <= a
D = sum(C, 1) - 1; % //For each neighbourhood, add them up
Ng = B(middleRow,:).' + 1; % // Determine Ng and Nr, and find Q
Nr = D.' + 1;
Q = accumarray([Ng Nr], 1, [max(Ng) numTotalNeigh]);
end
We can recreate the scenario we showed above with the example matrix by:
A = [1 1 2 3 1; 0 1 1 2 2; 0 0 2 2 1; 3 3 2 2 1; 0 0 2 0 1];
Q = calculateGrayDepMatrix(A, 0, 1);
Q is thus:
Q =
0 0 1 0 0 0 0 0 0
0 0 1 1 0 0 0 0 0
0 0 0 0 4 1 0 0 0
0 1 0 0 0 0 0 0 0
Hope this helps!
Give an algorithm to find a given element x (give the co-ordinates), in an n by n matrix where the rows and columns are monotonically increasing.
My thoughts:
Reduce problem set size.
In the 1st column, find the largest element <= x. We know x must be in this row or after (lower). In the last column of the matrix, find the smallest element >= x. We know x must be in this row or before. Do the same thing with the first and last rows of the matrix. We have now defined a sub-matrix such that if x is in the matrix at all, it is in this sub-matrix. Now repeat the algo on this sub-matrix... Something along these lines.
[YAAQ: Yet another arrays question.]
I think you cannot hope for more than O(N), which is attainable. (N is the width of the matrix).
Why you cannot hope for more
Imagine a matrix like this:
0 0 0 0 0 0 ... 0 0 x
0 0 0 0 0 0 ... 0 x 2
0 0 0 0 0 0 ... x 2 2
.....................
0 0 0 0 0 x ... 2 2 2
0 0 0 0 x 2 ... 2 2 2
0 0 0 x 2 2 ... 2 2 2
0 0 x 2 2 2 ... 2 2 2
0 x 2 2 2 2 ... 2 2 2
x 2 2 2 2 2 ... 2 2 2
where x is an unknown number (not the same number, ie. it might be a different one in every column). To satisfy the monotonicity of the matrix, you can place any of 0, 1, or 2 in all of the x places. So, to find if there is 1 in the matrix, you have to check all the x places, and there are N of them.
How to make it O(n)
Imagine you have to find first column indicies with number > q (a given number) for all rows. You start in the upper right corner of the matrix; if the number you see is greater, you go left; else go down. End when you are in the last row. The points where you went down are the places you search for. If any of them have the number you search for, you've found it.
This algorithm is O(n), because in each step, you either go left or down. Totally, it cannot go more than N times left and N times down. Therefore it's O(n).
Pick a corner element, one that is greatest in its row and smallest in its column (or the other way). Compare with x. Depending on the result of the comparison, you can exclude the row or the column from further search.
The new matrix has sum of dimensions decreased by 1, compared to the original one. Apply the above iteratively. After 2*n steps you end up with a 1x1 matrix.
If "the rows and columns are monotonically increasing" means that the values in each (row,col) increase such that for any row, (rowM,col1) < (rowM,col2) < ... < (rowM,colN) < (rowM+1,col1) ...
Then you can just treat it as a 1 dimensional array that is sorted from smallest to largest, and do a standard binary search, by sampling the item that is 1/2(rows * cols) fron the start, then sampling the element that is 1/4(rows * cols) behind (if the first element sampled is > x) or ahead (if the first element sampled is < x), and so forth.