The difference between size and capacity of matrix in igraph

The difference between size and capacity of matrix in igraph - matrix

In igraph, we can use igraph_matrix_size to get the size of a matrix, and we can use igraph_matrix_capacity to get the capacity of a matrix, but who can tell me what is the difference between them?
Thank you. Please.

igraph_matrix_capacity shows the number of elements that matrix can potentially have without doing a relocation.
More details in the documentation:
3.11.3. igraph_matrix_size — The number of elements in a matrix.
long int igraph_matrix_size(const igraph_matrix_t *m);
http://igraph.org/c/doc/ch07.html#igraph_matrix_size
Compared to:
3.11.4. igraph_matrix_capacity — Returns the number of elements allocated for
a matrix.
long int igraph_matrix_capacity(const igraph_matrix_t *m);
Note that this might be different from the size of the matrix (as
queried by igraph_matrix_size(), and specifies how many elements the
matrix can hold, without reallocation.
http://igraph.org/c/doc/ch07.html#igraph_matrix_capacity
Hope that answers your question.

Related

Runtime of Dynamic Programming Solution from Previous Post (Balls into Bins)

In the question
Calculating How Many Balls in Bins Over Several Values Using Dynamic Programming
the answer discusses a dynamic programming algorithm for placing balls into bins, and I was attempting to determine the running time, as it is not addressed in the answer.
A quick summary: Given M indistinguishable balls and N distinguishable bins, the entry in the dynamic programming table Entry[i][j] represents the number of unique ways i balls can be placed into j bins.
S[i][j] = sum(x = 0 -> i, S[i-x][j-1])
It is clear that the size of the dynamic programming 2D array is O(MN). However, I am trying to determine the impact the summation has on the running time.
I know a summation of values (1....x) means we must access values from 1 to x. Would this then mean, that for each entry computation, since we must access at most from 1...M other values, the running time is in the realm of O((M^2)N)?
Would appreciate any clarification. Thanks!

You can avoid excessive time for summation if you keep column sums in additional table.
When you calculate S[i][j], also fill Sums[i,j]=Sums[i-1,j] + S[i,j] and later use this value for the cell at right side S[i,j+1]
P.S. Note that you really need to store only two rows or even one row of sum table

Merging two binary strings and separating them

I am preparing for a job interview and have an question.
I have 2 binary arrays of n and m size. I need to create an algorithm for merging them together and then separating them. The merged array is have to be also binary array. No information about the size of merged array, I assume it could be n+m.

If you know what is the maximum size of A and B, then you can code the sizes of A and B in binary, and create a new binary array by multiplexing
size of A
Acontent
size if B
Bcontent
Then demultiplexing (separating A and B) is easy.
It is similar to what is performed in telecommunications.
Edit: I mentioned that maximum size must be known. This is because for demultiplexing we need to know how much bits are used to encode the sizes. Then, the number of bits for this encoding must be fixed.

Eigen3 - accessing a (non contiguous) subset of vector elements

Suppose I have a VectorXf exampleVector with arbitrary float values and I want to select out some elements according to their values.
I can efficiently get a logical vector of true/false values according to my criterion
eg boolArray=exampleVector<1;
But now I want to make a new vector (of a smaller dimension) that contains only those elements that meet my criterion.
How can I do this efficiently in eigen3?
In R I could use reducedVector=exampleVector[boolArray]
Thanks in advance

Since the VectorXf stores its values in a continous memory range, you will have to copy out the values that you want. I am sure R does it the same way, so you won't loose efficiency. There is however no way that I know of to do it as conveniently as in R. So you will have to loop through and copy out the relevant values.

Fastest way to check if a vector increases matrix rank

Given an n-by-m matrix A, with it being guaranteed that n>m=rank(A), and given a n-by-1 column v, what is the fastest way to check if [A v] has rank strictly bigger than A?
For my application, A is sparse, n is about 2^12, and m is anywhere in 1:n-1.
Comparing rank(full([A v])) takes about a second on my machine, and I need to do it tens of thousands of times, so I would be very happy to discover a quicker way.

There is no need to do repeated solves IF you can afford to do ONE computation of the null space. Just one call to null will suffice. Given a new vector V, if the dot product with V and the nullspace basis is non-zero, then V will increase the rank of the matrix. For example, suppose we have the matrix M, which of course has a rank of 2.
M = [1 1;2 2;3 1;4 2];
nullM = null(M')';
Will a new column vector [1;1;1;1] increase the rank if we appended it to M?
nullM*[1;1;1;1]
ans =
-0.0321573705742971
-0.602164651199413
Yes, since it has a non-zero projection on at least one of the basis vectors in nullM.
How about this vector:
nullM*[0;0;1;1]
ans =
1.11022302462516e-16
2.22044604925031e-16
In this case, both numbers are essentially zero, so the vector in question would not have increased the rank of M.
The point is, only a simple matrix-vector multiplication is necessary once the null space basis has been generated. If your matrix is too large (and the matrix nearly of full rank) that a call to null will fail here, then you will need to do more work. However, n = 4096 is not excessively large as long as the matrix does not have too many columns.
One alternative if null is too much is a call to svds, to find those singular vectors that are essentially zero. These will form the nullspace basis that we need.

I would use sprank for sparse matrixes. Check it out, it might be faster than any other method.
Edit : As pointed out correctly by #IanHincks, it is not the rank. I am leaving the answer here, just in case someone else will need it in the future.

Maybe you can try to solve the system A*x=v, if it has a solution that means that the rank does not increase.
x=(B\A)';
norm(A*x-B) %% if this is small then the rank does not increase

Find the "largest" dense sub matrix in a large sparse matrix

Given a large sparse matrix (say 10k+ by 1M+) I need to find a subset, not necessarily continuous, of the rows and columns that form a dense matrix (all non-zero elements). I want this sub matrix to be as large as possible (not the largest sum, but the largest number of elements) within some aspect ratio constraints.
Are there any known exact or aproxamate solutions to this problem?
A quick scan on Google seems to give a lot of close-but-not-exactly results. What terms should I be looking for?
edit: Just to clarify; the sub matrix need not be continuous. In fact the row and column order is completely arbitrary so adjacency is completely irrelevant.
A thought based on Chad Okere's idea
Order the rows from largest count to smallest count (not necessary but might help perf)
Select two rows that have a "large" overlap
Add all other rows that won't reduce the overlap
Record that set
Add whatever row reduces the overlap by the least
Repeat at #3 until the result gets to small
Start over at #2 with a different starting pair
Continue until you decide the result is good enough

I assume you want something like this. You have a matrix like
1100101
1110101
0100101
You want columns 1,2,5,7 and rows 1 and 2, right? That submatrix would 4x2 with 8 elements. Or you could go with columns 1,5,7 with rows 1,2,3 which would be a 3x3 matrix.
If you want an 'approximate' method, you could start with a single non-zero element, then go on to find another non-zero element and add it to your list of rows and columns. At some point you'll run into a non-zero element that, if it's rows and columns were added to your collection, your collection would no longer be entirely non-zero.
So for the above matrix, if you added 1,1 and 2,2 you would have rows 1,2 and columns 1,2 in your collection. If you tried to add 3,7 it would cause a problem because 1,3 is zero. So you couldn't add it. You could add 2,5 and 2,7 though. Creating the 4x2 submatrix.
You would basically iterate until you can't find any more new rows and columns to add. That would get you too a local minimum. You could store the result and start again with another start point (perhaps one that didn't fit into your current solution).
Then just stop when you can't find any more after a while.
That, obviously, would take a long time, but I don't know if you'll be able to do it any more quickly.

I know you aren't working on this anymore, but I thought someone might have the same question as me in the future.
So, after realizing this is an NP-hard problem (by reduction to MAX-CLIQUE) I decided to come up with a heuristic that has worked well for me so far:
Given an N x M binary/boolean matrix, find a large dense submatrix:
Part I: Generate reasonable candidate submatrices
Consider each of the N rows to be a M-dimensional binary vector, v_i, where i=1 to N
Compute a distance matrix for the N vectors using the Hamming distance
Use the UPGMA (Unweighted Pair Group Method with Arithmetic Mean) algorithm to cluster vectors
Initially, each of the v_i vectors is a singleton cluster. Step 3 above (clustering) gives the order that the vectors should be combined into submatrices. So each internal node in the hierarchical clustering tree is a candidate submatrix.
Part II: Score and rank candidate submatrices
For each submatrix, calculate D, the number of elements in the dense subset of the vectors for the submatrix by eliminating any column with one or more zeros.
Select the submatrix that maximizes D
I also had some considerations regarding the min number of rows that needed to be preserved from the initial full matrix, and I would discard any candidate submatrices that did not meet this criteria before selecting a submatrix with max D value.

Is this a Netflix problem?
MATLAB or some other sparse matrix libraries might have ways to handle it.
Is your intent to write your own?
Maybe the 1D approach for each row would help you. The algorithm might look like this:
Loop over each row
Find the index of the first non-zero element
Find the index of the non-zero row element with the largest span between non-zero columns in each row and store both.
Sort the rows from largest to smallest span between non-zero columns.
At this point I start getting fuzzy (sorry, not an algorithm designer). I'd try looping over each row, lining up the indexes of the starting point, looking for the maximum non-zero run of column indexes that I could.
You don't specify whether or not the dense matrix has to be square. I'll assume not.
I don't know how efficient this is or what its Big-O behavior would be. But it's a brute force method to start with.

EDIT. This is NOT the same as the problem below.. My bad...
But based on the last comment below, it might be equivilent to the following:
Find the furthest vertically separated pair of zero points that have no zero point between them.
Find the furthest horizontally separated pair of zero points that have no zeros between them ?
Then the horizontal region you're looking for is the rectangle that fits between these two pairs of points?
This exact problem is discussed in a gem of a book called "Programming Pearls" by Jon Bentley, and, as I recall, although there is a solution in one dimension, there is no easy answer for the 2-d or higher dimensional variants ...
The 1=D problem is, effectively, find the largest sum of a contiguous subset of a set of numbers:
iterate through the elements, keeping track of a running total from a specific previous element, and the maximum subtotal seen so far (and the start and end elemnt that generateds it)... At each element, if the maxrunning subtotal is greater than the max total seen so far, the max seen so far and endelemnt are reset... If the max running total goes below zero, the start element is reset to the current element and the running total is reset to zero ...
The 2-D problem came from an attempt to generate a visual image processing algorithm, which was attempting to find, within a stream of brightnesss values representing pixels in a 2-color image, find the "brightest" rectangular area within the image. i.e., find the contained 2-D sub-matrix with the highest sum of brightness values, where "Brightness" was measured by the difference between the pixel's brighness value and the overall average brightness of the entire image (so many elements had negative values)
EDIT: To look up the 1-D solution I dredged up my copy of the 2nd edition of this book, and in it, Jon Bentley says "The 2-D version remains unsolved as this edition goes to print..." which was in 1999.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

The difference between size and capacity of matrix in igraph - matrix

In igraph, we can use igraph_matrix_size to get the size of a matrix, and we can use igraph_matrix_capacity to get the capacity of a matrix, but who can tell me what is the difference between them? Thank you. Please.

Related

Runtime of Dynamic Programming Solution from Previous Post (Balls into Bins)

Merging two binary strings and separating them

Eigen3 - accessing a (non contiguous) subset of vector elements

Fastest way to check if a vector increases matrix rank

Find the "largest" dense sub matrix in a large sparse matrix

Categories

Resources