The Puzzle on the graph - algorithm

Given an undirected graph G=(V,E), each node i is associated with 'Ci' number of objects. At each step, for every node i, the Ci objects are divided up equally among i's neighbors. After K steps, output the number of objects of the top five nodes which has the most objects.
Here is one example of what happens in one step:
Objects of A is divided equally by B and C.
Objects of B is divided equally by A and C.
Objects of C is divided equally by A and B.
Some Constrains:
|V|<10^5, |E|<2*10^5, K<10^7, Ci<1000
My current idea is: represent the transformation in each step with a matrix.
This problem is converted to the calculation of the power of matrix. But this solution is much too slow considering |V| can be 10^5.
Is there any faster way to do it?

The matrix equation for a single step is like M x = x', where x is a vector of current node contents, and x' is the contents after one step. That is, x' = M x. The contents at the step after that is x" = M x' = M(M x). An example of M follows, where the graph's adjacency matrix is shown at left. The column headed #nbr is the number of neighbors of nodes a, b ... e. Matrix M is formed from the adjacency matrix by replacing each 1 with a fraction equal to the number of ones in the same column.
a b c d e #nbr matrix M
a 0 0 1 1 0 2 0 0 1/3 1/4 0
b 0 0 0 1 0 1 0 0 0 1/4 0
c 1 0 0 1 1 3 1/2 0 0 1/4 1/2
d 1 1 1 0 1 4 1/2 1 1/3 0 1/2
e 0 0 1 1 0 2 0 0 1/3 1/4 0
To do K steps starting with initial contents x, just compute (M^K) x. Use an exponentiation method that requires lg K matrix multiplications, lg representing logarithms to base 2. As matrix multiplication typically is of O(n^3) complexity, this method is O(lg K * n^3) if straightforwardly implemented, or O(lg K * n^2.376) if using Coppersmith/Winograd algorithm. The complexity can be reduced to O(n^2.376) – that is, we can drop the lg K multiplier – by diagonalizing M into form (P^-1)AP, from which M^K = (P^-1)(A^K)P, and A^K is an O(n lg K) operation, giving O(n^2.376) overall. Diagonalization typically costs O(n^3), but is O(n^2.376) using Coppersmith/Winograd algorithm.

Related

Median of areas given in a matrix

Given a matrix (n x n) of 1 and 0, where 1 represent land and 0 represent water.
How can I find the median of the area of the lands in the most efficient way?
For Example:
1 1 0 0 0
1 0 0 1 1
1 0 1 0 0
There are three islands, the area of them [1,2,4] and the median is 2
An island can be consist of continuous non-diagonal cells which contain 1:
For example:
1 0 1
0 1 0
this matrix contains three islands of areas [1,1,1]
My solution is finding recursively the areas and then sort them to find the median which takes O(n^2log(n^2)), is there a more efficient way to do that?
First step, run DFS recursively on the grid and discover all the islands & calculate areas in O(n^2) time.
Second step, You can use Median of Medians algorithm to calculate the median of unsorted island's areas array in expected O(m) time where m is the number of islands.
Overall time complexity O(n^2).
If you need further help, I can provide my implementation.
Using a disjoint set gives you O(A(N)), where A is inverse Ackermann function to find the Islands, then using an nth_element (aka IntroSelect) to find the N/2 in O(N) to find the median.
sets = DisjointSet(matrix)
median = nth_element(sets, N/2)
For a total of O(A(N)) far less than O(N^2)

Given a directed weighted graph with self loops ,find the list of nodes that are exactly k dist from a given node x?

Each edge in the graph has weight of 1,The graph may have cycles ,if a node has self loop it can be any distance from itself from 0 to infinity , depending on the no. of time we take the self loop.
I have solved the problem using bfs, but the constraint on distance is in order of 10^9 ,hence bfs is slow.
We ll be asked multiple queries on a given graph of the form
(distance , source)
and the o/p is the list of nodes that are exactly at the given distance starting from the source vertex.
Constraints
1<=Nodes<=500
1<queries<=500
1<=distance<=10^9
I have a feeling ,there would be many repeated computations as the no. of nodes are small,but i am not able to figure out how do i reduce the problem in smaller problems.
What is the efficient way to do this?
Edit : I have tried using matrix exponentiation but its too slow ,for the given constraints. The problem has a time limit of 1 sec.
Let G = (V,E) be your graph, and define the adjacency matrix A as follows:
A[i][j] = 1 (V[i],V[j]) is in E
0 otherwise
In this matrix, for each k:
(A^k)[i][j] > 0 if and only if there is a path from v[i] to v[j] of length exactly k.
This means by creating this matrix and then calculating the exponent, you can easily get your answer.
For fast exponent calculation you can use exponent by squaring, which will yield O(M(n)^log(k)) where M(n) is the cost for matrix multiplication for nXn matrix.
This will also save you some calculation when looking for different queries on the same graph.
Appendix - claim proof:
Base: A^1 = A, and indeed in A by definition, A[i][j]=1 if and only if (V[i],V[j]) is in E
Hypothesis: assume the claim is correct for all l<k
A^k = A^(k-1)*A. From induction hypothesis, A^(k-1)[i][j] > 0 iff there is a path of length k-1 from V[i] to V[j].
Let's examine two vertices v1,v2 with indices i and j.
If there is a path of length k between them, let it be v1->...->u->v2. Let the index of u be m.
From i.h. A^(k-1)[i][m] > 0 because there is a path. In addition A[m][j] = 1, because (u,v2) = (V[m],V[j]) is an edge.
A^k[i][j] = A^(k-1)*A[i][j] = A^(k-1)[i][1]A[1][j] + ... + A^(k-1)[i][m]A[m][j] + ... + A^(k-1)[i][n]A[n][j]
And since A[m][j] > 0 and A^(k-1)[i][m] > 0, then A^(k-1)*A[i][j] > 0
If there is no such path, then for each vertex u such that (u,v2) is an edge, there is no path of length k-1 from v to u (otherweise v1->..->u->v2 is a path of length k).
Then, using induction hypothesis we know that if A^(k-1)[i][m] > 0 then A[m][j] = 0, for all m.
If we assign that in the sum defining A^k[i][j], we get that A^k[i][j] = 0
QED
Small note: Technically, A^k[i][j] is the number of paths between i and j of length exactly k. This can be proven similar to above but with a bit more attention to details.
To avoid the numbers growing too fast (which will increase M(n) because you might need big integers to store that value), and since you don't care for the value other than 0/1 - you can treat the matrix as booleans - using only 0/1 values and trimming anything else.
if there are cycles in your graph, then you can infer that there is a link between each adjacent nodes in cycle * N + 1, because you can iterate through as much as you wish.
That bring me to the idea, we can use the cycles to our advantage!
using BFS while detecting a cycle, we calculate offset + cycle*N and then we get as close to our goal(K)
and search for the K pretty easily.
e.g.
A -> B -> C -> D -> B
K = 1000;
S = A;
A - 0
B - 1
C - 2
D - 3
B - 1 (+ 4N)
here you can check floor() of k - (1+4N) = 0 > 1000 - 1 - 4N = 0 > 999 = 4N > N=249 => best B is 249*4 + 1 = 997
simpler way would be to calculate: round(k - offset, cycle)
from here you can count only few more steps.
in this example (as a REGEX): A(BCD){249}BCD

Complexity of searching sorted matrix

Suppose we have a matrix of size NxN of numbers where all the rows and columns are in increasing order, and we want to find if it contains a value v. One algorithm is to perform a binary search on the middle row, to find the elements closest in value to v: M[row,col] < v < M[row,col+1] (if we find v exactly, the search is complete). Since the matrix is sorted we know that v is larger than all elements in the sub-matrix M[0..row, 0..col] (the top-left quadrant of the matrix), and similarly it's smaller than all elements in the sub-matrix M[row..N-1, col+1..N-1] (the bottom right quadrant). So we can recursively search the top right quadrant M[0..row-1, col+1..N-1] and the bottom left quadrant M[row+1..N-1, 0..col].
The question is what is the complexity of this algorithm ?
Example: Suppose we have the 5x5 matrix shown below and we are searching for the number 25:
0 10 20 30 40
1 11 21 31 41
2 12 22 32 42
3 13 23 33 43
4 14 24 34 44
In the first iteration we perform binary search on the middle row and find the closest element which is smaller than 25 is 22 (at row=2 col=2). So now we know 25 is larger than all items in the top-left 3x3 quadrant:
0 10 20
1 11 21
2 12 22
Similary we know 25 is smaller than all elements in the bottom right 3x2 quadrant:
32 42
33 43
34 44
So, we recursively search the remaining quadrants - the top right 2x2:
30 40
31 41
and the bottom left 2x3:
3 13 23
4 14 24
And so on. We essentially divided the matrix into 4 quadrants (which might be of different sizes depending on the result of the binary search on the middle row), and then we recursively search two of the quadrants.
The worst-case running time is Theta(n). Certainly this is as good as it gets for correct algorithms (consider an anti-diagonal, with elements less than v above and elements greater than v below). As far as upper bounds go, the bound for an n-row, m-column matrix is O(n log(2 + m/n)), as evidenced by the correct recurrence
m-1
f(n, m) = log m + max [f(n/2, j) + f(n/2, m-1 - j)],
j=0
where there are two sub-problems, not one. This recurrence is solvable by the substitution method.
?
f(n, m) ≤ c n log(2 + m/n) - log(m) - 2 [hypothesis; c to be chosen later]
m-1
f(n, m) = log m + max [f((n-1)/2, j) + f((n-1)/2, m-j)]
j=0
m-1
≤ log m + max [ c (n/2) log(2 + j/(n/2)) - log(j) - 2
+ c (n/2) log(2 + (m-j)/(n/2))] - log(m-j) - 2]
j=0
[fixing j = m/2 by the concavity of log]
≤ log m + c n log(2 + m/n) - 2 log(m/2) - 4
= log m + c n log(2 + m/n) - 2 log(m) - 2
= c n log(2 + m/n) - log(m) - 2.
Set c large enough that, for all n, m,
c n log(2 + m/n) - log(m) - 2 ≥ log(m),
where log(m) is the cost of the base case n = 1.
If you find your element after n steps, then the searchable range has size N = 4^n. Then, time complexity is O(log base 4 of N) = O(log N / log 4) = O(0.5 * log N) = O(log N).
In other words, your algorithm is two times faster then binary search, which is equal to O(log N)
A consideration on binary search on matrices:
Binary search on 2D matrices and in general ND matrices are nothing different than binary search on sorted 1D vectors. Infact C for instance store them in row-major fashion(as concat of rows from: [[row0],[row1]..[rowk]]
This means one can use the well-known binary search on matrix as following (with complexity log(n*m)):
template<typename T>
bool binarySearch_2D(T target,T** matrix){
int a=0;int b=NCELLS-1;//ROWS*COLS
bool found=false;
while(!found && a <= b){
int half=(a+b)/2;
int r=half/COLS;
int c=half-(half/COLS)*COLS;
int v =matrix[r][c];
if(v==target)
found=true;
else if(target > v)
a=half+1;
else //target < v
b=half-1;
}
return found;
}
The complexity of this algorithm will be -:
O(log2(n*n))
= O(log2(n))
This is because you are eliminating half of the matrix in one iteration.
EDIT -:
Recurrence relation -:
Assuming n to be the total number of elements in the matrix,
=> T(n) = T(n/2) + log(sqrt(n))
=> T(n) = T(n/2) + log(n^(1/2))
=> T(n) = T(n/2) + 1/2 * log(n)
Here, a = 1, b = 2.
Therefore, c = logb(a) = log2(1) = 0
=> n^c = n^0
Also, f(n) = n^0 * 1/2 * log(n)
According to case 2 of Master Theorem,
T(n) = O((log(n))^2)
You can use a recursive function and apply the master theorem to find the complexity.
Assume n is the number of elements in the matrix.
Cost for one step is binary search on sqrt(n) elements and you get two problems, in worst case same size each with n/4 elements: 2*T(n/4). So we have:
T(n)=2*T(n/4)+log(sqrt(n))
equal to
T(n)=2*T(n/4)+log(n)/2
Now apply master theorem case 1 (a=2, b=4, f(n)=log(n)/2 and f(n) in O(n^log_b(a))=O(n^(1/2)) therefore we have case 1)
=> Total running time T(n) is in O(n^(a/b)) = O(n^(1/2))
or equal to
O(sqrt(n))
which is equal to height or width of the matrix if both sides are the same.
Let's assume that we have the following matrix:
1 2 3
4 5 6
7 8 9
Let's search for value 7 using binary search as you specified:
Search nearest value to 7 in middle row: 4 5 6, which is 6.
Hmm we have a problem, 7 is not in the following submatrix:
6
9
So what to do? One solution would be to apply binary search to all rows, which has a complexity of nlog(n). So walking the matrix is a better solution.
Edit:
Recursion relation:
T(N*N) = T(N*N/2) + log(N)
if we normalize the function to one variable with M = N^2:
T(M) = T(M/2) + log(sqrt(M))
T(M) = T(M/2) + log(M)/2
According to Master Theorem Case #2, complexity is
(log(M))^2
=> (2log(N))^2
=> (log(N))^2
Edit 2:
Sorry I answered your question from my mobile, now when you think about it, M[0...row-1, col+1...N-1] doesn't make much sense right? Consider my example, if you search for a value that is smaller than all values in the middle row, you'll always end up with the leftmost number. Similarly, if you search for a value that is greater than all values in the middle row, you'll end up with the rightmost number. So the algorithm can be reworded as follows:
Search middle row with custom binary search that returns 1 <= idx <= N if found, idx = 0 or idx = N+1 if not found. After binary search if idx = 0, start the search in the upper submatrix: M[0...row][0...N].
If the index is N + 1 start the search in the lower submatrix: M[row+1...N][0...N]. Otherwise, we are done.
You suggest that complexity should be: 2T(M/4) + log(M)/2. But at each step, we divide the whole matrix by two and only process one of them.
Moreover, if you agree that T(N*N) = T(N*N/2) + log(N) is correct, than you can substitute all N*N expressions with M.

Matrix chain multiplication algorithm

I am reading Thoman Cormen's "Introduction to Algorithms" and I have problems understanding the algorithm written below.
Matrix-Chain-Order(p)
1 n ← length[p] − 1
2 for i ← 1 to n
3 do m[i, i] ← 0
4 for l ← 2 to n //l is the chain length.
5 do for i ← 1 to n − l + 1 // what is this?
6 do j ← i + l − 1 // what is this?
7 m[i, j] ← ∞
8 for k ← i to j − 1
9 do q ← m[i, k] + m[k + 1, j] + pi−1pkpj
10 if q < m[i, j]
11 then m[i, j] ← q
12 s[i, j] ← k
13 return m and s
Now, I know how the algorithm works. I know how to proceed in constructing the table and all that. In other words I know what happens up to line 4 and I also know what 9 to 13 is about.
I have problems understanding the subtleties of the "for" loops though. Lines 4 to 8 are difficult to understand. In line 5 why does i go up to n-l+1 and why is j in line 6 set to i+l-1. In line 7 ,m[i, j] is initialized for the comparison in line 10 but then again line 8 is a mystery.
I was just going through the algorithm definition on wikipedia and it's pretty comprehensive there. I'll try to explain you how I understood the solution.
The crux of the problem is we are basically trying to 'parenthesise' i.e. prioritize how we chain our matrices so that they are multiplied most efficiently and it's reflected in this line of code:
q = m[i,k] + m[k+1,j] + p[i-1]*p[k]*p[j];
To understand the above stand, first let's establish that i and j are fixed here i.e. we are trying to compute m[i,j] or the most efficient way to multiply matrices A[i..j] and k is the variable.
So at a very high level if i=1 and j=3 and the matrices are :
(A*B)*C //We are trying to establish where the outer most parenthesis should be
We don't know where it should be, hence we try all possibilities and pick the combination where m[i,j] is minimized. So we try:
i=1 and j=3
A*(B*C) //k=1
(A*B)*C //k=2
So clearly k should vary from i to j-1 which is reflected in the loop as we try all possible combinations and take the most efficient one. So for any k we'll have two partitions: A[i..k] and A[k+1...j]
So the cost of multiplication of A[i..j] for this partition of k is:
m[i,k] //Minimum cost of multiplication of A[i..k]
m[k+1,j] //Minimum cost of multiplication of A[k+1..j]
p[i-1]*p[k]*p[j]; //Final cost of multiplying the two partitions i.e. A[i..k] and A[k+1..j], where p contains the dimensions of the matrices.
A is a 10 × 30 matrix, B is a 30 × 5 matrix, and C is a 5 × 60 matrix. Then,
p[] = [10,30,5,60] i.e. Matrix Ai has dimension p[i-1] x p[i] for i = 1..n
This is what dynamic programming is all about. So we try all combinations of k and calculate m[i,j] but for that we also need to calculate m[i,k] and m[k+1,j] i.e. we break our problem down into smaller sub problems where the concept of chain length comes in.
So for all the matrices A[i..n] we calculate the most efficient way of multiplying a smaller chain of matrices of length l.
The smallest value of l is obviously 2 and the largest is n which is what we would get after we solve the smaller sub problems like I explained.
Let's come to the piece of code you are having trouble understanding:
for l ← 2 to n //l is the chain length.
do for i ← 1 to n − l + 1
do j ← i + l − 1
m[i, j] ← ∞
Now let's again consider a smaller example of 4 matrices H,I,J,K and you are looking at first chain lengths of 2. So when traversing the array of matrices.
A[1..4] = H,I,J,K //where A[1] = H and A[4] = K
For l = 2
Our loop should go from i=1 to i=3, as for every i we are looking at the chain of length 2.
So when i = 1, we would compute
m[1,2] i.e. minimum cost to multiply chain (H,I)
and when i = 3, we would compute
m[3,4] i.e. minimum cost to multiply chain (J,K)
When chain length is 3, we would have:
For i=1, j=3
m[i,j] -> m[1,3] i.e. minimum cost to multiply chain (H,I,J)
For i=2, j=4
m[i,j] -> m[2,4] i.e. minimum cost to multiply chain (I,J,K)
Hence when we define i to not exceed n-l+1 and j=i+l-1, we are making sure we are covering all the elements of the array and not exceeding the boundary condition i.e. the size of the array which is n and j defines the end of the chain starting from i with length l.
So the problem comes down to calculating m[i,j] for some i and j which as I explained earlier is solved by taking a partition k and trying out all possible values of k and then re-defining m[i,j] as the minimum value which is why it is initialized as ∞.
I hope my answer wasn't too long and it gives you clarity as to how the algorithm flows and helps you appreciate the sheer vastness of dynamic programming.

Number of binary n x m matrices modulo c, with at most k consecutive number of 1 in each column

I am trying to compute the number of nxm binary matrices with at most k consecutive values of 1 in each column. After a few researches, I've figured out that it will be enough to find the vectors with 1 column and n lines. For example, if we have p number of vectors the required number of matrices would be m^p.
Because n and m are very large (< 2.000.000) i can't find a suitable solution. I am trying to find a recurrence formula in order to build a matrix to help me compute the answer. So could you suggest me any solution?
There's a (k + 1)-state dynamic program (state = number of previous 1s, from 0 to k). To make a long story short, you can compute large terms of it quickly by taking the nth power of the k + 1 by k + 1 integer matrix like (example for k = 4)
1 1 0 0 0
1 0 1 0 0
1 0 0 1 0
1 0 0 0 1
1 0 0 0 0
modulo c and summing the first row.

Resources