Algorithms : martix traversal variation - algorithm

There is an N × N square mesh-shaped grid of wires, as shown in a figure below. Nodes of the grid are at points (X, Y), where X and Y are integers from 0 to N−1. An electric current flows through the grid, between the nodes at (0, 0) and (N−1, N−1).
Initially, all the wires conduct the current, but the wires burn out at a rate of one per second. The burnouts are described by three zero-indexed arrays of integers, A, B and C, each of size M. For each moment T (0 ≤ T < M), in the T-th second the wire between nodes (A[T], B[T]) and:
(A[T], B[T] + 1), if C[T] = 0 or
(A[T] + 1, B[T]), if C[T] = 1
burns out. You can assume that the arrays describe existing wires, and that no wire burns out more than once. Your task is to determine when the current stops flowing between the nodes at (0,0) and (N−1,N−1).
Write a function:
int wire_burnouts(int N, int A[], int M, int B[], int M2, int C[], int M3);
that, given integer N and arrays A, B and C, returns the number of seconds after which the current stops flowing between the nodes at (0, 0) and (N−1, N−1). If the current keeps flowing even after all M wires burn out, the function should return −1.
For example, given N = 4, M = 9 and the following arrays:
A[0] = 0 B [0] = 0 C[0] = 0
A1 = 1 B 1 = 1 C1 = 1
A2 = 1 B 2 = 1 C2 = 0
A[3] = 2 B [3] = 1 C[3] = 0
A[4] = 3 B [4] = 2 C[4] = 0
A[5] = 2 B [5] = 2 C[5] = 1
A[6] = 1 B [6] = 3 C[6] = 1
A[7] = 0 B [7] = 1 C[7] = 0
A[8] = 0 B [8] = 0 C[8] = 1
your function should return 8, because just after the eighth wire burns out, there is no connection between the nodes at (0, 0) and (N−1, N−1). This situation is shown in the following figure:
Given N = 4, M = 1 and the following arrays:
A[0] = 0 B [0] = 0 C[0] = 0
your function should return −1, because burning out a single wire cannot break the connection between the nodes at (0, 0) and (N−1, N−1).
Assume that:
N is an integer within the range [1..400];
M is an integer within the range [0..2*N*(N−1)];
each element of array A is an integer within the range [0..N−1];
each element of array B is an integer within the range [0..N−1];
each element of array C is an integer within the range [0..1].
expected worst-case time complexity is O(N2*log(N));
expected worst-case space complexity is O(N2), beyond input storage (not counting the storage required for input arguments).

Construct complete grid of wires. Then destroy first M/2 wires. Check connectivity with depth-first search. If still connected, destroy M/4 more wires. If not, restore M/4 most recently destroyed wires. Continue this binary search until proper T is found.
Time complexity is determined by number of depth-first searches: O(log M) <= O(log N) and complexity of each depth-first search: O(N2).
Previous result may be improved with Disjoint-set data structure.
Construct complete grid of wires. Then destroy M wires as directed by arrays A, B, and C. Add the remaining connected components of the grid to disjoint-set data structure.
Then sequentially restore wires, starting from the last elements of these arrays and coming to their first elements. While doing this, find union of the sets remaining in disjoint-set structure. Stop when sets containing nodes (0, 0) and (N−1, N−1) are joined together.
If disjoint-set data structure uses union by rank and path compression approaches, time complexity of the whole algorithm is O(N2 α(N)), where α is the inverse Ackermann function. This is practically as good as O(N2).
Previous result may be improved if we use a graph, dual to the original grid of wires: node of the dual graph corresponds to face of the original graph, each edge of the dual graph intersects corresponding edge of the original graph. Two additional nodes will be needed: node L connected to every top and left node of the dual graph and node R connected to every bottom and right node.
If this dual graph contains a path from L to R, nodes (0, 0) and (N−1, N−1) cannot be connected to each other. If there is no path from L to R, nodes (0, 0) and (N−1, N−1) are connected.
Initially dual graph is completely disconnected. While removing edges from the original graph, we add corresponding edges to dual graph. At the same time we update disjoint-set data structure. Stop as soon as the sets containing nodes L and R are joined together.
This algorithm needs to visit elements of its input arrays A, B, and C only once, which makes it an Online algorithm.
The most limiting factor for time complexity is now the initialization time for the array of dual graph's nodes: O(N2). If there is a way to avoid this initialization, we get asymptotically more efficient O(M α(M)) algorithm. There are several approaches to initialization problem:
Use this trick to initialize array in O(1) time. This gives O(M α(M)) worst case time algorithm. But in practice it is rarely possible to allocate memory without initializing it (for security reasons).
Initialize array once and then use this algorithm many times. This gives O(M α(M)) amortized time algorithm.
Use hash table to store dual graph's nodes. This gives O(M α(M)) expected time algorithm. Also this improves space complexity to O(M).


Maximum profit earned on weighted un-directed tree

I came across this problem while giving a sample test. The problem was that we have given a tree which is undirected. We can start from any node of our choice. Initially we have power "P" and while going from one node to other node we loose some power "X" (consider as cost of travelling) and earn some profit "Y".
So we need to tell that what is the maximum profit that we can earn with a given power ?
Example: First line contains number of nodes and initial power
Next n-1 lines contains node-node-cost-profit
5 4
1 2 1 2
1 3 2 3
1 4 2 4
4 5 2 2
Answer => 7. We can start from 4 and go to 1 and than to 3.
I have applied DFS on this to get maximum profit earned by traversing every single path.
But is there a way to decrease time ???
from collections import defaultdict
class tree:
def __init__(self,nodes):
self.nodes = nodes
self.graph = defaultdict(list)
def add(self,a,b,charge,profit):
def start(self,power):
maxi = -1
visited = [False for i in range(self.nodes)]
for i in range(1,self.nodes+1):
powers = power
visited[i-1] = True
for j in self.graph[i]:
temp = self.dfs(j,powers,0,visited)
if temp > maxi:
maxi = temp
visited[i-1] = False
return maxi
def dfs(self,node,powers,profit,visited):
if powers-p < 0:
return 0
if powers-p == 0:
return profit + pro
profit += pro
powers = powers-p
visited[v-1] = True
tempo = profit
for k in self.graph[v]:
if visited[k[0]-1] == False:
temp = self.dfs(k,powers,tempo,visited)
if temp > profit:
profit = temp
visited[v-1] = False
return profit
t = tree(5)
You want to find all paths that have length up to P and take maximum of their profits. You can achieve it in O(n log^2 n) time using centroid decomposition.
Consider all subtrees that you create by deleting a centroid C from the tree. Let's say you have found all paths of length less or equal P and taken a maximum of them (now you'll only consider paths that contain C). Using DFS calculate distance and profit from C to each other node in the tree and store them as pairs in multiset.
For each subtree do:
delete from multiset every pair of values of node from that subtree - O(n log n)
copy all the pairs from the multiset to a list L1 - O(n)
create list L2 of pairs (distance, profit) from the current subtree and sort it by distance in decreasing order - O(n log n)
create variable maxx = 0 and i = 0
for each pair X in L2:
while L1[i] <= P - X.distance do: maxx = max(maxx, L1[i].profit), ++i
result = max(result, maxx + X.profit)
all of it will take at most O(n)
insert all pairs from L2 back to multiset - O(n log n)
Time complexity: O(n log n)
Now you have calculated maximum profit of all paths of length equal or less P in the tree. To get all the values in subtrees run the same algorithm recursively. Since there are at most O(log n) layers using centroid decomposition total complexity is O(n log^2 n).

Why solving a range minimum query with segment tree time complexity is O(Log n)?

i was trying to solve how to find in a given array and two indexes the minimum value between these two indexes in O(Log(n)).
i saw the solution of using a segment-tree but couldn't understand why the time complexity for this solution is O(Logn) because it doesnt seems like this because if your range is not exactly within the nod's range you need to start spliting the search.
First proof:
The claim is that there are at most 2 nodes which are expanded at each level. We will prove this by contradiction.
Consider the segment tree given below.
Let's say that there are 3 nodes that are expanded in this tree. This means that the range is from the left most colored node to the right most colored node. But notice that if the range extends to the right most node, then the full range of the middle node is covered. Thus, this node will immediately return the value and won't be expanded. Thus, we prove that at each level, we expand at most 2 nodes and since there are logn levels, the nodes that are expanded are 2⋅logn=Θ(logn).
Second proof:
There are four cases when query the interval (x,y)
FIND(R,x,y) //R is the node
% Case 1
if R.first = x and R.last = y
return {R}
% Case 2
if y <= R.middle
return FIND(R.leftChild, x, y)
% Case 3
if x >= R.middle + 1
return FIND(R.rightChild, x, y)
% Case 4
P = FIND(R.leftChild, x, R.middle)
Q = FIND(R.rightChild, R.middle + 1, y)
return P union Q.
Intuitively, first three cases reduce the level of tree height by 1, since the tree has height log n, if only first three cases happen, the running time is O(log n).
For the last case, FIND() divide the problem into two subproblems. However, we assert that this can only happen at most once. After we called FIND(R.leftChild, x, R.middle), we are querying R.leftChild for the interval [x, R.middle]. R.middle is the same as R.leftChild.last. If x > R.leftChild.middle, then it is Case 1; if x <= R.leftChild, then we will call
FIND ( R.leftChild.leftChild, x, R.leftChild.middle );
FIND ( R.leftChild.rightChild, R.leftChild.middle + 1, , R.leftChild.last );
However, the second FIND() returns R.leftChild.rightChild.sum and therefore takes constant time, and the problem will not be separate into two subproblems (strictly speaking, the problem is separated, though one subproblem takes O(1) time to solve).
Since the same analysis holds on the rightChild of R, we conclude that after case4 happens the first time, the running time T(h) (h is the remaining level of the tree) would be
T(h) <= T(h-1) + c (c is a constant)
T(1) = c
which yields:
T(h) <= c * h = O(h) = O(log n) (since h is the height of the tree)
Hence we end the proof.

Fibonacci sums on a tree

Given a tree with n nodes (n can be as large as 2 * 10^5), where each node has a cost associated with it, let us define the following functions:
g(u, v) = the sum of all costs on the simple path from u to v
f(n) = the (n + 1)th Fibonacci number (n + 1 is not a typo)
The problem I'm working on requires me to compute the sum of f(g(u, v)) over all possible pairs of nodes in the tree modulo 10^9 + 7.
As an example, let's take a tree with 3 nodes.
without loss of generality, let's say node 1 is the root, and its children are 2 and 3
costs[1] = 2, cost[2] = 1, cost[3] = 1
g(1, 1) = 2; f(2) = 2
g(2, 2) = 1; f(1) = 1
g(3, 3) = 1; f(1) = 1
g(1, 2) = 3; f(3) = 3
g(2, 1) = 3; f(3) = 3
g(1, 3) = 3; f(3) = 3
g(3, 1) = 3; f(3) = 3
g(2, 3) = 4; f(4) = 5
g(3, 2) = 4; f(4) = 5
Summing all of the values, and taking the result modulo 10^9 + 7 gives 26 as the correct answer.
My attempt:
I implemented an algorithm to compute g(u, v) in O(log n) by finding the lowest common ancestor using a sparse table.
For the finding of the appropriate Fibonacci values, I tried two approaches, namely using exponentiation on the matrix form and another by noticing that the sequence modulo 10^9 + 7 is cyclical.
Now comes the extremely tricky part. No matter how I do the above computations, I still end up going to up to O(n^2) pairs when calculating the sum of all possible f(g(u, v)). I mean there's the obvious improvement of only going up to n * (n - 1) / 2 pairs but that's still quadratic.
What am I missing? I've been at it for several hours, but I can't see a way to get that sum without actually producing a quadratic algorithm.
To know how many times the cost of a node X is to be included in the total sum, we divide the other nodes into 3 (or more) groups:
the subtree A connected to the left of X
the subtree B connected to the right of X
(subtrees C, D... if the tree is not binary)
all other nodes Y, connected through X's parent
When two nodes belong to different groups, their simple path goes through X. So the number of simple paths that go through X is:
#Y + #A × (N - #A) + #B × (N - #B)
So by counting the total number of nodes N, and the size of the subtrees under X, you can calculate how many times the cost of node X should be included in the total sum. Do this for every node and you have the total cost.
The code for this could be straightforward. I'll assume that the total number of nodes N is known, and that you can add properties to the nodes (both of these assumptions simplify the algorithm, but it can be done without them).
We'll add a child_count to store the number of descendants of the node, and a path_count to store the number of simple paths that the node is part of; both are initialised to zero.
For each node, starting from the root:
If not all children have been visited, go to an unvisited child.
If all children have been visited (or node is leaf):
Increment child_count.
Increase path_count with N - child_count.
Add this node's path_count × cost to the total cost.
If the current node is the root, we're done; otherwise:
Increase the parent node's child_count with this node's child_count.
Increase the parent node's path_count with this node's child_count × (N - child_count).
Go to the parent node.
The below algorithm's running time is O(n^3).
Tree is a strongly connected graph without loops. So when we want to get all possible pairs' costs, we are trying to find the shortest paths for all pairs. Thus, we can use Dijkstra's idea and dynamic programming approach for this problem (I took it from Weiss's book). Then we apply Fibonacci function to the cost, assuming that we already have a table to look up.
Dijkstra's idea: We start from the root and search all simple paths from the root to all other nodes and then do that for other vertices on the graph.
Dynamic programming approach: We use a 2D matrix D[][] to represent the lowest path/cost (They could be used exchangeably.) between node i and node j. Initially, D[i][i] is already set. If node i and node j is parent/child, D[i][j] = g(i, j), which is the cost between them. If node k is on the path which has lower cost for node i and node j, we can update the D[i][j], i.e., D[i][j] = D[i][k] + D[k][j] if D[i][j] < D[i][k] + D[k][j] else D[i][j].
When done, we check D[][] matrix and apply Fibonacci function to each cell and add them up, and also apply modulo operation.

Given a directed weighted graph with self loops ,find the list of nodes that are exactly k dist from a given node x?

Each edge in the graph has weight of 1,The graph may have cycles ,if a node has self loop it can be any distance from itself from 0 to infinity , depending on the no. of time we take the self loop.
I have solved the problem using bfs, but the constraint on distance is in order of 10^9 ,hence bfs is slow.
We ll be asked multiple queries on a given graph of the form
(distance , source)
and the o/p is the list of nodes that are exactly at the given distance starting from the source vertex.
I have a feeling ,there would be many repeated computations as the no. of nodes are small,but i am not able to figure out how do i reduce the problem in smaller problems.
What is the efficient way to do this?
Edit : I have tried using matrix exponentiation but its too slow ,for the given constraints. The problem has a time limit of 1 sec.
Let G = (V,E) be your graph, and define the adjacency matrix A as follows:
A[i][j] = 1 (V[i],V[j]) is in E
0 otherwise
In this matrix, for each k:
(A^k)[i][j] > 0 if and only if there is a path from v[i] to v[j] of length exactly k.
This means by creating this matrix and then calculating the exponent, you can easily get your answer.
For fast exponent calculation you can use exponent by squaring, which will yield O(M(n)^log(k)) where M(n) is the cost for matrix multiplication for nXn matrix.
This will also save you some calculation when looking for different queries on the same graph.
Appendix - claim proof:
Base: A^1 = A, and indeed in A by definition, A[i][j]=1 if and only if (V[i],V[j]) is in E
Hypothesis: assume the claim is correct for all l<k
A^k = A^(k-1)*A. From induction hypothesis, A^(k-1)[i][j] > 0 iff there is a path of length k-1 from V[i] to V[j].
Let's examine two vertices v1,v2 with indices i and j.
If there is a path of length k between them, let it be v1->...->u->v2. Let the index of u be m.
From i.h. A^(k-1)[i][m] > 0 because there is a path. In addition A[m][j] = 1, because (u,v2) = (V[m],V[j]) is an edge.
A^k[i][j] = A^(k-1)*A[i][j] = A^(k-1)[i][1]A[1][j] + ... + A^(k-1)[i][m]A[m][j] + ... + A^(k-1)[i][n]A[n][j]
And since A[m][j] > 0 and A^(k-1)[i][m] > 0, then A^(k-1)*A[i][j] > 0
If there is no such path, then for each vertex u such that (u,v2) is an edge, there is no path of length k-1 from v to u (otherweise v1->..->u->v2 is a path of length k).
Then, using induction hypothesis we know that if A^(k-1)[i][m] > 0 then A[m][j] = 0, for all m.
If we assign that in the sum defining A^k[i][j], we get that A^k[i][j] = 0
Small note: Technically, A^k[i][j] is the number of paths between i and j of length exactly k. This can be proven similar to above but with a bit more attention to details.
To avoid the numbers growing too fast (which will increase M(n) because you might need big integers to store that value), and since you don't care for the value other than 0/1 - you can treat the matrix as booleans - using only 0/1 values and trimming anything else.
if there are cycles in your graph, then you can infer that there is a link between each adjacent nodes in cycle * N + 1, because you can iterate through as much as you wish.
That bring me to the idea, we can use the cycles to our advantage!
using BFS while detecting a cycle, we calculate offset + cycle*N and then we get as close to our goal(K)
and search for the K pretty easily.
A -> B -> C -> D -> B
K = 1000;
S = A;
A - 0
B - 1
C - 2
D - 3
B - 1 (+ 4N)
here you can check floor() of k - (1+4N) = 0 > 1000 - 1 - 4N = 0 > 999 = 4N > N=249 => best B is 249*4 + 1 = 997
simpler way would be to calculate: round(k - offset, cycle)
from here you can count only few more steps.
in this example (as a REGEX): A(BCD){249}BCD

How to noisily select k smallest elements of an array?

So I wrote a function to find the k nodes of a graph that have the smallest degree. It looks like this:
def smallestKNodes(G, k):
leastK = []
for i in range(G.GetMxNId()):
# Produces an iterator to the node
node = G.GetNI(i)
for j in range(k):
if j >= len(leastK):
elif node.GetDeg() < leastK[j].GetDeg():
leastK.insert(j, node)
leastK = leastK[0:k]
return leastK[0:k]
My problem is when all the nodes have the same degree, it selects the same nodes every time. How can I make it so it takes all the nodes with zero degree or whatever and then selects k nodes randomly?
(1) Suppose k = 7, then if there are 3 nodes with degree 0 and 10 nodes with degree 1, I would like to choose all the nodes with degree 0, but randomly choose 4 of the nodes with degree 1.
(2) If possible I don't want to visit any node twice because there might be too many nodes to fit into memory. There might also be a very large number of nodes with minimum degree. In some cases there might also be a very small number of nodes.
Store all the nodes which satisfy your condition and randomly pick k nodes from it. You can do the random pick by shuffling the array (e.g. Fisher-Yates, std::shuffle, randperm, etc.) and picking the first k nodes (for example).
You might want to do two passes, the first pass to discover the relevant degree you have to randomize, how many nodes of that degree to choose, and the total number of nodes with that degree. Then, do a second pass on your nodes, choosing only those with the desired degree at random.
To choose k nodes of n total so each node has a fair probability (k/n), loop over relevant nodes, and choose each one with probability 1, 1, ..., 1, k/(k+1), k/(k+2), ..., k/n. When choosing a node, if k nodes are already chosen, throw one of them away at random.
def randomNodesWithSpecificDegree(G, d, k, n):
result = []
examined = 0
for i in range(G.GetMxNId()):
# Produces an iterator to the node
node = G.GetNI(i)
if node.GetDeg() = d:
examined = examined + 1
if len(result) < k:
elif random(0...1) < k / examined
index = random(0...k-1)
result[index] = node
assert(examined = n)
return result
This pseudo-code is good when k is small and n is big (seems your case).
