Given a tree with n nodes (n can be as large as 2 * 10^5), where each node has a cost associated with it, let us define the following functions:
g(u, v) = the sum of all costs on the simple path from u to v
f(n) = the (n + 1)th Fibonacci number (n + 1 is not a typo)
The problem I'm working on requires me to compute the sum of f(g(u, v)) over all possible pairs of nodes in the tree modulo 10^9 + 7.
As an example, let's take a tree with 3 nodes.
without loss of generality, let's say node 1 is the root, and its children are 2 and 3
costs[1] = 2, cost[2] = 1, cost[3] = 1
g(1, 1) = 2; f(2) = 2
g(2, 2) = 1; f(1) = 1
g(3, 3) = 1; f(1) = 1
g(1, 2) = 3; f(3) = 3
g(2, 1) = 3; f(3) = 3
g(1, 3) = 3; f(3) = 3
g(3, 1) = 3; f(3) = 3
g(2, 3) = 4; f(4) = 5
g(3, 2) = 4; f(4) = 5
Summing all of the values, and taking the result modulo 10^9 + 7 gives 26 as the correct answer.
My attempt:
I implemented an algorithm to compute g(u, v) in O(log n) by finding the lowest common ancestor using a sparse table.
For the finding of the appropriate Fibonacci values, I tried two approaches, namely using exponentiation on the matrix form and another by noticing that the sequence modulo 10^9 + 7 is cyclical.
Now comes the extremely tricky part. No matter how I do the above computations, I still end up going to up to O(n^2) pairs when calculating the sum of all possible f(g(u, v)). I mean there's the obvious improvement of only going up to n * (n - 1) / 2 pairs but that's still quadratic.
What am I missing? I've been at it for several hours, but I can't see a way to get that sum without actually producing a quadratic algorithm.
To know how many times the cost of a node X is to be included in the total sum, we divide the other nodes into 3 (or more) groups:
the subtree A connected to the left of X
the subtree B connected to the right of X
(subtrees C, D... if the tree is not binary)
all other nodes Y, connected through X's parent
When two nodes belong to different groups, their simple path goes through X. So the number of simple paths that go through X is:
#Y + #A × (N - #A) + #B × (N - #B)
So by counting the total number of nodes N, and the size of the subtrees under X, you can calculate how many times the cost of node X should be included in the total sum. Do this for every node and you have the total cost.
The code for this could be straightforward. I'll assume that the total number of nodes N is known, and that you can add properties to the nodes (both of these assumptions simplify the algorithm, but it can be done without them).
We'll add a child_count to store the number of descendants of the node, and a path_count to store the number of simple paths that the node is part of; both are initialised to zero.
For each node, starting from the root:
If not all children have been visited, go to an unvisited child.
If all children have been visited (or node is leaf):
Increment child_count.
Increase path_count with N - child_count.
Add this node's path_count × cost to the total cost.
If the current node is the root, we're done; otherwise:
Increase the parent node's child_count with this node's child_count.
Increase the parent node's path_count with this node's child_count × (N - child_count).
Go to the parent node.
The below algorithm's running time is O(n^3).
Tree is a strongly connected graph without loops. So when we want to get all possible pairs' costs, we are trying to find the shortest paths for all pairs. Thus, we can use Dijkstra's idea and dynamic programming approach for this problem (I took it from Weiss's book). Then we apply Fibonacci function to the cost, assuming that we already have a table to look up.
Dijkstra's idea: We start from the root and search all simple paths from the root to all other nodes and then do that for other vertices on the graph.
Dynamic programming approach: We use a 2D matrix D[][] to represent the lowest path/cost (They could be used exchangeably.) between node i and node j. Initially, D[i][i] is already set. If node i and node j is parent/child, D[i][j] = g(i, j), which is the cost between them. If node k is on the path which has lower cost for node i and node j, we can update the D[i][j], i.e., D[i][j] = D[i][k] + D[k][j] if D[i][j] < D[i][k] + D[k][j] else D[i][j].
When done, we check D[][] matrix and apply Fibonacci function to each cell and add them up, and also apply modulo operation.
Related
Given a tree with n vertices, each vertex has a special value C_v. A straight path of length k >= 1 is defined as a sequence of vertices v_1, v_2, ... , v_k such that each two consecutive elements of the sequence are connected by an edge and all vertices v_i are different. The straight path may not contain any edges. In other words, for k = 1, a sequence containing a single vertex is also a straight path. There is a function S defined. For a given straight path v_1, v_2, ... , v_k we get S(v_1, v_2, ... ,v_k) = Cv_1 - Cv_2 + Cv_3 - Cv_4 + ...
Calculate the sum of the values of the function S for all straight paths in the tree. Since the result may be very large, give its remainder when divided by 10^9 + 7.
Paths are treated as directed. For example: paths 1 -> 2 -> 4 and 4 -> 2 -> 1 are treated as two different paths and for each one separately the value of the function S should be taken into account in the result.
My implementation is as follows:
def S(path):
total, negative_one_pow = 0, 1
for node in path:
total += (values[node - 1] * negative_one_pow)
negative_one_pow *= -1
return total
def search(graph):
global total
for node in range(1, n + 1):
queue = [(node, [node])]
visited = set()
while queue:
current_node, path = queue.pop(0)
if current_node in visited:
continue
visited.add(current_node)
total += S(path)
for neighbor in graph[current_node]:
queue.append((neighbor, [*path, neighbor]))
n = int(input())
values = list(map(int, input().split()))
graph = {i: [] for i in range(1, n + 1)}
total = 0
for i in range(n - 1):
a, b = map(int, input().split())
graph[a].append(b)
graph[b].append(a)
search(graph)
print(total % 1000000007)
The execution of the code takes too long for bigger graphs. Can you suggest ways to speed up the code?
It is a tree. Therefore all straight paths start somewhere, travel up some distance (maybe 0), hit a peak, then turn around and go down some distance (maybe 0).
First calculate for each node the sum and count of all odd length paths rising to its children, and the sum and count of all even length paths rising to its children. This can be done through dynamic programming.
Armed with that we can calculate for each node the sum of all paths with that node as a peak. (Calculate the sum of all paths that rise to the peak then fall. For each child subtract out the sum of all paths that rose to that child and then fell back to that child.)
I came across this problem while giving a sample test. The problem was that we have given a tree which is undirected. We can start from any node of our choice. Initially we have power "P" and while going from one node to other node we loose some power "X" (consider as cost of travelling) and earn some profit "Y".
So we need to tell that what is the maximum profit that we can earn with a given power ?
Example: First line contains number of nodes and initial power
Next n-1 lines contains node-node-cost-profit
5 4
1 2 1 2
1 3 2 3
1 4 2 4
4 5 2 2
Answer => 7. We can start from 4 and go to 1 and than to 3.
I have applied DFS on this to get maximum profit earned by traversing every single path.
But is there a way to decrease time ???
from collections import defaultdict
class tree:
def __init__(self,nodes):
self.nodes = nodes
self.graph = defaultdict(list)
def add(self,a,b,charge,profit):
self.graph[a].append([b,charge,profit])
self.graph[b].append([a,charge,profit])
def start(self,power):
maxi = -1
visited = [False for i in range(self.nodes)]
for i in range(1,self.nodes+1):
powers = power
visited[i-1] = True
for j in self.graph[i]:
temp = self.dfs(j,powers,0,visited)
if temp > maxi:
maxi = temp
visited[i-1] = False
return maxi
def dfs(self,node,powers,profit,visited):
v,p,pro=node[0],node[1],node[2]
if powers-p < 0:
return 0
if powers-p == 0:
return profit + pro
profit += pro
powers = powers-p
visited[v-1] = True
tempo = profit
for k in self.graph[v]:
if visited[k[0]-1] == False:
temp = self.dfs(k,powers,tempo,visited)
if temp > profit:
profit = temp
visited[v-1] = False
return profit
t = tree(5)
t.add(1,2,1,2)
t.add(1,3,2,3)
t.add(1,4,2,4)
t.add(4,5,2,2)
print(t.start(4))
You want to find all paths that have length up to P and take maximum of their profits. You can achieve it in O(n log^2 n) time using centroid decomposition.
Consider all subtrees that you create by deleting a centroid C from the tree. Let's say you have found all paths of length less or equal P and taken a maximum of them (now you'll only consider paths that contain C). Using DFS calculate distance and profit from C to each other node in the tree and store them as pairs in multiset.
For each subtree do:
delete from multiset every pair of values of node from that subtree - O(n log n)
copy all the pairs from the multiset to a list L1 - O(n)
create list L2 of pairs (distance, profit) from the current subtree and sort it by distance in decreasing order - O(n log n)
create variable maxx = 0 and i = 0
for each pair X in L2:
while L1[i] <= P - X.distance do: maxx = max(maxx, L1[i].profit), ++i
result = max(result, maxx + X.profit)
all of it will take at most O(n)
insert all pairs from L2 back to multiset - O(n log n)
Time complexity: O(n log n)
Now you have calculated maximum profit of all paths of length equal or less P in the tree. To get all the values in subtrees run the same algorithm recursively. Since there are at most O(log n) layers using centroid decomposition total complexity is O(n log^2 n).
Each edge in the graph has weight of 1,The graph may have cycles ,if a node has self loop it can be any distance from itself from 0 to infinity , depending on the no. of time we take the self loop.
I have solved the problem using bfs, but the constraint on distance is in order of 10^9 ,hence bfs is slow.
We ll be asked multiple queries on a given graph of the form
(distance , source)
and the o/p is the list of nodes that are exactly at the given distance starting from the source vertex.
Constraints
1<=Nodes<=500
1<queries<=500
1<=distance<=10^9
I have a feeling ,there would be many repeated computations as the no. of nodes are small,but i am not able to figure out how do i reduce the problem in smaller problems.
What is the efficient way to do this?
Edit : I have tried using matrix exponentiation but its too slow ,for the given constraints. The problem has a time limit of 1 sec.
Let G = (V,E) be your graph, and define the adjacency matrix A as follows:
A[i][j] = 1 (V[i],V[j]) is in E
0 otherwise
In this matrix, for each k:
(A^k)[i][j] > 0 if and only if there is a path from v[i] to v[j] of length exactly k.
This means by creating this matrix and then calculating the exponent, you can easily get your answer.
For fast exponent calculation you can use exponent by squaring, which will yield O(M(n)^log(k)) where M(n) is the cost for matrix multiplication for nXn matrix.
This will also save you some calculation when looking for different queries on the same graph.
Appendix - claim proof:
Base: A^1 = A, and indeed in A by definition, A[i][j]=1 if and only if (V[i],V[j]) is in E
Hypothesis: assume the claim is correct for all l<k
A^k = A^(k-1)*A. From induction hypothesis, A^(k-1)[i][j] > 0 iff there is a path of length k-1 from V[i] to V[j].
Let's examine two vertices v1,v2 with indices i and j.
If there is a path of length k between them, let it be v1->...->u->v2. Let the index of u be m.
From i.h. A^(k-1)[i][m] > 0 because there is a path. In addition A[m][j] = 1, because (u,v2) = (V[m],V[j]) is an edge.
A^k[i][j] = A^(k-1)*A[i][j] = A^(k-1)[i][1]A[1][j] + ... + A^(k-1)[i][m]A[m][j] + ... + A^(k-1)[i][n]A[n][j]
And since A[m][j] > 0 and A^(k-1)[i][m] > 0, then A^(k-1)*A[i][j] > 0
If there is no such path, then for each vertex u such that (u,v2) is an edge, there is no path of length k-1 from v to u (otherweise v1->..->u->v2 is a path of length k).
Then, using induction hypothesis we know that if A^(k-1)[i][m] > 0 then A[m][j] = 0, for all m.
If we assign that in the sum defining A^k[i][j], we get that A^k[i][j] = 0
QED
Small note: Technically, A^k[i][j] is the number of paths between i and j of length exactly k. This can be proven similar to above but with a bit more attention to details.
To avoid the numbers growing too fast (which will increase M(n) because you might need big integers to store that value), and since you don't care for the value other than 0/1 - you can treat the matrix as booleans - using only 0/1 values and trimming anything else.
if there are cycles in your graph, then you can infer that there is a link between each adjacent nodes in cycle * N + 1, because you can iterate through as much as you wish.
That bring me to the idea, we can use the cycles to our advantage!
using BFS while detecting a cycle, we calculate offset + cycle*N and then we get as close to our goal(K)
and search for the K pretty easily.
e.g.
A -> B -> C -> D -> B
K = 1000;
S = A;
A - 0
B - 1
C - 2
D - 3
B - 1 (+ 4N)
here you can check floor() of k - (1+4N) = 0 > 1000 - 1 - 4N = 0 > 999 = 4N > N=249 => best B is 249*4 + 1 = 997
simpler way would be to calculate: round(k - offset, cycle)
from here you can count only few more steps.
in this example (as a REGEX): A(BCD){249}BCD
I am wondering about two questions that came up when studying about binary search trees. They are the following:
What is the maximum number of nodes in the bottom level of a balanced binary search tree with n nodes?
What is the minimum number of nodes in the bottom level of a balanced binary search tree with n nodes?
I cannot find any formulas in my textbook regarding this. Is there any way to answers these questions? Please let me know.
Using notation:
H = Balanced binary tree height
L = Total number of leaves in a full binary tree of height H
N = Total number of nodes in a full binary tree of height H
The relation is L = (N + 1) / 2 as demonstrated below. That would be the maximum number of leaf nodes for a given tree height H. The minimum number of nodes at a given height is 1 (cannot be zero, because then the tree height would be reduced by one).
Drawing trees with increasing heights, one can observe that:
H = 1, L = 1, N = 1
H = 2, L = 2, N = 3
H = 3, L = 4, N = 7
H = 4, L = 8, N = 15
...
The relation between tree height (H) and the total number of leaves (L)
and the total number of nodes (N) becomes apparent:
L = 2^(H-1)
N = (2^H) - 1
The correctness is easily proven using mathematical induction.
Examples above show that it is true for small H.
Simply put in the value of H (e.g. H=1) and compute L and N.
Assuming the formulas are true for some H, one can show they are also true for HH=H+1:
For L, the assumption is that L=2^(H-1) is true.
As each node has two children, increasing the height by one
is going to replace each leaf node with two new leaves, effectively
doubling the total number of leaves. Therefore, in case of HH=H+1,
the total number of leaves (LL) is going to be doubled:
LL = L * 2
= 2^(H-1) * 2
= 2^(H)
= 2^(HH-1)
For N, the assumption is that N=(2^H)-1 is true.
Increasing the height by one (HH=H+1) increases the total number
of nodes by the total number of added leaf nodes. Therefore,
NN = N + LL
= (2^H) - 1 + 2^(HH-1)
= 2^(HH-1) - 1 + 2^(HH-1)
= 2 * 2^(HH-1) - 1
= (2^HH) - 1
Applying the mathematical induction, the correctness is proven.
H can be expressed in terms of N:
N = (2^H) - 1 // +1 to both sides
N + 1 = 2^H // apply log2 monotone function to both sides
log2(N+1) = log2(2^H)
= H * log2(2)
= H
The direct relation between L and N (which is the answer to the question asked) is:
L = 2^(H - 1) // replace H = log2(N + 1)
= 2^(log2(N + 1) - 1)
= 2^(log2(N + 1) - log2(2))
= 2^(log2( (N + 1) / 2 ))
= (N + 1) / 2
For Big O analysis, the constants are discarded, so the Binary Search Tree lookup time complexity (i.e. H with respect to the input size N) is O(log2(N)). Also, keeping in mind the formula for changing the logarithm base:
log2(N) = log10(N) / log10(2)
and discarding the constant factor 1/log10(2), where instead of 10 one can have an arbitrary logarithm base, the time complexity is simply O(log(N)) regardless of the chosen logarithm base constant.
Assuming that it's a full binary tree, the number of nodes in the leaf will always be equal to (n/2)+1.
For the minimum number of nodes, the total number of nodes could be 1 (satisfying the condition that it should be a balanced tree).
I got the answers from my professor.
1) Maximum number of nodes at the last level: ⌈n/2⌉
If there is a balanced binary search tree with 7 nodes, then the answer would be ⌈7/2⌉ = 4 and for a tree with 15 nodes, the answer would be ⌈15/2⌉ = 8.
But what is troubling is the fact that this formula gives the right answer only when the last level of a balanced tree is completely filled from left to right.
For example, a balanced binary search tree with 5 nodes, the above formula gives an answer of 3 which is not true because a tree with 5 nodes can contain a maximum nodes of 4 nodes at the last level. So I am guessing he meant full balanced binary search tree.
2) Minimum number of nodes at the last level: 1
The maximum number of nodes at level L in a binary tree is 2^L (if you assume that the vertex is level 0). This is easy to see because at each level you spawn 2 children from each previous leaf. The fact that it is balanced/search tree is irrelevant. So you have to find the biggest L such that 2^L < n and subtract it from n. Which in math language is:
The minimum number of nodes depends on the way you balance your tree. There can be height-balanced trees, weight-balanced trees and I assume other balanced trees. Even with height balanced trees you can define what do you mean by a balanced tree. Because technically a tree of 2^N nodes that has a hight of N + 2 is still a balanced tree.
So I wrote a function to find the k nodes of a graph that have the smallest degree. It looks like this:
def smallestKNodes(G, k):
leastK = []
for i in range(G.GetMxNId()):
# Produces an iterator to the node
node = G.GetNI(i)
for j in range(k):
if j >= len(leastK):
leastK.append(node)
break
elif node.GetDeg() < leastK[j].GetDeg():
leastK.insert(j, node)
leastK = leastK[0:k]
break
return leastK[0:k]
My problem is when all the nodes have the same degree, it selects the same nodes every time. How can I make it so it takes all the nodes with zero degree or whatever and then selects k nodes randomly?
Stipulations:
(1) Suppose k = 7, then if there are 3 nodes with degree 0 and 10 nodes with degree 1, I would like to choose all the nodes with degree 0, but randomly choose 4 of the nodes with degree 1.
(2) If possible I don't want to visit any node twice because there might be too many nodes to fit into memory. There might also be a very large number of nodes with minimum degree. In some cases there might also be a very small number of nodes.
Store all the nodes which satisfy your condition and randomly pick k nodes from it. You can do the random pick by shuffling the array (e.g. Fisher-Yates, std::shuffle, randperm, etc.) and picking the first k nodes (for example).
You might want to do two passes, the first pass to discover the relevant degree you have to randomize, how many nodes of that degree to choose, and the total number of nodes with that degree. Then, do a second pass on your nodes, choosing only those with the desired degree at random.
To choose k nodes of n total so each node has a fair probability (k/n), loop over relevant nodes, and choose each one with probability 1, 1, ..., 1, k/(k+1), k/(k+2), ..., k/n. When choosing a node, if k nodes are already chosen, throw one of them away at random.
def randomNodesWithSpecificDegree(G, d, k, n):
result = []
examined = 0
for i in range(G.GetMxNId()):
# Produces an iterator to the node
node = G.GetNI(i)
if node.GetDeg() = d:
examined = examined + 1
if len(result) < k:
result.append(node)
elif random(0...1) < k / examined
index = random(0...k-1)
result[index] = node
assert(examined = n)
return result
This pseudo-code is good when k is small and n is big (seems your case).