Center node in a tree (in a minimum sum of distances sense) - algorithm

My problem is the following:
Given a tree (V, E), find the center node v such that sum{w in V}[dist(v,w)] is minimum, where dist(v,w) is the number of edges in shortest path from v to w. The algorithm should run in O(n) time (n being the number of nodes in a tree).
The questions here and here also ask for the center node but define it differently.
I haven't rigorously gone through the steps but I actually think that the solution to my problem should be similar to the solution of this problem.
However, I decided that I should share my problem with the community as it took me a while to navigate to the link, which however does not answer the question directly.

I came up with this solution:
1) Choose an arbitrary node as a root r, form a tree. For each subtree in this tree, calculate number of nodes in a subtree (the leaves are single-node-trees).
As an example for this tree
1
/ | \
2 3 4
/ \ \
5 6 7
/ \
8 9
the result would be
9
/ | \
5 1 2
/ \ \
1 3 1
/ \
1 1
2) Calculate the sum of distances for this chosen root. For the example, if you choose vertex 1 as a root, the sum of distances is
0 + 1 + 1 + 1 + 2 + 2 + 2 + 3 + 3 = 15
3) Traverse the tree in a depth-first-search manner. For example, starting from vertex 1, we traverse to vertex 4. We observe that for 7 nodes (1,2,3,5,6,8,9), we are getting further by 1 (add 7=9-2 to the score), for other 2 (4,7), we are getting closer by 1 (subtract 2). This gives the sum-of-distances equal to 15+(9-2)-2 = 20.
Suppose we traverse from 4 to 7 next. Now we get the sum of distances equal to 20+(9-1)-1 = 27 (getting further from 8 vertices, and getting closer to 1 vertex).
As another example if we traverse from 1 to 2, we get a sum of distances equal to 15+(9-5)-5 = 14. Vertex 2 is actually the solution for this example.
This would be my algorithm.

Each edge e={a,b} has the following properties:
a_count = number of nodes to a side (including a)
b_count = number of nodes to b side (including b)
a_sum = sum of distances from a to its subtree nodes
b_sum = sum of distances from b to its subtree nodes
a_count for node e={a,b} can be evaluated as following:
* get all edges of a, not including e, sum their a_count
* add 1 to the sum
a_sum for node e={a,b} can be evaluated as following:
* get all edges of a, not including e, sum their a_sum
* add a_count (it includes +1 for each enumerated edge and +1 for a)
You can freely do calculation in recursive function accepting node and direction parameters, saving obtained results in global array.
If you run this function on every edge of tree in both directions, you get full calculation for edges. Total time for all calculations is O(n), since once you get to some subtree, recursive nature will close the whole subtree in this direction and next calls will obtain result from global array, and you only do 2*n calls for your function.
For a node A final measure is sum of all B_count+B_sum of all edges connected to node. Do one run of this evaluation on nodes and select node with minimal value.

Related

Minimum cost to make connect all nodes in one partition of a weighted undirected bipartite graph

Given a weighted undirected bipartite graph between partition A and B, I'm trying to find the minimum cost to connect all nodes in partition A.
It seems like I need to first find a Minimum Spanning Tree (MST) of the whole graph and then exclude edges that are unnecessary (since I don't need to connect nodes in partition B). What I thought was to remove nodes in partition B with degree of 1 (only connected to 1 node in partition A) from the MST, but this method gave sub-optimal in some cases.
Hope somebody can help me on this.
Input format:
1st line is N and M - size of partition A and B (1 < N, M < 1000)
Next is the cost matrix of size N x M. The value at row i and column j would be the cost to connect A[i] to B[j]
Output format:
The minimal cost to connect all nodes in A
Sample input:
3 4
1 2 3 7
1 3 1 2
3 4 1 7
Sample output:
4
Explanation:
Connect A[1] to B[1], A[2] to B[1], A[2] to B[3], and A[3] to B[3].This will connect all three nodes in A with the minimal cost of 4 in total.
Partitions: A and B
Nodes: m and n (m,n < 1000)
We are trying to connect all nodes in A
Step 1: Create an exhaustive list of pairs of A-nodes, with min cost required to connect them. This will be (max) mC2 = m * (m-1) / 2 elements long list. For the example in the post, this will look like:
[(A1,A2,2), (A1,A3,4), (A2,A3,2)]
This is essentially a new graph with nodes only from A and respective edge weights. The easiest way to accomplish creating this is O(m^2 * n)
Step 2: This becomes your weighted graph for which you can apply MST logic to get to your final answer which will be O(m^2) solution (since number of edges ~m^2 >> number of nodes =m)

Calculate the number of nodes on either side of an edge in a tree

A tree here means an acyclic undirected graph with n nodes and n-1 edges. For each edge in the tree, calculate the number of nodes on either side of it. If on removing the edge, you get two trees having a and b number of nodes, then I want to find those values a and b for all edges in the tree (ideally in O(n) time).
Intuitively I feel a multisource BFS starting from all the "leaf" nodes would yield an answer, but I'm not able to translate it into code.
For extra credit, provide an algorithm that works in any general graph.
Run a depth-first search (or a breadth-first search if you like it more) from any node.
That node will be called the root node, and all edges will be traversed only in the direction from the root node.
For each node, we calculate the number of nodes in its rooted subtree.
When a node is visited for the first time, we set this number to 1.
When the subtree of a child is fully visited, we add the size of its subtree to the parent.
After this, we know the number of nodes on one side of each edge.
The number on the other side is just the total minus the number we found.
(The extra credit version of your question involves finding bridges in the graph on top of this as a non-trivial part, and thus deserves to be asked as a separate question if you are really interested.)
Consider the following tree:
1
/ \
2 3
/ \ | \
5 6 7 8
If we cut the edge between node 1 and 2, The tree will surely split into two tree because there is only one unique edge between two nodes according to tree property:
1
\
3
| \
7 8
and
2
/ \
5 6
So, now a is the number of nodes rooted at 1 and b is number of nodes rooted at 2.
> Run one DFS considering any node as root.
> During DFS, for each node x, calculate nodes[x] and parent[x] where
nodes [x] = k means number of nodes of sub-tree rooted at x is k
parent[x] = y means y is parent of x.
> For any edge between node x and y where parent[x] = y:
a := nodes[root] - nodes[x]
b := nodes[x]
Time and space complexity both O(n).
Note that n=b-a+1. Due to this, you don't need to count both sides of the edge. This greatly simplifies things. A normal recursion over the nodes starting from the root is enough. Since your tree is undirected you don't really have a "root", just pick one of the leaves.
What you want to do is to "go down" the tree until you reach the bottom. Then you count backwards from there. The leaf returns 1, and each recursive step sums the return values for each edge and then increment by 1.
Here is the Java code. Function countEdges() takes in the adjacency list of the tree as an argument also current node and the parent node of the current node(here parent node means that current node was introduced by parent node in this DFS).
Here edge[][] stores the number of nodes on one side of the edge[i][j], obviously the number of nodes on the other side will be equal to (total nodes - edge[i][j]).
int edge[][];
int countEdges(ArrayList<Integer> adj[], int cur, int par) {
// If current nodes is leaf node and is not the node provided by the calling function then return 1
if(adj[cur].size() == 1 && par != 0) return 1;
int count = 1;
// count the number of nodes recursively for each neighbor of current node.
for(int neighbor: adj[cur]) {
if(neighbor == par) continue;
count += countEdges(adj, neighbor, cur);
}
// while returning from recursion assign the result obtained in the edge[][] matrix.
return edge[par][cur] = count;
}
Since we are visiting each node only once in the DFS time complexity should be O(V).

Find max subset of tree with max distance not greater than K

I run into a dynamic programming problem on interviewstreet named "Far Vertices".
The problem is like:
You are given a tree that has N vertices and N-1 edges. Your task is
to mark as small number of verices as possible so that the maximum
distance between two unmarked vertices be less than or equal to K. You
should write this value to the output. Distance between two vertices i
and j is defined as the minimum number of edges you have to pass in
order to reach vertex i from vertex j.
I was trying to do dfs from every node of the tree, in order to find the max connected subset of the nodes, so that every pair of subset did not have distance more than K.
But I could not define the state, and transitions between states.
Is there anybody that could help me?
Thanks.
The problem consists essentially of finding the largest subtree of diameter <= k, and subtracting its size from n. You can solve it using DP.
Some useful observations:
The diameter of a tree rooted at node v (T(v)) is:
1 if n has no children,
max(diameter T(c), height T(c) + 1) if there is one child c,
max(max(diameter T(c)) for all children c of v, max(height T(c1) + height T(c2) + 2) for all children c1, c2 of v, c1 != c2)
Since we care about maximizing tree size and bounding tree diameter, we can flip the above around to suggest limits on each subtree:
For any tree rooted at v, the subtree of interest is at most k deep.
If n is a node in T(v) and has no children <= k away from v, its maximum size is 1.
If n has one child c, the maximum size of T(n) of diameter <= k is max size T(c) + 1.
Now for the tricky bit. If n has more than one child, we have to find all the possible tree sizes resulting from allocating the available depth to each child. So say we are at depth 3, k = 7, we have 4 depth left to play with. If we have three children, we could allocate all 4 to child 1, 3 to child 1 and 1 to child 2, 2 to child 1 and 1 to children 2 and 3, etc. We have to do this carefully, making sure we don't exceed diameter k. You can do this with a local DP.
What we want for each node is to calculate maxSize(d), which gives the max size of the tree rooted at that node that is up to d deep that has diameter <= k. Nodes with 0 and 1 children are easy to figure this for, as above (for example, for one child, v.maxSize(i) = c.maxSize(i - 1) + 1, v.maxSize(0) = 1). Nodes with 2 or more children, you compute dp[i][j], which gives the max size of a k-diameter-bound tree using up to the ith child taking up to j depth. The recursion is dp[i][j] = max(child(i).maxSize(m - 1) + dp[i - 1][min(j, k - m)] for m from 1 to j. d[i][0] = 1. This says, try giving the ith child 1 to j depth, and give the rest of the available depth to the previous nodes. The "rest of the available depth" is the minimum of j, the depth we are working with, or k - m, because depth given to child i + depth given to the rest cannot exceed k. Transfer the values of the last row of dp to the maxSize table for this node. If you run the above using a depth-limited DFS, it will compute all the necessary maxSize entries in the correct order, and the answer for node v is v.maxSize(k). Then you do this once for every node in the tree, and the answer is the maximum value found.
Sorry for the muddled nature of the explanation. It was hard for me to think through, and difficult to describe. Working through a few simple examples should make it clearer. I haven't calculated the complexity, but n is small, and it went through all the test cases in .5 to 1s in Scala.
A few basic things I can notice (maybe very obvious to others):
1. There is only one route possible between two given vertices.
2. The farthest vertices would be the one with only one outgoing edge.
Now to solve the issue.
I would start with the set of Vertices that have only one edge and call them EDGE[] calculate the distances between the vertices in EDGE[]. This will give you (EDGE[i],EDGE[j],distance ) value pairs
For all the vertices pairs in EDGE that have a distance of > K, DO EDGE[i].occur++,EDGE[i].distance = MAX(EDGE[i].distance, distance)
EDGE[j].occur++,EDGE[j].distance = MAX(EDGE[j].distance, distance)
Find the CANDIDATES in EDGE[] that have max(distance) from those Mark the with with max (occur)
Repeat till all edge vertices pair have distance less then or equal to K

What is the minimum sized AVL tree where a deletion causes 2 rotations?

It is well known that deletion from an AVL tree may cause several nodes to eventually be unbalanced. My question is, what is the minimum sized AVL tree such that 2 rotations are required (I'm assuming a left-right or right-left rotation is 1 rotation)? I currently have an AVL tree with 12 nodes where deletion would cause 2 rotations. My AVL tree is inserting in this order:
8, 5, 9, 3, 6, 11, 2, 4, 7, 10, 12, 1.
If you delete the 10, 9 becomes unbalanced and a rotation occurs. In doing so, 8 becomes unbalanced and another rotation occurs. Is there a smaller tree where 2 rotations are necessary after a deletion?
After reading jpalecek's comment, my real question is: Given some constant k, what is the minimum sized AVL tree that has k rotations after 1 deletion?
A tree of four nodes requires a single rotation in the worst case. The worst case number of deletions increases with each term in the list: 4, 12, 33, 88, 232, 609, 1596, 4180, 10945, 28656, ...
This is Sloane's A027941 and is a Fibonacci-type sequence that can be generated with N(i)=1+N(i-1)+N(i-2) for i>=2, N(1)=2, N(0)=1.
To see why this is so, first note that rotating an imbalanced AVL tree reduces its height by one because its shorter leg is lengthened at the expense of its longer leg.
When a node is removed from an AVL tree, the AVL algorithm checks all of the removed node's ancestors for potential rebalancing. Therefore, to answer your question we need to identify trees with the minimum number of nodes for a given height.
In such a tree every node is either a leaf or has a balance factor of +1 or -1: if a node had a balance factor of zero this would mean that a node could be removed without triggering a rebalancing. And we know rebalancing makes a tree shorter.
Below, I show a set of worst-case trees. You can see that following the first two trees in the sequence, each tree is constructed by joining the previous two trees. You can also see that every node in each tree is either a leaf or has a non-zero balance factor. Therefore, each tree has the maximum height for its number of nodes.
For each tree, a removal in the left subtree will, in the worst case, cause rotations which ultimately reduce the height of that subtree by one. This balances the tree as a whole. On the other hand, removing a node from the right subtree may ultimately imbalance the tree resulting in a rotation of the root. Therefore, the right subtrees are of prime interest.
You can verify that Tree (c) and Tree (d) have one rotation upon removal, in the worst case.
Tree (c) appears as a right subtree in Tree (e) and Tree (d) as a right subtree in Tree (f). When a rotation is triggered in Tree (c) or (d) this shortens the trees resulting in a root rotation in Trees (d) and (f). Clearly, the sequence continues.
If you count the number of nodes in the trees this matches my original statement and completes the proof.
(In the trees below removing the highlighted node will result in a new maximum number of rotations.)
I am not good at proofs, and I'm sure the below is full of holes, but maybe it will spark something positive.
To effect k rotations on a minimized AVL tree following the deletion of a node, the following conditions must be met:
The target node must exist in a 4-node sub-tree.
The target node must either be on the short branch, or must be the root of the sub-tree and be replaced by the leaf of the short branch.
Each node in the ancestry of the root of the target sub-tree must be slightly out of balance (balance factor of +/-1). That is - when a balance factor of 0 is encountered, the rotation chain will cease.
The height and number of nodes of the minimized tree is calculated with the following equations.
Let H(k) = the minimum height of the tree affected by k rotations.
H(k) = 2k + 1, k > 0
Let N(h) = the number of nodes in a (min-node) AVL tree of height h.
N(0) = 0
N(1) = 1
N(h) = N(h-1) + N(h-2) + 1, h > 1
Let F(k) = the minimum number of nodes in the tree affected by k rotations.
F(k) = N(H(k))
(e.g:)
k = 1, H(k) = 4, N(4) = 7
k = 2, H(k) = 6, N(6) = 20
Proof (such as it is)
Minimum Height
A deletion can only cause a rotation for trees with 4 or more nodes.
A tree of 1 node must have a balance factor of 0.
A tree of 2 nodes must have a balance factor of +/-1, and deletion leads to a balanced tree of 1 node.
A tree of 3 nodes must have a balance factor of 0. Removal of a node results in a balance factor of +/-1 and no rotation occurs.
Therefore, deletion from a tree with fewer than 4 nodes can not result in a rotation.
The smallest sub-tree for which 1 rotation occurs on delete is 4 nodes, which has height of 3. Removal of the node in the short side will result in rotation. Likewise, removal of the root node, using the node on the short side as replacement will cause a rotation. It doesn't matter how the tree is configured:
B B Removal of A or replacement of B with A
/ \ / \ results in rotation. No rotation occurs
A C A D on removal of C or D, or on replacement
\ / of B with C.
D C
C C Removal of D or replacement of C with D
/ \ / \ results in rotation. No rotation occurs
B D A D on removal of A or B, or on replacement
/ \ of C with B.
A B
Deletion from a 4 node tree results in a balanced tree of height 2.
.
/ \
. .
To effect a second rotation, the target tree must have a sibling of height 4, so that the balance factor of the root is +/-1 (and therefore has a height of 5). It doesn't matter if the affected tree is on the right or left of the parent, nor is the layout of the sibling tree important (that is, the H3 child of H4 can be on the left or right, and can be any of the 4 orientations above while the H2 child can be either of the 2 possible orientations - this needs proving).
_._ _._
/ \ / \
(H4) . . (H4)
/ \ / \
. . . .
\ \
. .
It is clear that the third rotation requires that the grandparent of the affected tree be likewise imbalanced by +/-1, and the fourth requires the great-grandparent be imbalanced by +/-1, and so on.
By definition, the height of a sub-tree is the maximum height of each branch plus one for the root. One sibling must be 1 taller than the other to achieve the +/-1 imbalance in the root.
H(1) = 3 (as observed above)
H(k) = 1 + max(H(k - 1), H(k - 1) + 1)) = 1 + H(k - 1) + 1 = H(k - 1) + 2
... Inductive proof leading to H(k) = 2k + 1 eludes me.
Minimum Nodes
By definition, the number of nodes in a sub-tree is the number of nodes in the left branch plus the number of nodes in the right branch plus 1 for the root.
Also be definition, a tree of height 0 must have 0 nodes, and a tree of height 1 must have no branches and thus 1 node.
It was shown above that the one branch must be one shorter than the other.
Let N(h) = minimum number of nodes required to create a tree of height h:
N(0) = 0
N(1) = 1
// the number of nodes in the two subtrees plus the root
N(h) = N(h-1) + N(h-2) + 1
Corollary
The minimum number of nodes is not necessarily the maximum in large trees. To wit:
Delete A from the following tree and observe that the height doesn't change following rotation. Therefore, the balance factor in the parent would not change and no additional rotation would occur.
B B D
/ \ \ / \
A D => D => B E
/ \ / \ \
C E C E C
However, in the k = 2 case, it does not matter if H(4) is minimized here - the second rotation will still occur.
_._ _._
/ \ / \
(H4) . . (H4)
/ \ / \
. . . .
\ \
. .
Questions
What is the position of the target sub-tree? Clearly for k = 1, it is the root, and for k = 2, it is the left if the root's balance factor is -1 otherwise the right. Is there a formula for determining position for k >= 3?
What is the maximum nodes a tree can contain to effect k rotations? Is it possible to have an intermediate node in the ancestry that is not rotated, though its parent is?

Counting Treaps

Consider the problem of counting the number of structurally distinct binary search trees:
Given N, find the number of structurally distinct binary search trees containing the values 1 .. N
It's pretty easy to give an algorithm that solves this: fix every possible number in the root, then recursively solve the problem for the left and right subtrees:
countBST(numKeys)
if numKeys <= 1
return 1
else
result = 0
for i = 1 .. numKeys
leftBST = countBST(i - 1)
rightBST = countBST(numKeys - i)
result += leftBST * rightBST
return result
I've recently been familiarizing myself with treaps, and I posed the following problem to myself:
Given N, find the number of distinct treaps containing the values 1 .. N with priorities 1 .. N. Two treaps are distinct if they are structurally different relative to EITHER the key OR the priority (read on for clarification).
I've been trying to figure out a formula or an algorithm that can solve this for a while now, but I haven't been successful. This is what I noticed though:
The answers for n = 2 and n = 3 seem to be 2 and 6, based on me drawing trees on paper.
If we ignore the part that says treaps can also be different relative to the priority of the nodes, the problem seems to be identical to counting just binary search trees, since we'll be able to assign priorities to each BST such that it also respects the heap invariant. I haven't proven this though.
I think the hard part is accounting for the possibility to permute the priorities without changing the structure. For example, consider this treap, where the nodes are represented as (key, priority) pairs:
(3, 5)
/ \
(2, 3) (4, 4)
/ \
(1, 1) (5, 2)
We can permute the priorities of both the second and third levels while still maintaining the heap invariant, so we get more solutions even though no keys switch place. This probably gets even uglier for bigger trees. For example, this is a different treap from the one above:
(3, 5)
/ \
(2, 4) (4, 3) // swapped priorities
/ \
(1, 1) (5, 2)
I'd appreciate if anyone can share any ideas on how to approach this. It seemed like an interesting counting problem when I thought about it. Maybe someone else thought about it too and even solved it!
Interesting question! I believe the answer is N factorial!
Given a tree structure, there is exactly one way to fill in the binary search tree key values.
Thus all we need to do is count the different number of heaps.
Given a heap, consider an in-order traversal of the tree.
This corresponds to a permutation of the numbers 1 to N.
Now given any permutation of {1,2...,N}, you can construct a heap as follows:
Find the position of the largest element. The elements to its left form the left subtree and the elements to its right form the right subtree. These subtrees are formed recursively by finding the largest element and splitting there.
This gives rise to a heap, as we always choose the max element and the in-order traversal of that heap is the permutation we started with. Thus we have a way of going from a heap to a permutaion and back uniquely.
Thus the required number is N!.
As an example:
5
/ \
3 4 In-order traversal -> 35142
/ \
1 2
Now start with 35142. Largest is 5, so 3 is left subtree and 142 is right.
5
/ \
3 {142}
In 142, 4 is largest and 1 is left and 2 is right, so we get
5
/ \
3 4
/ \
1 2
The only way to fill in binary search keys for this is:
(2,5)
/ \
(1,3) (4,4)
/ \
(3,1) (5,2)
For a more formal proof:
If HN is the number of heaps on 1...N, then we have that
HN = Sum_{L=0 to N-1} HL * HN-1-L * (N-1 choose L)
(basically we pick the max and assign to root. Choose the size of left subtree, and choose that many elements and recurse on left and right).
Now,
H0 = 1
H1 = 1
H2 = 2
H3 = 6
If Hn = n! for 0 ≤ n ≤ k
Then HK+1 = Sum_{L=0 to K} L! * (K-L)! * (K!/L!*(K-L)!) = (K+1)!
def countBST(numKeys:Long):Long = numKeys match {
case 0L => 1L
case 1L => 1L
case _ => (1L to numKeys).map{i=>countBST(i-1) * countBST(numKeys-i)}.sum
}
You didn't actually define structural similarity for treaps -- you just gave examples. I'm going to assume the following definition: two trees are structurally different if and only if they have a different shape, or there exist nodes a (from tree A) and b (from tree B) such that a and b are in the same position, and the priorities of the children of a are in the opposite order of the priorities of the children of b. (It's obvious that if two treaps on the same values have the same shape, then the values in corresponding nodes are the same.)
In other words, if we visualize two trees by just giving the priorities on the nodes, the following two trees are structurally similar:
7 7
6 5 6 5
4 3 2 1 2 1 4 3 <--- does not change the relative order
of the children of any node
6's left child is still greater than 6's right child
5's left child is still greater than 5's right child
but the following two trees are structurally different:
7 7
5 6 6 5 <--- changes the relative order of the children
4 3 2 1 4 3 2 1 of node 7
Thus for the treap problem, each internal node has 2 orderings, and these two orderings do not otherwise affect the shape of the tree. So...
def countTreap(numKeys:Long):Long = numKeys match {
case 0L => 1L
case 1L => 1L
case _ => 2 * countBST(numKeys-1) + //2 situations when the tree has only 1 child
2 * (2L to (numKeys-1)).map{i=>countBST(i-1) * countBST(numKeys-i)}.sum
// and for each situation where the tree has 2 children, this node
// contributes 2 orderings the priorities of its children
// (which is independent of the shape of the tree below this level)
}

Resources