Select the subtrees containing exactly K leaves

Select the subtrees containing exactly K leaves - algorithm

I'm given a tree T which has n nodes and l leaves.
I have to select some subtrees which contains exactly k (<=l) leaves. If I select node t's ancestors' subtree, we cannot select t's subtree.
For example:
This is the tree T which has 13 nodes (7 leaves).
If I want to select k = 4 leaves, I can select node 4 and 6 (or, node 2 and 5). This is the minimum number of the selection. (we can select node 6, 7, 8, 9 either, but this is not the minimum).
If I want to select k = 5 leaves, I can select node 3, and this is the minimum number of the selection.
I want to select the minimum numbers of subtrees. I can find only O(nk^2) and O(nk) algorithm, which uses BFS and dynamic programming. Is there any better solution with selecting this?
Thanks :)

Actually, to know the number of leaves of each subtree you just need to go through each node once so complexity should be O(nm) where m is the mean number of children of each node, which in most cases evaluates to O(n) because m is just a constant. To do this, you should:
Find which nodes of your tree are leaves
Go up the tree, saving for each node the number of leaves in its subtree
You can do this by starting with leaves and putting parents inside a queue. When you pop a node n_i out of the queue, sum the number of leaves contained in each subtree starting from each of n_i's children. Once you're done, mark n_i as visited (so you don't visit it multiple times, since it can be added once per children)
This gives something like this:
^
| f (3) This node last
| / \
| / \
| / \
| / \
| d (2) e (1) These nodes second
| / \ /
| / \ /
| a (1) b (1) c (1) These nodes first
The steps would be:
Find leaves `a`, `b` and `c`.
For each leave, add parent to queue # queue q = (d, d, e)
Pop d # queue q = (d, e)
Count leaves in subtree: d.leaves = a.leaves + b.leaves
Mark d as visited
Add parent to queue # queue q = (d, e, f)
Pop d # queue q = (e, f)
d is visited, do nothing
Pop e # queue q = (f)
Count leaves in subtree: e.leaves = c.leaves
Mark d as visited
Add parent to tree # queue q = (f, f)
Pop f # queue q = (f)
Count leaves in subtree: f.leaves = d.leaves + e.leaves
Mark d as visited
Add parent to tree (none)
Pop f # queue q = ()
f is visited, do nothing
You can also use a smart data structure that will ignore nodes added twice. Note that you can't use an ordered set because it is very important that you explore "lower" nodes before "higher" nodes.
In your case, you can eliminate nodes in your queue if they have more than k leaves, and return each node that you find that haves k leaves, which will give an even faster algorithm.

Related

Algorithm for "balanced" breadth-first search

I'm looking for references for an algorithm that conducts a breadth-first tree search in a balanced manner that is resilient in a situation where
we expect most nodes to have few children, but
a few nodes may have a many (possibly infinitely many) children.
Consider this simple tree (modified from this post):
A
/ \
B C
/ / \
D E F
| /|\ \
G HIJ... K
Depth-first visits nodes in this order:
A B D G C E H I J ... F K
Breadth-first visits nodes in this order:
A B C D E F G H I J ... K
The balanced breadth-first algorithm that I have in mind would visit nodes in this order:
A B C D E G F H K I J ...
Note how
we visit G before F, and
we visit K after H but before I.
G is deeper than F, but it is an only child of B whereas F is a second child of C and must share its search priority with E. Similarly between K and the many children H, I, J, ... of E.
I call this "balanced" because a node with lots of children cannot choke the algorithm. Concretely, if E has 𝜔 (infinitely) many nodes then a pure breadth-first strategy would never reach K, whereas the "balanced" algorithm would still reach K after H but before the other children of E.
(The reader who does not like 𝜔 can attain a similar effect with a large but still finite number such as "the greatest number of steps any practical search algorithm will ever make, plus 1".)
I can only imagine that this style of search or something like it must have been the subject of much research and practical application. I would be grateful to be pointed in the right direction. Thank you.

Transform your tree to a different kind of representation. In this new representation, each node has at most two links: one to its leftmost child, and one to its right sibling.
A
/ \
B C
/ / \
D E F
| /|\ \
G HIJ... K
  ⇓
A
/
B --> C
/ /
D E -> F
| / \
G / K
/
H -> I -> J -> ...
Then treat this representation as a normal binary tree, and traverse it breadth-first. It may have an infinite height, but like with any binary tree, the width at any particular level is finite.

depth-first-search, breadth-first-search, A*-search (and others) only differ in how you handle the list of "nodes still to visit".
(I assume you always process the node at the start of the list next)
depth-first-search: append new nodes at the front of the list
breadth-first-search: add new nodes to the end of the list
A*-search: insert new nodes in the list according to costs+heuristic
So you need to formalize how to insert new nodes to the list to fulfill your requirements.

Amortized Time Calculation in AVL tree

My professor showed the following problem in class and mentioned that the answer is O(1) while mine was quit different, I hope to get some help knowing of what mistakes did I made.
Question:
Calculate the Amortized Time Complexity for F method in AVL tree, when we start from the minimal node and each time we call F over the last found member.
Description of F: when we are at specific node F continues just like inorder traversal starting from the current one until the next one in inorder traversal for the next call.
What I did:
First I took an random series of m calls to F.
I said for the first one we need O(log n) - to find the most minimal node then for the next node we need to do inorder again but continues one more step so O(log n)+1 an so on until I scan m elements.
Which gets me to:
To calculate Amortized Time we do T(m)/m then I get:
Which isn't O(1) for sure.

The algorithm doesn't start by searching for any node, but instead is already passed a node and will start from that node. E.g. pseudocode for F would look like this:
F(n):
if n has right child
n = right child of n
while n has left child
n = left child of n
return n
else
prev = n
cur = parent of n
while prev is right child of cur and cur is not root
prev = cur
cur = parent of prev
if cur is root and prev is right child of cur
error "Reached end of traversal"
else
return cur
The above code basically does an in-order traversal of a tree starting from a node until the next node is reached.
Amortized runtime:
Pick an arbitrary tree and m. Let r_0 be the lowest common ancestor of all nodes visited by F. Now define r_(n + 1) as the lowest common ancestor of all nodes in the right subtree of r_n that will be returned by F. This recursion bottoms out for r_u, which will be the m-th node in in-order traversal. Any r_n will be returned by F in some iteration, so all nodes in the left subtree of r_n will be returned by F as well.
All nodes that will be visited by F are either also returned by F or are nodes on the path from r_0 to r_u. Since r_0 is an ancestor of r_1 and r_1 is an ancestor of r_2, etc., the path from r_0 to r_u can be at most as long as the right subtree is high. The height of the tree is limited by log_phi(m + 2), so in total at most
m + log_phi(m + 2)
nodes will be visited during m iterations of F. All nodes visited by F form a subtree, so there are at most 2 * (m + log_phi(m + 2)) edges that will be traversed by the algorithm, leading to an amortized runtime-complexity of
2 * (m + log_phi(m + 2)) / m = 2 + 2 * log_phi(m + 2) / m = O(1)
(The above bounds are in reality considerably tighter, but for the calculation presented here completely sufficient)

How to count all reachable nodes in a directed graph?

There is a directed graph (which might contain cycles), and each node has a value on it, how could we get the sum of reachable value for each node. For example, in the following graph:
the reachable sum for node 1 is: 2 + 3 + 4 + 5 + 6 + 7 = 27
the reachable sum for node 2 is: 4 + 5 + 6 + 7 = 22
.....
My solution: To get the sum for all nodes, I think the time complexity is O(n + m), the n is the number of nodes, and m stands for the number of edges. DFS should be used,for each node we should use a method recursively to find its sub node, and save the sum of sub node when finishing the calculation for it, so that in the future we don't need to calculate it again. A set is needed to be created for each node to avoid endless calculation caused by loop.
Does it work? I don't think it is elegant enough, especially many sets have to be created. Is there any better solution? Thanks.

This can be done by first finding Strongly Connected Components (SCC), which can be done in O(|V|+|E|). Then, build a new graph, G', for the SCCs (each SCC is a node in the graph), where each node has value which is the sum of the nodes in that SCC.
Formally,
G' = (V',E')
Where V' = {U1, U2, ..., Uk | U_i is a SCC of the graph G}
E' = {(U_i,U_j) | there is node u_i in U_i and u_j in U_j such that (u_i,u_j) is in E }
Then, this graph (G') is a DAG, and the question becomes simpler, and seems to be a variant of question linked in comments.
EDIT previous answer (striked out) is a mistake from this point, editing with a new answer. Sorry about that.
Now, a DFS can be used from each node to find the sum of values:
DFS(v):
if v.visited:
return 0
if v is leaf:
return v.value
v.visited = true
return sum([DFS(u) for u in v.children])
This is O(V^2 + VE) worst vase, but since the graph has less nodes, V
and E are now significantly lower.
Some local optimizations can be made, for example, if a node has a single child, you can reuse the pre-calculated value and not apply DFS on the child again, since there is no fear of counting twice in this case.
A DP solution for this problem (DAG) can be:
D[i] = value(i) + sum {D[j] | (i,j) is an edge in G' }
This can be calculated in linear time (after topological sort of the DAG).
Pseudo code:
Find SCCs
Build G'
Topological sort G'
Find D[i] for each node in G'
apply value for all node u_i in U_i, for each U_i.
Total time is O(|V|+|E|).

You can use DFS or BFS algorithms for solving Your problem.
Both have complexity O(V + E)
You dont have to count all values for all nodes. And you dont need recursion.
Just make something like this.
Typically DFS looks like this.
unmark all vertices
choose some starting vertex x
mark x
list L = x
while L nonempty
choose some vertex v from front of list
visit v
for each unmarked neighbor w
mark w
add it to end of list
In Your case You have to add some lines
unmark all vertices
choose some starting vertex x
mark x
list L = x
float sum = 0
while L nonempty
choose some vertex v from front of list
visit v
sum += v->value
for each unmarked neighbor w
mark w
add it to end of list

Calculate the number of nodes on either side of an edge in a tree

A tree here means an acyclic undirected graph with n nodes and n-1 edges. For each edge in the tree, calculate the number of nodes on either side of it. If on removing the edge, you get two trees having a and b number of nodes, then I want to find those values a and b for all edges in the tree (ideally in O(n) time).
Intuitively I feel a multisource BFS starting from all the "leaf" nodes would yield an answer, but I'm not able to translate it into code.
For extra credit, provide an algorithm that works in any general graph.

Run a depth-first search (or a breadth-first search if you like it more) from any node.
That node will be called the root node, and all edges will be traversed only in the direction from the root node.
For each node, we calculate the number of nodes in its rooted subtree.
When a node is visited for the first time, we set this number to 1.
When the subtree of a child is fully visited, we add the size of its subtree to the parent.
After this, we know the number of nodes on one side of each edge.
The number on the other side is just the total minus the number we found.
(The extra credit version of your question involves finding bridges in the graph on top of this as a non-trivial part, and thus deserves to be asked as a separate question if you are really interested.)

Consider the following tree:
1
/ \
2 3
/ \ | \
5 6 7 8
If we cut the edge between node 1 and 2, The tree will surely split into two tree because there is only one unique edge between two nodes according to tree property:
1
\
3
| \
7 8
and
2
/ \
5 6
So, now a is the number of nodes rooted at 1 and b is number of nodes rooted at 2.
> Run one DFS considering any node as root.
> During DFS, for each node x, calculate nodes[x] and parent[x] where
nodes [x] = k means number of nodes of sub-tree rooted at x is k
parent[x] = y means y is parent of x.
> For any edge between node x and y where parent[x] = y:
a := nodes[root] - nodes[x]
b := nodes[x]
Time and space complexity both O(n).

Note that n=b-a+1. Due to this, you don't need to count both sides of the edge. This greatly simplifies things. A normal recursion over the nodes starting from the root is enough. Since your tree is undirected you don't really have a "root", just pick one of the leaves.
What you want to do is to "go down" the tree until you reach the bottom. Then you count backwards from there. The leaf returns 1, and each recursive step sums the return values for each edge and then increment by 1.

Here is the Java code. Function countEdges() takes in the adjacency list of the tree as an argument also current node and the parent node of the current node(here parent node means that current node was introduced by parent node in this DFS).
Here edge[][] stores the number of nodes on one side of the edge[i][j], obviously the number of nodes on the other side will be equal to (total nodes - edge[i][j]).
int edge[][];
int countEdges(ArrayList<Integer> adj[], int cur, int par) {
// If current nodes is leaf node and is not the node provided by the calling function then return 1
if(adj[cur].size() == 1 && par != 0) return 1;
int count = 1;
// count the number of nodes recursively for each neighbor of current node.
for(int neighbor: adj[cur]) {
if(neighbor == par) continue;
count += countEdges(adj, neighbor, cur);
}
// while returning from recursion assign the result obtained in the edge[][] matrix.
return edge[par][cur] = count;
}
Since we are visiting each node only once in the DFS time complexity should be O(V).

How to represent a dependency graph with alternative paths

I'm having some trouble trying to represent and manipulate dependency graphs in this scenario:
a node has some dependencies that have to be solved
every path must not have dependencies loops (like in DAG graphs)
every dependency could be solved by more than one other node
I starts from the target node and recursively look for its dependencies, but have to mantain the above properties, in particular the third one.
Just a little example here:
I would like to have a graph like the following one
(A)
/ \
/ \
/ \
[(B),(C),(D)] (E)
/\ \
/ \ (H)
(F) (G)
which means:
F,G,C,H,E have no dependencies
D dependends on H
B depends on F and G
A depends on E and
B or
C or
D
So, if I write down all the possible topological-sorted paths to A I should have:
E -> F -> G -> B -> A
E -> C -> A
E -> H -> D -> A
How can I model a graph with these properties? Which kind of data structure is the more suitable to do that?

You should use a normal adjacency list, with an additional property, wherein a node knows its the other nodes that would also satisfy the same dependency. This means that B,C,D should all know that they belong to the same equivalence class. You can achieve this by inserting them all into a set.
Node:
List<Node> adjacencyList
Set<Node> equivalentDependencies
To use this data-structure in a topo-sort, whenever you remove a source, and remove all its outgoing edges, also remove the nodes in its equivalency class, their outgoing edges, and recursively remove the nodes that point to them.
From wikipedia:
L ← Empty list that will contain the sorted elements
S ← Set of all nodes with no incoming edges
while S is non-empty do
remove a node n from S
add n to tail of L
for each node o in the equivalency class of n <===removing equivalent dependencies
remove j from S
for each node k with an edge e from j to k do
remove edge e from the graph
if k has no other incoming edges then
insert k into S
for each node m with an edge e from n to m do
remove edge e from the graph
if m has no other incoming edges then
insert m into S
if graph has edges then
return error (graph has at least one cycle)
else
return L (a topologically sorted order)
This algorithm will give you one of the modified topologicaly-sorted paths.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio