Developing an Algorithm for Tree Mutations - algorithm

I have to come up with an efficient algorithm that takes a tree in this format:
?
/ \
? ?
/ \ / \
G A A A
and fills in the question mark nodes with the values that provide the least amount of mutations. The values can only be {A, C, T, G}. The tree will always have this same shape and amount of nodes. Also, it will always have the leaf nodes filled in and the remaining nodes will be question marks that need to be filled.
For instance, the tree on the right is correct and has less mutations than the one on the left.
A A
/ \ / \
G G A A
/ \ / \ / \ / \
G A A A G A A A
A mutation occurs when a parent node differs from a child node. So, the above left tree contains five mutations and the above right has one.
Can someone help me out by providing psuedocode? Thanks.

This looks like dynamic programming from the bottom of the tree up. For each node you want to work out the least cost solution that leaves that node marked A, C, T, or G, for each of these possibilities. You work this out by using previously calculated costs for each possibility for the nodes immediately below that node. The code just to work out the cost might be a bit like this.
LeastCost(node, colourHere)
{
foreach colour
leastLeft[colour] = LeastCost(leftChild, colour)
leastRight[colour] = LeastCost(rightChild, colour)
best = infinity
foreach combination
cost = leastLeft[combination.leftColour] +
leastRight[combination.rightColour]
if (combination.leftColour != colourHere)
cost++;
if (combination.rightColour != colourHere)
cost++;
if (cost < best)
cost = best;
return cost
}
To return the best answer as well as the best cost you need to keep track of the combination corresponding to the best answer as well. Come to think about it, you can save time by working out the answers for all four colours at each node at the same time.

Related

Generate all the leaf to leaf path in an n-array tree

Given an N-ary tree, I have to generate all the leaf to leaf paths in an n-array tree. The path should also denote the direction. As an example:
Tree:
1
/ \
2 6
/ \
3 4
/
5
Paths:
5 UP 3 UP 2 DOWN 4
4 UP 2 UP 1 DOWN 6
5 UP 3 UP 2 UP 1 DOWN 6
These paths can be in any order, but all paths need to be generated.
I kind of see the pattern:
looks like I have to do in order traversal and
need to save what I have seen so far.
However, can't really come up with an actual working algorithm.
Can anyone nudge me to the correct algorithm?
I am not looking for the actual implementation, just the pseudo code and the conceptual idea would be much appreciated.
The first thing I would do is to perform in-order traversal. As a result of this, we will accumulate all the leaves in the order from the leftmost to the rightmost nodes.(in you case this would be [5,4,6])
Along the way, I would certainly find the mapping between nodes and its parents so that we can perform dfs later. We can keep this mapping in HashMap(or its analogue). Apart from this, we will need to have the mapping between nodes and its priorities which we can compute from the result of the in-order traversal. In your example the in-order would be [5,3,2,4,1,6] and the list of priorities would be [0,1,2,3,4,5] respectively.
Here I assume that our node looks like(we may not have the mapping node -> parent a priori):
class TreeNode {
int val;
TreeNode[] nodes;
TreeNode(int x) {
val = x;
}
}
If we have n leaves, then we need to find n * (n - 1) / 2 paths. Obviously, if we have managed to find a path from leaf A to leaf B, then we can easily calculate the path from B to A. (by transforming UP -> DOWN and vice versa)
Then we start traversing over the array of leaves we computed earlier. For each leaf in the array we should be looking for paths to leaves which are situated to the right of the current one. (since we have already found the paths from the leftmost nodes to the current leaf)
To perform the dfs search, we should be going upwards and for each encountered node check whether we can go to its children. We should NOT go to a child whose priority is less than the priority of the current leaf. (doing so will lead us to the paths we already have) In addition to this, we should not visit nodes we have already visited along the way.
As we are performing dfs from some node, we can maintain a certain structure to keep the nodes(for instance, StringBuilder if you program in Java) we have come across so far. In our case, if we have reached leaf 4 from leaf 5, we accumulate the path = 5 UP 3 UP 2 DOWN 4. Since we have reached a leaf, we can discard the last visited node and proceed with dfs and the path = 5 UP 3 UP 2.
There might be a more advanced technique for solving this problem, but I think it is a good starting point. I hope this approach will help you out.
I didn't manage to create a solution without programming it out in Python. UNDER THE ASSUMPTION that I didn't overlook a corner case, my attempt goes like this:
In a depth-first search every node receives the down-paths, emits them (plus itself) if the node is a leaf or passes the down-paths to its children - the only thing to consider is that a leaf node is a starting point of a up-path, so these are input from the left to right children as well as returned to the parent node.
def print_leaf2leaf(root, path_down):
for st in path_down:
st.append(root)
if all([x is None for x in root.children]):
for st in path_down:
for n in st: print(n.d,end=" ")
print()
path_up = [[root]]
else:
path_up = []
for child in root.children:
path_up += child is not None and [st+[root] for st in print_root2root(child, path_down + path_up)] or []
for st in path_down:
st.pop()
return path_up
class node:
def __init__(self,d,*children):
self.d = d
self.children = children
## 1
## / \
## 2 6
## / \ /
## 3 4 7
## / / | \
## 5 8 9 10
five = node(5)
three = node(3,five)
four = node(4)
two = node(2,three,four)
eight = node(8)
nine = node(9)
ten = node(10)
seven = node(7,eight,nine,ten)
six = node(6,None,seven)
one = node(1,two,six)
print_leaf2leaf(one,[])

How do you find the number of leaves at the lowest level of a complete binary tree?

I'm trying to define an algorithm that returns the number of leaves at the lowest level of a complete binary tree. By a complete binary tree, I mean a binary tree whose every level, except possibly the last, is filled, and all nodes in the last level are as far left as possible.
For example, if I had the following complete binary tree,
_ 7_
/ \
4 9
/ \ / \
2 6 8 10
/ \ /
1 3 5
the algorithm would return '3' since there are three leaves at the lowest level of the tree.
I've been able to find numerous solutions for finding the count of all the leaves in regular or balanced binary trees, but so far I haven't had any luck with the particular case of finding the count of the leaves at the lowest level of a complete binary tree. Any help would be appreciated.
Do a breadth-first search, so you can aswell find a number of nodes on each level.
Some pseudo code
q <- new queue of (node, level) data
add (root, 0) in q
nodesPerLevel <- new vector of integers
while q is not empty:
(currentNode, currentLevel) <- take from top of q
nodesPerLevel[currentLevel] += 1
for each child in currentNode's children:
add (child, currentLevel + 1) in q
return last value of nodesPerLevel

View of a binary tree from 45 degree

So I am not asking diagonal view of a tree, which fortunately I already know. I am asking if I view a tree from 45-degree angle only a few nodes should be visible. So there is a plane which at an angle of 45-degrees from the x-axis. so we need to print all the nodes which are visible from that plane.
For example:
1
/ \
2 3
/ \ / \
4 5 6 7
So if I look from that plane, I will only see nodes [4, 6, 7] as 5 and 6 overlaps each other. If I add another node at 6, now it will hide 7. How to do that? I searched on internet but couldn't find the answer.
Thanks!
I am giving you an abstract answer as the question is not language specific.
The problem with logging trees like this is the use of recursion.
By that I mean the traversal is going down nodes and up nodes.
What if you wrote a height helper which would return the depth of the current node.
For each depth level, you place the value in an array.
Then, write the values of the array.
Then you could grab the length of the last array and determine the amount of spaces each node needs.
Allow the arrays to hold empty values or else you will have to keep track of which nodes dont have children.
int total_depth = tree.getTotalHeight();
int arr[total_depth] = {};
for(int i = total_depth; i--;){
// there is a formula for the max number of nodes at a given depth of a binary tree
arr[i] = int arr[maximum_nodes_at_depth]
}
tree.inorderTraverse(function(node){
int depth = node.getHeightHelper();
// check if item is null
if( node!=nullptr && node.Item != NULL)
{
arr[depth].push(node.Item)
}
else
{
arr[depth].push(NULL)
}
})
So now you would have to calculate the size of your tree and then dynamically calculate how many spaces should prefix each node. The lower the depth the more prefixed spaces to center it.
I apologize but the pseudocode is a mix of javascript and c++ syntax.... which should never happen lol

minimum weight vertex cover of a tree

There's an existing question dealing with trees where the weight of a vertex is its degree, but I'm interested in the case where the vertices can have arbitrary weights.
This isn't homework but it is one of the questions in the algorithm design manual, which I'm currently reading; an answer set gives the solution as
Perform a DFS, at each step update Score[v][include], where v is a vertex and include is either true or false;
If v is a leaf, set Score[v][false] = 0, Score[v][true] = wv, where wv is the weight of vertex v.
During DFS, when moving up from the last child of the node v, update Score[v][include]:
Score[v][false] = Sum for c in children(v) of Score[c][true] and Score[v][true] = wv + Sum for c in children(v) of min(Score[c][true]; Score[c][false])
Extract actual cover by backtracking Score.
However, I can't actually translate that into something that works. (In response to the comment: what I've tried so far is drawing some smallish graphs with weights and running through the algorithm on paper, up until step four, where the "extract actual cover" part is not transparent.)
In response Ali's answer: So suppose I have this graph, with the vertices given by A etc. and the weights in parens after:
A(9)---B(3)---C(2)
\ \
E(1) D(4)
The right answer is clearly {B,E}.
Going through this algorithm, we'd set values like so:
score[D][false] = 0; score[D][true] = 4
score[C][false] = 0; score[C][true] = 2
score[B][false] = 6; score[B][true] = 3
score[E][false] = 0; score[E][true] = 1
score[A][false] = 4; score[A][true] = 12
Ok, so, my question is basically, now what? Doing the simple thing and iterating through the score vector and deciding what's cheapest locally doesn't work; you only end up including B. Deciding based on the parent and alternating also doesn't work: consider the case where the weight of E is 1000; now the correct answer is {A,B}, and they're adjacent. Perhaps it is not supposed to be confusing, but frankly, I'm confused.
There's no actual backtracking done (or needed). The solution uses dynamic programming to avoid backtracking, since that'd take exponential time. My guess is "backtracking Score" means the Score contains the partial results you would get by doing backtracking.
The cover vertex of a tree allows to include alternated and adjacent vertices. It does not allow to exclude two adjacent vertices, because it must contain all of the edges.
The answer is given in the way the Score is recursively calculated. The cost of not including a vertex, is the cost of including its children. However, the cost of including a vertex is whatever is less costly, the cost of including its children or not including them, because both things are allowed.
As your solution suggests, it can be done with DFS in post-order, in a single pass. The trick is to include a vertex if the Score says it must be included, and include its children if it must be excluded, otherwise we'd be excluding two adjacent vertices.
Here's some pseudocode:
find_cover_vertex_of_minimum_weight(v)
find_cover_vertex_of_minimum_weight(left children of v)
find_cover_vertex_of_minimum_weight(right children of v)
Score[v][false] = Sum for c in children(v) of Score[c][true]
Score[v][true] = v weight + Sum for c in children(v) of min(Score[c][true]; Score[c][false])
if Score[v][true] < Score[v][false] then
add v to cover vertex tree
else
for c in children(v)
add c to cover vertex tree
It actually didnt mean any thing confusing and it is just Dynamic Programming, you seems to almost understand all the algorithm. If I want to make it any more clear, I have to say:
first preform DFS on you graph and find leafs.
for every leaf assign values as the algorithm says.
now start from leafs and assign values to each leaf parent by that formula.
start assigning values to parent of nodes that already have values until you reach the root of your graph.
That is just it, by backtracking in your algorithm it means that you assign value to each node that its child already have values. As I said above this kind of solving problem is called dynamic programming.
Edit just for explaining your changes in the question. As you you have the following graph and answer is clearly B,E but you though this algorithm just give you B and you are incorrect this algorithm give you B and E.
A(9)---B(3)---C(2)
\ \
E(1) D(4)
score[D][false] = 0; score[D][true] = 4
score[C][false] = 0; score[C][true] = 2
score[B][false] = 6 this means we use C and D; score[B][true] = 3 this means we use B
score[E][false] = 0; score[E][true] = 1
score[A][false] = 4 This means we use B and E; score[A][true] = 12 this means we use B and A.
and you select 4 so you must use B and E. if it was just B your answer would be 3. but as you find it correctly your answer is 4 = 3 + 1 = B + E.
Also when E = 1000
A(9)---B(3)---C(2)
\ \
E(1000) D(4)
it is 100% correct that the answer is B and A because it is wrong to use E just because you dont want to select adjacent nodes. with this algorithm you will find the answer is A and B and just by checking you can find it too. suppose this covers :
C D A = 15
C D E = 1006
A B = 12
Although the first two answer have no adjacent nodes but they are bigger than last answer that have adjacent nodes. so it is best to use A and B for cover.

Is this algorithm for finding a maximum independent set in a graph correct?

We have the following input for the algorithm:
A graph G with no cycles (aka a spanning-tree) where each node has an associated weight.
I want to find an independent set S such that:
No two elements in S form an edge in G
There is no other possible subset which satisfies the above condition, for which there is a greater weight than S[0] + S[1] + ... + S[n-1] (where len(S)==n).
This is the high-level pseudocode I have so far:
MaxWeightNodes(SpanningTree S):
output = {0}
While(length(S)):
o = max(node in S)
output = output (union) o
S = S \ (o + adjacentNodes(o))
End While
Return output
Can someone tell me off the bat whether I've made any errors, or if this algorithm will give me the result I want?
The algorithm is not valid, since you'll soon face a case when excluding the adjacent nodes of an initial maximum may be the best local solution, but not the best global decision.
For example, output = []:
10
/ \
100 20
/ \ / \
80 90 10 30
output = [100]:
x
/ \
x 20
/ \ / \
x x 10 30
output = [100, 30]:
x
/ \
x x
/ \ / \
x x 10 x
output = [100, 30, 10]:
x
/ \
x x
/ \ / \
x x x x
While we know there are better solutions.
This means you're down on a greedy algorithm, without an optimal substructure.
I think the weights of the vertices make greedy solutions difficult. If all weights are equal, you can try choosing a set of levels of the tree (which obviously is easiest with a full k-ary tree, but spanning trees generally don't fall into that class). Maybe it'll be useful for greedy approximation to think about the levels as having a combined weight, since you can always choose all vertices of the same level of the tree (independent of which vertex you root it at) to go into the same independent set; there can't be an edge between two vertices of the same level. I'm not offering a solution because this seems like a difficult problem to me. The main problems seem to be the weights and the fact that you can't assume that you're dealing with full trees.
EDIT: actually, always choosing all vertices of one level seems to be a bad idea as well, as Rubens's example helps visualize; imagine the vertex on the second level on the right of his tree had a weight of 200.

Resources