Build tree from edges - algorithm

I have the edges and i want to build a tree with it.
The problem is that i can construct my tree structure only when edges are in specific order.
Example of orders:
(vertex, parent_vertex)
good: bad:
(0, ) <-top (3, 2)
(1, 0) (1, 0)
(2, 1) (3, 2)
(3, 2) (0, ) <-top
I iterate throw the edges and for current vertex trying to find it's parent in created tree, then i construct the node and insert it.
result tree:
0 - 1 - 2 - 3
So there is always must exist a parent in the tree for the new added vertex.
The question is how to sort the input edges. Voices tells me about the topological sort, but it's for vertexes. Is it possible to sort it right?

#mirt thanks for pointing out the optimizations on my approach, have you got any better?
i will put the below algo for ref
initially construct a hash map to store elements that are there in tree : H, add the root (null in your case/ or anything that represent that root)
taking the pair (_child, _parent)
loop through the whole list.
in the list. (each pair is the element)
for each pair, see if the _child and _parent is there in the hash map H, if you dont find, create the tree node for the missing ones and add them to H , and link them with the parent child relationship.
you will be left with the tree at the end of iteration.
complexity is O(n).

Related

Synchronize graph with XML-tree and apply axes to them

I transformed a graph with cycles and multiple parents to XML such that I can use XQuery on it.
The graph is on the left and the XML-tree is on the right.
I transformed the graph by writing down all child nodes from the first node (node 1) and repeat that on the returned nodes until no more children exist or a node has already been visited (like node 2).
Further more, I added the constraint, that all nodes with the same number have to be selected, if one of them is selected. (For example, if node 2 (child of 1) is selected, then we also have to select node 2 (child of 6) in the XML-tree.)
The operations I can use on the graph are: getPatents, getChildren, readValue(node).
In the graph, all information is stored in the node, and in the XML-tree all Information of a node is stored as attributes.
My Question: I want to synchronize both structures, such that I can apply an axis like ancestor (or descendant) on the graph and on the XML-tree and get the same result.(I can parse the graph with Python and the XML-tree with XQuery)
My Problem: If I select node 8 on the graph and apply the ancestor function, it'll return: 4, 5, 2, 1, 6, 3 (6 and 3 because of the cycle).
The ancestor axis on the XML-tree would return (we have to select both 8s): 4, 5, 2, 1 (the second 2, (child of 6) would also be selected due to the constraint, but not node 6 and 3).
My Solution: Changing the ancestor axis such that it returns all parents of the selected nodes, then applies the constraint and then selects again all parents and so on. But this solution seems to be very complicated and inefficient. Is there any better way?
Thanks for your help
I think it is not that easy to solve that for that particular format and with XSLT/XQuery/XPath as the document order imposed by most step or except or intersect or the arbitrary order XQuery grouping gives make it hard to establish the nodes you want and in the order they are traversed, the easiest I could come up with is
declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";
declare option output:method 'text';
declare option output:item-separator ', ';
declare variable $main-root := /;
declare function local:eliminate-duplicates($nodes as node()*) as node()*
{
for $node at $p in $nodes
group by $id := generate-id($node)
order by head($p)
return head($node)
};
declare function local:get-parents($nodes as element(node)*, $collected as element(node)*) as element(node)*
{
let $new-parents :=
for $p in local:eliminate-duplicates($nodes ! ..)
return $main-root//node[#value = $p/#value][not(. intersect $collected)]
return
if ($new-parents)
then local:get-parents($new-parents, ($collected, $new-parents))
else $collected
};
local:get-parents(//node[#value = 8], ()) ! #value ! string()
https://xqueryfiddle.liberty-development.net/gWmuPs8 gives 4, 5, 2, 2, 1, 6, 3.
How efficient that works will partly depend on any index used for the node[#value = $p/#value] comparison, in XSLT you could ensure that with a key (https://xsltfiddle.liberty-development.net/aiyneS), in database oriented XQuery processors probably with an attribute based index.

Given two binary trees, calculate their diff

A friend of mine was asked this question in an interview.
Given two binary trees, explain how would you create a diff such that if you have that diff and either of the trees you should be able to generate the other binary tree. Implement a function createDiff(Node tree1, Node tree 2) returns that diff.
Tree 1
4
/ \
3 2
/ \ / \
5 8 10 22
Tree 2
1
\
4
/ \
11 12
If you are given Tree 2 and the diff you should be able to generate Tree 1.
My solution:
Convert both the binary trees into array where left child is at 2n+1 and right child is at 2n+2and represent empty node by -1. Then just do element-wise subtraction of the array to create the diff. This solution will fail if tree has -1 as node value and I think there has to be a better and neat solution but I'm not able to figure it out.
Think of them as direcory tres and print a sorted list of the path to every leaf item
Tree 1 becomes:
4/2/10
4/2/22
4/3/5
4/3/8
These list formats can be diff'ed and the tree recreated from such a list.
There are many ways to do this.
I would suggest that you turn the tree into a sorted array of triples of (parent, child, direction). So start with tree1:
4
/ \
3 2
/ \ / \
5 8 10 22
This quickly becomes:
(None, 4, None) # top
(4, 3, L)
(3, 5, L)
(3, 8, L)
(4, 2, R)
(2, 10, L)
(2, 22, R)
Which you sort to get
(None, 4, None) # top
(2, 10, L)
(2, 22, R)
(3, 5, L)
(3, 8, L)
(4, 2, R)
(4, 3, L)
Do the same with the other, and then diff them.
Given a tree and the diff, you can first turn the tree into this form, look at the diff, realize which direction it is and get the desired representation with patch. You can then reconstruct the other tree recursively.
The reason why I would do it with this representation is that if the two trees share any subtrees in common - even if they are placed differently in the main tree - those will show up in common. And therefore you are likely to get relatively small diffs if the trees do, in fact, match in some interesting way.
Edit
Per point from #ruakh, this does assume that values do not repeat in a tree. If they do, then you could do a representation like this:
4
/ \
3 2
/ \ / \
5 8 10 22
becomes
(, 4)
(0, 3)
(00, 5)
(01, 8)
(1, 2)
(10, 10)
(11, 22)
And now if you move subtrees, they will show up as large diffs. But if you just change one node, it will still be a small diff.
(The example from the question(/interview) is not very helpful in not showing any shared sub-structure of non-trivial size. Or the interview question outstanding for initiating a dialogue between customer and developer.)
Re-use of subtrees needs a representation allowing to identify such. It seems useful to be able to reconstruct the smaller tree without walking most of the difference. Denoting "definition" of identifiable sub-trees with capital letters and re-use by a tacked-on ':
d e d--------e
c b "-" c b => C B' C' b
b a a b a a B a a
a a a
(The problem statement does not say diff is linear.)
Things to note:
there's a sub-tree B occurring in two places of T1
in T2, there's another b with one leaf-child a that is not another occurrence of B
no attempt to share leaves
What if now I imagine (or the interviewer suggests) two huge trees, identical but for one node somewhere in the middle which has a different value?
Well, at least its sub-trees will be shared, and "the other sub-trees" all the way up to the root. Too bad if the trees are degenerated and almost all nodes are part of that path.
Huge trees with children of the root exchanged?
(Detecting trees occurring more than once has a chance to shine here.)
The bigger problem would seem to be the whole trees represented in "the diff", while the requirement may be
Given one tree, the diff shall support reconstruction of the other using little space and processing.
(It might include setting up the diff shall be cheap, too - which I'd immediately challenge: small diff looks related to editing distance.)
A way to identify "crucial nodes" in each tree is needed - btilly's suggestion of "left-right-string" is good as gold.
Then, one would need a way to keep differences in children & value.
That's the far end I'd expect an exchange in an interview to reach.
To detect re-used trees, I'd add the height to each internal node. For a proof of principle, I'd probably use an existing implementation of find repeated strings on a suitable serialisation.
There are many ways to think of a workable diff-structure.
Naive solution
One naive way is to store the two trees in a tuple. Then, when you need to regenerate a tree, given the other and the diff, you just look for a node that is different when comparing the given tree with the tree in the first tuple entry of the diff. If found you return that tree from the first tuple entry. If not found, you return the second one from the diff tuple.
Small diffs for small differences
An interviewer would probably ask for a less memory consuming alternative. One could try to think of a structure that will be small in size when there are only a few values or nodes different. In the extreme case where both trees are equal, such diff would be (near-)empty as well.
Definitions
I define these terms before defining the diff's structure:
Imagine the trees get extra NIL leaf nodes, i.e. an empty tree would consist of 1 NIL node. A tree with only a root node, would have two NIL nodes as its direct children, ...etc.
A node is common to both trees when it can be reached via the same path from the root (e.g. left-left-right), irrespective of whether they contain the same value or have the same children. A node can even be common when it is a NIL node in one or both of the trees (as defined above).
Common nodes (including NIL nodes when they are common) get a preorder sequence number (0, 1, 2, ...). Nodes that are not common are discarded during this numbering.
Diff structure
The difference could be a list of tuples, where each tuple has this information:
The above mentioned preorder sequence number, identifying a common node
A value: when neither nodes is a NIL node, this is the diff of the values (e.g. XOR). When one of the nodes is a NIL node, the value is the other node object (so effectively including the whole subtree below it). In typeless languages, either information can fit in the same tuple position. In strongly typed languages, you would use an extra entry in the tuple (e.g. atomicValue, subtree), where only one of two would have a significant value.
A tuple will only be added for a common node, and only when either their values differ, and at least one of both is a not-NIL node.
Algorithm
The diff can be created via a preorder walk through the common nodes of the trees.
Here is an implementation in JavaScript:
class Node {
constructor(value, left, right) {
this.value = value;
if (left) this.left = left;
if (right) this.right = right;
}
clone() {
return new Node(this.value, this.left ? this.left.clone() : undefined,
this.right ? this.right.clone() : undefined);
}
}
// Main functions:
function createDiff(tree1, tree2) {
let i = -1; // preorder sequence number
function recur(node1, node2) {
i++;
if (!node1 !== !node2) return [[i, (node1 || node2).clone()]];
if (!node1) return [];
const result = [];
if (node1.value !== node2.value) result.push([i, node1.value ^ node2.value]);
return result.concat(recur(node1.left, node2.left), recur(node1.right, node2.right));
}
return recur(tree1, tree2);
}
function applyDiff(tree, diff) {
let i = -1; // preorder sequence number
let j = 0; // index in diff array
function recur(node) {
i++;
let diffData = j >= diff.length || diff[j][0] !== i ? 0 : diff[j++][1];
if (diffData instanceof Node) return node ? undefined : diffData.clone();
return node && new Node(node.value ^ diffData, recur(node.left), recur(node.right));
}
return recur(tree);
}
// Create sample data:
let tree1 =
new Node(4,
new Node(3,
new Node(5), new Node(8)
),
new Node(2,
new Node(10), new Node(22)
)
);
let tree2 =
new Node(2,
undefined,
new Node(4,
new Node(11), new Node(12)
)
);
// Demo:
let diff = createDiff(tree1, tree2);
console.log("Diff:");
console.log(diff);
const restoreTree2 = applyDiff(tree1, diff);
console.log("Is restored second tree equal to original?");
console.log(JSON.stringify(tree2)===JSON.stringify(restoreTree2));
const restoreTree1 = applyDiff(tree2, diff);
console.log("Is restored first tree equal to original?");
console.log(JSON.stringify(tree1)===JSON.stringify(restoreTree1));
const noDiff = createDiff(tree1, tree1);
console.log("Diff for two equal trees:");
console.log(noDiff);

recursively split line into smaller segments

I have a line. It starts with two indexes, call them 0 and 1, at the outermost points. At any point I can create a new point which bisects two other ones (there must not already be a point between them). However when this happens the indexes need to increment. For example, here's a potential series of steps to achieve N=5 since there are indexes in the result.
(graph) (split between) (iteration #)
< ============================ >
0 1 0,1 0
0 1 2 1,2 1
0 1 2 3 0,1 2
0 1 2 3 4
I have two questions:
What pseudocode could be used to find the "split between" values given the iteration number?
How could I prevent the shape from being unbalanced? Are there certain restrictions I should place on the value of N? I don't particularly care what order the splits happen in, but I do want to make sure the result is balanced.
This is an issue I've encountered when developing a video game.
I'm not sure if this is the kind of answer you are looking for, but I see this as a binary tree structure. Every tree node contains its own label and its left and right labels. The root of the tree (level 0) would be (2, 0, 1) (split 2 with 0 on the left and 1 and the right). Every node would be split into two children. The algorithm would go something like this:
At step N, pick the leftmost node without two children in level floor(log2(N - 1)).
Take the node label T and the left and right labels L and R from that node.
If the node does not have a left child, add a left child node (N, L, T).
If the node already has a left child, add a right child node (N, T, R).
N <- N + 1
For example, at iteration 5 you would have something like this:
Level 0: (2, 0, 1)
/ \
/ \
/ \
Level 1: (3, 0, 2) (4, 2, 1)
/
/
Level 2: (5, 0, 3)
Now, to reconstruct the current split, you would do the following:
Initialize a list S <- [0].
For every node (T, L, R) in the tree traversed in postorder:
If the node does not have a left child, append T to S.
If the node does not have a right child, append R to S.
For the previous case, you would have:
S = [0]
(5, 0, 3) -> S = [0, 5, 3]
(3, 0, 2) -> S = [0, 5, 3, 2]
(4, 2, 1) -> S = [0, 5, 3, 2, 4, 1]
(2, 0, 1) -> S = [0, 5, 3, 2, 4, 1]
So the complete split would be [0, 5, 3, 2, 4, 1]. The split would be perfectly balanced only when N = 2k for some positive integer k. Of course, you can annotate the tree nodes with additional "distance" information if you need to keep track of something like that.
I agree with jdehesa in that what you are doing does have its similarities with a binary tree. I would recommend looking in using that data structure if you can, since it is highly structured, well-defined, and many great algorithms exist for working with them.
Additionally, as mentioned in the comment section above, a linked list would also be a nice option, since you are adding in a lot of elements. A normal array (which is contiguous in memory) will require you to move many elements over and over again as you insert additional elements, which is slow. A linked list would allow you to add your element anywhere in memory, and then just update a few pointers in the linked list on both sides of where you want to insert it, and be done. No moving things around.
However, if you really just want to put together a working solution using array and aren't concerned with using other data structures, here is the math for the indexing you requested:
Each pair can be listed as (a, b), and we can quickly see b = a + 1. Thus, if you find a, you know b. To get these, you'll need two loops:
iteration := 0
i := 0
while iteration < desired_iterations
for j = (2 ^ i) - 1; j >= 0 && iteration < desired_iterations; j--
print j, j + 1
iteration++
i++
Where ^ is the exponentiation operator. What we do is find the second to last element in the list (2^i)-1 and count backwards, listing off the indices. We then increment "i" to signify that we've now doubled our array size, and then repeat again. If at any point we research our desired number of iterations, we break out of both loops because we're finished.

Print all paths in a tree (Not just root to nodes)

So how would you print all paths in a tree. Here the condition is that we don't only want paths starting from the root or paths in the sub-tree.
For example:
2
/ \
8 10
/\ /
5 6 11
So the program should return:
2-8
2-10
2-8-5
2-8-6
8-5
8-6
2-10-11
10-11
5-8-2-10-11
5-8-2-10
and so on...
One approach is to find the LCA between every distinct pair of nodes and then print the path from the LCA to both nodes (reverse in the left subtree and in order in the right subtree). But the complexity here would be O(n^3). Is there a more efficient solution ?
If you are only interested in the result, not in the algoritm, create the nodes and relations in neo4j with
merge (n2:node{n:2})-[:down]->(n8:node{n:8})-[:down]->(:node{n:5})
merge (n2)-[:down]->(:node{n:10})-[:down]->(:node{n:11})
merge (n8)-[:down]->(:node{n:6})
then query
match p=(a)-[r:down *]-(b) return nodes(p)
Assuming you tree has distinct nodes, you can:
Create a map having key as int and value as vector. The key stands for each node you encounter and vector is for storing all the nodes that you will traverse under the node.
Pass this map by value to each node. You can have a function like:
void printAllPaths(node *proot, map<int, vector<int> > m)
Whenever you encounter a new node n, do the following
a) For each k from set of keys
b) Add n to the value vector of k.
c) Print all keys followed by their value vectors.
d) Also insert new key as n into the map with empty vector as value.
Note: If your tree has duplicate nodes you a multimap will help you keep track. c++ STL will serve you well in this case.

How to find the set of trees every one of which spans over another given tree?

Imagine it's given a set of trees ST and each vertex of every tree is labeled. Also another tree T is given (also with labels vertices). The question is how can I find which trees of the ST can span over the tree T starting from the root of T in such a way that the labels of the vertices of the spanning tree T' coincide with those labels of T 's vertices. Note that the children of every vertex of T should be either completely covered or not covered at all - partial covering of children is not allowed. Stated in other words: Given a tree and the following procedure: pick a vertex and remove all vertices and edges below this vertex (except the vertex itself). Find those trees of ST such that each tree is generated with a series of procedures applied to T.
For example given the tree T
the trees
cover T and the tree
does not because this tree has children 3, 5 unlike T which has 2, 3 as children. The best thing I was able to think of was either to brute force it or to find the set of tree every one of which has the same root label as T and then to search for the answer among those trees but I guess neither of those two approaches is the optimal one. I was thinking of somehow hashing the trees but nothing came out. Any thoughts?
Notes:
The trees are not necessarily binary
A tree T can cover another tree T' if they share a root
The tree is ordered meaning that you cannot swap the position of any two children.
TL; DR Find a efficient algorithm which on query with given tree T the algorithm finds all trees from a given(fixed/static) set ST which are able to cover T.
I'll sketch an answer and then provide some working source code.
First off, you need an algorithm to hash a tree. We can assume, without loss of generality, that the children of each of your tree's nodes are ordered from least to greatest (or vice versa).
Run this algorithm on every member of ST and save the hashes.
Now, take your test tree T and generate all of its subtrees TP that retain the original root. You can do this (perhaps inefficiently) by:
Making a set S of its nodes
Generating the power set P of S
Generating the subtrees by removing the nodes present in each member of P from copies of T
Adding those subtrees which retain the original root to TP.
Now generate a set of all of the hashes of TP.
Now check each of your ST hashes for membership in TP.
ST hash storage requires O(n) space in ST, and possibly the space to hold the trees.
You can optimize the membership code so that it requires no storage space (I have not done this in my test code). The code will require approximately 2N checks, where N is the number of nodes in **T.
So the algorithm runs in O(H 2**N), where H is the size of ST and N is the number of nodes in T. The best way of speeding this up is to find an improved algorithm for generating the subtrees of T.
The following Python code accomplishes this:
#!/usr/bin/python
import itertools
import treelib
import Crypto.Hash.SHA
import copy
#Generate a hash of a tree by recursively hashing children
def HashTree(tree):
digester=Crypto.Hash.SHA.new()
digester.update(str(tree.get_node(tree.root).tag))
children=tree.get_node(tree.root).fpointer
children.sort(key=lambda x: tree.get_node(x).tag, cmp=lambda x,y:x-y)
hash=False
if children:
for child in children:
digester.update(HashTree(tree.subtree(child)))
hash = "1"+digester.hexdigest()
else:
hash = "0"+digester.hexdigest()
return hash
#Generate a power set of a set
def powerset(iterable):
"powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
s = list(iterable)
return itertools.chain.from_iterable(itertools.combinations(s, r) for r in range(len(s)+1))
#Generate all the subsets of a tree which still share the original root
#by using a power set of all the tree's nodes to remove nodes from the tree
def TreePowerSet(tree):
nodes=[x.identifier for x in tree.nodes.values()]
ret=[]
for s in powerset(nodes):
culled_tree=copy.deepcopy(tree)
for n in s:
try:
culled_tree.remove_node(n)
except:
pass
if len([x.identifier for x in culled_tree.nodes.values()])>0:
ret.append(culled_tree)
return ret
def main():
ST=[]
#Generate a member of ST
treeA = treelib.Tree()
treeA.create_node(1,1)
treeA.create_node(2,2,parent=1)
treeA.create_node(3,3,parent=1)
ST.append(treeA)
#Generate a member of ST
treeB = treelib.Tree()
treeB.create_node(1,1)
treeB.create_node(2,2,parent=1)
treeB.create_node(3,3,parent=1)
treeB.create_node(4,4,parent=2)
treeB.create_node(5,5,parent=2)
ST.append(treeB)
#Generate hashes for members of ST
hashes=[(HashTree(tree), tree) for tree in ST]
print hashes
#Generate a test tree
T=treelib.Tree()
T.create_node(1,1)
T.create_node(2,2,parent=1)
T.create_node(3,3,parent=1)
T.create_node(4,4,parent=2)
T.create_node(5,5,parent=2)
T.create_node(6,6,parent=3)
T.create_node(7,7,parent=3)
#Generate all the subtrees of this tree which still retain the original root
Tsets=TreePowerSet(T)
#Hash all of the subtrees
Thashes=set([HashTree(x) for x in Tsets])
#For each member of ST, check to see if that member is present in the test
#tree
for hash in hashes:
if hash[0] in Thashes:
print [x for x in hash[1].expand_tree()]
main()
To verify that one tree covers another, one must look at all vertices of the first tree at least once. It is trivial to verify that a tree covers another by looking at all vertices of the first tree exactly once. Thus the simplest possible algorithm is already optimal, if it's only needed to check one tree.
Everything below are untested fruits of my sick imagination.
If there are many possible T that must be checked against the same ST, then it's possible to store trees of ST as sets of facts like these
root = 1
children of node 1 = (2, 3)
children of node 2 = ()
children of node 3 = ()
These facts can be stored in a standard relational DB in two tables, "roots" (fields "tree" and rootnode") and "branches" (fields "tree", "node" and "children"). then an SQL query or a series of queries can be built to find matching trees quickly. My SQL-fu is rudimentary so I could not manage it in a single query, but I'm believe it should be possible.

Resources