Recommmend a proper data structure - data-structures

I have an algorithm problem that needs binary tree structure similar to a binary tree. but the difference is that it may have nodes apart from the original tree independently.
And each node has three types. The first type is to point out starting node and only one exists. The second type is to point out connecting node and of course, and the last type is to point out a leaf node. Each edge has a cost to traverse to its bottom node.
Which data structure is good for me to cost to reach each node?
UPDATE
OK, I questioned this with data-structure tag so that I want to avoid to explain what the problem is. But inevitably, I explain about the problem because of lack of my explaination and my poor English.
I have nodes lists and edges with costs. There is a starting node(root node), nodes where will be located in the middle of a tree and leaf nodes are the destination for my program to traverse starting from a root node. But some of the leaf nodes may be ignored depending on the value in it. It is not important anyway. I have to calculate all leaf nodes' cost to reach its node from the root node and get the maximum value for them. Now, The problem is to adjust the cost value in edges for all other leaf nodes to have the same total cost with the maximum cost. But the sum of the adjust values has to be the minumum.

Related

How to check if two binary trees share a node

Given an array of binary trees find whether any two trees share a node, not value wise, but "pointer" wise. At the bottom I provided an example.
My approach was to iterate through all the trees and store all the leaves (pointers) from each tree into a list, then check if list has any duplicates, but that's a rather slow approach. Is there perhaps a quicker way to solve this?
In the worst case you will have to traverse all nodes (all pointers) to find a shared node (pointer), as it might happen to be the last one visited. So the best time complexity we can expect to have is O(𝑚+𝑛) where 𝑚 and 𝑛 represent the number of nodes in either tree.
We can achieve this time complexity if we store the pointers from the first tree in a hash set and then traverse the pointers of the second tree to see if any of those is in the set. Assuming that get/set operations on a hash set have an amortized constant time complexity, the overal time complexity will be O(𝑚+𝑛).
If the same program is responsible for constructing the trees, then a reuse of the same node can be detected upon insertion. For instance, reuse of the same node in multiple trees can be completely avoided by having the insert method of your tree only take a value as argument, never a node instance. The method will then encapsulate the actual creation of the node, guaranteeing its uniqueness.
An idea for O(#nodes) time and O(1) space. It does more traversal work than simple traversals using a hash table, but it doesn't have the cost of using a hash table. I don't know what's better. Might depend on the language.
For two trees
Create one extra node. Do a Morris traversal of the first tree. It only modifies right child pointers, so we can use left child pointers for marking nodes as seen. For every tree node without left child, set our extra node as left child. Whenever checking a left child pointer, treat our extra node like a null pointer, i.e., don't visit it. After the traversal, the tree structure is restored, and all originally left-child-less tree nodes now point to our extra node as left child. That includes all leaf nodes.
Do a Morris traversal of the second tree. Again treat pointers to our extra node like null pointers. If we ever do encounter our extra node, we know the trees share a node. If not, then we know the trees don't share a node, since if they did share any, they'd also share a leaf node (just go down from any shared node to a leaf node, that's also shared), and all leafs nodes of the first tree are marked. After the traversal, the second tree is restored.
Do a Morris traversal of the first tree again, this time removing our extra node, restoring the original null pointers.
For an array of more than two trees
Mark the first tree as above. Check the second tree as above. Mark the second tree. Check the third. Mark the third. Check the fourth. Mark the fourth. Etc. When you found a shared node or there are no more trees, unmark the marked trees.
Every shared node must have two parents, or an ancestor with two parents.
LOOP over nodes
IF node has two parents
MARK node as shared
Mark all descendants as shared.

What will be the most optimized position of a node in a binary tree with given specifications?

Suppose I have a binary tree in which a node can have either 0,1 or 2 children. A cost value is associated with each node, and it can be {5,10,20,40}. The most optimal placement of a new node is under a node with same or lower cost value. For example- a new node with cost value 20 is best placed under a node with cost value 20, but can also be placed under nodes with cost values 5 and 10.
Primary requirement of this algorithm is to complete the left and right child of a node if it is required, i.e. if a node with cost value 10 has a left child with cost value 10, then a new node having cost value 10 will be made the right child of the above node . The secondary requirement is to maximize the overall depth of the tree.
The tree cannot be rearranged at any point of time. If an incoming node is of lesser value, then there is no penalty involved.
Given the above requirements, how can we decide the best position of an incoming new node in the tree ? Can we write a general algorithm for it ?
Initially, I thought to complete each level of the tree first, but I don't think it would be optimal.
The secondary requirement is to maximize the overall depth of the tree.
That's a bit unusual.
The quickest way:
sort your input values
fill all the minimal value nodes (5's) in respect with the first requirement (still unclear if both left-right nodes must be filled in before going down a level. If it must then the max depth will be log2(N5) If "going deep on left" is allowed without filling in the right, then the max depth tree will degenerate in list with all right nodes to null).Call this the master tree
make a tree from the next values (say 10-value nodes) and attach this tree to the deepest branch of the master tree
repeat step 3 as necessary
Note: this is the simplest concept, the implementation may take advantage from the fact the master tree is sorted at all time and get over with the initial sort.

Find a loop in a binary tree

How to find a loop in a binary tree? I am looking for a solution other than marking the visited nodes as visited or doing a address hashing. Any ideas?
Suppose you have a binary tree but you don't trust it and you think it might be a graph, the general case will dictate to remember the visited nodes. It is, somewhat, the same algorithm to construct a minimum spanning tree from a graph and this means the space and time complexity will be an issue.
Another approach would be to consider the data you save in the tree. Consider you have numbers of hashes so you can compare.
A pseudocode would test for this conditions:
Every node would have to have a maximum of 2 children and 1 parent (max 3 connections). More then 3 connections => not a binary tree.
The parent must not be a child.
If a node has two children, then the left child has a smaller value than the parent and the right child has a bigger value. So considering this, if a leaf, or inner node has as a child some node on a higher level (like parent's parent) you can determine a loop based on the values. If a child is a right node then it's value must be bigger then it's parent but if that child forms a loop, it means he is from the left part or the right part of the parent.
3.a. So if it is from the left part then it's value is smaller than it's sibling. So => not a binary tree. The idea is somewhat the same for the other part.
Testing aside, in what form is the tree that you want to test? Remeber that every node has a pointer to it's parent. An this pointer points to a single parent. So depending of the format you tree is in, you can take advantage from this.
As mentioned already: A tree does not (by definition) contain cycles (loops).
To test if your directed graph contains cycles (references to nodes already added to the tree) you can iterate trough the tree and add each node to a visited-list (or the hash of it if you rather prefer) and check each new node if it is in the list.
Plenty of algorithms for cycle-detection in graphs are just a google-search away.

Binary tree visit: get from one leaf to another leaf

Problem: I have a binary tree, all leaves are numbered (from left to right, starting from 0) and no connection exists between them.
I want an algorithm that, given two indices (of 2 distinct leaves), visits the tree starting from the greater leaf (the one with the higher index) and gets to the lower one.
The internal nodes of the tree do not contain any useful information.
I should chose the path based only on the leaves indices. The path start from a leaf and terminates on a leaf, and of course I can access a leaf if I know its index (through an array of pointers)
The tree is static, no insertion or deletion of nodes is allowed.
I have developed an algorithm to do it but it really sucks... any ideas?
One option would be to find the least common ancestor of the two nodes, along with the sequence of nodes you should take from each node to get to that ancestor. Here's a sketch of the algorithm:
Starting from each node, walk back up to that node's parent until you reach the root. Count the number of nodes on the path from each node to the root. Let the height of the first node be h1 and the height of the second node be h2.
Let h = min(h1, h2). This is the height of the higher of the two nodes.
Starting from each node, keep following the node's parent pointer until both nodes are at height h. Record the nodes you followed during this step. At this point, both nodes are at the same height.
Until you find a common node, keep marching upwards from each node to its parent. Eventually you will hit their common ancestor. At this point, follow the path from the first node up to this ancestor, then down the path from the ancestor down to the second node.
In the worst case, this takes O(h) time and O(h) space, where h is the height of the tree. For a balanced binary tree is this O(lg n) time and space, which is quite good.
If you're interested in a Much More Hardcore version of this algorithm, consider looking into Tarjan's Least Common Ancestors algorithm, which with linear preprocessing time, can be used to find the least common ancestor much more rapidly than this.
Hope this helps!
Distance between any two nodes can be calculated with the help of lowest common ancestor:
Dist(n1, n2) = Dist(root, n1) + Dist(root, n2) - 2*Dist(root, lca)
where lca is lowest common ancestor.
see this for more help about this algorithm and see this video for learning how to calculate lca.

Most efficient way to visit nodes of a DAG in order

I have a large (100,000+ nodes) Directed Acyclic Graph (DAG) and would like to run a "visitor" type function on each node in order, where order is defined by the arrows in the graph. i.e. all parents of a node are guaranteed to be visited before the node itself.
If two nodes do not refer to each other directly or indirectly, then I don't care which order they are visited in.
What's the most efficient algorithm to do this?
You would have to perform a topological sort on the nodes, and visit the nodes in the resulting order.
The complexity of such algorithm is O(|V|+|E|) which is quite good. You want to traverse all nodes, so if you would want a faster algorithm than that, you would have to solve it without even looking at all edges, which would be dangerous, because one single edge could havoc the order completely.
There are some answers here:
Good graph traversal algorithm
and here:
http://en.wikipedia.org/wiki/Topological_sorting
In general, after visiting a node, you should visit its related nodes, but only the nodes that are not already visited. In order to keep track of the visited nodes, you need to keep the IDs of the nodes in a set (or map), or you can mark the node as visited (somehow).
If you care about the topological order, you must first get hold of a collection of all the un-traversed links ("remaining links") to a node, sorted by the id of the referenced node (typically: map(node-ID -> link-count)). If you haven't got that, you might need to build it using an approach similar to the one above. Then, start by visiting a node whose remaining incoming link count is zero. For each link from that node, reduce the remaining link count for each related node, adding the related node to the set of nodes-to-visit (or just visiting the node) if the count reaches zero.
As mentioned in the other answers, this problem can be solved by Topological Sorting.
A very simple algorithm for that (not the most efficient):
Keep an array (or map) indegree[] where indegree[node]=number of incoming edges of node
while there is at least one node n with indegree[n]=0:
for each node n in nodes where indegree[n]>0:
visit(n)
indegree[n]=-1 # mark n as visited
for each node x adjacent to n:
indegree[x]=indegree[x]-1 # its parent has been visited, so one less edge coming into it
You can traverse a DAG in O(N) (without any topsort) by just running your dfs from every node with zero indegree, because those will be the valid "starting point". This will work because graph has no cycles, those zero indegree nodes must exist, and must traverse the whole graph.

Resources