Find n largest nodes in an arbitrary tree - algorithm

Given an arbitrary tree (not a binary tree), each node is labeled an integer.
How can I find n largest nodes in the tree?
e.g.
If a tree contains {43, 253, 48, 62, 91, 641}, and asked for 3 largest nodes, then the algorithm should return <641, 253, 91>.
All c++ (or any language) standard library functions/data structures are allowed.
It is also allowed to add fields to the nodes, as long as it is constant space usage. Like, I can add a field to each node to let it point to its largest child, but I cannot have an ArrayList to store all of its children in sorted order.
As a new programmer, I have spent days on this question. A simple graph search algorithm (BFS, DFS) would work and easy to implement, but they are not fast enough because they are all doing an exhaustive search on the entire tree.
Can you please help me find a correct and fast(er) solution to this problem?

As a new programmer, I have spent days on this question. A simple graph search algorithm (BFS, DFS) would work and easy to implement, but they are not fast enough because they are all doing an exhaustive search on the entire tree.
Since your tree is not a binary tree, examining a node does not yield an additional information about its child nodes. Therefore, it not possible to implement an algorithm that produces K highest values without an exhaustive search of the entire tree. In other words, you don't get better performance than what you'd get with an unordered array of arbitrary values.
To get K values in O(N * log K) time maintain a priority queue of K elements as you traverse your arbitrary tree.

As the given tree is arbitrary with no special property. To find even 1 Highest value child, you need to search the entire graph. Complexity of it O(n).
For top K Highest value children,
You have a O(N * log K) time - a priority queue based solution, as mentioned in dasblinkenlight answer.
You have a O(N) time solution with Median of Median as well.

Related

range search complexity of R tree and R* tree

What is the range search complexity for R tree and R* Tree? I understand the process of range search: similar to a DFS search, it visits each node and if a node's bounding box intersects the target range, then include the node in the result set. More precisely, we also need to consider the branch-and-bound strategy it uses: if a parent node doesn't intersect with the target, then we don't visit its children nodes. Then the complexity should be smaller than O(n), where n is the number of nodes. I really don't know how to calculate the number of nodes given the number of leaves(or data points).
Could anybody give me an explanation here? Thank you.
Obviously, the worst case must be at least O(n) if your range is [-∞;∞] in every dimension. It may be as bad as O(n log n) then because of the tree.
Assuming the answer is a single entry, the average case probably is O(log n) - only few paths through the tree need to be followed (if you have little enough overlap).
It is log to the base of your page size. So it will usually not exceed 5, because you never want trees with more than say 1000^5=10^15 objects.
For all practical purposes, assume the runtime complexity is simply the answer set size O(s). Select 2% of your data it takes twice as long as 1%.

what difference between binary search and depth first search

Binary research executing usually cause memory leaking problem, although it is faster than linear search.
These two search methods, depth first search and binary search, which is more adapted to search random numbers.
Depth first search is the answer here. Because of the nature of binary search, binary search cannot search random numbers (in trees or elsewhere), only sorted numbers. You see, in a stereotypical binary search the middle value is analyzed (or root of tree). If the target value is higher, then the second half of the search domain is chosen, if the number is lower, then the first half. The search is then performed recursively on whichever half is chosen. For this reason, binary search will not work at all on a randomly sorted list of values. I will not go into the specifics of DFS since this question is answered. I'm sure there is a good WIKI on it.
Binary Search would not be the best option for sorting random numbers.
Binary search is a search algorithm finds an element by taking the value of the middle element or root node and compares every other value in the data structure to it. It must begin with a sorted data structure. If the other number is lower than the midpoint, the lower half of the data structure becomes the full structure and the midpoint is found on the lower half. If the other number is higher than the midpoint, then the same process is performed on the higher half of the data structure. This process is repeated until the value is found or the halves cross.
Depth-first search is a search and traversal algorithm that visits nodes in a tree or graph data structure down a path as far as it can go before backtracking. It uses a stack to keep track of all the adjacent nodes of the node being visited and then continues to visit all its neighbors until those have also been considered visited. After all the nodes on the path has been visited, the algorithm backtracks using the stack.
DFS would be the best option for searching random numbers because the prerequisite of Binary Search is for it to be sorted initially. If the numbers are not sorted, it defeats the purpose of the algorithm. DFS will be able to find a value in the structure of random numbers in O(V + E) time complexity.

Find k-th smallest element data structure

I have a problem here that requires to design a data structure that takes O(lg n) worst case for the following three operations:
a) Insertion: Insert the key into data structure only if it is not already there.
b) Deletion: delete the key if it is there!
c) Find kth smallest : find the ݇k-th smallest key in the data structure
I am wondering if I should use heap but I still don't have a clear idea about it.
I can easily get the first two part in O(lg n), even faster but not sure how to deal with the c) part.
Anyone has any idea please share.
Two solutions come in mind:
Use a balanced binary search tree (Red black, AVG, Splay,... any would do). You're already familiar with operation (1) and (2). For operation (3), just store an extra value at each node: the total number of nodes in that subtree. You could easily use this value to find the kth smallest element in O(log(n)).
For example, let say your tree is like follows - root A has 10 nodes, left child B has 3 nodes, right child C has 6 nodes (3 + 6 + 1 = 10), suppose you want to find the 8th smallest element, you know you should go to the right side.
Use a skip list. It also supports all your (1), (2), (3) operations for O(logn) on average but may be a bit longer to implement.
Well, if your data structure keeps the elements sorted, then it's easy to find the kth lowest element.
The worst-case cost of a Binary Search Tree for search and insertion is O(N) while the average-case cost is O(lgN).
Thus, I would recommend using a Red-Black Binary Search Tree which guarantees a worst-case complexity of O(lgN) for both search and insertion.
You can read more about red-black trees here and see an implementation of a Red-Black BST in Java here.
So in terms of finding the k-th smallest element using the above Red-Black BST implementation, you just need to call the select method, passing in the value of k. The select method also guarantees worst-case O(lgN).
One of the solution could be using the strategy of quick sort.
Step 1 : Pick the fist element as pivot element and take it to its correct place. (maximum n checks)
now when you reach the correct location for this element then you do a check
step 2.1 : if location >k
your element resides in the first sublist. so you are not interested in the second sublist.
step 2.2 if location
step 2.3 if location == k
you have got the element break the look/recursion
Step 3: repete the step 1 to 2.3 by using the appropriate sublist
Complexity of this solution is O(n log n)
Heap is not the right structure for finding the Kth smallest element of an array, simply because you would have to remove K-1 elements from the heap in order to get to the Kth element.
There is a much better approach to finding Kth smallest element, which relies on median-of-medians algorithm. Basically any partition algorithm would be good enough on average, but median-of-medians comes with the proof of worst-case O(N) time for finding the median. In general, this algorithm can be used to find any specific element, not only the median.
Here is the analysis and implementation of this algorithm in C#: Finding Kth Smallest Element in an Unsorted Array
P.S. On a related note, there are many many things that you can do in-place with arrays. Array is a wonderful data structure and only if you know how to organize its elements in a particular situation, you might get results extremely fast and without additional memory use.
Heap structure is a very good example, QuickSort algorithm as well. And here is one really funny example of using arrays efficiently (this problem comes from programming Olympics): Finding a Majority Element in an Array

Check if 2 tree nodes are related (ancestor/descendant) in O(1) with pre-processing

Check if 2 tree nodes are related (i.e. ancestor-descendant)
solve it in O(1) time, with O(N) space (N = # of nodes)
pre-processing is allowed
That's it. I'll be going to my solution (approach) below. Please stop if you want to think yourself first.
For a pre-processing I decided to do a pre-order (recursively go through the root first, then children) and give a label to each node.
Let me explain the labels in details. Each label will consist of comma-separated natural numbers like "1,2,1,4,5" - the length of this sequence equals to (the depth of the node + 1). E.g. the label of the root is "1", root's children will have labels "1,1", "1,2", "1,3" etc.. Next-level nodes will have labels like "1,1,1", "1,1,2", ..., "1,2,1", "1,2,2", ...
Assume that "the order number" of a node is the "1-based index of this node" in the children list of its parent.
Common rule: node's label consists of its parent label followed by comma and "the order number" of the node.
Thus, to answer if two nodes are related (i.e. ancestor-descendant) in O(1), I'll be checking if the label of one of them is "a prefix" of the other's label. Though I'm not sure if such labels can be considered to occupy O(N) space.
Any critics with fixes or an alternative approach is expected.
You can do it in O(n) preprocessing time, and O(n) space, with O(1) query time, if you store the preorder number and postorder number for each vertex and use this fact:
For two given nodes x and y of a tree T, x is an ancestor of y if and
only if x occurs before y in the preorder traversal of T and after y
in the post-order traversal.
(From this page: http://www.cs.arizona.edu/xiss/numbering.htm)
What you did in the worst case is Theta(d) where d is the depth of the higher node, and so is not O(1). Space is also not O(n).
if you consider a tree where a node in the tree has n/2 children (say), the running time of setting the labels will be as high as O(n*n). So this labeling scheme wont work ....
There are linear time lowest common ancestor algorithms(at least off-line). For instance have a look here. You can also have a look at tarjan's offline LCA algorithm. Please note that these articles require that you know the pairs for which you will be performing the LCA in advance. I think there are also online linear time precomputation time algorithms but they are very complex. For instance there is a linear precomputation time algorithm for the range minimum query problem. As far as I remember this solution passed through the LCA problem twice . The problem with the algorithm is that it had such a large constant that it require enormous input to be actually faster then the O(n*log(n)) algorithm.
There is much simpler approach that requires O(n*log(n)) additional memory and again answers in constant time.
Hope this helps.

Median of BST in O(logn) time complexity

I came across solution given at http://discuss.joelonsoftware.com/default.asp?interview.11.780597.8 using Morris InOrder traversal using which we can find the median in O(n) time.
But is it possible to achieve the same using O(logn) time? The same has been asked here - http://www.careercup.com/question?id=192816
If you also maintain the count of the number of left and right descendants of a node, you can do it in O(logN) time, by doing a search for the median position. In fact, you can find the kth largest element in O(logn) time.
Of course, this assumes that the tree is balanced. Maintaining the count does not change the insert/delete complexity.
If the tree is not balanced, then you have Omega(n) worst case complexity.
See: Order Statistic Tree.
btw, BigO and Smallo are very different (your title says Smallo).
Unless you guarantee some sort of balanced tree, it's not possible.
Consider a tree that's completely degenerate -- e.g., every left pointer is NULL (nil, whatever), so each node only has a right child (i.e., for all practical purposes the "tree" is really a singly linked list).
In this case, just accessing the median node (at all) takes linear time -- even if you started out knowing that node N was the median, it would still take N steps to get to that node.
We can find the median by using the rabbit and the turtle pointer. The rabbit moves twice as fast as the turtle in the in-order traversal of the BST. This way when the rabbit reaches the end of traversal, the turtle in at the median of the BST.
Please see the full explanation.

Resources