Algorithm which finds biggest n nodes in a tree - algorithm

Lets assume that we have a tree which nodes hold some numbers.
I need to find n biggest numbers in this tree.
I have two algorithms on my mind:
1. Using BFS or DFS iterate over tree and put it's nodes in an array and then sort it using quick sort as example and return n first elements.
Time complexity of this method is O(|V| + |E| + |V|log|V|) spatial is O(|V|)
2. Second is to iterate over tree finding maximum element and marking it n times. So time complexity is O(N*(|V| + |E|)) spatial is O(|V|) too.
Which solution is better and maybe im on the wrong way and there is a much better solution?

And a standard heap selection algorithm won't work?
The basic algorithm is (assuming that k is the number of items you want to select)
create an empty min-heap
for each node (depth-first search)
if heap.count < k
heap.Add(node)
else if node.Value < heap.Peek.Value()
heap.RemoveSmallest()
heap.Add(node)
When the for loop is done, your heap contains the k largest values. You can obtain them in ascending order with:
while heap.count > 0
output (heap.RemoveSmallest().Value)
If you want them in ascending order, remove them from the heap as above into an array, and then reverse the array.
This algorithm is O(n log k), where n is the number of nodes in the tree, and k is the number of items you want.

Related

How to find the kth minimum element from a Double linked list?

I need to implement a function which can be find the kth minimum from doubly linked list.
I searched on internet and come to know about this :
quickSelect logic and k-th order statistic algorithm would be effective for array or vector but here i am using linked list where I do not have any size of linked list so its hard to divide them in 5 elements part.
My function testcase is looks like this :
for(int i = 0; i < 1000; ++i)
{
// create linked list with 1000 elements
int kthMinimum = findKthMin(LinkedList, i);
// validate kthMinimum answer.
}
Here linkedlist can be in anyorder, we have to assume randomized only.
Any idea or suggestion to find kth minimum from doubly linked list in efficient time?
Thanks
Algorithm
You can maintain a heap of size k by doing the following:
Fill the array with the k first elements of the list.
Heapify the array (using a MaxHeap)
Process the remaining elements of the list:
If top of the heap (the max) is greater than the current element in the list e, replace it with e (and maintain the heap invariant)
If the element is greater, just ignore it and carry on
At the end of the algorithm, the k-th smallest element will be at the top of the heap.
Complexity
Accumulate the first k elements + heapify the array: O(k)
Process the remaining part of the list O((n-k)ln(k)).
If the list is doubly-linked, you can run the QuickSort algorithm on it. On my experience QuickSort is the fastest sorting algorithm (measured generating random lists, and pitting it against HeapSort and MergeSort). After that, simply walk the list k-positions to get your k-th smallest element.
QuickSort average time is O(n*log(n)), walking the list will be O(k), which in its worst case is O(n). So, total time is O(n*log(n)).

Finding the kth smallest element in a sequence where duplicates are compressed?

I've been asked to write a program to find the kth order statistic of a data set consisting of character and their occurrences. For example, I have a data set consisting of
B,A,C,A,B,C,A,D
Here I have A with 3 occurrences, B with 2 occurrences C with 2 occurrences and D with on occurrence. They can be grouped in pairs (characters, number of occurrences), so, for example, we could represent the above sequence as
(A,3), (B,2), (C,2) and (D,1).
Assuming than k is the number of these pairs, I am asked to find the kth of the data set in O(n) where n is the number of pairs.
I thought could sort the element based their number of occurrence and find their kth smallest elements, but that won't work in the time bounds. Can I please have some help on the algorithm for this problem?
Assuming that you have access to a linear-time selection algorithm, here's a simple divide-and-conquer algorithm for solving the problem. I'm going to let k denote the total number of pairs and m be the index you're looking for.
If there's just one pair, return the key in that pair.
Otherwise:
Using a linear-time selection algorithm, find the median element. Let medFreq be its frequency.
Sum up the frequencies of the elements less than the median. Call this less. Note that the number of elements less than or equal to the median is less + medFreq.
If less < m < less + medFreq, return the key in the median element.
Otherwise, if m ≤ less, recursively search for the mth element in the first half of the array.
Otherwise (m > less + medFreq), recursively search for the (m - less - medFreq)th element in the second half of the array.
The key insight here is that each iteration of this algorithm tosses out half of the pairs, so each recursive call is on an array half as large as the original array. This gives us the following recurrence relation:
T(k) = T(k / 2) + O(k)
Using the Master Theorem, this solves to O(k).

Binary Tree array list representation

I have been doing some research on Binary trees, and the array list representation. I am struggling to understand that the worst case space complexity is O(2^n). Specifically, the book states, the space usage is O(N) (N = array size), which is O(2^n) in the worst case . I would have thought it would have been 2n in the worst case as each node has two children (indexes) not O(2^n), where n = no. of elements.
an example, if I had a binary tree with 7 nodes, then the space would be 2n = 14 not 2^n = 128.
This is Heap implementation on an array. Where
A[1..n]
left_child(i) = A[2*i]
right_child(i) = A[2*i+1]
parent(i) = A[floor(i/2)]
Now, come to space. Think intuitively,
when you insert first element n=1, location=A[1], similarly,
n=2 #A[2] left_child(1)
n=3 #A[3] right_child(1)
n=4 #A[4] left_child(2)
n=5 #A[5] right_child(2)
You see, nth element will go into A[n]. So space complexity is O(n).
When you code you just plug-in the element to be inserted in the end say at A[n+1], and say that it's a child of floor((n+1)/2).
Refer: http://en.wikipedia.org/wiki/Binary_heap#Heap_implementation
Heap is a nearly complete tree, so total number of elements in the tree would be 2h-1 < n <= 2h+1-1 and this is what the length of array you will need. Refer: this
The worst case space complexity of a binary tree is O(n) (not O(2^n) in your question), but using arrays to represent binary trees can save the space of pointers if it's nearly a complete binary tree.
See http://en.wikipedia.org/wiki/Binary_tree#Arrays
I think this refers to storing arbitrary binary trees in an array representation, which is normally used for complete or nearly complete binary trees, notably in the implementation of heaps.
In this representation, the root is stored at index 0 in the array, and for any node with index n, its left and right children are stored at indices 2n+1 and 2n+2, respectively.
If you have a degenerate tree where no nodes have any right children (the tree is effectively a linked list), then the first items will be stored at indices 0, 1, 3, 7, 15, 31, .... In general, the nth item of this list (starting from 0) will be stored at index 2n-1, so in this case the array representation requires θ(2n) space.

Find algorithm in 2-3 BST tree

Is there an algorithm that with a given 2-3 tree T and a pointer to some node v in said tree, the algo can change the key of the node v so T would remain a legal 2-3 tree, in O(logn/loglogn) amortized efficiency?
No.
Assume it was possible, with the algorithm f, we will show we can sort an array with O(n*logn/loglogn) time complexity.
sort array A of length n:
(1) Create an 2-3 tree of size n, with no importance to keys. let it be T.
(2) store all pointers to nodes in T in a second array B.
(3) for each i from 0 to n:
(3.1) f(B[i],A[i]) //modify the tree: pointer: B[i] new value: A[i]
(4) extract elements from T back to A inorder.
correctness:
After each activation of f the tree is legal. After finishing activating f on all elements of T and all elements of A, the tree is legal and contains all elements. Thus, extracting elements from A, we get back the sorted array.
complexity:
(1)Creating a tree [no importance which keys we put] is O(n) we can put 0 in all elements, it doesn't matter
(2)iterating T and creating B is O(n)
(3)activating f is O(logn/loglogn), thus invoking it n times is O(n*logn/loglogn)
(4) extracting elements is just a traversal: O(n)
Thus: total complexity is O(n*logn/loglogn)
But sorting is an Omega(nlogn) problem with comparisons based algorithms. contradiction.
Conclusion: desired f doesn't exist.

How to find k nearest neighbors to the median of n distinct numbers in O(n) time?

I can use the median of medians selection algorithm to find the median in O(n). Also, I know that after the algorithm is done, all the elements to the left of the median are less that the median and all the elements to the right are greater than the median. But how do I find the k nearest neighbors to the median in O(n) time?
If the median is n, the numbers to the left are less than n and the numbers to the right are greater than n.
However, the array is not sorted in the left or the right sides. The numbers are any set of distinct numbers given by the user.
The problem is from Introduction to Algorithms by Cormen, problem 9.3-7
No one seems to quite have this. Here's how to do it. First, find the median as described above. This is O(n). Now park the median at the end of the array, and subtract the median from every other element. Now find element k of the array (not including the last element), using the quick select algorithm again. This not only finds element k (in order), it also leaves the array so that the lowest k numbers are at the beginning of the array. These are the k closest to the median, once you add the median back in.
The median-of-medians probably doesn't help much in finding the nearest neighbours, at least for large n. True, you have each column of 5 partitioned around it's median, but this isn't enough ordering information to solve the problem.
I'd just treat the median as an intermediate result, and treat the nearest neighbours as a priority queue problem...
Once you have the median from the median-of-medians, keep a note of it's value.
Run the heapify algorithm on all your data - see Wikipedia - Binary Heap. In comparisons, base the result on the difference relative to that saved median value. The highest priority items are those with the lowest ABS(value - median). This takes O(n).
The first item in the array is now the median (or a duplicate of it), and the array has heap structure. Use the heap extract algorithm to pull out as many nearest-neighbours as you need. This is O(k log n) for k nearest neighbours.
So long as k is a constant, you get O(n) median of medians, O(n) heapify and O(log n) extracting, giving O(n) overall.
med=Select(A,1,n,n/2) //finds the median
for i=1 to n
B[i]=mod(A[i]-med)
q=Select(B,1,n,k) //get the kth smallest difference
j=0
for i=1 to n
if B[i]<=q
C[j]=A[i] //A[i], the real value should be assigned instead of B[i] which is only the difference between A[i] and median.
j++
return C
You can solve your problem like that:
You can find the median in O(n), w.g. using the O(n) nth_element algorithm.
You loop through all elements substutiting each with a pair:
the absolute difference to the median, element's value.
Once more you do nth_element with n = k. after applying this algorithm you are guaranteed to have the k smallest elements in absolute difference first in the new array. You take their indices and DONE!
Four Steps:
Use Median of medians to locate the median of the array - O(n)
Determine the absolute difference between the median and each element in the array and store them in a new array - O(n)
Use Quickselect or Introselect to pick k smallest elements out of the new array - O(k*n)
Retrieve the k nearest neighbours by indexing the original array - O(k)
When k is small enough, the overall time complexity becomes O(n).
Find the median in O(n). 2. create a new array, each element is the absolute value of the original value subtract the median 3. Find the kth smallest number in O(n) 4. The desired values are the elements whose absolute difference with the median is less than or equal to the kth smallest number in the new array.
You could use a non-comparison sort, such as a radix sort, on the list of numbers L, then find the k closest neighbors by considering windows of k elements and examining the window endpoints. Another way of stating "find the window" is find i that minimizes abs(L[(n-k)/2+i] - L[n/2]) + abs(L[(n+k)/2+i] - L[n/2]) (if k is odd) or abs(L[(n-k)/2+i] - L[n/2]) + abs(L[(n+k)/2+i+1] - L[n/2]) (if k is even). Combining the cases, abs(L[(n-k)/2+i] - L[n/2]) + abs(L[(n+k)/2+i+!(k&1)] - L[n/2]). A simple, O(k) way of finding the minimum is to start with i=0, then slide to the left or right, but you should be able to find the minimum in O(log(k)).
The expression you minimize comes from transforming L into another list, M, by taking the difference of each element from the median.
m=L[n/2]
M=abs(L-m)
i minimizes M[n/2-k/2+i] + M[n/2+k/2+i].
You already know how to find the median in O(n)
if the order does not matter, selection of k smallest can be done in O(n)
apply for k smallest to the rhs of the median and k largest to the lhs of the median
from wikipedia
function findFirstK(list, left, right, k)
if right > left
select pivotIndex between left and right
pivotNewIndex := partition(list, left, right, pivotIndex)
if pivotNewIndex > k // new condition
findFirstK(list, left, pivotNewIndex-1, k)
if pivotNewIndex < k
findFirstK(list, pivotNewIndex+1, right, k)
don't forget the special case where k==n return the original list
Actually, the answer is pretty simple. All we need to do is to select k elements with the smallest absolute differences from the median moving from m-1 to 0 and m+1 to n-1 when the median is at index m. We select the elements using the same idea we use in merging 2 sorted arrays.
If you know the index of the median, which should just be ceil(array.length/2) maybe, then it just should be a process of listing out n(x-k), n(x-k+1), ... , n(x), n(x+1), n(x+2), ... n(x+k)
where n is the array, x is the index of the median, and k is the number of neighbours you need.(maybe k/2, if you want total k, not k each side)
First select the median in O(n) time, using a standard algorithm of that complexity.
Then run through the list again, selecting the elements that are nearest to the median (by storing the best known candidates and comparing new values against these candidates, just like one would search for a maximum element).
In each step of this additional run through the list O(k) steps are needed, and since k is constant this is O(1). So the total for time needed for the additional run is O(n), as is the total runtime of the full algorithm.
Since all the elements are distinct, there can be atmost 2 elements with the same difference from the mean. I think it is easier for me to have 2 arrays A[k] and B[k] the index representing the absolute value of the difference from the mean. Now the task is to just fill up the arrays and choose k elements by reading the first k non empty values of the arrays reading A[i] and B[i] before A[i+1] and B[i+1]. This can be done in O(n) time.
All the answers suggesting to subtract the median from the array would produce incorrect results. This method will find the elements closest in value, not closest in position.
For example, if the array is 1,2,3,4,5,10,20,30,40. For k=2, the value returned would be (3,4); which is incorrect. The correct output should be (4,10) as they are the nearest neighbor.
The correct way to find the result would be using the selection algorithm to find upper and lower bound elements. Then by direct comparison find the remaining elements from the list.

Resources