Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have some problem. I must add a lot of different values and just get only k-th largest in the end. How can I effectively implement that and what algorithm should I use?
Algorithm:
Create a binary maximum heap, and add each one of the first K values into the heap.
For each one of the remaining N-K values, if it larger than the last value in the heap:
Put it instead of the last value, and bubble it up in order to resort the heap.
Extract all the (K) values from the heap into a list.
Complexity:
O(K)
O((N-K)×log(K))
O(K×log(K))
If N-K ≥ K, then the overall complexity is O((N-K)×log(K)).
If N-K < K, then the overall complexity is O(K×log(K)).
(Based on comments that you do not want to store all the numbers you have seen...)
Keep a running list (sorted) of the k largest you have seen so far. As you get new numbers, look to see if it is larger than the least element in the list. If it is, remove the least element and insert (sorted) the new element into the list of k largest. Your original list of k (when you've seen no numbers) would consist of k entries of negative infinity.
first build max-heap using those elements which is O(n) time.
then extract k-1 elements in O(klogn) time.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I'm looking for an efficient algorithm or data structure to find largest element by second parameter in first N elements of multiset in which I'll make many ,so I can't use segment tree.Any Ideas?
note:I have multiset of pairs.
You can use any balanced binary search tree implementation you are familiar with. Arguably the most well known are AVL tree, Red-black tree.
Usually binary search tree description mentions a key and value pair stored in tree node. The keys are ordered from left to right. Insert, delete and find operations work with O(log(n)) time complexity because tree is balanced. Balance is often supported by tree rotation.
In order to be able to find maximum value on a range of elements you have to store and maintain additional information in each tree node namely maxValue in the subtree of the node and size of the subtree. Define a recursive function for a node to find maximum value among the first N nodes of its subtree. If N is equal to size you will already have an answer in maxValue of current node. Otherwise call the function for left/right node if some elements are in threir subtrees.
F(node, N) =
if N == size[node] : maxValue[node]
else if N <= size[leftChild[node]] :
F(leftChild[node], N)
else if N == size[leftChild[node]] + 1 :
MAX(F(leftChild[node], N), value[node])
else :
MAX(maxValue[leftChild[node]],
value[node],
F(rightChild[node], N - size[leftChild[node]] - 1)
If you are familiar with segment tree you will not encounter any problems with this implementation.
I may suggest you to use Treap. This is randomised binary tree. Because of the this randomised nature the tree always remains balances providing O(log(n)) time complexity for the basic operations. Treap DS has two basic operations split and merge, all other operations are implemented via their usage. An advatage of treap is that you don't have to deal with rotations.
EDIT: There is no way to maintain maxKey/minKey in each node explicitly O(log(n)).
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
Given a list with n elements,and given K and the value of the Kth element in the list. required to build a heap in 0(n) from the list, any other insertion to heap in O(log n) (adding another elements not from the list) ,delete in O(n),
how to find the biggest k element no matter what changed in the heap in O(k)?
for example: given list (2,4,6,10,9,12) and given k=3 and value=9,
the k biggest are 10,9,12. after that if deleted 10 and 12 ,the k bigggest will be 4,6,9 .
Maintain:
'Big' max heap, of size (n-k)
'Small' min heap, of size k.
At any time point the largest-k numbers are in the smaller heap. the root of the big heap is the candidate to be inserted to the smaller heap.
Once a number is inserted:
If it is bigger than the root of the small heap, then it should be among the
largest-k. pop the root of the small heap, and insert the new number to the small heap.
If it is smaller than the root of K2 - insert it to the big heap.
Once a number is deleted:
If it is smaller than the root of the small heap, then this number exists in the big heap only. find it (in the big heap) and delete it.
if it is bigger than the root of the smaller heap - then this number is among the largest-k. Find it in the small heap, delete it, and pop the root of the big heap - and insert it to the small heap.
This algorithm uses the complexity bounds you mentioned.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
What is considered an optimal data structure for pushing elements in sorted order. Here i am looking for some idea or our own customize data structure using that i can achieve insertion of each element in O(1) time and it should be sorted. I do not want to use binary search or tree or linkedlist to make it done.
Values range would be till 50,000 and it can be insert in any random order. After each insert my test case will check data structure is sorted or not. So i have to sort after each insert.
Please share your suggestions and views on this. How can i achieve this inserting sorting order with O(1).
Thanks
If you could do insertion in O(1) time, then you could solve for sorting a list of n elements in O(n) time. But that problem has been proven to be O(n log n), so the original assumption, that insertion can be done in O(1), is wrong.
If you are dealing with integers, the closest you can get to your requirements is by using a Van Emde Boas tree.
You can't get pure O(1). Either you have to do a binary search, or move elements around, or find the right place in a tree.
Hash tables will not keep your elements sorted in any way, at least with VEB trees you have the FindNext methods.
The only "sorting" you can do in O(1) is to use your sort keys as direct indexes into an array, which becomes impractical or plain impossible as soon as your keys can vary in too broad a range.
Maybe "Bucket sort" will fulfill your requirement of O(1) insertion in sorted list, limited value range & insert with random order.
For example, you can split 1~50,000 number to 10,000 buckets, then when you get a number N, you can push it in bucket n/5. after that, you just need to rerank the number in bucket n/5.
this is "nearly" O(1).
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I am a fresher and preparing for interviews. In my recent interview I was asked a question, for which I couldn't find suitable answer.
I was given some 100 files, each file is containing large number of comma separated integers. I had to find the top 10 integers among the whole files. I tried to solve it using heap. But I got confused with the time complexity of the process. Any help will be appreciated, thanks.
I think you are on the right track with using a heap data structure.
You could process the files in parallel and for each file you could maintain a min-heap of size 10.
As you iterate through a file you insert a value into the min-heap until it is full (size 10) then for values in positions 11 through n
if current_value > min_heap.current()
min_heap.extract()
min_heap.insert(current_value)
You have to iterate through n values and the worst case scenario is if the file is sorted in ascending order. In that case you will have to extract the min value and insert a new value for all the values in positions 11 thru n. The heap operations will be O(log n) giving you an overall running time of O(n * log n) for each file.
At this point you have m (# of files) min-heaps each of size 10. Here you can use a final min heap to store the ten largest numbers contained in the m min-heaps. This computation will be O(m) because the all the heaps at this point will be of max size 10, a constant.
Overall the running time will be O(n * log n + m). m could be much smaller than n so amongst friends we could say O(n * log n).
Even if you don't do the first step in parallel it would be O(m * n * log n + m), but once again if n dominates m we could say O(n * log n).
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I am trying to understand the runtime of printing BST keys in the given range.
I tried to understand it from this example, but I could not.
I think I understand where the O(log n) is coming from. That is from going through the BST recursively, this will take O(log n) for each side, but I am not sure about:
Where the K is coming from. is it just the constant time it takes to print? if yes why is the runtime not O(log n) + O(k) , and than you would ignore the K
Where is the O(n) from the in order traversal? because it is not in this runtime.
How the runtime will change if we have several values in the range on each side of the tree. For example, what if the range was from 4?
An easier way to understand the solution is to consider the following algorithm:
Searching for a minimum value greater than key k1 in the BST tree - O(lgn)
Performing in-order traversal of the BST tree nodes from k1 till we reach a node less than or equal to k2, and printing their keys. Because the in-order traversal of the complete BST takes O(n) time, if there are k keys between k1 and k2, the in-order traversal will take O(k) time.
The given algorithm is doing the same thing; Searching for a key between k1 and k2 takes O(lgn) time, whereas printing is done only for k keys within the range k1 and k2 which is O(k). If all BST keys lie within k1 and k2, the runtime will be O(lgn) + O(n) = O(n) because all n keys need to be printed out.