Maximizing minimum on an array - algorithm

There is probably an efficient solution for this, but I'm not seeing it.
I'm not sure how to explain my problem but here goes...
Lets say we have one array with n integers, for example {3,2,0,5,0,4,1,9,7,3}.
What we want to do is to find the range of 5 consecutive elements with the "maximal minimum"...
The solution in this example, would be this part {3,2,0,5,0,4,1,9,7,3} with 1 as the maximal minimum.
It's easy to do with O(n^2), but there must be a better way of doing this. What is it?

If you mean literally five consecutive elements, then you just need to keep a sorted window of the source array.
Say you have:
{3,2,0,5,0,1,0,4,1,9,7,3}
First, you get five elements and sort'em:
{3,2,0,5,0, 1,0,1,9,7,3}
{0,0,2,3,5} - sorted.
Here the minimum is the first element of the sorted sequence.
Then you need do advance it one step to the right, you see the new element 1 and the old one 3, you need to find and replace 3 with 1 and then return the array to the sorted state. You actually don't need to run a sorting algorithm on it, but you can as there is just one element that is in the wrong place (1 in this example). But even bubble sort will do it in linear time here.
{3,2,0,5,0,1, 0,4,1,9,7,3}
{0,0,1,2,5}
Then the new minimum is again the first element.
Then again and again you advance and compare first elements of the sorted sequence to the minimum and remember it and the subsequence.
Time complexity is O(n).

Can't you use some circular buffer of 5 elements, run over the array and add the newest element to the buffer (thereby replacing the oldest element) and searching for the lowest number in the buffer? Keep a variable with the offset into the array that gave the highest minimum.
That would seem to be O(n * 5*log(5)) = O(n), I believe.
Edit: I see unkulunkulu proposed exactly the same as me in more detail :).

Using a balanced binary search tree indtead of a linear buffer, it is trivial to get complexity O(n log m).

You can do it in O(n) for arbitrary k-consecutive elements as well. Use a deque.
For each element x:
pop elements from the back of the deque that are larger than x
if the front of the deque is more than k positions old, discard it
push x at the end of the deque
at each step, the front of the deque will give you the minimum of your
current k-element window. Compare it with your global maximum and update if needed.
Since each element gets pushed and popped from the deque at most once, this is O(n).
The deque data structure can either be implemented with an array the size of your initial sequence, obtaining O(n) memory usage, or with a linked list that actually deletes the needed elements from memory, obtaining O(k) memory usage.

Related

picking the 10 largest values in array

I want to pick the 10 largest values in an array (size~1e9 elements) in fortran 90. what is the most time efficient way to do this? I was looking into efficient sorting algorithm, is it the way to go? Do I need to sort the entire array?
Sorting 109 elements to pick 101 from the top sounds like an overkill: log2N factor will be about 30, and the process of sorting will move a lot of data.
Make an array of ten items for the result, fill it with the first ten elements from the big array, and sort your 10-element array. Now walk the big array starting at element 11. If the current element is greater than the smallest item in the 10-element array, find the insertion point, shift ten-element array to make space for the new element, and place it in the array. Once you are done with the big array, the small array contains ten largest values.
For "larger values of ten" you can get a significant performance improvement by switching to a max-heap data structure. Construct a heap from the first ten items of the big array; store the smallest number for future reference. Then for each number in the big array above the smallest number in the heap so far do the following:
Replace the smallest number with the new number,
Follow the heap structure up to the root to place the number in the correct spot,
Store the location of the new smallest number in the heap.
Once you are done, the heap will contain ten largest items from the big array.
Sorting is not needed. You just need a priority queue of size 10, cost O(n) while the best sort is O(nlogn).
No, you don't need to perform a full sorting. You can drop parts of an input array as soon as you know they contain only items from those largest 10, or none of them.
You could for example adapt a quicksort algorithm in such a way that you recursively process only partitions covering the border between the 10-th and the 11-th highest items. Eventually you'll get 10 largest items at 10 last positions (not necessarily ordered by value, though) and all other items below (not in order, either).
Anyway in pessimistic case (wrong pivot selection or too many equal items) it may take too long.
The best solution is passing the big array through a 10-item priority queue, as #J63 mentions in the answer.

Find element of an array that appears only once in O(logn) time

Given an array A with all elements appearing twice except one element which appears only once. How do we find the element which appears only once in O(logn) time? Let's discuss two cases.
Array is always sorted and elements are in sequential order. Let's assume A = [1, 1, 2, 2, 3, 4, 4, 5, 5, 6, 6], we want to find 3 in log n time because it appears only once.
When the array is not sorted and the elements are not in sequential order.
I can only come up with a solution of using the XOR operator on the binary representation of the integers as explained Here, and at the end, the binary string will represent the element which appears only once because duplicates will cancel out. But it takes O(n) time. How can we do better than that?
using Haroon S' comment this is the solution which I think is correct, given the constraints for time.
class Solution:
def singleNonDuplicate(self, nums: List[int]) -> int:
low = 0
high = len(nums)-1
while(low<high):
mid = (low+high)//2
if(mid%2==0):
mid+=1
if(nums[mid]==nums[mid+1]):
# answer in second half
high = mid-1
elif(nums[mid]==nums[mid-1]):
# answer in first half
low = mid+1
return nums[low]
If the elements are sorted (i.e., the first case you mentioned) then I believe a strategy not unlike binary search could work in O(logN) time.
Starting from the left endpoint in a sorted array, until we encounter the unique element, all the index pairs (2i, 2i + 1) we encounter along the way will have the same value. (i.e., due to the array being sorted) However, as we go towards the right endpoint of the array, as soon as we consider an array that includes the unique element, that structure of "same values within (2i, 2i+1) index pairs" will be invalid.
Using that information, a search algorithm similar to binary search can find out in which half of the array the unique element is. Basically, you can deduce that, "in the left half of the array, if the values in the rightmost index pair (2i, 2i+1) are the same, then the unique value is in the right half". (i.e., with the exception of the last index on the left half-array being even; but you can overcome that case with various O(1) time operations)
The overall complexity then becomes O(logN), due to the halving of the array size at each step.
For the demonstration of the index notion I mentioned above, see your own example. In the left of the unique element(i.e. 3) all index pairs (2i, 2i+1) have the same values. And all subarrays starting from index 0 and ending with an index that is to the right of the unique element, all index pairs (2i, 2i+1) have a correspond to cells that contain different values.
Unless the array is sorted, though, since you'd have to investigate each and every element, I believe any algorithm you may come up with would take at least O(n) time. This is what I think will happen in the second case you mention in your question.
In the general case this is impossible, as to make sure an element doesn't repeat you need to check every other element.
From your example, it seems the array might be a sorted sequence of integers with no "gaps" (or some other clearly defined sequence, like all even numbers, etc). In this case it is possible with a modified binary search.
You have the array [1,1,2,2,3,4,4,5,5,6,6].
You check the middle element and the element following it and see 3 and 4. Now you know there are only 5 elements from the set {1, 2, 3}, while there are 6 elements from the set {4, 5, 6}. Which means, the missing elements is in {1, 2, 3}.
Then you recurse on [1,1,2,2,3]. You see 2,2. Now you know there are 2 "1" elements and 1 "3" element, so 3 is the answer.
The reason you check 2 elements in each step is that if you see just "3", you don't know whether you hit the first 3 in "3,3" or the second one. But if you read 2 elements you always find a "boundary" between 2 different elements.
The condition for this to be viable is that, given the value of an element, you need to be able to calculate in O(1) how many different elements come before this element. In your case this is trivial, but it is also possible for any arithmetic series, geometric series (with fixed size numbers)...
This is not a O(log n) solution. I have no idea how to solve it in logarithmic time without the constraints that the array is sorted and we have a known difference between consecutive numbers so we can recognise when we are to the left or right of the singleton. The other solutions already deal with that special case and I couldn’t do better there either.
I have a suggestion that might solve the general case in O(n), rather than O(n log n) when you first sort the array. It’s not as fast as the xor solution, but it will also work for non-integers. The elements must have an order, so it is not completely general, but it will work anywhere you can sort the elements.
The idea is the same as the k’th order element algorithm based on Quicksort. You partition and recurse on one half of the array. The time recurrence is T(n) = T(n/2) + O(n) = O(n).
Given array x and indices i,j, representing sub-array x[i:j], partition with quicksort’s partitioning method. You want a variant that partitions x[i:j] into three segments, x[i:k] x[k:l], x[l:j] where all elements in the first part are smaller than the pivot (whatever it is) all elements in x[k:l] are equal to the pivot, and all elements in the last segment are greater than the pivot.
(you might be able to use a version that only partitions in two, or explicitly count the number of pivots, but with this version is easier to work with here)
Now, if the middle segment has length one, you have your singleton. It is the pivot.
If not, the length of the segment that has the singleton is odd while the other is even. So recurse on the segment with the odd length.
It doesn’t give you worst case linear time, for the same reason that Quicksort isn’t worst case log-linear, but you get an expected linear time algorithm and likely a fast one at that.
Not, of course, as fast as those solutions based on binary search, but here the elements do not need to be sorted and we can handle elements with arbitrary gaps between them. We are also not restricted to data where we can easily manipulate their bit-patterns. So it is more general. If you can compare the elements, this approach will find the singleton in O(n).
This solution will find the element in the array that appeared only once but there should not be more than one element of that type and the array should be sorted. This is Binary Search and will return the element in O(log n) time.
var singleNonDuplicate = function(nums) {
let s=0,e= nums.length-1
while(s < e){
let mid = Math.trunc(s+(e-s)/2)
if((mid%2 == 0&& nums[mid] ==nums[mid+1])||(mid%2==1 && nums[mid] == nums[mid-1]) ){
s= mid+1
}
else{
e = mid
}
}
return nums[s] // can return nums[e] also
};
I don't believe there is a O(log n) solution for that. The reason is that in order to find which element is appearing only once, you at least need to iterate over the elements of that array once.

Maintaining sort while changing random elements

I have come across this problem where I need to efficiently remove the smallest element in a list/array. That would be fairly trivial to solve - a heap would be sufficient.
However, the issue now is that when I remove the smallest element, it would cause changes in other elements in the data structure, which may result in the ordering being changed. An example is this:
I have an array of elements:
[1,3,5,7,9,11,12,15,20,33]
When I remove "1" from the array "5" and "12" get changed to "4" and "17" respectively.
[3,4,7,9,11,17,15,20,33]
And hence the ordering is not maintained.
However, the element that is removed will have pointers to all elements that will be changed, but there is not knowing how many elements will be changed and by how much.
So my question is:
What is the best way to store these elements to maximize performance when removing the smallest element from the data structure while maintaining sort? Or should I just leave it unsorted?
My current implementation is just storing them unsorted in a vector, so the time complexity is O(N^2), O(N) for finding the smallest element, and N removals.
A.
If you have the list M of all changed elements of the ordered list L,
go through M, and for every element
If it is still ordered with its neigbours in M, live it be.
If it is not in order with neighbours, exclude it from the M.
Such excluded elements will create a list N
Order N
Use some algorithm for merging ordered lists. http://en.wikipedia.org/wiki/Merge_algorithm
B.
If you are sure that new elements are few and not strongly changed, simply use the bubble sort.
I would still go with a heap ,backed by an array
In case only a few elements change after each pop,After you perform the pop operation , perform a heapify up/down for any item that reduces in value. It will still be in the order of O(nlog k) values, where k is the size of your array and n the number of elements that have reduced in size.
If a lot of items change in size , then you can consider this as a case where you have an unsorted array and you just create a heap from the array.

Data design Issue to find insertion deletion and getMin in O(1)

As said in the title i need to define a datastructure that takes only O(1) time for insertion deletion and getMIn time.... NO SPACE CONSTRAINTS.....
I have searched SO for the same and all i have found is for insertion and deletion in O(1) time.... even a stack does. i saw previous post in stack overflow all they say is hashing...
with my analysis for getMIn in O(1) time we can use heap datastructure
for insertion and deletion in O(1) time we have stack...
so inorder to achieve my goal i think i need to tweak around heapdatastructure and stack...
How will i add hashing technique to this situation ...
if i use hashtable then what should my hash function look like how to analize the situation in terms of hashing... any good references will be appreciated ...
If you go with your initial assumption that insertion and deletion are O(1) complexity (if you only want to insert into the top and delete/pop from the top then a stack works fine) then in order to have getMin return the minimum value in constant time you would need to store the min somehow. If you just had a member variable keep track of the min then what would happen if it was deleted off the stack? You would need the next minimum, or the minimum relative to what's left in the stack. To do this you could have your elements in a stack contain what it believes to be the minimum. The stack is represented in code by a linked list, so the struct of a node in the linked list would look something like this:
struct Node
{
int value;
int min;
Node *next;
}
If you look at an example list: 7->3->1->5->2. Let's look at how this would be built. First you push in the value 2 (to an empty stack), this is the min because it's the first number, keep track of it and add it to the node when you construct it: {2, 2}. Then you push the 5 onto the stack, 5>2 so the min is the same push {5,2}, now you have {5,2}->{2,2}. Then you push 1 in, 1<2 so the new min is 1, push {1, 1}, now it's {1,1}->{5,2}->{2,2} etc. By the end you have:
{7,1}->{3,1}->{1,1}->{5,2}->{2,2}
In this implementation, if you popped off 7, 3, and 1 your new min would be 2 as it should be. And all of your operations is still in constant time because you just added a comparison and another value to the node. (You could use something like C++'s peek() or just use a pointer to the head of the list to take a look at the top of the stack and grab the min there, it'll give you the min of the stack in constant time).
A tradeoff in this implementation is that you'd have an extra integer in your nodes, and if you only have one or two mins in a very large list it is a waste of memory. If this is the case then you could keep track of the mins in a separate stack and just compare the value of the node that you're deleting to the top of this list and remove it from both lists if it matches. It's more things to keep track of so it really depends on the situation.
DISCLAIMER: This is my first post in this forum so I'm sorry if it's a bit convoluted or wordy. I'm also not saying that this is "one true answer" but it is the one that I think is the simplest and conforms to the requirements of the question. There are always tradeoffs and depending on the situation different approaches are required.
This is a design problem, which means they want to see how quickly you can augment existing data-structures.
start with what you know:
O(1) update, i.e. insertion/deletion, is screaming hashtable
O(1) getMin is screaming hashtable too, but this time ordered.
Here, I am presenting one way of doing it. You may find something else that you prefer.
create a HashMap, call it main, where to store all the elements
create a LinkedHashMap (java has one), call it mins where to track the minimum values.
the first time you insert an element into main, add it to mins as well.
for every subsequent insert, if the new value is less than what's at the head of your mins map, add it to the map with something equivalent to addToHead.
when you remove an element from main, also remove it from mins. 2*O(1) = O(1)
Notice that getMin is simply peeking without deleting. So just peek at the head of mins.
EDIT:
Amortized algorithm:
(thanks to #Andrew Tomazos - Fathomling, let's have some more fun!)
We all know that the cost of insertion into a hashtable is O(1). But in fact, if you have ever built a hash table you know that you must keep doubling the size of the table to avoid overflow. Each time you double the size of a table with n elements, you must re-insert the elements and then add the new element. By this analysis it would
seem that worst-case cost of adding an element to a hashtable is O(n). So why do we say it's O(1)? because not all the elements take worst-case! Indeed, only the elements where doubling occurs takes worst-case. Therefore, inserting n elements takes n+summation(2^i where i=0 to lg(n-1)) which gives n+2n = O(n) so that O(n)/n = O(1) !!!
Why not apply the same principle to the linkedHashMap? You have to reload all the elements anyway! So, each time you are doubling main, put all the elements in main in mins as well, and sort them in mins. Then for all other cases proceed as above (bullets steps).
A hashtable gives you insertion and deletion in O(1) (a stack does not because you can't have holes in a stack). But you can't have getMin in O(1) too, because ordering your elements can't be faster than O(n*Log(n)) (it is a theorem) which means O(Log(n)) for each element.
You can keep a pointer to the min to have getMin in O(1). This pointer can be updated easily for an insertion but not for the deletion of the min. But depending on how often you use deletion it can be a good idea.
You can use a trie. A trie has O(L) complexity for both insertion, deletion, and getmin, where L is the length of the string (or whatever) you're looking for. It is of constant complexity with respect to n (number of elements).
It requires a huge amount of memory, though. As they emphasized "no space constraints", they were probably thinking of a trie. :D
Strictly speaking your problem as stated is provably impossible, however consider the following:
Given a type T place an enumeration on all possible elements of that type such that value i is less than value j iff T(i) < T(j). (ie number all possible values of type T in order)
Create an array of that size.
Make the elements of the array:
struct PT
{
T t;
PT* next_higher;
PT* prev_lower;
}
Insert and delete elements into the array maintaining double linked list (in order of index, and hence sorted order) storage
This will give you constant getMin and delete.
For insertition you need to find the next element in the array in constant time, so I would use a type of radix search.
If the size of the array is 2^x then maintain x "skip" arrays where element j of array i points to the nearest element of the main array to index (j << i).
This will then always require a fixed x number of lookups to update and search so this will give constant time insertion.
This uses exponential space, but this is allowed by the requirements of the question.
in your problem statement " insertion and deletion in O(1) time we have stack..."
so I am assuming deletion = pop()
in that case, use another stack to track min
algo:
Stack 1 -- normal stack; Stack 2 -- min stack
Insertion
push to stack 1.
if stack 2 is empty or new item < stack2.peek(), push to stack 2 as well
objective: at any point of time stack2.peek() should give you min O(1)
Deletion
pop() from stack 1.
if popped element equals stack2.peek(), pop() from stack 2 as well

Finding the repeated element

In an array with integers between 1 and 1,000,000 or say some very larger value ,if a single value is occurring twice twice. How do you determine which one?
I think we can use a bitmap to mark the elements , and then traverse allover again to find out the repeated element . But , i think it is a process with high complexity.Is there any better way ?
This sounds like homework or an interview question ... so rather than giving away the answer, here's a hint.
What calculations can you do on a range of integers whose answer you can determine ahead of time?
Once you realize the answer to this, you should be able to figure it out .... if you still can't figure it out ... (and it's not homework) I'll post the solution :)
EDIT: Ok. So here's the elegant solution ... if the list contains ALL of the integers within the range.
We know that all of the values between 1 and N must exist in the list. Using Guass' formula we can quickly compute the expected value of a range of integers:
Sum(1..N) = 1/2 * (1 + N) * Count(1..N).
Since we know the expected sum, all we have to do is loop through all the values and sum their values. The different between this sum and the expected sum is the duplicate value.
EDIT: As other's have commented, the question doesn't state that the range contains all of the integers ... in this case, you have to decide whether you want to optimize for memory or time.
If you want to perform the operation using O(1) storage, you can perform an in-place sort of the list. As you're sorting you have to check adjacent elements. Once you see a duplicate, you know you can stop. Optimal sorting is an O(n log n) operation on average - which establishes an upper bound for find the duplicate in this manner.
If you want to optimize for speed, you can use an additional O(n) storage. Using a HashSet (or similar structure), insert values from your list until you determine you are inserting a duplicate into the HashSet. Inserting n items into a HashSet is an O(n) operation on average, which establishes that as an upper bound for this method.
you may try to use bits as hashmap:
1 at position k means that number k occured before
0 at position k means that number k did not occured before
pseudocode:
0. assume that your array is A
1. initialize bitarray(there is nice class in c# for this) of 1000000 length filled with zeros
2. for each num in A:
if bitarray[num]
return num
else
bitarray[num] = 1
end
The time complexity of the bitmap solution is O(n) and it doesn't seem like you could do better than that. However it will take up a lot of memory for a generic list of numbers. Sorting the numbers is an obvious way to detect duplicates and doesn't require extra space if you don't mind the current order changing.
Assuming the array is of length n < N (i.e. not ALL integers are present -- in this case LBushkin's trick is the answer to this homework problem), there is no way to solve this problem using less than O(n) memory using an algorithm that just takes a single pass through the array. This is by reduction to the set disjointness problem.
Suppose I made the problem easier, and I promised you that the duplicate elements were in the array such that the first one was in the first n/2 elements, and the second one was in the last n/2 elements. Now we can think of playing a game in which two people each hold a string of n/2 elements, and want to know how many messages they have to send to be sure that none of their elements are the same. Since the first player could simulate the run of any algorithm that takes a pass through the array, and send the contents of its memory to the second player, a lower bound on the number of messages they need to send implies a lower bound on the memory requirements of any algorithm.
But its easy to see in this simple game that they need to send n/2 messages to be sure that they don't hold any of the same elements, which yields the lower bound.
Edit: This generalizes to show that for algorithms that make k passes through the array and use memory m, that m*k = Omega(n). And it is easy to see that you can in fact trade off memory for time in this way.
Of course, if you are willing to use algorithms that don't simply take passes through the array, you can do better as suggested already: sort the array, then take 1 pass through. This takes time O(nlogn) and space O(1). But note curiously that this proves that any sorting algorithm that just makes passes through the array must take time Omega(n^2)! Sorting algorithms that break the n^2 bound must make random accesses.

Resources