Maintaining sort while changing random elements - performance

I have come across this problem where I need to efficiently remove the smallest element in a list/array. That would be fairly trivial to solve - a heap would be sufficient.
However, the issue now is that when I remove the smallest element, it would cause changes in other elements in the data structure, which may result in the ordering being changed. An example is this:
I have an array of elements:
[1,3,5,7,9,11,12,15,20,33]
When I remove "1" from the array "5" and "12" get changed to "4" and "17" respectively.
[3,4,7,9,11,17,15,20,33]
And hence the ordering is not maintained.
However, the element that is removed will have pointers to all elements that will be changed, but there is not knowing how many elements will be changed and by how much.
So my question is:
What is the best way to store these elements to maximize performance when removing the smallest element from the data structure while maintaining sort? Or should I just leave it unsorted?
My current implementation is just storing them unsorted in a vector, so the time complexity is O(N^2), O(N) for finding the smallest element, and N removals.

A.
If you have the list M of all changed elements of the ordered list L,
go through M, and for every element
If it is still ordered with its neigbours in M, live it be.
If it is not in order with neighbours, exclude it from the M.
Such excluded elements will create a list N
Order N
Use some algorithm for merging ordered lists. http://en.wikipedia.org/wiki/Merge_algorithm
B.
If you are sure that new elements are few and not strongly changed, simply use the bubble sort.

I would still go with a heap ,backed by an array
In case only a few elements change after each pop,After you perform the pop operation , perform a heapify up/down for any item that reduces in value. It will still be in the order of O(nlog k) values, where k is the size of your array and n the number of elements that have reduced in size.
If a lot of items change in size , then you can consider this as a case where you have an unsorted array and you just create a heap from the array.

Related

picking the 10 largest values in array

I want to pick the 10 largest values in an array (size~1e9 elements) in fortran 90. what is the most time efficient way to do this? I was looking into efficient sorting algorithm, is it the way to go? Do I need to sort the entire array?
Sorting 109 elements to pick 101 from the top sounds like an overkill: log2N factor will be about 30, and the process of sorting will move a lot of data.
Make an array of ten items for the result, fill it with the first ten elements from the big array, and sort your 10-element array. Now walk the big array starting at element 11. If the current element is greater than the smallest item in the 10-element array, find the insertion point, shift ten-element array to make space for the new element, and place it in the array. Once you are done with the big array, the small array contains ten largest values.
For "larger values of ten" you can get a significant performance improvement by switching to a max-heap data structure. Construct a heap from the first ten items of the big array; store the smallest number for future reference. Then for each number in the big array above the smallest number in the heap so far do the following:
Replace the smallest number with the new number,
Follow the heap structure up to the root to place the number in the correct spot,
Store the location of the new smallest number in the heap.
Once you are done, the heap will contain ten largest items from the big array.
Sorting is not needed. You just need a priority queue of size 10, cost O(n) while the best sort is O(nlogn).
No, you don't need to perform a full sorting. You can drop parts of an input array as soon as you know they contain only items from those largest 10, or none of them.
You could for example adapt a quicksort algorithm in such a way that you recursively process only partitions covering the border between the 10-th and the 11-th highest items. Eventually you'll get 10 largest items at 10 last positions (not necessarily ordered by value, though) and all other items below (not in order, either).
Anyway in pessimistic case (wrong pivot selection or too many equal items) it may take too long.
The best solution is passing the big array through a 10-item priority queue, as #J63 mentions in the answer.

Best sorting algorithm - Partially sorted linked list

Problem- Given a sorted doubly link list and two numbers C and K. You need to decrease the info of node with data K by C and insert the new node formed at its correct position such that the list remains sorted.
I would think of insertion sort for such problem, because, insertion sort at any instance looks like, shown bunch of cards,
that are partially sorted. For insertion sort, number of swaps is equivalent to number of inversions. Number of compares is equivalent to number of exchanges + (N-1).
So, in the given problem(above), if node with data K is decreased by C, then the sorted linked list became partially sorted. Insertion sort is the best fit.
Another point is, amidst selection of sorting algorithm, if sorting logic applied for array representation of data holds best fit, then same sorting logic should holds best fit for linked list representation of same data.
For this problem, Is my thought process correct in choosing insertion sort?
Maybe you mean something else, but insertion sort is not the best algorithm, because you actually don't need to sort anything. If there is only one element with value K then it doesn't make a big difference, but otherwise it does.
So I would suggest the following algorithm O(n), ignoring edge cases for simplicity:
Go forward in the list until the value of the current node is > K - C.
Save this node, all the reduced nodes will be inserted before this one.
Continue to go forward while the value of the current node is < K
While the value of the current node is K, remove node, set value to K - C and insert it before the saved node. This could be optimized further, so that you only do one remove and insert operation of the whole sublist of nodes which had value K.
If these decrease operations can be batched up before the sorted list must be available, then you can simply remove all the decremented nodes from the list. Then, sort them, and perform a two-way merge into the list.
If the list must be maintained in order after each node decrement, then there is little choice but to remove the decremented node and re-insert in order.
Doing this with a linear search for a deck of cards is probably acceptable, unless you're running some monstrous Monte Carlo simulation involving cards, that runs for hours or day, so that optimization counts.
Otherwise the way we would deal with the need to maintain order would be to use an ordered sequence data structure: balanced binary tree (red-black, splay) or a skip list. Take the node out of the structure, adjust value, re-insert: O(log N).

Find element of an array that appears only once in O(logn) time

Given an array A with all elements appearing twice except one element which appears only once. How do we find the element which appears only once in O(logn) time? Let's discuss two cases.
Array is always sorted and elements are in sequential order. Let's assume A = [1, 1, 2, 2, 3, 4, 4, 5, 5, 6, 6], we want to find 3 in log n time because it appears only once.
When the array is not sorted and the elements are not in sequential order.
I can only come up with a solution of using the XOR operator on the binary representation of the integers as explained Here, and at the end, the binary string will represent the element which appears only once because duplicates will cancel out. But it takes O(n) time. How can we do better than that?
using Haroon S' comment this is the solution which I think is correct, given the constraints for time.
class Solution:
def singleNonDuplicate(self, nums: List[int]) -> int:
low = 0
high = len(nums)-1
while(low<high):
mid = (low+high)//2
if(mid%2==0):
mid+=1
if(nums[mid]==nums[mid+1]):
# answer in second half
high = mid-1
elif(nums[mid]==nums[mid-1]):
# answer in first half
low = mid+1
return nums[low]
If the elements are sorted (i.e., the first case you mentioned) then I believe a strategy not unlike binary search could work in O(logN) time.
Starting from the left endpoint in a sorted array, until we encounter the unique element, all the index pairs (2i, 2i + 1) we encounter along the way will have the same value. (i.e., due to the array being sorted) However, as we go towards the right endpoint of the array, as soon as we consider an array that includes the unique element, that structure of "same values within (2i, 2i+1) index pairs" will be invalid.
Using that information, a search algorithm similar to binary search can find out in which half of the array the unique element is. Basically, you can deduce that, "in the left half of the array, if the values in the rightmost index pair (2i, 2i+1) are the same, then the unique value is in the right half". (i.e., with the exception of the last index on the left half-array being even; but you can overcome that case with various O(1) time operations)
The overall complexity then becomes O(logN), due to the halving of the array size at each step.
For the demonstration of the index notion I mentioned above, see your own example. In the left of the unique element(i.e. 3) all index pairs (2i, 2i+1) have the same values. And all subarrays starting from index 0 and ending with an index that is to the right of the unique element, all index pairs (2i, 2i+1) have a correspond to cells that contain different values.
Unless the array is sorted, though, since you'd have to investigate each and every element, I believe any algorithm you may come up with would take at least O(n) time. This is what I think will happen in the second case you mention in your question.
In the general case this is impossible, as to make sure an element doesn't repeat you need to check every other element.
From your example, it seems the array might be a sorted sequence of integers with no "gaps" (or some other clearly defined sequence, like all even numbers, etc). In this case it is possible with a modified binary search.
You have the array [1,1,2,2,3,4,4,5,5,6,6].
You check the middle element and the element following it and see 3 and 4. Now you know there are only 5 elements from the set {1, 2, 3}, while there are 6 elements from the set {4, 5, 6}. Which means, the missing elements is in {1, 2, 3}.
Then you recurse on [1,1,2,2,3]. You see 2,2. Now you know there are 2 "1" elements and 1 "3" element, so 3 is the answer.
The reason you check 2 elements in each step is that if you see just "3", you don't know whether you hit the first 3 in "3,3" or the second one. But if you read 2 elements you always find a "boundary" between 2 different elements.
The condition for this to be viable is that, given the value of an element, you need to be able to calculate in O(1) how many different elements come before this element. In your case this is trivial, but it is also possible for any arithmetic series, geometric series (with fixed size numbers)...
This is not a O(log n) solution. I have no idea how to solve it in logarithmic time without the constraints that the array is sorted and we have a known difference between consecutive numbers so we can recognise when we are to the left or right of the singleton. The other solutions already deal with that special case and I couldn’t do better there either.
I have a suggestion that might solve the general case in O(n), rather than O(n log n) when you first sort the array. It’s not as fast as the xor solution, but it will also work for non-integers. The elements must have an order, so it is not completely general, but it will work anywhere you can sort the elements.
The idea is the same as the k’th order element algorithm based on Quicksort. You partition and recurse on one half of the array. The time recurrence is T(n) = T(n/2) + O(n) = O(n).
Given array x and indices i,j, representing sub-array x[i:j], partition with quicksort’s partitioning method. You want a variant that partitions x[i:j] into three segments, x[i:k] x[k:l], x[l:j] where all elements in the first part are smaller than the pivot (whatever it is) all elements in x[k:l] are equal to the pivot, and all elements in the last segment are greater than the pivot.
(you might be able to use a version that only partitions in two, or explicitly count the number of pivots, but with this version is easier to work with here)
Now, if the middle segment has length one, you have your singleton. It is the pivot.
If not, the length of the segment that has the singleton is odd while the other is even. So recurse on the segment with the odd length.
It doesn’t give you worst case linear time, for the same reason that Quicksort isn’t worst case log-linear, but you get an expected linear time algorithm and likely a fast one at that.
Not, of course, as fast as those solutions based on binary search, but here the elements do not need to be sorted and we can handle elements with arbitrary gaps between them. We are also not restricted to data where we can easily manipulate their bit-patterns. So it is more general. If you can compare the elements, this approach will find the singleton in O(n).
This solution will find the element in the array that appeared only once but there should not be more than one element of that type and the array should be sorted. This is Binary Search and will return the element in O(log n) time.
var singleNonDuplicate = function(nums) {
let s=0,e= nums.length-1
while(s < e){
let mid = Math.trunc(s+(e-s)/2)
if((mid%2 == 0&& nums[mid] ==nums[mid+1])||(mid%2==1 && nums[mid] == nums[mid-1]) ){
s= mid+1
}
else{
e = mid
}
}
return nums[s] // can return nums[e] also
};
I don't believe there is a O(log n) solution for that. The reason is that in order to find which element is appearing only once, you at least need to iterate over the elements of that array once.

Find the maximum element which is common in n different arrays?

Earlier today I asked similar problem to find the maximum element which is common in two arrays. I got a couple of good solutions here ( Find the maximum element which is common in two arrays?).
Now it occurred to me, what if instead of two arrays we have to find the maximum element which is common in n different arrays?
Example:
array1 = [1,5,2,4,6,88,34]
array2 = [1,5,6,2,34]
array3 = [1,34]
array4 = [7,99,34]
Here the maximum element which is common in all the arrays is 34.
Is it a good idea to create a hashmap of the array1, array2 ..... array(N-1) separately and then check every element of arrayN in each of these hashmaps keeping track of maximum element (when present in all the hashmaps)?
Can we have better solutions than this?
For each array A_n:
Add all elements in A_n to a hashset, H_n
Create a hashmap, M, which maps values to counts.
For each hashset H_n:
For each value, v, in H_n:
M[v]++
Go through M for the highest value with count == N
This will run in O(n) time and space, where n is the total number of elements in all arrays. It also properly deals with elements being duplicated in a single array, which you didn't mention but which might cause problems for some algorithms. If you know that you won't have duplicate elements in a single array you can skip the first step and add values to M directly from the arrays.
hashmap is not good to keep track of max elements. what you want is a max heap or hashset.
you create n max heap and you traverse n of them. you take the common max element. this might be hard to implement.
another approach would be to find the intersection of all the sets and take the max from the intersection. in this case, you can use hashset.

Maximizing minimum on an array

There is probably an efficient solution for this, but I'm not seeing it.
I'm not sure how to explain my problem but here goes...
Lets say we have one array with n integers, for example {3,2,0,5,0,4,1,9,7,3}.
What we want to do is to find the range of 5 consecutive elements with the "maximal minimum"...
The solution in this example, would be this part {3,2,0,5,0,4,1,9,7,3} with 1 as the maximal minimum.
It's easy to do with O(n^2), but there must be a better way of doing this. What is it?
If you mean literally five consecutive elements, then you just need to keep a sorted window of the source array.
Say you have:
{3,2,0,5,0,1,0,4,1,9,7,3}
First, you get five elements and sort'em:
{3,2,0,5,0, 1,0,1,9,7,3}
{0,0,2,3,5} - sorted.
Here the minimum is the first element of the sorted sequence.
Then you need do advance it one step to the right, you see the new element 1 and the old one 3, you need to find and replace 3 with 1 and then return the array to the sorted state. You actually don't need to run a sorting algorithm on it, but you can as there is just one element that is in the wrong place (1 in this example). But even bubble sort will do it in linear time here.
{3,2,0,5,0,1, 0,4,1,9,7,3}
{0,0,1,2,5}
Then the new minimum is again the first element.
Then again and again you advance and compare first elements of the sorted sequence to the minimum and remember it and the subsequence.
Time complexity is O(n).
Can't you use some circular buffer of 5 elements, run over the array and add the newest element to the buffer (thereby replacing the oldest element) and searching for the lowest number in the buffer? Keep a variable with the offset into the array that gave the highest minimum.
That would seem to be O(n * 5*log(5)) = O(n), I believe.
Edit: I see unkulunkulu proposed exactly the same as me in more detail :).
Using a balanced binary search tree indtead of a linear buffer, it is trivial to get complexity O(n log m).
You can do it in O(n) for arbitrary k-consecutive elements as well. Use a deque.
For each element x:
pop elements from the back of the deque that are larger than x
if the front of the deque is more than k positions old, discard it
push x at the end of the deque
at each step, the front of the deque will give you the minimum of your
current k-element window. Compare it with your global maximum and update if needed.
Since each element gets pushed and popped from the deque at most once, this is O(n).
The deque data structure can either be implemented with an array the size of your initial sequence, obtaining O(n) memory usage, or with a linked list that actually deletes the needed elements from memory, obtaining O(k) memory usage.

Resources