So I have come across priority queue data structure, the implementation is usually with heaps or binary tree.
Why can't a priority queue just use a simple array and use binary search for insertion, it would still be O(logn) with retrieval time of O(1).
while true {
// We will eventually reach the search with only two or less elements remaining.
// The insert can be in front of (first element index), between (second element index)
// or behind of the two element (second element index + 1).
//
// The lowerBound is the first element index if the first element (aka middle) has a
// higher priority than the insert priority (last call is upperBound = middle - 1 so
// lowerBound unchange)
//
// The lowerBound will be the second element index if the first element (aka middle)
// has a lower priority than the insert priority (last call is lowerBound = middle + 1)
//
if lowerBound >= upperBound {
// We still need to check if lowerBound is now in the second element, in which it can also
// be greater than priority of the second element. If so, we need to add 1 to the index
// (lowerBound)
return array[lowerBound].priority > insert.priority ? lowerBound : lowerBound + 1
}
var middle = (lowerBound + upperBound) / 2
let middleValue = array[middle]
if middleValue.priority > insert.priority {
upperBound = middle - 1
} else { // treat equal as less than so it put the same priority in order of insertion
lowerBound = middle + 1
}
}
Your loop, the binary search, only finds the index at which the new item should be inserted to maintain sorted order. Actually inserting it there is the hard part. Simply put, this takes linear time and there's nothing we can do about that. A sorted array is very fast for retrieval, but pretty slow for insertion. Binary heaps are fast (logarithmic) for both insertion and retrieval.
Related
If I have a list of integers, in an array, how do I find the length of the longest sub array, such that the difference between the minimum and maximum element of that array is less than a given integer, say M.
So if we had an array with 3 elements,
[1, 2, 4]
And if M were equal to 2
Then the longest subarry would be [1, 2]
Because if we included 4, and we started from the beginning, the difference would be 3, which is greater than M ( = 2), and if we started from 2, the difference between the largest (4) and smallest element (2) would be 2 and that is not less than 2 (M)
The best I can think of is to start from the left, then go as far right as possible without the sub array range getting too high. Of course at each step we have to keep track of the minimum and maximum element so far. This has an n squared time complexity though, can't we get it faster?
I have an improvement to David Winder's algorithm. The idea is that instead of using two heaps to find the minimum and maximum elements, we can use what I call the deque DP optimization trick (there's probably a proper name for this somewhere).
To understand this, we can look at a simpler problem: finding the minimum element in all subarrays of some size k in an array. The idea is that we keep a double-ended queue containing potential candidates for the minimum element. When we encounter a new element, we pop off all the elements at the back end of the queue more than or equal to the current element before pushing the current element into the back.
We can do this because we know that any subarray we encounter in the future which includes an element that we pop off will also include the current element, and since the current element is less than those elements that gets popped off, those elements will never be the minimum.
After pushing the current element, we pop off the front element in the queue if it is more than k elements away. The minimum element in the current subarray is simply the first element in the queue because the way we popped off the elements from the back of the queue kept it increasing.
To use this algorithm in your problem, we would have two deques to store the minimum and maximum elements. When we encounter a new element which is too much larger than the minimum element, we pop off the front of the deque until the element is no longer too large. The beginning of the longest array ending at that position is then the index of the last element we popped off plus 1.
This makes the solution O(n).
C++ implementation:
int best = std::numeric_limits<int>::lowest(), beg = 0;
//best = length of the longest subarray that meets the requirements so far
//beg = the beginning of the longest subarray ending at the current index
std::deque<int> least, greatest;
//these two deques store the indices of the elements which could cause trouble
for (int i = 0; i < n; i++)
{
while (!least.empty() && a[least.back()] >= a[i])
{
least.pop_back();
//we can pop this off since any we encounter subarray which includes this
//in the future will also include the current element
}
least.push_back(i);
while (!greatest.empty() && a[greatest.back()] <= a[i])
{
greatest.pop_back();
//we can pop this off since any we encounter subarray which includes this
//in the future will also include the current element
}
greatest.push_back(i);
while (a[least.front()] < a[i] - m)
{
beg = least.front() + 1;
least.pop_front();
//remove elements from the beginning if they are too small
}
while (a[greatest.front()] > a[i] + m)
{
beg = greatest.front() + 1;
greatest.pop_front();
//remove elements from the beginning if they are too large
}
best = std::max(best, i - beg + 1);
}
Consider the following idea:
Let create MaxLen array (size of n) which define as: MaxLen[i] = length of the max sub-array till the i-th place.
After we will fill this array it will be easy (O(n)) to find your max sub-array.
How do we fill the MaxLen array? Assume you know MaxLen[i], What will be in MaxLen[i+1]?
We have 2 option - if the number in originalArr[i+1] do not break your constrains of exceed diff of m in the longest sub-array ending at index i then MaxLen[i+1] = MaxLen[i] + 1 (because we just able to make our previous sub array little bit longer. In the other hand, if originalArr[i+1] bigger or smaller with diff m with one of the last sub array we need to find the element that has diff of m and (let call its index is k) and insert into MaxLen[i+1] = i - k + 1 because our new max sub array will have to exclude the originalArr[k] element.
How do we find this "bad" element? we will use Heap. After every element we pass we insert it value and index to both min and max heap (done in log(n)). When you have the i-th element and you want to check if there is someone in the previous last array who break your sequence you can start extract element from the heap until no element is bigger or smaller the originalArr[i] -> take the max index of the extract element and that your k - the index of the element who broke your sequence.
I will try to simplify with pseudo code (I only demonstrate for min-heap but it the same as the max heap)
Array is input array of size n
min-heap = new heap()
maxLen = array(n) // of size n
maxLen[0] = 1; //max subArray for original Array with size 1
min-heap.push(Array[0], 0)
for (i in (1,n)) {
if (Array[i] - min-heap.top < m) // then all good
maxLen[i] = maxLen[i-1] + 1
else {
maxIndex = min-heap.top.index;
while (Array[i] - min-heap.top.value > m)
maxIndex = max (maxIndex , min-heap.pop.index)
if (empty(min-heap))
maxIndex = i // all element are "bad" so need to start new sub-array
break
//max index is our k ->
maxLen[i] = i - k + 1
}
min-heap.push(Array[i], i)
When you done, run on your max length array and choose the max value (from his index you can extract the begin an end indexes of the original array).
So we had loop over the array (n) and in each insert to 2 heaps (log n).
You would probably saying: Hi! But you also had un-know times of heap extract which force heapify (log n)! But notice that this heap can have max of n element and element can be extract twice so calculate accumolate complecsity and you will see its still o(1).
So bottom line: O(n*logn).
Edited:
This solution can be simplify by using AVL tree instead of 2 heaps - finding min and max are both O(logn) in AVL tree - same goes for insert, find and delete - so just use tree with element of the value and there index in the original array.
Edited 2:
#Fei Xiang even came up with better solution of O(n) using deques.
I am studying quickselect for a midterm in my algorithms analysis course and the algorithm I have been working with is the following:
Quickselect(A[L...R],k)
// Input: Array indexed from 0 to n-1 and an index of the kth smallest element
// Output: Value of the kth position
s = LomutoPartition(A[L...R]) // works by taking the first index and value as the
// pivot and returns it's index in the sorted position
if(s == k-1) // we have our k-th element, it's k-1 because arrays are 0-indexed
return A[s]
else if(s> L+k-1) // this is my question below
Quickselect(L...s-1,k) // basically the element we want is somewhere to the left
// of our pivot so we search that side
else
Quickselect(s+1...R, k-1-s)
/* the element we want is greater than our pivot so we search the right-side
* however if we do we must scale the k-th position accordingly by removing
* 1 and s so that the new value will not push the sub array out of bounds
*/
My question is why in the first if do we need L + k - 1? Doing a few examples on paper I have come to the conclusion that no matter the context L is always an index and that index is always 0. Which does nothing for the algorithm right?
There seems to be a discrepancy between the line
if(s == k-1)
and the line
else if(s> L+k-1)
The interpretations are incompatible.
As Trincot correctly notes, from the second recursive call on, it's possible that L is not 0. Your Lomuto subroutine doesn't take an array, a low index, and a high index (as the one in Wikipedia does, for example). Instead it just takes an array (which happens to be a subarray between low and hight of some other array). The index s it returns is thus relative to the subarray, and to translate it to the position within the original array, you need to add L. This is consistent with your first line, except that the line following it should read
return A[L + s]
Your second line should therefore also compare to k - 1, not L + k - 1.
Edit
Following the comment, here is the pseudo-code from Wikipedia:
// Returns the n-th smallest element of list within left..right inclusive
// (i.e. left <= n <= right).
// The search space within the array is changing for each round - but the list
// is still the same size. Thus, n does not need to be updated with each round.
function select(list, left, right, n)
if left = right // If the list contains only one element,
return list[left] // return that element
pivotIndex := ... // select a pivotIndex between left and right,
// e.g., left + floor(rand() % (right - left + 1))
pivotIndex := partition(list, left, right, pivotIndex)
// The pivot is in its final sorted position
if n = pivotIndex
return list[n]
else if n < pivotIndex
return select(list, left, pivotIndex - 1, n)
else
return select(list, pivotIndex + 1, right, n)
Note the conditions
if n = pivotIndex
and
else if n < pivotIndex
which are consistent in their interpretation of the indexing returned in partitioning.
Once again, it's possible to define the partitioning sub-routine either as returning the index relative to the start of the sub-array, or as returning the index relative to the original array, but there must be consistency in this.
Finding the minimum from last K elements , after each insertion , where K is not fixed:
For example Given Array
10 2 4 1 3
Query K = 3
ans : 1 (minimum of 4 1 3)
Insertion : 5
10 2 4 1 3 5
Query K =2
ans = 3
Insertion 2
10 2 4 1 3 5 2
Query K =4
ans 1
Is there an efficient way for process such queries in less than O(n) time for each query ?
I am assuming that you know in the beginning itself the maximum number of elements that shall be inserted so that you can allocate space for them accordingly.
Then a min-segment tree shall work. Initially all elements in the segment tree contain "INT_MAX" value.
As new elements arrive, the corresponding leafs (and their ancestors) get updated. The leaf chosen for updation is as per the position of the element in the stream.
The interval queries can then be easily performed.
Both insertion and query operations shall take O(log n) time.
There is a way to solve this problem using a array and Binary search.
First, we process the array, and maintaining a array of index with increasing value.
So, for array 10 2 4 1 3, in the queue we have 1 3
Every time we insert one element into array, we try to remove all elements at the end of the array, which are greater than the current element, so when we insert 5 -> array become 1 3 5 , then insert 2 -> queue become 1 2.
So, to query the minimum element in range K, we need to find the element in the array, which is nearest to the start of the queue, and has index in the range of the last K element, which can be easily done using binary search.
So, total time for all insert will be O(n) or in average O(1) per insert and for each query is O(log n).
Pseudo code
int[] q;
int numberOfElement = 0;
for(int i = 0; i < n; i++){
while(numberOfElement > 0 && data[q[numberOfElement - 1]] >= data[i]){
numberOfElement--;
}
q[numberOfElement++] = i;
}
//Insert at index i:
while(numberOfElement > 0 && data[q[numberOfElement - 1]] >= data[i]){
numberOfElement--;
}
q[numberOfElement++] = i;
//Query for range K
int start = 0;
int end = numberOfElement - 1;
int result = 0;
while(start <= end){
int mid = (start + end) >> 1;
if(q[mid] >= totalElement - K){
result = mid;
end = mid - 1;
} else{
start = mid + 1;
}
}
If you know the number of elements to be inserted beforehand then you can go with #Abhishek Bansal answer (segment tree). Otherwise you can use a BST (binary search tree), for example Treap. You use index of element in the array as a key and the value of a node is its value in the array. So insertion of an element will take O(log(n)) and same complexity for the query (query is a max range query from last_index-k+1 to last_index, both are keys).
Here is the code for treap (it is for maxrange query) but minrange is the same idea: https://ideone.com/M9rnbg
Create and maintain a min-heap tree of maximum size k. Before every insertion, you need to delete the last kth inserted value from the min-heap tree, and after insertion min-heapify the tree.
Root node value(after insertion) stores the minimum value among last k inserted element all the time.
For each query:
Deletion costs: O(log k)
Insertion costs : O(log k)
Finding minimum costs : O(1)
So, overall time complexity is O(log k) (which is lesser than O(n)).
For a binary search of a sorted array of 2^n-1 elements in which the element we are looking for appears, what is the amortized worst-case time complexity?
Found this on my review sheet for my final exam. I can't even figure out why we would want amortized time complexity for binary search because its worst case is O(log n). According to my notes, the amortized cost calculates the upper-bound of an algorithm and then divides it by the number of items, so wouldn't that be as simple as the worst-case time complexity divided by n, meaning O(log n)/2^n-1?
For reference, here is the binary search I've been using:
public static boolean binarySearch(int x, int[] sorted) {
int s = 0; //start
int e = sorted.length-1; //end
while(s <= e) {
int mid = s + (e-s)/2;
if( sorted[mid] == x )
return true;
else if( sorted[mid] < x )
start = mid+1;
else
end = mid-1;
}
return false;
}
I'm honestly not sure what this means - I don't see how amortization interacts with binary search.
Perhaps the question is asking what the average cost of a successful binary search would be. You could imagine binary searching for all n elements of the array and looking at the average cost of such an operation. In that case, there's one element for which the search makes one probe, two for which the search makes two probes, four for which it makes three probes, etc. This averages out to O(log n).
Hope this helps!
iAmortized cost is the total cost over all possible queries divided by the number of possible queries. You will get slightly different results depending on how you count queries that fail to find the item. (Either don't count them at all, or count one for each gap where a missing item could be.)
So for a search of 2^n - 1 items (just as an example to keep the math simple), there is one item you would find on your first probe, 2 items would be found on the second probe, 4 on the third probe, ... 2^(n-1) on the nth probe. There are 2^n "gaps" for missing items (remembering to count both ends as gaps).
With your algorithm, finding an item on probe k costs 2k-1 comparisons. (That's 2 compares for each of the k-1 probes before the kth, plus one where the test for == returns true.) Searching for an item not in the table costs 2n comparisons.
I'll leave it to you to do the math, but I can't leave the topic without expressing how irked I am when I see binary search coded this way. Consider:
public static boolean binarySearch(int x, int[] sorted {
int s = 0; // start
int e = sorted.length; // end
// Loop invariant: if x is at sorted[k] then s <= k < e
int mid = (s + e)/2;
while (mid != s) {
if (sorted[mid] > x) e = mid; else s = mid;
mid = (s + e)/2; }
return (mid < e) && (sorted[mid] == x); // mid == e means the array was empty
}
You don't short-circuit the loop when you hit the item you're looking for, which seems like a defect, but on the other hand you do only one comparison on every item you look at, instead of two comparisons on each item that doesn't match. Since half of all items are found at leaves of the search tree, what seems like a defect turns out to be a major gain. Indeed, the number of elements where short-circuiting the loop is beneficial is only about the square root of the number of elements in the array.
Grind through the arithmetic, computing amortized search cost (counting "cost" as the number of comparisons to sorted[mid], and you'll see that this version is approximately twice as fast. It also has constant cost (within ±1 comparison), depending only on the number of items in the array and not on where or even if the item is found. Not that that's important.
Given a 2d array (a matrix) with n rows and k columns, with it's rows sorted, columns unspecified, what would be the most efficient algorithm to sort it?
For exmaple:
Input (n = 3 ; k = 4):
1 5 8 10
3 4 5 6
2 3 3 9
Output:
1 2 3 3 3 4 5 5 6 8 9 10
This is purely algorithmic question, so no specific .sort() methods of some languages help me, as I'm actually intereseted in the runtime complexity.
What I had in mind would be an algorithm as follows:
- Build a Binary Search tree with n nodes. Each node contains:
ID - of the row it refers to;
Value - The number at the "head" of the row.
- Sort the tree where the sorting key are the values.
- Find the lowest valued node.
- Do while the tree has a root:
- Take the lowest value and append it to the new array.
- Update the node's value to the next one in the same row.
- If the updated value is null, delete that node.
- else re-sort the tree.
- If the node moved, the first node it swapped with will be the next lowest
- else, if it didn't move, it's the new lowest.
If I'm not mistaken the runtime complexity is O(n*k * log n), Since I'm sorting the tree n*k times, which takes O(log n) time, and process of finding the next lowest is O(1).
If my complexity calculation is wrong, please let me know.
Is there any way more efficient than this?
You basically have n sorted lists, each of size k. You need a generalization of merge-sort, which is the k-way merge.
The idea is to keep a min-heap, that contains the smallest element from each list.
Now, iteratively pop the min of the heap. Let this number be x, and say it was taken from row i. Now, append x to the result list, and add to the min heap the next element in row i (if such element exists)
Repeat until all elements are exhausted.
Complexity is O(n*k*logn), which is pretty efficient considering you are sorting n*k elements, and need to traverse all of them. The constants for using a binary heap are pretty good.
Note that this is often refered as external sort (or to be exact a close variant to external sort's second part).
This is very similar to the algorithm you are suggesting, but will probably run much faster (with better constants) due to the usage of heap rather than the less efficient tree.
Also note, that if you use a 'regular' binary tree, you get complexity of O(n^2k), since there is no guarantee on the height of the tree. You need a self balancing binary search tree in order to get O(nklogn) run time.
This can be done using Sorted Merge which will take o(rows*cols) time i.e. total number of elements and o(rows) space complexity.
The java code for this problem is as below: (Consider rows = 3 and cols = 4)
for(int i=0;i<3;i++)
{
index[i] =0;
}
int count=0;
int a;
int b;
int c;
while(count<(ROWS*COLS))
{
int smallest;
if(index[0]>=COLS)
a= Integer.MAX_VALUE;
else
a= matrix[0][index[0]];
if(index[1]>=COLS)
b = Integer.MAX_VALUE;
else
b = matrix[1][index[1]];
if(index[2]>=COLS)
c = Integer.MAX_VALUE;
else
c = matrix[2][index[2]];
if(a<=b && a<=c){
// a is smallest
smallest = a;
index[0] = index[0] +1;
}else if(b<=c && b<=a){
//b is smallest
smallest = b;
index[1] = index[1] + 1;
}else{
//c is smallest
smallest = c;
index[2] = index[2] + 1;
}
System.out.print(smallest + ", ");
count++;
}