How is remove min on a sorted array implementation of a priority Queue constant time? - big-o

As per this lecture from uc BerkleyCS61B lecture 16, if a priority queue is implemented as a sorted array, the remove min is constant time.
If you just remove the element at the zeroth index, then wouldn't you have to move all the items over to the left? In which case wouldnt it be theta of n? Else find min will not be constant time.

If you implement a priority queue as a sorted array, there are two different ways to ensure that removeMin is an O(1) operation.
If the array is sorted in ascending order, then the smallest element is at the front of the array. In this case you maintain an index that tells you where the beginning of the queue is. When you first build it, then of course the index is at the front of the array (a[0] or a[1], depending on your choice of language). When you want to remove the smallest element, you return the item at a[idx], and then increment idx.
Any time you insert an item you write the code to move everything up to the front of the array again and reset idx to 0 (or 1, as appropriate).
The other way is to maintain the array in descending order. The smallest element is the last element of the array. You already have to keep track of how many elements are in the array. So you have an index, call it ixEnd. When you want to remove the smallest element, you return a[ixEnd], and subtract 1 from ixEnd.
Either way, insertion is O(n) and removeMin is O(1).

Related

Algorithm to find closest tuples for each tuple in a list

Consider an array of elements, where each element is a pair like (a,b)
The first element a is a Date and b is some positive integer.
The given array is sorted based on the Date.
We have to write a function that returns an array of integers.
Each element in the array at ith location is derived from the corresponding tuple element in the original array like following.
Take the ith tuple say (a,b) . Now look at all the tuples which occur after it. And find the one (c,d) such that d is less than b and is maximum.
The ith element in the returnes array will be (c-a).
My thoughts -
We scan from right side of the given array of tuples. And each time we encounter a tuple we add it in an AVL tree. Now searching takes time equal to height of tree.
So if the elements are distinct this will work in n log n time.
But if the second element in a tuple occur more times then we may end up traversing the whole tree.
Not sure how to address that.
We could probably store the min and Max nodes in a node for each subtree.
//input list of pair is of the form (date, value)
//DS is a data structure that support lower bound search and insert in O(logn)
index = size of list of pair - 1
for (pair p in input list of pair, scanning from right to left):
//search should return a sentinel if DS is empty
resultant_array[index--] = pair<p.date, search(previous of lower bound of p.value in DS)?.date>
if(DS doesn't contain pair with key p.value)
insert in DS the pair <p.value, p.date>
The above algorithm considers highest date if b in (a, b) could be duplicate. To take the lowest date, if p.value exists in DS, update instead of insert on the last line of algorithm.
DS could be ordered map, AVL, red-black, etc. Whole DS doesn't need to be traversed even in case of duplicate values against dates. So, just O(logn) per search.

Max absolute difference between elements of two subarrays

Given an array. An integer K divides array into two subarrays. The diffK is defined as max({A[0], A[1],....A[K]})- max({A[K+1],A[K+2],...A[n-1]}). Return maximum absolute value of diifK. Time Complexity has to be O(n) and max space complexity O(n)
It's straightforward to build, in a single pass forward through the array, a "helper" array that tracks the maximum value seen up through a given index. (So, for any given K, helper[K] = max({A[0], A[1],....A[K]}).)
Then, in a single pass backward through the array, you can track the maximum value seen from a given index onward (max({A[K+1],A[K+2],...A[n-1]}), where K is the index), and compare it to the value of the above "helper" array at that index. Keep track of the largest difference you ever see between the two values at the same index, and return the result.

(double) link list get() complexity

Why in double link list complexity of get(index) is equal O(n) not O(1)? Why it isn't like in array O(1) ? Is it because we have to traverse through previous nodes to get one ?
This is by definition. As you suspected, to get to the i-th element in the list, all previous items must be traversed.
As an exercise, implement a linked list for yourself.
Yes, having to "traverse through previous nodes to get one" is exactly it.
In a linked list, to find element # n, you would use something like:
def getNodeNum (n):
node = head
while n > 0 and node != NULL:
n = n - 1
node = node.next
return node
The reason an array is O(1) is because all the elements are laid out in contiguous memory. To get the address of element 42, you simply multiply 42 by the element size and then add the array base. This has the same cost for element number 3 as it does for element number 999.
You can't do that with a list because the elements are not necessarily contiguous in memory, hence you have to walk the list to find the one you desire. Thus the cost for finding element number 3 is actually far less than the cost for finding element number 999.

check if there exists a[i] = 2*a[j] in an unsorted array a?

Given a unsorted sequence of a[1,...,n] of integers, give an O(nlogn) runtime algorithm to check there are two indices i and j such that a[i] =2*a[j]. The algorithm should return i=0 and j=2 on input 4,12,8,10 and false on input 4,3,1,11.
I think we have to sort the array anyways which is O(nlogn). I'm not sure what to do after that.
Note: that can be done on O(n)1 on average, using a hash table.
set <- new hash set
for each x in array:
set.add(2*x)
for each x in array:
if set.contains(x):
return true
return false
Proof:
=>
If there are 2 elements a[i] and a[j] such that a[i] = 2 * a[j], then when iterating first time, we inserted 2*a[j] to the set when we read a[j]. On the second iteration, we find that a[i] == 2* a[j] is in set, and return true.
<=
If the algorithm returned true, then it found a[i] such that a[i] is already in the set in second iteration. So, during first itetation - we inserted a[i]. That only can be done if there is a second element a[j] such that a[i] == 2 * a[j], and we inserted a[i] when reading a[j].
Note:
In order to return the indices of the elemets, one can simply use a hash-map instead of a set, and for each i store 2*a[i] as key and i as value.
Example:
Input = [4,12,8,10]
first insert for each x - 2x to the hash table, and the index. You will get:
hashTable = {(8,0),(24,1),(16,2),(20,3)}
Now, on secod iteration you check for each element if it is in the table:
arr[0]: 4 is not in the table
arr[1]: 12 is not in the table
arr[2]: 8 is in the table - return the current index [2] and the value of 8 in the map, which is 0.
so, final output is 2,0 - as expected.
(1) Complexity notice:
In here, O(n) assumes O(1) hash function. This is not always true. If we do assume O(1) hash function, we can also assume sorting with radix-sort is O(n), and using a post-processing of O(n) [similar to the one suggested by #SteveJessop in his answer], we can also achieve O(n) with sorting-based algorithm.
Sort the array (O(n log n), or O(n) if you're willing to stretch a point about arrays of fixed-size integers)
Initialise two pointers ("fast" and "slow") at the start of the array (O(1))
Repeatedly:
increment "fast" until you find an even value >= twice the value at "slow"
if the value at "fast" is exactly twice the value at "slow", return true
increment "slow" until you find a value >= half the value at fast
if the value at "slow" is exactly half the value at "fast", return true
if one of the attempts to increment goes past the end, return false
Since each of fast and slow can be incremented at most n times total before reaching the end of the array, the "repeatedly" part is O(n).
You're right that the first step is sorting the array.
Once the array is sorted, you can find out whether a given element is inside the array in O(log n) time. So if for every of the n elements, you check for the inclusion of another element in O(log n) time, you end up with a runtime of O(n log n).
Does that help you?
Create an array of pairs A={(a[0], 0), (a[1], 1), ..., (a[n-1], n-1)}
Sort A,
For every (a[i], i) in A, do a binary search to see if there's a (a[i] * 2, j) pair or not. We can do this, because A is sorted.
Step 1 is O(n), and step 2 and 3 are O(n * log n).
Also, you can do step 3 in O(n) (there's no need for binary search). Because if the corresponding element for A[i] is at A[j], then then corresponding element for A[i+1] cannot be in A[0..j-1]. So we can keep two pointers, and find the answer in O(n). But anyway, the whole algorithm will be O(n log n) because we still do sorting.
Sorting the array is a good option - O(nlogn), assuming you don't have some fancy bucket sort option.
Once it's sorted, you need only pass through the array twice - I believe this is O(n)
Create a 'doubles' list which starts empty.
Then, For each element of the array:
check the element against the first element of the 'doubles' list
if it is the same, you win
if the element is higher, ditch the first element of the 'doubles' list and check again
add its double to the end of the 'doubles' list
keep going until you find a double, or get to the end of your first list.
You can also use a balanced tree, but it uses extra space but also does not harm the array.
Starting at i=0, and incrementing i, insert elements, checking if twice or half the current element is already there in the tree.
One advantage is that it will work in O(M log M) time where M = min [max{i,j}]. You could potentially change your sorting based algorithm to try and do O(M log M) but it could get complicated.
Btw, if you are using comparisons only, there is an Omega(n log n) lower bound, by reducing the element distinctness problem to this:
Duplicate the input array. Use the algorithm for this problem twice. So unless you bring hashing type stuff into the picture, you cannot get a better than Theta(n log n) algorithm!

Algorithm for finding 2 items with given difference in an array

I am given an array of real numbers, A. It has n+1 elements.
It is known that there are at least 2 elements of the array, x and y, such that:
abs(x-y) <= (max(A)-min(A))/n
I need to create an algorithm for finding the 2 items (if there are more, any couple is good) in O(n) time.
I've been trying for a few hours and I'm stuck, any clues/hints?
woo I got it! The trick is in the Pigeonhole Principle.
Okay.. think of the numbers as being points on a line. Then min(A) and max(A) define the start and end points of the line respectively. Now divide that line into n equal intervals of length (max(A)-min(A))/n. Since there are n+1 points, two of them must fall into one of the intervals.
Note that we don't need to rely on the question telling us that there are two points that satisfy the criterion. There are always two points that satisfy it.
The algorithm itself: You can use a simplified form of bucket sort here, since you only need one item per bucket (hit two and you're done). First loop once through the array to get min(A) and max(A) and create an integer array buckets[n] initialized to some default value, say -1. Then go for a second pass:
for (int i=0; i<len; i++) {
int bucket_num = find_bucket(array[i]);
if (bucket[bucket_num] == -1)
bucket[bucket_num] = i;
else
// found pair at (i, bucket[bucket_num])
}
Where find_bucket(x) returns the rounded-down integer result of x / ((max(A)-min(A))/n).
Let's re-word the problem: we're to find two elements, such that abs(x-y) <= c, where c is a constant, that we can find in O(n) time. (Indeed, we can compute both max(A) and min(A) in linear time and just assign c=(max-min)/n).
Let's imagine we have a set of buckets, so that in first bucket elements 0<=x<c are placed, in the second bucket elements c<=x<=2c are placed, etc. For each element, we can determine its bucket for O(1) time. Note that the number of buckets occupied will be not more than the number of elements in array.
Let's iterate the array and place each element to its bucket. If in the bucket we're going to place it, there already is another element, then we've just found the proper pair of x and y!
If we've iterated the whole array and every element has fallen into its own bucket, no worries! Iterate the buckets now (there is not more than n buckets, as we've said above) and for each bucket element x, if in the next bucket y element is such that abs(x-y)<=c, then we've found the solution.
If we iterated all the buckets and found no proper elements, then there is no solution. OMG, I really missed that pigeonhole stuff (see the other answer).
Buckets may be implemented as a hash map, where each bucket holds one array index (placing element in bucket will look like this: buckets[ a[i] / c] = i). We compute c in O(n) time, assign items to buckets in O(n)*O(1) time (O(1) is access to hash map), traverse buckets in O(n) time. Therefore, the whole algorithm is linear.

Resources