Meaning of L+k-1 index in Quickselect algorithm - algorithm

I am studying quickselect for a midterm in my algorithms analysis course and the algorithm I have been working with is the following:
Quickselect(A[L...R],k)
// Input: Array indexed from 0 to n-1 and an index of the kth smallest element
// Output: Value of the kth position
s = LomutoPartition(A[L...R]) // works by taking the first index and value as the
// pivot and returns it's index in the sorted position
if(s == k-1) // we have our k-th element, it's k-1 because arrays are 0-indexed
return A[s]
else if(s> L+k-1) // this is my question below
Quickselect(L...s-1,k) // basically the element we want is somewhere to the left
// of our pivot so we search that side
else
Quickselect(s+1...R, k-1-s)
/* the element we want is greater than our pivot so we search the right-side
* however if we do we must scale the k-th position accordingly by removing
* 1 and s so that the new value will not push the sub array out of bounds
*/
My question is why in the first if do we need L + k - 1? Doing a few examples on paper I have come to the conclusion that no matter the context L is always an index and that index is always 0. Which does nothing for the algorithm right?

There seems to be a discrepancy between the line
if(s == k-1)
and the line
else if(s> L+k-1)
The interpretations are incompatible.
As Trincot correctly notes, from the second recursive call on, it's possible that L is not 0. Your Lomuto subroutine doesn't take an array, a low index, and a high index (as the one in Wikipedia does, for example). Instead it just takes an array (which happens to be a subarray between low and hight of some other array). The index s it returns is thus relative to the subarray, and to translate it to the position within the original array, you need to add L. This is consistent with your first line, except that the line following it should read
return A[L + s]
Your second line should therefore also compare to k - 1, not L + k - 1.
Edit
Following the comment, here is the pseudo-code from Wikipedia:
// Returns the n-th smallest element of list within left..right inclusive
// (i.e. left <= n <= right).
// The search space within the array is changing for each round - but the list
// is still the same size. Thus, n does not need to be updated with each round.
function select(list, left, right, n)
if left = right // If the list contains only one element,
return list[left] // return that element
pivotIndex := ... // select a pivotIndex between left and right,
// e.g., left + floor(rand() % (right - left + 1))
pivotIndex := partition(list, left, right, pivotIndex)
// The pivot is in its final sorted position
if n = pivotIndex
return list[n]
else if n < pivotIndex
return select(list, left, pivotIndex - 1, n)
else
return select(list, pivotIndex + 1, right, n)
Note the conditions
if n = pivotIndex
and
else if n < pivotIndex
which are consistent in their interpretation of the indexing returned in partitioning.
Once again, it's possible to define the partitioning sub-routine either as returning the index relative to the start of the sub-array, or as returning the index relative to the original array, but there must be consistency in this.

Related

Is this wikipedia code for quickselect incorrect?

I was reading the quickselect algorithm on wikipedia: https://en.wikipedia.org/wiki/Quickselect
function select(list, left, right, k)
if left = right // If the list contains only one element,
return list[left] // return that element
pivotIndex := ... // select a pivotIndex between left and right,
// e.g., left + floor(rand() % (right - left + 1))
pivotIndex := partition(list, left, right, pivotIndex)
// The pivot is in its final sorted position
if k = pivotIndex
return list[k]
else if k < pivotIndex
return select(list, left, pivotIndex - 1, k)
else
return select(list, pivotIndex + 1, right, k - pivotIndex)
Isn't the last recursive call incorrect? I believe it the last argument should just be k rather than k - pivotIndex. Am I missing something here?
You are right - the last correction from September 20 introduced this error.
Top comment says that
// Returns the k-th smallest element of list within left..right inclusive
// (i.e. left <= k <= right).
and k is defined over all index range, it is absolute, not relative to local low border, as you noticed in comment.
Aslo checked my implementation of kselect, it uses k in the second call.

Longest Length sub array with elements in a given range

If I have a list of integers, in an array, how do I find the length of the longest sub array, such that the difference between the minimum and maximum element of that array is less than a given integer, say M.
So if we had an array with 3 elements,
[1, 2, 4]
And if M were equal to 2
Then the longest subarry would be [1, 2]
Because if we included 4, and we started from the beginning, the difference would be 3, which is greater than M ( = 2), and if we started from 2, the difference between the largest (4) and smallest element (2) would be 2 and that is not less than 2 (M)
The best I can think of is to start from the left, then go as far right as possible without the sub array range getting too high. Of course at each step we have to keep track of the minimum and maximum element so far. This has an n squared time complexity though, can't we get it faster?
I have an improvement to David Winder's algorithm. The idea is that instead of using two heaps to find the minimum and maximum elements, we can use what I call the deque DP optimization trick (there's probably a proper name for this somewhere).
To understand this, we can look at a simpler problem: finding the minimum element in all subarrays of some size k in an array. The idea is that we keep a double-ended queue containing potential candidates for the minimum element. When we encounter a new element, we pop off all the elements at the back end of the queue more than or equal to the current element before pushing the current element into the back.
We can do this because we know that any subarray we encounter in the future which includes an element that we pop off will also include the current element, and since the current element is less than those elements that gets popped off, those elements will never be the minimum.
After pushing the current element, we pop off the front element in the queue if it is more than k elements away. The minimum element in the current subarray is simply the first element in the queue because the way we popped off the elements from the back of the queue kept it increasing.
To use this algorithm in your problem, we would have two deques to store the minimum and maximum elements. When we encounter a new element which is too much larger than the minimum element, we pop off the front of the deque until the element is no longer too large. The beginning of the longest array ending at that position is then the index of the last element we popped off plus 1.
This makes the solution O(n).
C++ implementation:
int best = std::numeric_limits<int>::lowest(), beg = 0;
//best = length of the longest subarray that meets the requirements so far
//beg = the beginning of the longest subarray ending at the current index
std::deque<int> least, greatest;
//these two deques store the indices of the elements which could cause trouble
for (int i = 0; i < n; i++)
{
while (!least.empty() && a[least.back()] >= a[i])
{
least.pop_back();
//we can pop this off since any we encounter subarray which includes this
//in the future will also include the current element
}
least.push_back(i);
while (!greatest.empty() && a[greatest.back()] <= a[i])
{
greatest.pop_back();
//we can pop this off since any we encounter subarray which includes this
//in the future will also include the current element
}
greatest.push_back(i);
while (a[least.front()] < a[i] - m)
{
beg = least.front() + 1;
least.pop_front();
//remove elements from the beginning if they are too small
}
while (a[greatest.front()] > a[i] + m)
{
beg = greatest.front() + 1;
greatest.pop_front();
//remove elements from the beginning if they are too large
}
best = std::max(best, i - beg + 1);
}
Consider the following idea:
Let create MaxLen array (size of n) which define as: MaxLen[i] = length of the max sub-array till the i-th place.
After we will fill this array it will be easy (O(n)) to find your max sub-array.
How do we fill the MaxLen array? Assume you know MaxLen[i], What will be in MaxLen[i+1]?
We have 2 option - if the number in originalArr[i+1] do not break your constrains of exceed diff of m in the longest sub-array ending at index i then MaxLen[i+1] = MaxLen[i] + 1 (because we just able to make our previous sub array little bit longer. In the other hand, if originalArr[i+1] bigger or smaller with diff m with one of the last sub array we need to find the element that has diff of m and (let call its index is k) and insert into MaxLen[i+1] = i - k + 1 because our new max sub array will have to exclude the originalArr[k] element.
How do we find this "bad" element? we will use Heap. After every element we pass we insert it value and index to both min and max heap (done in log(n)). When you have the i-th element and you want to check if there is someone in the previous last array who break your sequence you can start extract element from the heap until no element is bigger or smaller the originalArr[i] -> take the max index of the extract element and that your k - the index of the element who broke your sequence.
I will try to simplify with pseudo code (I only demonstrate for min-heap but it the same as the max heap)
Array is input array of size n
min-heap = new heap()
maxLen = array(n) // of size n
maxLen[0] = 1; //max subArray for original Array with size 1
min-heap.push(Array[0], 0)
for (i in (1,n)) {
if (Array[i] - min-heap.top < m) // then all good
maxLen[i] = maxLen[i-1] + 1
else {
maxIndex = min-heap.top.index;
while (Array[i] - min-heap.top.value > m)
maxIndex = max (maxIndex , min-heap.pop.index)
if (empty(min-heap))
maxIndex = i // all element are "bad" so need to start new sub-array
break
//max index is our k ->
maxLen[i] = i - k + 1
}
min-heap.push(Array[i], i)
When you done, run on your max length array and choose the max value (from his index you can extract the begin an end indexes of the original array).
So we had loop over the array (n) and in each insert to 2 heaps (log n).
You would probably saying: Hi! But you also had un-know times of heap extract which force heapify (log n)! But notice that this heap can have max of n element and element can be extract twice so calculate accumolate complecsity and you will see its still o(1).
So bottom line: O(n*logn).
Edited:
This solution can be simplify by using AVL tree instead of 2 heaps - finding min and max are both O(logn) in AVL tree - same goes for insert, find and delete - so just use tree with element of the value and there index in the original array.
Edited 2:
#Fei Xiang even came up with better solution of O(n) using deques.

Quicksort efficiency: does direction of scan matter?

Here is my implementation of an in-place quicksort algorithm, an adaptation from this video:
def partition(arr, start, size):
if (size < 2):
return
index = int(math.floor(random.random()*size))
L = start
U = start+size-1
pivot = arr[start+index]
while (L < U):
while arr[L] < pivot:
L = L + 1
while arr[U] > pivot:
U = U - 1
temp = arr[L]
arr[L] = arr[U]
arr[U] = temp
partition(arr, start, L-start)
partition(arr, L+1, size-(L-start)-1)
There seems to be a few implementations of the scanning step where the array (or current portion of the array) is divided into 3 segments: elements lower than the pivot, the pivot, and elements greater than the pivot. I am scanning from the left for elements greater than or equal to the pivot, and from the right for elements less than or equal to the pivot. Once one of each is found, the swap is made, and the loop continues until the left marker is equal to or greater than the right marker. However, there is another method following this diagram that results in less partition steps in many cases. Can someone verify which method is actually more efficient for the quicksort algorithm?
Both the methods you used are basically the same . In the above code
index = int(math.floor(random.random()*size))
Index is chosen randomly, so it can be first element or the last element. In the link https://s3.amazonaws.com/hr-challenge-images/quick-sort/QuickSortInPlace.png they Initailly take the last element as pivot and Move in same way as you do in the code.
So both methods are same. In your code you randomly select the pivot, In the Image - you state the Pivot.

Selection algorithm to find the median, elements to the left and to the right

Suppose I had an unsorted array A of size n.
How to find the n/2, n/2−1, n/2+1th smallest element from the original unsorted list in linear time?
I tried to use the selection algorithm in wikipedia (Partition-based general selection algorithm is what I am implementing).
function partition(list, left, right, pivotIndex)
pivotValue := list[pivotIndex]
swap list[pivotIndex] and list[right] // Move pivot to end
storeIndex := left
for i from left to right-1
if list[i] < pivotValue
swap list[storeIndex] and list[i]
increment storeIndex
swap list[right] and list[storeIndex] // Move pivot to its final place
return storeIndex
function select(list, left, right, k)
if left = right // If the list contains only one element
return list[left] // Return that element
select pivotIndex between left and right //What value of pivotIndex shud i select??????????
pivotNewIndex := partition(list, left, right, pivotIndex)
pivotDist := pivotNewIndex - left + 1
// The pivot is in its final sorted position,
// so pivotDist reflects its 1-based position if list were sorted
if pivotDist = k
return list[pivotNewIndex]
else if k < pivotDist
return select(list, left, pivotNewIndex - 1, k)
else
return select(list, pivotNewIndex + 1, right, k - pivotDist)
But I have not understood 3 or 4 steps. I have following doubts:
Did I pick the correct algorithm and will it really work in linear time for my program. I am bit confused as it resembles like quick sort.
While Calling the Function Select from the main function, what will be the value of left, right and k. Consider my array is list [1...N].
Do I have to call select function three times, one time for finding n/2th smallest, another time for finding n/2+1 th smallest and one more time for n/2-1th smallest, or can it be done on a single call, if yes, how?
Also in function select (third step) "select pivotIndex between left and right", what value of pivotIndex should I select for my program/purpose.
Thanks!
It is like quicksort, but it's linear because in quicksort, you then need to handle both the left and right side of the pivot, while in quickselect, you only handle one side.
The initial call should be Select(A, 0, N, (N-1)/2) if N is odd; you'll need to decide exactly what you want to do if N is even.
To find the median and left/right, you probably want to call it to find the median, and then just do the max of the components of the array to the left and the min of the components to the right, because you know once the median selection phase is done that all elements to the left of the median will be less than it and to the right will be greater (or equal). This is O(n) + n/2 + n/2 = O(n) total time.
There are lots of ways to choose pivot indices. For casual purposes, either the middle element or a random index will probably suffice.

Get a random element in single direction linked list by one time traverse

I have a single direction linked list without knowing its size.
I want to get a random element in this list, and I just have one time chance to traverse the list. (I am not allowed to traverse twice or more)
What’s the algorithm for this problem? Thanks!
This is just reservoir sampling with a reservoir of size 1.
Essentially it is really simple
Pick the first element regardless (for a list of length 1, the first element is always the sample).
For every other element with probability 1/n where n is the number of elements observed so far, you replace the already picked element with the current element you are on.
This is uniformly sampled, since the probability of picking any element at the end of the day is 1/n (exercise to the reader).
This is probably an interview question.Reservoir sampling is used by data scientist to store relevant data in limited storage from large stream of data.
If you have to collect k elements from any array with elements n, such that you probability of each element collected should be same (k/n), you follow two steps,
1) Store first k elements in the storage.
2) When the next element(k+1) comes from the stream obviously you have no space in your collection anymore.Generate a random number from o to n, if the generated random number is less than k suppose l, replace storage[l] with the (k+1) element from stream.
Now, coming back to your question, here storage size is 1.So you will pick the first node,iterate over the list for second element.Now generate the random number ,if its 1, leave the sample alone otherwise switch the storage element from list
This question can be done using reservoir sampling. It is based on choosing k random items out of n items, but here n can be very large(which doesn't has to fit in memory!) and (as in your case) unknown initially.
The wikipedia has an understandable algorithm which i quote below:
array R[k]; // result
integer i, j;
// fill the reservoir array
for each i in 1 to k do
R[i] := S[i]
done;
// replace elements with gradually decreasing probability
for each i in k+1 to length(S) do
j := random(1, i); // important: inclusive range
if j <= k then
R[j] := S[i]
fi
done
The question requires only 1 value so we take k=1.
C implementation :
https://ideone.com/txnsas
This is the easiest way that I have found, it works fine and is understandable:
public int findrandom(Node start) {
Node curr = start;
int count = 1, result = 0, probability;
Random rand = new Random();
while (curr != null) {
probability = rand.nextInt(count) + 1;
if (count == probability)
result = curr.data;
count++;
curr = curr.next;
}
return result;
}

Resources