Quicksort efficiency: does direction of scan matter? - algorithm

Here is my implementation of an in-place quicksort algorithm, an adaptation from this video:
def partition(arr, start, size):
if (size < 2):
return
index = int(math.floor(random.random()*size))
L = start
U = start+size-1
pivot = arr[start+index]
while (L < U):
while arr[L] < pivot:
L = L + 1
while arr[U] > pivot:
U = U - 1
temp = arr[L]
arr[L] = arr[U]
arr[U] = temp
partition(arr, start, L-start)
partition(arr, L+1, size-(L-start)-1)
There seems to be a few implementations of the scanning step where the array (or current portion of the array) is divided into 3 segments: elements lower than the pivot, the pivot, and elements greater than the pivot. I am scanning from the left for elements greater than or equal to the pivot, and from the right for elements less than or equal to the pivot. Once one of each is found, the swap is made, and the loop continues until the left marker is equal to or greater than the right marker. However, there is another method following this diagram that results in less partition steps in many cases. Can someone verify which method is actually more efficient for the quicksort algorithm?

Both the methods you used are basically the same . In the above code
index = int(math.floor(random.random()*size))
Index is chosen randomly, so it can be first element or the last element. In the link https://s3.amazonaws.com/hr-challenge-images/quick-sort/QuickSortInPlace.png they Initailly take the last element as pivot and Move in same way as you do in the code.
So both methods are same. In your code you randomly select the pivot, In the Image - you state the Pivot.

Related

Find minimum number of steps to collect target number of coins

Given a list of n houses, each house has a certain number of coins in it. And a target value t. We have to find the minimum number of steps required to reach the target.
The person can choose to start at any house and then go right or left and collect coins in that direction until it reaches the target value. But the person cannot
change the direction.
Example: 5 1 2 3 4 These are supposed the coin values in 5 houses and the target is 13 then the minimum number of steps required is 5 because we have to select all the coins.
My Thoughts:
One way will be for each index i calculate the steps required in left or right direction to reach the target and then take the minimum of all these 2*n values.
Could there be a better way ?
First, let's simplify and canonize the problem.
Observation 1: The "choose direction" capability is redundant, if you choose to go from house j to house i, you can also go from i to j to have the same value, so it is sufficient to look at one direction only.
Observation 2: Now that we can look at the problem as going from left to right (observation 1), it is clear that we are looking for a subarray whose value exceeds k.
This means that we can canonize the problem:
Given an array with non negative values a, find minimal subarray
with values summing k or more.
There are various ways to solve this, one simple solution using a sorted map (balanced tree for example) is to go from left to right, summing values, and looking for the last element seen whose value was sum - k.
Pseudo code:
solve(array, k):
min_houses = inf
sum = 0
map = new TreeMap()
map.insert(0, -1) // this solves issue where first element is sufficient on its own.
for i from 0 to array.len():
sum = sum + array[i]
candidate = map.FindClosestLowerOrEqual(sum - k)
if candidate == null: // no matching sum, yet
continue
min_houses = min(min_houses, i - candidate)
map.insert(sum, i)
return min_houses
This solution runs in O(nlogn), as each map insertion takes O(logn), and there are n+1 of those.
An optimization, running in O(n), can be done if we take advantage of "non negative" trait of the array. This means, as we go on in the array - the candidate chosen (in the map seek) is always increasing.
We can utilize it to have two pointers running concurrently, and finding best matches, instead of searching from scratch in the index as we did before.
solve(array, k):
left = 0
sum = 0
min_houses = infinity
for right from 0 to len(array):
sum = sum + array[right]
while (left < right && sum >= k):
min_houses = min(min_houses, right - left)
sum = sum - array[left]
left = left + 1
return min_houses
This runs in O(n), as each index is increased at most n times, and every operation is O(1).

Maximum sum in array with special conditions

Assume we have an array with n Elements ( n%3 = 0).In each step, a number is taken from the array. Either you take the leftmost or the rightmost one. If you choose the left one, this element is added to the sum and the two right numbers are removed and vice versa.
Example: A = [100,4,2,150,1,1], sum = 0.
take the leftmost element. A = [4,2,150] sum = 0+100 =100
2.take the rightmost element. A = [] sum = 100+150 = 250
So the result for A should be 250 and the sequence would be Left, Right.
How can I calculate the maximum sum I can get in an array? And how can I determine the sequence in which I have to extract the elements?
I guess this problem can best be solved with dynamic programming and the concrete sequence can then be determined by backtracking.
The underlying problem can be solved via dynamic programming as follows. The state space can be defined by letting
M(i,j) := maximum value attainable by chosing from the subarray of
A starting at index i and ending at index j
for any i, j in {1, N} where `N` is the number of elements
in the input.
where the recurrence relation is as follows.
M(i,j) = max { M(i+1, j-2) + A[i], M(i+2, j-1) + A[j] }
Here, the first value corresponds to the choice of adding the beginning of the array while the second value connesponds to the choice of subtracting the end of the array. The base cases are the states of value 0 where i=j.

Longest Length sub array with elements in a given range

If I have a list of integers, in an array, how do I find the length of the longest sub array, such that the difference between the minimum and maximum element of that array is less than a given integer, say M.
So if we had an array with 3 elements,
[1, 2, 4]
And if M were equal to 2
Then the longest subarry would be [1, 2]
Because if we included 4, and we started from the beginning, the difference would be 3, which is greater than M ( = 2), and if we started from 2, the difference between the largest (4) and smallest element (2) would be 2 and that is not less than 2 (M)
The best I can think of is to start from the left, then go as far right as possible without the sub array range getting too high. Of course at each step we have to keep track of the minimum and maximum element so far. This has an n squared time complexity though, can't we get it faster?
I have an improvement to David Winder's algorithm. The idea is that instead of using two heaps to find the minimum and maximum elements, we can use what I call the deque DP optimization trick (there's probably a proper name for this somewhere).
To understand this, we can look at a simpler problem: finding the minimum element in all subarrays of some size k in an array. The idea is that we keep a double-ended queue containing potential candidates for the minimum element. When we encounter a new element, we pop off all the elements at the back end of the queue more than or equal to the current element before pushing the current element into the back.
We can do this because we know that any subarray we encounter in the future which includes an element that we pop off will also include the current element, and since the current element is less than those elements that gets popped off, those elements will never be the minimum.
After pushing the current element, we pop off the front element in the queue if it is more than k elements away. The minimum element in the current subarray is simply the first element in the queue because the way we popped off the elements from the back of the queue kept it increasing.
To use this algorithm in your problem, we would have two deques to store the minimum and maximum elements. When we encounter a new element which is too much larger than the minimum element, we pop off the front of the deque until the element is no longer too large. The beginning of the longest array ending at that position is then the index of the last element we popped off plus 1.
This makes the solution O(n).
C++ implementation:
int best = std::numeric_limits<int>::lowest(), beg = 0;
//best = length of the longest subarray that meets the requirements so far
//beg = the beginning of the longest subarray ending at the current index
std::deque<int> least, greatest;
//these two deques store the indices of the elements which could cause trouble
for (int i = 0; i < n; i++)
{
while (!least.empty() && a[least.back()] >= a[i])
{
least.pop_back();
//we can pop this off since any we encounter subarray which includes this
//in the future will also include the current element
}
least.push_back(i);
while (!greatest.empty() && a[greatest.back()] <= a[i])
{
greatest.pop_back();
//we can pop this off since any we encounter subarray which includes this
//in the future will also include the current element
}
greatest.push_back(i);
while (a[least.front()] < a[i] - m)
{
beg = least.front() + 1;
least.pop_front();
//remove elements from the beginning if they are too small
}
while (a[greatest.front()] > a[i] + m)
{
beg = greatest.front() + 1;
greatest.pop_front();
//remove elements from the beginning if they are too large
}
best = std::max(best, i - beg + 1);
}
Consider the following idea:
Let create MaxLen array (size of n) which define as: MaxLen[i] = length of the max sub-array till the i-th place.
After we will fill this array it will be easy (O(n)) to find your max sub-array.
How do we fill the MaxLen array? Assume you know MaxLen[i], What will be in MaxLen[i+1]?
We have 2 option - if the number in originalArr[i+1] do not break your constrains of exceed diff of m in the longest sub-array ending at index i then MaxLen[i+1] = MaxLen[i] + 1 (because we just able to make our previous sub array little bit longer. In the other hand, if originalArr[i+1] bigger or smaller with diff m with one of the last sub array we need to find the element that has diff of m and (let call its index is k) and insert into MaxLen[i+1] = i - k + 1 because our new max sub array will have to exclude the originalArr[k] element.
How do we find this "bad" element? we will use Heap. After every element we pass we insert it value and index to both min and max heap (done in log(n)). When you have the i-th element and you want to check if there is someone in the previous last array who break your sequence you can start extract element from the heap until no element is bigger or smaller the originalArr[i] -> take the max index of the extract element and that your k - the index of the element who broke your sequence.
I will try to simplify with pseudo code (I only demonstrate for min-heap but it the same as the max heap)
Array is input array of size n
min-heap = new heap()
maxLen = array(n) // of size n
maxLen[0] = 1; //max subArray for original Array with size 1
min-heap.push(Array[0], 0)
for (i in (1,n)) {
if (Array[i] - min-heap.top < m) // then all good
maxLen[i] = maxLen[i-1] + 1
else {
maxIndex = min-heap.top.index;
while (Array[i] - min-heap.top.value > m)
maxIndex = max (maxIndex , min-heap.pop.index)
if (empty(min-heap))
maxIndex = i // all element are "bad" so need to start new sub-array
break
//max index is our k ->
maxLen[i] = i - k + 1
}
min-heap.push(Array[i], i)
When you done, run on your max length array and choose the max value (from his index you can extract the begin an end indexes of the original array).
So we had loop over the array (n) and in each insert to 2 heaps (log n).
You would probably saying: Hi! But you also had un-know times of heap extract which force heapify (log n)! But notice that this heap can have max of n element and element can be extract twice so calculate accumolate complecsity and you will see its still o(1).
So bottom line: O(n*logn).
Edited:
This solution can be simplify by using AVL tree instead of 2 heaps - finding min and max are both O(logn) in AVL tree - same goes for insert, find and delete - so just use tree with element of the value and there index in the original array.
Edited 2:
#Fei Xiang even came up with better solution of O(n) using deques.

Meaning of L+k-1 index in Quickselect algorithm

I am studying quickselect for a midterm in my algorithms analysis course and the algorithm I have been working with is the following:
Quickselect(A[L...R],k)
// Input: Array indexed from 0 to n-1 and an index of the kth smallest element
// Output: Value of the kth position
s = LomutoPartition(A[L...R]) // works by taking the first index and value as the
// pivot and returns it's index in the sorted position
if(s == k-1) // we have our k-th element, it's k-1 because arrays are 0-indexed
return A[s]
else if(s> L+k-1) // this is my question below
Quickselect(L...s-1,k) // basically the element we want is somewhere to the left
// of our pivot so we search that side
else
Quickselect(s+1...R, k-1-s)
/* the element we want is greater than our pivot so we search the right-side
* however if we do we must scale the k-th position accordingly by removing
* 1 and s so that the new value will not push the sub array out of bounds
*/
My question is why in the first if do we need L + k - 1? Doing a few examples on paper I have come to the conclusion that no matter the context L is always an index and that index is always 0. Which does nothing for the algorithm right?
There seems to be a discrepancy between the line
if(s == k-1)
and the line
else if(s> L+k-1)
The interpretations are incompatible.
As Trincot correctly notes, from the second recursive call on, it's possible that L is not 0. Your Lomuto subroutine doesn't take an array, a low index, and a high index (as the one in Wikipedia does, for example). Instead it just takes an array (which happens to be a subarray between low and hight of some other array). The index s it returns is thus relative to the subarray, and to translate it to the position within the original array, you need to add L. This is consistent with your first line, except that the line following it should read
return A[L + s]
Your second line should therefore also compare to k - 1, not L + k - 1.
Edit
Following the comment, here is the pseudo-code from Wikipedia:
// Returns the n-th smallest element of list within left..right inclusive
// (i.e. left <= n <= right).
// The search space within the array is changing for each round - but the list
// is still the same size. Thus, n does not need to be updated with each round.
function select(list, left, right, n)
if left = right // If the list contains only one element,
return list[left] // return that element
pivotIndex := ... // select a pivotIndex between left and right,
// e.g., left + floor(rand() % (right - left + 1))
pivotIndex := partition(list, left, right, pivotIndex)
// The pivot is in its final sorted position
if n = pivotIndex
return list[n]
else if n < pivotIndex
return select(list, left, pivotIndex - 1, n)
else
return select(list, pivotIndex + 1, right, n)
Note the conditions
if n = pivotIndex
and
else if n < pivotIndex
which are consistent in their interpretation of the indexing returned in partitioning.
Once again, it's possible to define the partitioning sub-routine either as returning the index relative to the start of the sub-array, or as returning the index relative to the original array, but there must be consistency in this.

Maximum of all possible subarrays of an array

How do I find/store maximum/minimum of all possible non-empty sub-arrays of an array of length n?
I generated the segment tree of the array and the for each possible sub array if did query into segment tree but that's not efficient. How do I do it in O(n)?
P.S n <= 10 ^7
For eg. arr[]= { 1, 2, 3 }; // the array need not to be sorted
sub-array min max
{1} 1 1
{2} 2 2
{3} 3 3
{1,2} 1 2
{2,3} 2 3
{1,2,3} 1 3
I don't think it is possible to store all those values in O(n). But it is pretty easy to create, in O(n), a structure that makes possible to answer, in O(1) the query "how many subsets are there where A[i] is the maximum element".
Naïve version:
Think about the naïve strategy: to know how many such subsets are there for some A[i], you could employ a simple O(n) algorithm that counts how many elements to the left and to the right of the array that are less than A[i]. Let's say:
A = [... 10 1 1 1 5 1 1 10 ...]
This 5 up has 3 elements to the left and 2 to the right lesser than it. From this we know there are 4*3=12 subarrays for which that very 5 is the maximum. 4*3 because there are 0..3 subarrays to the left and 0..2 to the right.
Optimized version:
This naïve version of the check would take O(n) operations for each element, so O(n^2) after all. Wouldn't it be nice if we could compute all these lengths in O(n) in a single pass?
Luckily there is a simple algorithm for that. Just use a stack. Traverse the array normally (from left to right). Put every element index in the stack. But before putting it, remove all the indexes whose value are lesser than the current value. The remaining index before the current one is the nearest larger element.
To find the same values at the right, just traverse the array backwards.
Here's a sample Python proof-of-concept that shows this algorithm in action. I implemented also the naïve version so we can cross-check the result from the optimized version:
from random import choice
from collections import defaultdict, deque
def make_bounds(A, fallback, arange, op):
stack = deque()
bound = [fallback] * len(A)
for i in arange:
while stack and op(A[stack[-1]], A[i]):
stack.pop()
if stack:
bound[i] = stack[-1]
stack.append(i)
return bound
def optimized_version(A):
T = zip(make_bounds(A, -1, xrange(len(A)), lambda x, y: x<=y),
make_bounds(A, len(A), reversed(xrange(len(A))), lambda x, y: x<y))
answer = defaultdict(lambda: 0)
for i, x in enumerate(A):
left, right = T[i]
answer[x] += (i-left) * (right-i)
return dict(answer)
def naive_version(A):
answer = defaultdict(lambda: 0)
for i, x in enumerate(A):
left = next((j for j in range(i-1, -1, -1) if A[j]>A[i]), -1)
right = next((j for j in range(i+1, len(A)) if A[j]>=A[i]), len(A))
answer[x] += (i-left) * (right-i)
return dict(answer)
A = [choice(xrange(32)) for i in xrange(8)]
MA1 = naive_version(A)
MA2 = optimized_version(A)
print 'Array: ', A
print 'Naive: ', MA1
print 'Optimized:', MA2
print 'OK: ', MA1 == MA2
I don't think it is possible to it directly in O(n) time: you need to iterate over all the elements of the subarrays, and you have n of them. Unless the subarrays are sorted.
You could, on the other hand, when initialising the subarrays, instead of making them normal arrays, you could build heaps, specifically min heaps when you want to find the minimum and max heaps when you want to find the maximum.
Building a heap is a linear time operation, and retrieving the maximum and minimum respectively for a max heap and min heap is a constant time operation, since those elements are found at the first place of the heap.
Heaps can be easily implemented just using a normal array.
Check this article on Wikipedia about binary heaps: https://en.wikipedia.org/wiki/Binary_heap.
I do not understand what exactly you mean by maximum of sub-arrays, so I will assume you are asking for one of the following
The subarray of maximum/minimum length or some other criteria (in which case the problem will reduce to finding max element in a 1 dimensional array)
The maximum elements of all your sub-arrays either in the context of one sub-array or in the context of the entire super-array
Problem 1 can be solved by simply iterating your super-array and storing a reference to the largest element. Or building a heap as nbro had said. Problem 2 also has a similar solution. However a linear scan is through n arrays of length m is not going to be linear. So you will have to keep your class invariants such that the maximum/minimum is known after every operation. Maybe with the help of some data structure like a heap.
Assuming you mean contiguous sub-arrays, create the array of partial sums where Yi = SUM(i=0..i)Xi, so from 1,4,2,3 create 0,1,1+4=5,1+4+2=7,1+4+2+3=10. You can create this from left to right in linear time, and the value of any contiguous subarray is one partial sum subtracted from another, so 4+2+3 = 1+4+2+3 - 1= 9.
Then scan through the partial sums from left to right, keeping track of the smallest value seen so far (including the initial zero). At each point subtract this from the current value and keep track of the highest value produced in this way. This should give you the value of the contiguous sub-array with largest sum, and you can keep index information, too, to find where this sub-array starts and ends.
To find the minimum, either change the above slightly or just reverse the sign of all the numbers and do exactly the same thing again: min(a, b) = -max(-a, -b)
I think the question you are asking is to find the Maximum of a subarry.
bleow is the code that cand do that in O(n) time.
int maxSumSubArr(vector<int> a)
{
int maxsum = *max_element(a.begin(), a.end());
if(maxsum < 0) return maxsum;
int sum = 0;
for(int i = 0; i< a.size; i++)
{
sum += a[i];
if(sum > maxsum)maxsum = sum;
if(sum < 0) sum = 0;
}
return maxsum;
}
Note: This code is not tested please add comments if found some issues.

Resources