Quicksort with 3-way partition - algorithm

What is QuickSort with a 3-way partition?

Picture an array:
3, 5, 2, 7, 6, 4, 2, 8, 8, 9, 0
A two partition Quick Sort would pick a value, say 4, and put every element greater than 4 on one side of the array and every element less than 4 on the other side. Like so:
3, 2, 0, 2, 4, | 8, 7, 8, 9, 6, 5
A three partition Quick Sort would pick two values to partition on and split the array up that way. Lets choose 4 and 7:
3, 2, 0, 2, | 4, 6, 5, 7, | 8, 8, 9
It is just a slight variation on the regular quick sort.
You continue partitioning each partition until the array is sorted.
The runtime is technically nlog3(n) which varies ever so slightly from regular quicksort's nlog2(n).

http://www.sorting-algorithms.com/static/QuicksortIsOptimal.pdf
See also:
http://www.sorting-algorithms.com/quick-sort-3-way
I thought the interview question version was also interesting. It asks, are there four partition versions of quicksort...

if you really grind out the math using Akra-Bazzi formula leaving the number of partitions as a parameter, and then optimize over that parameter, you'll find that e ( =2.718...) partitions gives the fastest performance. in practice, however, our language constructs, cpus, etc are all optimized for binary operations so the standard partitioning to two sets will be fastest.

I think the 3-way partition is by Djstrka.
Think about an array with elements { 3, 9, 4, 1, 2, 3, 15, 17, 25, 17 }.
Basically you set up 3 partitions: less than, equals to, and greater than a certain pivot. The equal-to partition doesn't need further sorting because all its elements are already equal.
For example, if we pick the first 3 as the pivot, then a 3-way partition using Dijkstra would arrange the original array and return two indices m1 and m2 such that all elements whose index is less than m1 will be lower than 3, all elements whose index is greater than or equal to m1 and less than or equal to m2 will be equal to 3, and all elements whose index is greater than m2 will be bigger than 3.
In this particular case, the resulting array could be { 1, 2, 3, 3, 9, 4, 15, 17, 25, 17 }, and the values m1 and m2 would be m1 = 2 and m2 = 3.
Notice that the resulting array could change depending on the strategy used to partition, but the numbers m1 and m2 would be the same.

I think it is related to the Dijkstra way of partitioning where the partition is of elemnts smaller, equal, and larger than the pivot. Only the smaller and larger partitions have to be sorted recursively. You can see an interactive visualization and play with it at the walnut. The colors I used there are red/white/blue because the method of partitioning is usually called "the dutch flag problem"

3 way quick sort basically partitions the array in 3 parts. First part is lesser than the pivot , Second part is equal to pivot and third part is greater than pivot.It is linear-time partition algorithm.
This partition is similar to Dutch National Flag problem.

//code to implement Dijkstra 3-way partitioning
package Sorting;
public class QuickSortUsing3WayPartitioning {
private int[]original;
private int length;
private int lt;
private int gt;
public QuickSortUsing3WayPartitioning(int len){
length = len;
//original = new int[length];
original = {0,7,8,1,8,9,3,8,8,8,0,7,8,1,8,9,3,8,8,8};
}
public void swap(int a, int b){ //here indexes are passed
int temp = original[a];
original[a] = original[b];
original[b] = temp;
}
public int random(int start,int end){
return (start + (int)(Math.random()*(end-start+1)));
}
public void partition(int pivot, int start, int end){
swap(pivot,start); // swapping pivot and starting element in that subarray
int pivot_value = original[start];
lt = start;
gt = end;
int i = start;
while(i <= gt) {
if(original[i] < pivot_value) {
swap(lt, i);
lt++;
i++;
}
if(original[i] > pivot_value) {
swap(gt, i);
gt--;
}
if(original[i] == pivot_value)
i++;
}
}
public void Sort(int start, int end){
if(start < end) {
int pivot = random(start,end); // choose the index for pivot randomly
partition(pivot, start, end); // about index the array is partitioned
Sort(start, lt-1);
Sort(gt+1, end);
}
}
public void Sort(){
Sort(0,length-1);
}
public void disp(){
for(int i=0; i<length;++i){
System.out.print(original[i]+" ");
}
System.out.println();
}
public static void main(String[] args) {
QuickSortUsing3WayPartitioning qs = new QuickSortUsing3WayPartitioning(20);
qs.disp();
qs.Sort();
qs.disp();
}
}

Related

Interview Question - Which numbers shows up most times in a list of intervals

I only heard of this question, so I don't know the exact limits. You are given a list of positive integers. Each two consecutive values form a closed interval. Find the number that appears in most intervals. If two values appear the same amount of times, select the smallest one.
Example: [4, 1, 6, 5] results in [1, 4], [1, 6], [5, 6] with 1, 2, 3, 4, 5 each showing up twice. The correct answer would be 1 since it's the smallest.
I unfortunately have no idea how this can be done without going for an O(n^2) approach. The only optimisation I could think of was merging consecutive descending or ascending intervals, but this doesn't really work since [4, 3, 2] would count 3 twice.
Edit: Someone commented (but then deleted) a solution with this link http://www.zrzahid.com/maximum-number-of-overlapping-intervals/. I find this one the most elegant, even though it doesn't take into account the fact that some elements in my input would be both the beginning and end of some intervals.
Sort intervals based on their starting value. Then run a swipe line from left (the global smallest value) to the right (the global maximum value) value. At each meeting point (start or end of an interval) count the number of intersection with the swipe line (in O(log(n))). Time complexity of this algorithm would be O(n log(n)) (n is the number of intervals).
The major observation is that the result will be one of the numbers in the input (proof left to the reader as simple exercise, yada yada).
My solution will be inspired by #Prune's solution. The important step is mapping the input numbers to their order within all different numbers in the input.
I will work with C++ std. We can first load all the numbers into a set. We can then create map from that, which maps a number to its order within all numbers.
int solve(input) {
set<int> vals;
for (int n : input) {
vals.insert(n);
}
map<int, int> numberOrder;
int order = 0;
for (int n : vals) { // values in a set are ordered
numberOrder[n] = order++;
}
We then create process array (similar to #Prune's solution).
int process[map.size() + 1]; // adding past-the-end element
int curr = input[0];
for (int i = 0; i < input.size(); ++i) {
last = curr;
curr = input[i];
process[numberOrder[min(last, curr)]]++;
process[numberOrder[max(last, curr)] + 1]--;
}
int appear = 0;
int maxAppear = 0;
for (int i = 0; i < process.size(); ++i) {
appear += process[i];
if (appear > maxAppear) {
maxAppear = appear;
maxOrder = i;
}
}
Last, we need to find our found value in the map.
for (pair<int, int> a : numberOrder) {
if (a.second == maxOrder) {
return a.first;
}
}
}
This solution has O(n * log(n)) time complexity and O(n) space complexity, which is independent on maximum input number size (unlike other solutions).
If the maximum number in the range array is less than the maximum size limit of an array, my solution will work with complexity o(n).
1- I created a new array to process ranges and use it to find the
numbers that appears most in all intervals. For simplicity let's use
your example. the input = [1, 4], [1, 6], [5, 6]. let's call the new
array process and give it length 6 and it is initialized with 0s
process = [0,0,0,0,0,0].
2-Then loop through all the intervals and mark the start with (+1) and
the cell immediately after my range end with (-1)
for range [1,4] process = [1,0,0,0,-1,0]
for range [1,6] process = [2,0,0,0,-1,0]
for range [5,6] process = [2,0,0,0,0,0]
3- The p rocess array will work as accumulative array. initialize a
variable let's call it appear = process[0] which will be equal to 2
in our case. Go through process and keep accumulating what can you
notice? elements 1,2,3,4,5,6 will have appear =2 because each of
them appeared twice in the given ranges .
4- Maximize while you loop through process array you will find the
solution
public class Test {
public static void main(String[] args) {
int[] arr = new int[] { 4, 1, 6, 5 };
System.out.println(solve(arr));
}
public static int solve(int[] range) {
// I assume that the max number is Integer.MAX_VALUE
int size = (int) 1e8;
int[] process = new int[size];
// fill process array
for (int i = 0; i < range.length - 1; ++i) {
int start = Math.min(range[i], range[i + 1]);
int end = Math.max(range[i], range[i + 1]);
process[start]++;
if (end + 1 < size)
process[end + 1]--;
}
// Find the number that appears in most intervals (smallest one)
int appear = process[0];
int max = appear;
int solu = 0;
for (int i = 1; i < size; ++i) {
appear += process[i];
if (appear > max){
solu = i;
max = appear;
}
}
return solu;
}
}
Think of these as parentheses: ( to start and interval, ) to end. Now check the bounds for each pair [a, b], and tally interval start/end markers for each position: the lower number gets an interval start to the left; the larger number gets a close interval to the right. For the given input:
Process [4, 1]
result: [0, 1, 0, 0, 0, -1]
Process [1, 6]
result: [0, 2, 0, 0, 0, -1, 0, -1]
Process [6, 5]
result: [0, 2, 0, 0, 0, -1, 1, -2]
Now, merely make a cumulative sum of this list; the position of the largest value is your desired answer.
result: [0, 2, 0, 0, 0, -1, 1, -2]
cumsum: [0, 2, 2, 2, 2, 1, 2, 0]
Note that the final sum must be 0, and can never be negative. The largest value is 2, which appears first at position 1. Thus, 1 is the lowest integer that appears the maximum (2) quantity.
No that's one pass on the input, and one pass on the range of numbers. Note that with a simple table of values, you can save storage. The processing table would look something like:
[(1, 2)
(4, -1)
(5, 1)
(6, -2)]
If you have input with intervals both starting and stopping at a number, then you need to handle the starts first. For instance, [4, 3, 2] would look like
[(2, 1)
(3, 1)
(3, -1)
(4, -1)]
NOTE: maintaining a sorted insert list is O(n^2) time on the size of the input; sorting the list afterward is O(n log n). Either is O(n) space.
My first suggestion, indexing on the number itself, is O(n) time, but O(r) space on the range of input values.
[

Maximum sum from a 2D array-DP

Given a 2D array with weights, find the maximum sum of the 2D array with the condition that we can select only one element from a row and the element under the selected element cannot be selected(this condition should hold true for all elements which are selected). Also we can see that sum will contain elements equal to the number of rows.
If arr[i][j] is any selected element then I cannot select arr[i+1][j]. Also from each row only one element can be selected. Example if arr[i][1] is selected arr[i] [2 or 3 or..] cannot be selected
Edit- I tried solving it using DP.
Took a 2D array DP where
DP[i][j]= max(arr[i+1][k] for k=1 to n and k!=j)+ arr[i][j]
Then did this to build the DP matrix and finally looped to calculate the maximum.
But I think complexity is very high when I approach like this. Please help!
Input Matrix-
1 2 3 4
5 6 7 8
9 1 4 2
6 3 5 7
Output-
27
class Solution {
private static int maximumSum(int[][] mat){
int rows = mat.length;
int cols = mat[0].length;
int[] ans = new int[cols];
int[] index = new int[cols];
int max_val = 0;
for(int i=0;i<cols;++i){
ans[i] = mat[0][i];
index[i] = i;
max_val = Math.max(max_val,ans[i]); // needed for 1 row input
}
for(int i=1;i<rows;++i){
int[] temp = new int[cols];
for(int j=0;j<cols;++j){
temp[j] = ans[j];
int max_row_index = -1;
for(int k=0;k<cols;++k){
if(k == index[j]) continue;
if(max_row_index == -1 || mat[i][k] > mat[i][max_row_index]){
max_row_index = k;
}
}
temp[j] += mat[i][max_row_index];
index[j] = max_row_index;
max_val = Math.max(max_val,temp[j]);
}
ans = temp;
}
return max_val;
}
public static void main(String[] args) {
int[][] arr = {
{1,2,3,4},
{5,6,7,8},
{9,1,4,2},
{6,3,5,7}
};
System.out.println(maximumSum(arr));
}
}
Output:
27
Algorithm:
Let's adapt a top-down approach here. We go from start to end rows maintaining the answers in our ans array.
Let's workout through your example.
Case:
{1,2,3,4},
{5,6,7,8},
{9,1,4,2},
{6,3,5,7}
For first row, ans is as is [1,2,3,4].
For second row, we loop through [5,6,7,8] for each 1,2,3,4 skipping underneath columns for each index. For example, for 1, we skip 5 underneath and take max among all columns and add it to 1. Same goes for other elements.
So, now ans array looks like [9, 10, 11, 11].
Now, we workout for [9, 10, 11, 11] with next row [9,1,4,2] and so on. For this, we get [13, 19, 20, 20] and for this with last row [6,3,5,7], we get [20, 26, 27, 26] where 27 is the highest value and the final answer.
Time Complexity: O(n3), Space complexity: O(m) where m is the number of columns.
Update #1:
You can reduce the complexity from O(n3) to O(n2) by maintaining 2 max indexes for each row. This would always work since even if index of 1 max is same as the current index j of temp[j], the other max index would always provide the maximum value. Thanks to #MBo for this suggestion. This I leave as an exercise to the reader.
Update #2:
We also need to maintain the indexes of which element was picked in the last row.
This is necessary to remember since we can judge the path accurately for the current row.

Maximum subsets of intervals that does not exceed coverage limit?

Here's one coding question I'm confused about.
Given a 2-D array [[1, 9], [2, 8], [2, 5], [3, 4], [6, 7], [6, 8]], each inner array represents an interval; and if we pile up these intervals, we'll see:
1 2 3 4 5 6 7 8 9
2 3 4 5 6 7 8
2 3 4 5
3 4
6 7
6 7 8
Now there's a limit that the coverage should be <= 3 for each position; and obviously we could see for position 3, 4, 6, 7, the coverage is 4.
Then question is: maximally how many subsets of intervals can be chosen so that each interval could fit the <=3 limit? It's quite clear that for this case, we simply remove the longest interval [1, 9], so maximal subset number is 6 - 1 = 5.
What algorithm should I apply to such question? I guess it's variant question to interval scheduling?
Thanks
I hope I have understood the question right. This is the solution I could able to get with C#:
//test
int[][] grid = { new int[]{ 1, 9 }, new int[] { 2, 8 }, new int[] { 2, 5 }, new int[] { 3, 4 }, new int[] { 6, 7 }, new int[] { 6, 8 } };
SubsetFinder sf = new SubsetFinder(grid);
int t1 = sf.GetNumberOfIntervals(1);//6
int t2 = sf.GetNumberOfIntervals(2);//5
int t3 = sf.GetNumberOfIntervals(3);//5
int t4 = sf.GetNumberOfIntervals(4);//2
int t5 = sf.GetNumberOfIntervals(5);//0
class SubsetFinder
{
Dictionary<int, List<int>> dic;
int intervalCount;
public SubsetFinder(int[][] grid)
{
init(grid);
}
private void init(int[][] grid)
{
this.dic = new Dictionary<int, List<int>>();
this.intervalCount = grid.Length;
for (int r = 0; r < grid.Length; r++)
{
int[] row = grid[r];
if (row.Length != 2) throw new Exception("not grid");
int start = row[0];
int end = row[1];
if (end < start) throw new Exception("bad interval");
for (int i = start; i <= end; i++)
if (!dic.ContainsKey(i))
dic.Add(i, new List<int>(new int[] { r }));
else
dic[i].Add(r);
}
}
public int GetNumberOfIntervals(int coverageLimit)
{
HashSet<int> hsExclude = new HashSet<int>();
foreach (int key in dic.Keys)
{
List<int> lst = dic[key];
if (lst.Count < coverageLimit)
foreach (int i in lst)
hsExclude.Add(i);
}
return intervalCount - hsExclude.Count;
}
}
I think you can solve this problem using a sweep algorithm. Here's my approach:
The general idea is that instead of finding out the maximum number of intervals you can choose and still fit the limit, we will find the minimum number of intervals that must be deleted in order to make all the numbers fit the limit. Here's how we can do that:
First create a vector of triples, the first part is an integer, the second is a boolean, while the third part is an integer. The first part represents all the numbers from the input (both the start and end of intervals), the second part tells us whether the first part is the start or the end of an interval, while the third part represents the id of the interval.
Sort the created vector based on the first part, in case of a tie, the start should come before the end of some intervals.
In the example you provided the vector will be:
1,0 , 2,0 , 2,0 , 2,0 , 3,0 , 4,1 , 5,1 , 6.0 , 6.0 , 7,1 , 8,1 , 8,1 , 9,1
Now, iterate over the vector, while keeping a set of integers, which represents the intervals that are currently taken. The numbers inside the set represent the ends of the currently taken intervals. This set should be kept sorted in the increasing order.
While iterating over the vector, we might encounter one of the following 2 possibilities:
We are currently handling the start of an interval. In this case we simply add the end of this interval (which is identified by the third part id) to the set. If the size of the set is more than the limit, we must surely delete exactly one interval, but which interval is the best for deleting? Of course it's the interval with the biggest end because deleting this interval will not only help you reduce the number of taken intervals to fit the limit, but it will also be most helpful in the future since it lasts the most. Simply delete this interval from the set (the corresponding end will be last in the set, since the set is sorted in increasing order of the end)
We are currently handling the end of an interval, in this case check out the set. If it contains the specified end, just delete it, because the corresponding interval has come to its end. If the set doesn't contain an end that matches the one we are handling, simply just continue iterating to the next element, because this means we have already decided not to take the corresponding interval.
If you need to count the number of taken intervals, or even print them, it can be done easily. Whenever you handle the end of an interval, and you actually find this end at the set, this means that the corresponding interval is a taken one, and you may increment your answer by one, print it or keep it in some vector representing your answer.
The total complexity of my approach is : N Log(N), where N is the number of intervals given in the input.

Is it possible to find the largest drop between two numbers in an array in less than O(n²) complexity?

i have an array full of numbers.
i need to find the maximum different between 2 numbers but the biggest number is before the smallest number in the array.
public static int maximalDrop (int [] a)
For example:
for the array 5, 21, 3, 27, 12, 24, 7, 6, 4 the result will be 23 (27 - 4)
for the array 5, 21, 3, 22, 12, 7, 26, 14 the result will be 18 (21 - 3)
My solution is take the first element in the array (this number will be the big) and check the the difference between this number and all other numbers in the array and after that do the same thing but with the next number in the array and of course compare the difference and return the biggest one.
i thing that my solution is O(n²) so can i do that in less ?
Unless I misunderstand the question I believe you can do this in one pass of the array. You just need to keep track of the maximum value and maximum difference you have seen so far. As you go through the array calc the difference between the current number and the maximum seen so far.
So for your second example 5, 21, 3, 22, 12, 7, 26, 14
1: 5 is first value so set maximum to 5
2: 21 > 5 so reset maximum
3: 21 - 3 = 18
4: 22 > 21 so reset maximum
5: 22 - 12 = 10
6: 22 - 7 = 15
7: 26 > 22 so reset maximum
8: 26 - 14 = 12
As the smaller number comes after the larger when you find a new maximum any smaller number beyond it in the array needs to be subtracted from this new maximum.
The answer required is the maximum value seen during this process - in this case the 18 that is calulated in step 3.
Try this:
public static int maximalDrop (int[]a)
{
int max= a[0];
int dif= 0;
for (int i=0; i<a.length; i++)
{
if(a[i]>max){
max=a[i];
if (dif<max-a[i+1])
{
dif=max-a[i+1];
}
}
}
return dif;
}
Well, I'm not sure whether my understanding about this question is correct or not. However, I think you only need to keep track of the largest value you have already visited so far and the drop value.
Consider this, if the largest drop made by a-b; and there is another value c before b which is larger than a, then c-b definitely larger than a-b, then the largest drop should be c-b.
While, even though there will be a larger number replacing the max value later on, it won't change the drop value unless it can make a larger drop.
This code maybe work, it's in java:
So the time cost is O(n).
If I misunderstood some concepts, please let me know.
public int findDrop(int[] ar){
int max = ar[0];
int drop = 0;
for(int i=1;i<ar.length;i++){
if(ar[i] > max){
max = ar[i];
}
else
{
if(max - ar[i] > drop){
drop = max - ar[i];
}
}
}
return drop;
}
O(N) solution
public static int findMaxDrop(int[] arr){
int maxSoFar=0;
int currDrop=0;
int maxDrop=0;
for(int i=0;i<=arr.length-1;i++){
if(arr[i] > maxSoFar){
maxSoFar=arr[i];
}else{
currDrop = maxSoFar-arr[i];
maxDrop=Math.max(currDrop, maxDrop);
}
}
return maxDrop;
}
You should only need a minor tweak to merge sort to do this in O(n log n)!
can be done in O(n) :
merge sort the list
get the minimum and the maximum items and calculate the diff between them.

Find duplicates in array when there is more than one duplicated element

How can I find duplicates in array when there is more than one duplicated element?
When the array is only one duplicated element (for example: 1, 2, 3, 4, 4, 4, 5, 6, 7) then it is very easy:
int duplicate(int* a, int s)
{
int x = a[0];
for(int i = 1; i < s; ++i)
{
x = x ^ a[i];
}
for(int i = 0; i < a[s]; ++i)
{
x = x ^ i;
}
return x;
}
But if the input array contains more than one duplicated element (for example: 1, 2, 2, 2, 3, 4, 4, 4, 5, 6, 7), the above won't work. How can we solve this problem in O(n) time?
If space is no concern or the maximal number is quite low, you can simple use a kind of a bit-array and mark all already occurred numbers by setting the bit at the position of the number.
It'a a kind of HashSet with trivial (identity) hash-function.
Tests and set cost O(1) time.
Using a set is one of the possible generic solutions. Example in c++:
template <typename T>
void filter_duplicates(T* arr, int length) {
std::unordered_set<T> set;
for (int i = 0; i < length; ++i) {
if (set.count(arr[i]) > 0) {
// then it's a duplicate
}
set.insert(arr[i]);
}
// the set contains all the items, unduplicated
}
As unordered_set is implemented as a hash table, insertion and lookup are of amortized constant complexity. As a set can only contain unique keys, this effectively de-duplicates the items. We could finally convert back the set to an array. We could also use a map to count the occurrences.
If array elements are integers and that the maximum possible value is known, and fairly low, then the set can be replaced by a simple array either 1. of boolean or 2. of integer if we want to count the number of occurrences.

Resources