Lexographically smallest path in a N*M grid - algorithm

I came across this in a recent interview.
We are given a N*M grid consisting of numbers and a path in the grid is the nodes you traverse.We are given a constraint that we can only move either right or down in the grid.So given this grid, we need to find the lexographically smallest path,after sorting it, to reach from top left to bottom right point of the grid
Eg. if grid is 2*2
4 3
5 1
then lexographically smallest path as per the question is "1 3 4".
How to do such problem? Code is appreciated. Thanks in advance.

You can use Dynamic programming to solve this problem. Let f(i, j) be the smallest lexicographical path (after sorting the path) from (i, j) to (N, M) moving only right and down. Consider the following recurrence:
f(i, j) = sort( a(i, j) + smallest(f(i + 1, j), f(i, j + 1)))
where a(i, j) is the value in the grid at (i, j), smallest (x, y) returns the smaller lexicographical string between x and y. the + concatenate two strings, and sort(str) sorts the string str in lexical order.
The base case of the recurrence is:
f(N, M) = a(N, M)
Also the recurrence change when i = N or j = M (make sure that you see that).
Consider the following code written in C++:
//-- the 200 is just the array size. It can be modified
string a[200][200]; //-- represent the input grid
string f[200][200]; //-- represent the array used for memoization
bool calculated[200][200]; //-- false if we have not calculate the value before, and true if we have
int N = 199, M = 199; //-- Number of rows, Number of columns
//-- sort the string str and return it
string srt(string &str){
sort(str.begin(), str.end());
return str;
}
//-- return the smallest of x and y
string smallest(string & x, string &y){
for (int i = 0; i < x.size(); i++){
if (x[i] < y[i]) return x;
if (x[i] > y[i]) return y;
}
return x;
}
string solve(int i, int j){
if (i == N && j == M) return a[i][j]; //-- if we have reached the buttom right cell (I assumed the array is 1-indexed
if (calculated[i][j]) return f[i][j]; //-- if we have calculated this before
string ans;
if (i == N) ans = srt(a[i][j] + solve(i, j + 1)); //-- if we are at the buttom boundary
else if (j == M) ans = srt(a[i][j] + solve(i + 1, j)); //-- if we are at the right boundary
else ans = srt(a[i][j] + smallest(solve(i, j + 1), solve(i + 1, j)));
calculated[i][j] = true; //-- to fetch the calculated result in future calls
f[i][j] = ans;
return ans;
}
string calculateSmallestPath(){
return solve(1, 1);
}

You can apply a dynamic programming approach to solve this problem in O(N * M * (N + M)) time and space complexity.
Below I'll consider, that N is the number of rows, M is the number of columns, and top left cell has coordinates (0, 0), first for row and second for column.
Lets for each cell store the lexicographically smallest path ended at this cell in sorted order. The answer for row and column with 0 index is trivial, because there is only one way to reach each of these cells. For the rest of cells you should choose the smallest path for top and left cells and insert the value of current cell.
The algorithm is:
path[0][0] <- a[0][0]
path[i][0] <- insert(a[i][0], path[i - 1][0])
path[0][j] <- insert(a[0][j], path[0][j - 1])
path[i][j] <- insert(a[i][j], min(path[i - 1][j], path[i][j - 1])

If no number is repeated, this can be achieved in O (NM log (NM)) as well.
Intuition:
Suppose I label a grid with upper left corner (a,b) and bottom right corner (c,d) as G(a,b,c,d). Since you've to attain the lexicographically smallest string AFTER sorting the path, the aim should be to find the minimum value every time in G. If this minimum value is attained at, let's say, (i,j), then G(i,b,c,j) and G(a,j,i,d) are rendered useless for the search of our next min (for the path). That is to say, the values for the path we desire would never be in these two grids. Proof? Any location within these grids, if traversed will not let us reach the minimum value in G(a,b,c,d) (the one at (i,j)). And, if we avoid (i,j), the path we build cannot be lexicographically smallest.
So, first we find the min for G(1,1,m,n). Suppose it's at (i,j). Mark the min. We then find out the min in G(1,1,i,j) and G(i,j,m,n) and do the same for them. Keep continuing this way until, at the end, we have m+n-1 marked entries, which will constitute our path. Traverse the original grid G(1,1,m,n) linearly and the report the value if it is marked.
Approach:
To find the min every time in G is costly. What if we map each value in the grid to it's location? - Traverse the grid and maintain a dictionary Dict with the key being the value at (i,j) and the value being the tuple (i,j). At the end, you'll have a list of key value pairs covering all the values in the grid.
Now, we'll be maintaining a list of valid grids in which we will find candidates for our path. The first valid grid will be G(1,1,m,n).
Sort the keys and start iterating from the first value in the sorted key set S.
Maintain a tree of valid grids, T(G), such that for each G(a,b,c,d) in T, G.left = G(a,b,i,j) and G.right = G(i,j,c,d) where (i,j) = location of min val in G(a,b,c,d)
The algorithm now:
for each val in sorted key set S do
(i,j) <- Dict(val)
Grid G <- Root(T)
do while (i,j) in G
if G has no child do
G.left <- G(a,b,i,j)
G.right <- G(i,j,c,d)
else if (i,j) in G.left
G <- G.left
else if (i,j) in G.right
G <- G.right
else
dict(val) <- null
end do
end if-else
end do
end for
for each val in G(1,1,m,n)
if dict(val) not null
solution.append(val)
end if
end for
return solution
The Java code:
class Grid{
int a, b, c, d;
Grid left, right;
Grid(int a, int b, int c, int d){
this.a = a;
this.b = b;
this.c = c;
this.d = d;
left = right = null;
}
public boolean isInGrid(int e, int f){
return (e >= a && e <= c && f >= b && f <= d);
}
public boolean hasNoChild(){
return (left == null && right == null);
}
}
public static int[] findPath(int[][] arr){
int row = arr.length;
int col = arr[0].length;
int[][] index = new int[row*col+1][2];
HashMap<Integer,Point> map = new HashMap<Integer,Point>();
for(int i = 0; i < row; i++){
for(int j = 0; j < col; j++){
map.put(arr[i][j], new Point(i,j));
}
}
Grid root = new Grid(0,0,row-1,col-1);
SortedSet<Integer> keys = new TreeSet<Integer>(map.keySet());
for(Integer entry : keys){
Grid temp = root;
int x = map.get(entry).x, y = map.get(entry).y;
while(temp.isInGrid(x, y)){
if(temp.hasNoChild()){
temp.left = new Grid(temp.a,temp.b,x, y);
temp.right = new Grid(x, y,temp.c,temp.d);
break;
}
if(temp.left.isInGrid(x, y)){
temp = temp.left;
}
else if(temp.right.isInGrid(x, y)){
temp = temp.right;
}
else{
map.get(entry).x = -1;
break;
}
}
}
int[] solution = new int[row+col-1];
int count = 0;
for(int i = 0 ; i < row; i++){
for(int j = 0; j < col; j++){
if(map.get(arr[i][j]).x >= 0){
solution[count++] = arr[i][j];
}
}
}
return solution;
}
The space complexity is constituted by maintenance of dictionary - O(NM) and of the tree - O(N+M). Overall: O(NM)
The time complexity for filling up and then sorting the dictionary - O(NM log(NM)); for checking the tree for each of the NM values - O(NM log(N+M)). Overall - O(NM log(NM)).
Of course, this won't work if values are repeated since then we'd have more than one (i,j)'s for a single value in the grid and the decision to chose which will no longer be satisfied by a greedy approach.
Additional FYI: The problem similar to this I heard about earlier had an additional grid property - there are no values repeating and the numbers are from 1 to NM. In such a case, the complexity could further reduce to O(NM log(N+M)) since instead of a dictionary, you can simply use values in the grid as indices of an array (which won't required sorting.)

Related

Manhattan tourist

In my algorithms and datastructures class I have been asked to implement the Manhattan tourist problem using dynamic programming.
I have come to a solution using a combination of dynamic programming and recursive calls, but I seem to get "Time limit exceeded" when putting it to the test on CodeJudge. I haven't been able to figure out why my code isn't fast enough. Any takers?
Best regards.
Description of the problem:
Your are helping the tourist guide company "Manhattan Tourists", that are arranging
guided tours of the city. They want to find a walk between two points on the map that is both interesting and short. The map is a square grid graph. The square grid graph has n rows with n nodes in each row. Let node vi,j denote the jth node on row i. For 1≤I<n and for 1≤j≤n node vi,j is connected to vi+1, j. And for 1≤i≤n and for 1 ≤ j < n node vi,j is connected to vi,j+1. The edges have non-negative edge weights that indicate how interesting that street is. See the graph below for an example of a 5 × 5 grid graph.
They want to find a short interesting walk from the upper left corner (s = v1,1) to the lower right corner (t = vn,n). More precisely, they want to find a path with the possible smallest number of edges, and among all paths with this number of edges they want the path with the maximum weight (the weight of a path is the sum of weights on the path).
All shortest paths have 2n − 2 edges and go from s to t by walking either down or right in each step. In the example below two possible shortest paths (of length 8) are indicated. The dashed path has weigth 38 and the dotted path has weight 30.
Let W [i, j] be the maximal weight you can get when walking from s to vi, j walking either down or right in each step. Let D[i, j] be the weight of the edge going down from vi, j and let R[i, j] be the weight of the edge going right from vi,j.
Description on CodeJudge:
Exercise
Before you can solve this exercise, you must first read, understand and (partly) solve the problem Manhattan Tourists described on the weekplan.
Your task here is to implement your solution. Read the input/output specification below and look at the sample test data in order to learn how to read the input and write the output.
Input format
Line 1: The integer n (1<= n <= 1000).
Line 2..n+1: the n rows of R, each consisting of n-1 integers separated by space.
Line n+2..2n: the n-1 rows of D, each consisting of n integers separated by space.
Output format:
Line 1: The maximum interest score of a shortest walk.
Heres my code so far:
public static void main(String[] args) {
Scanner console = new Scanner(System.in);
int n = console.nextInt();
int[][] R = new int[n][n-1];
int[][] D = new int[n-1][n];
for(int i = 0; i < n; i++) {
for(int j = 0; j < n-1; j++) {
R[i][j] = console.nextInt();
}
}
for(int i = 0; i < n-1; i++) {
for(int j = 0; j < n; j++) {
D[i][j] = console.nextInt();
}
}
System.out.println(opt(R, D, n, n-1, n-1));
}
public static int opt(int[][]R, int[][]D, int n, int i, int j) {
int[][] result = new int[n][n];
if(i==0 && j==0) {
if(result[i][j] == 0) {
result[i][j] = 0;
}
return result[i][j];
} else if(i == 0) {
if(result[i][j] == 0) {
result[i][j] = opt(R,D,n,i,j-1) + R[i][j-1];
}
return result[i][j];
}else if(j == 0) {
if(result[i][j] == 0) {
result[i][j] = opt(R,D,n,i-1,j) + D[i-1][j];
}
return result[i][j];
}else if(result[i][j] == 0) {
result[i][j] = max(opt(R, D, n, i, j-1) + R[i][j-1],opt(R, D, n, i-1, j) + D[i-1][j]);
}
return result[i][j];
}
public static int max(int i, int j) {
if(i > j) {
return i;
}
return j;
}
}
Why a recursion?
The topmost row can be traversed horizontally only. So, for each vertex in the first row the total weight is a sum of weights of branches to the left. You can compute all of them in a single loop as a running total across the row.
For each next row the total weight of the first vertex is a weight of the vertex above it plus the weight of the branch between them. And the total weight of each next vertex in the row is the bigger one from two possible when coming from above or from left.
All that can be computed iteratively with two nested loops.

SUM exactly using K elements solution

Problem: On a given array with N numbers, find subset of size M (exactly M elements) that equal to SUM.
I am looking for a Dynamic Programming(DP) solution for this problem. Basically looking to understand the matrix filled approach. I wrote below program but didn't add memoization as i am still wondering how to do that.
#include <stdio.h>
#define SIZE(a) sizeof(a)/sizeof(a[0])
int binary[100];
int a[] = {1, 2, 5, 5, 100};
void show(int* p, int size) {
int j;
for (j = 0; j < size; j++)
if (p[j])
printf("%d\n", a[j]);
}
void subset_sum(int target, int i, int sum, int *a, int size, int K) {
if (sum == target && !K) {
show(binary, size);
} else if (sum < target && i < size) {
binary[i] = 1;
foo(target, i + 1, sum + a[i], a, size, K-1);
binary[i] = 0;
foo(target, i + 1, sum, a, size, K);
}
}
int main() {
int target = 10;
int K = 2;
subset_sum(target, 0, 0, a, SIZE(a), K);
}
Is the below recurrence solution makes sense?
Let DP[SUM][j][k] sum up to SUM with exactly K elements picked from 0 to j elements.
DP[i][j][k] = DP[i][j-1][k] || DP[i-a[j]][j-1][k-1] { input array a[0....j] }
Base cases are:
DP[0][0][0] = DP[0][j][0] = DP[0][0][k] = 1
DP[i][0][0] = DP[i][j][0] = 0
It means we can either consider this element ( DP[i-a[j]][j-1][k-1] ) or we don't consider the current element (DP[i][j-1][k]). If we consider current element, k is reduced by 1 which reduces the elements that needs to be considered and same goes when current element is not considered i.e. K is not reduced by 1.
Your solution looks right to me.
Right now, you're basically backtracking over all possibilities and printing each solution. If you only want one solution, you could add a flag that you set when one solution was found and check before continuing with recursive calls.
For memoization, you should first get rid of the binary array, after which you can do something like this:
int memo[NUM_ELEMENTS][MAX_SUM][MAX_K];
bool subset_sum(int target, int i, int sum, int *a, int size, int K) {
if (sum == target && !K) {
memo[i][sum][K] = true;
return memo[i][sum][K];
} else if (sum < target && i < size) {
if (memo[i][sum][K] != -1)
return memo[i][sum][K];
memo[i][sum][K] = foo(target, i + 1, sum + a[i], a, size, K-1) ||
foo(target, i + 1, sum, a, size, K);
return memo[i][sum][K]
}
return false;
}
Then, look at memo[_all indexes_][target][K]. If this is true, there exists at least one solution. You can store addition information to get you that next solution, or you can iterate with an i from found_index - 1 to 0 and check for which i you have memo[i][sum - a[i]][K - 1] == true. Then recurse on that, and so on. This will allow you to reconstruct the solution using just the memo array.
To my understanding, if only the feasibility of the input has to be checked, the problem can be solved with a two-dimensional state space
bool[][] IsFeasible = new bool[n][k]
where IsFeasible[i][j] is true if and only if there is a subset of the elements 1 to i which sum up to exactly j for every
1 <= i <= n
1 <= j <= k
and for this state space, the recurrence relation
IsFeasible[i][j] = IsFeasible[i-1][k-a[i]] || IsFeasible[i-1][k]
can be used, where the left-hand side of the or-operator || corresponds to selecting the i-th item and the right-hand side corresponds to to not selecting the i-th item. The actual choice of items could be obtained by backtracking or auxiliary information saved during evaluation.

How to find minimum positive contiguous sub sequence in O(n) time?

We have this algorithm for finding maximum positive sub sequence in given sequence in O(n) time. Can anybody suggest similar algorithm for finding minimum positive contiguous sub sequence.
For example
If given sequence is 1,2,3,4,5 answer should be 1.
[5,-4,3,5,4] ->1 is the minimum positive sum of elements [5,-4].
There cannot be such algorithm. The lower bound for this problem is O(n log n). I'll prove it by reducing the element distinctness problem to it (actually to the non-negative variant of it).
Let's suppose we have an O(n) algorithm for this problem (the minimum non-negative subarray).
We want to find out if an array (e.g. A=[1, 2, -3, 4, 2]) has only distinct elements. To solve this problem, I could construct an array with the difference between consecutive elements (e.g. A'=[1, -5, 7, -2]) and run the O(n) algorithm we have. The original array only has distinct elements if and only if the minimum non-negative subarray is greater than 0.
If we had an O(n) algorithm to your problem, we would have an O(n) algorithm to element distinctness problem, which we know is not possible on a Turing machine.
We can have a O(n log n) algorithm as follow:
Assuming that we have an array prefix, which index i stores the sum of array A from 0 to i, so the sum of sub-array (i, j) is prefix[j] - prefix[i - 1].
Thus, in order to find the minimum positive sub-array ending at index j, so, we need to find the maximum element prefix[x], which less than prefix[j] and x < j. We can find that element in O(log n) time if we use a binary search tree.
Pseudo code:
int[]prefix = new int[A.length];
prefix[0] = A[0];
for(int i = 1; i < A.length; i++)
prefix[i] = A[i] + prefix[i - 1];
int result = MAX_VALUE;
BinarySearchTree tree;
for(int i = 0; i < A.length; i++){
if(A[i] > 0)
result = min(result, A[i];
int v = tree.getMaximumElementLessThan(prefix[i]);
result = min(result, prefix[i] - v);
tree.add(prefix[i]);
}
I believe there's a O(n) algorithm, see below.
Note: it has a scale factor that might make it less attractive in practical applications: it depends on the (input) values to be processed, see remarks in the code.
private int GetMinimumPositiveContiguousSubsequenc(List<Int32> values)
{
// Note: this method has no precautions against integer over/underflow, which may occur
// if large (abs) values are present in the input-list.
// There must be at least 1 item.
if (values == null || values.Count == 0)
throw new ArgumentException("There must be at least one item provided to this method.");
// 1. Scan once to:
// a) Get the mimumum positive element;
// b) Get the value of the MAX contiguous sequence
// c) Get the value of the MIN contiguous sequence - allowing negative values: the mirror of the MAX contiguous sequence.
// d) Pinpoint the (index of the) first negative value.
int minPositive = 0;
int maxSequence = 0;
int currentMaxSequence = 0;
int minSequence = 0;
int currentMinSequence = 0;
int indxFirstNegative = -1;
for (int k = 0; k < values.Count; k++)
{
int value = values[k];
if (value > 0)
if (minPositive == 0 || value < minPositive)
minPositive = value;
else if (indxFirstNegative == -1 && value < 0)
indxFirstNegative = k;
currentMaxSequence += value;
if (currentMaxSequence <= 0)
currentMaxSequence = 0;
else if (currentMaxSequence > maxSequence)
maxSequence = currentMaxSequence;
currentMinSequence += value;
if (currentMinSequence >= 0)
currentMinSequence = 0;
else if (currentMinSequence < minSequence)
minSequence = currentMinSequence;
}
// 2. We're done if (a) there are no negatives, or (b) the minPositive (single) value is 1 (or 0...).
if (minSequence == 0 || minPositive <= 1)
return minPositive;
// 3. Real work to do.
// The strategy is as follows, iterating over the input values:
// a) Keep track of the cumulative value of ALL items - the sequence that starts with the very first item.
// b) Register each such cumulative value as "existing" in a bool array 'initialSequence' as we go along.
// We know already the max/min contiguous sequence values, so we can properly size that array in advance.
// Since negative sequence values occur we'll have an offset to match the index in that bool array
// with the corresponding value of the initial sequence.
// c) For each next input value to process scan the "initialSequence" bool array to see whether relevant entries are TRUE.
// We don't need to go over the complete array, as we're only interested in entries that would produce a subsequence with
// a value that is positive and also smaller than best-so-far.
// (As we go along, the range to check will normally shrink as we get better and better results.
// Also: initially the range is already limited by the single-minimum-positive value that we have found.)
// Performance-wise this approach (which is O(n)) is suitable IFF the number of input values is large (or at least: not small) relative to
// the spread between maxSequence and minSeqence: the latter two define the size of the array in which we will do (partial) linear traversals.
// If this condition is not met it may be more efficient to replace the bool array by a (binary) search tree.
// (which will result in O(n logn) performance).
// Since we know the relevant parameters at this point, we may below have the two strategies both implemented and decide run-time
// which to choose.
// The current implementation has only the fixed bool array approach.
// Initialize a variable to keep track of the best result 'so far'; it will also be the return value.
int minPositiveSequence = minPositive;
// The bool array to keep track of which (total) cumulative values (always with the sequence starting at element #0) have occurred so far,
// and the 'offset' - see remark 3b above.
int offset = -minSequence;
bool[] initialSequence = new bool[maxSequence + offset + 1];
int valueCumulative = 0;
for (int k = 0; k < indxFirstNegative; k++)
{
int value = values[k];
valueCumulative += value;
initialSequence[offset + valueCumulative] = true;
}
for (int k = indxFirstNegative; k < values.Count; k++)
{
int value = values[k];
valueCumulative += value;
initialSequence[offset + valueCumulative] = true;
// Check whether the difference with any previous "cumulative" may improve the optimum-so-far.
// the index that, if the entry is TRUE, would yield the best possible result.
int indexHigh = valueCumulative + offset - 1;
// the last (lowest) index that, if the entry is TRUE, would still yield an improvement over what we have so far.
int indexLow = Math.Max(0, valueCumulative + offset - minPositiveSequence + 1);
for (int indx = indexHigh; indx >= indexLow; indx--)
{
if (initialSequence[indx])
{
minPositiveSequence = valueCumulative - indx + offset;
if (minPositiveSequence == 1)
return minPositiveSequence;
break;
}
}
}
return minPositiveSequence;
}
}

Implementing quickselect

I'm trying to implement the quickselect algorithm. Though, I have understood the theory behind it very well; I'm finding it difficult to convert it into a well functioning program.
Here is how I'm going step by step to implement it and where I am facing problem:
Problem: Find the 4th smallest element in A[] = {2,1,3,7,5,4,6}
k = 4.
index:0|1|2|3|4|5|6
Corresponding values: 2|1|3|7|5|4|6
initially, l = 0 and r = 6
Step 1) Taking pivot as the leftmost element (pivot will always be the leftmost in this problem)-
pivot_index = 0
pivot_value = 2
Step 2) Applying the partition algo; putting the pivot at the right place ([<p][p][>p])-
We get the following array: 1|2|3|7|5|4|6
where, pivot_index = i-1 = 1
and therefore, pivot_value = 2
Step 3) Compare pivot_index with k-
k=3, pivot_index = 1; k>pivot_index
Hence, Our k-th smallest number lies in the right part of the array.
Right array = i to r and we do not bother with the left part (l to i-1) anymore.
Step 4) We modify the value of k as k - (pivot_index) => 4-1 = 2; k = 3.
Here is the problem: Should not the value of k be 2? Because we have two values on the left part of the array: 1|2? Should we calculate k as k - (pivot_index+1)?
Let's assume k = 3 is correct.
Step 5) "New" array to work on: 3|7|5|4|6 with corresponding indexes: 2|3|4|5|6
Now, pivot_index = 2 and pivot_index = 3
Step 6) Applying partition algo on the above array-
3|7|5|4|6 (array remains unchanged as pivot itself is the lowest value).
i = 3
pivot_index = i-1 = 2
pivot_value = 3
Step 7) Compare pivot_index with k
k=3 and pivot_index=2
k > pivot_index
and so on....
Is this approach correct?
Here is my code which is not working. I have used a random number generator to select a random pivot, the pivot is then swapped with the first element in the array.
#include<stdio.h>
#include<stdlib.h>
void print_array(int arr[], int array_length){
int i;
for(i=0; i<array_length; ++i) {
printf("%d ", arr[i]);
}
}
int random_no(min, max){
int diff = max-min;
return (int) (((double)(diff+1)/RAND_MAX) * rand() + min);
}
void swap(int *a, int *b){
int temp;
temp = *a;
*a = *b;
*b = temp;
}
int get_kth_small(int arr[], int k, int l, int r){
if((r-l) >= 1){
k = k + (l-1);
int pivot_index = random_no(l, r);
int i, j;
swap(&arr[pivot_index], &arr[l]); //Switch the pivot with the first element in the array. Now, the pivit is in arr[l]
i=l+1;
for(j=l+1; j<=r; ++j){
if(arr[j]<arr[l]){
swap(&arr[j], &arr[i]);
++i;
}
}
swap(&arr[l], &arr[i-1]); //Switch the pivot to the correct place; <p, p, >p
printf("value of i-1: %d\n", i-1);
printf("Value of k: %d\n", k);
if(k == (i-1)){
printf("Found: %d\n", arr[i]);
return 0;
}
if(k>(i-1)){
k=k-(i-1);
get_kth_small(arr, k, i, r);
} else {
get_kth_small(arr, k, l, r-1);
}
//get_kth_small(arr, k, i, r);
//get_kth_small(arr, k, l, i-1);
}
}
void main(){
srand(time(NULL));
int arr[] = {2,1,3,7,5,4,6};
int arr_size = sizeof(arr)/sizeof(arr[0]);
int k = 3, l = 0;
int r = arr_size - 1;
//printf("Enter the value of k: ");
//scanf("%d", &k);
get_kth_small(arr, k, l, r);
print_array(arr, arr_size);
printf("\n");
}
What you describe is a valid way to implement quick select. There are numerous other approaches how to select the pivot and most of them will give a better expected complexity but in essence the algorithm is the same.
"Step 2: putting the pivot at the right place": don't do that. In fact you can't put the pivot at the right place, as you don't know what it is. The partitioning rule is to put all elements smaller or equal than the pivot before those larger. Just leave the pivot where it is!
Quick select goes as follows: to find the Kth among N elements, 1) choose a pivot value, 2) move all elements smaller or equal to the pivot before the others, forming two zones of length Nle and Ngt, 3) recurse on the relevant zone with (K, Nle) or (K-Nle, Ngt), until N=1.
Actually, any value can be taken for the pivot, even one not present in the array; but the partition must be such that Nle and Ngt are nonzero.

3-PARTITION problem

here is another dynamic programming question (Vazirani ch6)
Consider the following 3-PARTITION
problem. Given integers a1...an, we
want to determine whether it is
possible to partition of {1...n} into
three disjoint subsets I, J, K such
that
sum(I) = sum(J) = sum(K) = 1/3*sum(ALL)
For example, for input (1; 2; 3; 4; 4;
5; 8) the answer is yes, because there
is the partition (1; 8), (4; 5), (2;
3; 4). On the other hand, for input
(2; 2; 3; 5) the answer is no. Devise
and analyze a dynamic programming
algorithm for 3-PARTITION that runs in
time poly- nomial in n and (Sum a_i)
How can I solve this problem? I know 2-partition but still can't solve it
It's easy to generalize 2-sets solution for 3-sets case.
In original version, you create array of boolean sums where sums[i] tells whether sum i can be reached with numbers from the set, or not. Then, once array is created, you just see if sums[TOTAL/2] is true or not.
Since you said you know old version already, I'll describe only difference between them.
In 3-partition case, you keep array of boolean sums, where sums[i][j] tells whether first set can have sum i and second - sum j. Then, once array is created, you just see if sums[TOTAL/3][TOTAL/3] is true or not.
If original complexity is O(TOTAL*n), here it's O(TOTAL^2*n).
It may not be polynomial in the strictest sense of the word, but then original version isn't strictly polynomial too :)
I think by reduction it goes like this:
Reducing 2-partition to 3-partition:
Let S be the original set, and A be its total sum, then let S'=union({A/2},S).
Hence, perform a 3-partition on the set S' yields three sets X, Y, Z.
Among X, Y, Z, one of them must be {A/2}, say it's set Z, then X and Y is a 2-partition.
The witnesses of 3-partition on S' is the witnesses of 2-partition on S, thus 2-partition reduces to 3-partition.
If this problem is to be solvable; then sum(ALL)/3 must be an integer. Any solution must have SUM(J) + SUM(K) = SUM(I) + sum(ALL)/3. This represents a solution to the 2-partition problem over concat(ALL, {sum(ALL)/3}).
You say you have a 2-partition implementation: use it to solve that problem. Then (at least) one of the two partitions will contain the number sum(ALL)/3 - remove the number from that partion, and you've found I. For the other partition, run 2-partition again, to split J from K; after all, J and K must be equal in sum themselves.
Edit: This solution is probably incorrect - the 2-partition of the concatenated set will have several solutions (at least one for each of I, J, K) - however, if there are other solutions, then the "other side" may not consist of the union of two of I, J, K, and may not be splittable at all. You'll need to actually think, I fear :-).
Try 2: Iterate over the multiset, maintaining the following map: R(i,j,k) :: Boolean which represents the fact whether up to the current iteration the numbers permit division into three multisets that have sums i, j, k. I.e., for any R(i,j,k) and next number n in the next state R' it holds that R'(i+n,j,k) and R'(i,j+n,k) and R'(i,j,k+n). Note that the complexity (as per the excersize) depends on the magnitude of the input numbers; this is a pseudo-polynomialtime algorithm. Nikita's solution is conceptually similar but more efficient than this solution since it doesn't track the third set's sum: that's unnecessary since you can trivially compute it.
As I have answered in same another question like this, the C++ implementation would look something like this:
int partition3(vector<int> &A)
{
int sum = accumulate(A.begin(), A.end(), 0);
if (sum % 3 != 0)
{
return false;
}
int size = A.size();
vector<vector<int>> dp(sum + 1, vector<int>(sum + 1, 0));
dp[0][0] = true;
// process the numbers one by one
for (int i = 0; i < size; i++)
{
for (int j = sum; j >= 0; --j)
{
for (int k = sum; k >= 0; --k)
{
if (dp[j][k])
{
dp[j + A[i]][k] = true;
dp[j][k + A[i]] = true;
}
}
}
}
return dp[sum / 3][sum / 3];
}
Let's say you want to partition the set $X = {x_1, ..., x_n}$ in $k$ partitions.
Create a $ n \times k $ table. Assume the cost $M[i,j]$ be the maximum sum of $i$ elements in $j$ partitions. Just recursively use the following optimality criterion to fill it:
M[n,k] = min_{i\leq n} max ( M[i, k-1], \sum_{j=i+1}^{n} x_i )
Using these initial values for the table:
M[i,1] = \sum_{j=1}^{i} x_i and M[1,j] = x_j
The running time is $O(kn^2)$ (polynomial )
Create a three dimensional array, where size is count of elements, and part is equal to to sum of all elements divided by 3. So each cell of array[seq][sum1][sum2] tells can you create sum1 and sum2 using max seq elements from given array A[] or not. So compute all values of array, result will be in cell array[using all elements][sum of all element / 3][sum of all elements / 3], if you can create two sets without crossing equal to sum/3, there will be third set.
Logic of checking: exlude A[seq] element to third sum(not stored), check cell without element if it has same two sums; OR include to sum1 - if it is possible to get two sets without seq element, where sum1 is smaller by value of element seq A[seq], and sum2 isn't changed; OR include to sum2 check like previous.
int partition3(vector<int> &A)
{
int part=0;
for (int a : A)
part += a;
if (part%3)
return 0;
int size = A.size()+1;
part = part/3+1;
bool array[size][part][part];
//sequence from 0 integers inside to all inside
for(int seq=0; seq<size; seq++)
for(int sum1=0; sum1<part; sum1++)
for(int sum2=0;sum2<part; sum2++) {
bool curRes;
if (seq==0)
if (sum1 == 0 && sum2 == 0)
curRes = true;
else
curRes= false;
else {
int curInSeq = seq-1;
bool excludeFrom = array[seq-1][sum1][sum2];
bool includeToSum1 = (sum1>=A[curInSeq]
&& array[seq-1][sum1-A[curInSeq]][sum2]);
bool includeToSum2 = (sum2>=A[curInSeq]
&& array[seq-1][sum1][sum2-A[curInSeq]]);
curRes = excludeFrom || includeToSum1 || includeToSum2;
}
array[seq][sum1][sum2] = curRes;
}
int result = array[size-1][part-1][part-1];
return result;
}
Another example in C++ (based on the previous answers):
bool partition3(vector<int> const &A) {
int sum = 0;
for (int i = 0; i < A.size(); i++) {
sum += A[i];
}
if (sum % 3 != 0) {
return false;
}
vector<vector<vector<int>>> E(A.size() + 1, vector<vector<int>>(sum / 3 + 1, vector<int>(sum / 3 + 1, 0)));
for (int i = 1; i <= A.size(); i++) {
for (int j = 0; j <= sum / 3; j++) {
for (int k = 0; k <= sum / 3; k++) {
E[i][j][k] = E[i - 1][j][k];
if (A[i - 1] <= k) {
E[i][j][k] = max(E[i][j][k], E[i - 1][j][k - A[i - 1]] + A[i - 1]);
}
if (A[i - 1] <= j) {
E[i][j][k] = max(E[i][j][k], E[i - 1][j - A[i - 1]][k] + A[i - 1]);
}
}
}
}
return (E.back().back().back() / 2 == sum / 3);
}
You really want Korf's Complete Karmarkar-Karp algorithm (http://ac.els-cdn.com/S0004370298000861/1-s2.0-S0004370298000861-main.pdf, http://ijcai.org/papers09/Papers/IJCAI09-096.pdf). A generalization to three-partitioning is given. The algorithm is surprisingly fast given the complexity of the problem, but requires some implementation.
The essential idea of KK is to ensure that large blocks of similar size appear in different partitions. One groups pairs of blocks, which can then be treated as a smaller block of size equal to the difference in sizes that can be placed as normal: by doing this recursively, one ends up with small blocks that are easy to place. One then does a two-coloring of the block groups to ensure that the opposite placements are handled. The extension to 3-partition is a bit complicated. The Korf extension is to use depth-first search in KK order to find all possible solutions or to find a solution quickly.

Resources