Sorting by simliarity - algorithm

I've got a collection of orders.
[a, b]
[a, b, c]
[a, b, c, d]
[a, b, c, d]
[b, c]
[c, d]
Where a, b, c and d are SKUs, and there are big boxes full of them. And there are thousands of orders and hundreds of possible SKUs.
Now imagine that when packing these orders, if an order lacks items from the previous order, you must put the box for that SKU away (and similarly take one out that you don't have).
How do you sort this so there are a minimum number of box changes? Or, in more programmy terms: how do you minimize the cumulative hamming distance / maximize the intersect between adjacent items in a collection?
I really have no clue where to start. Is there already some algorithm for this? Is there a decent approximation?

Indeed #irrelephant is correct. This is an undirected Hamiltonian path problem. Model it as a complete undirected graph where the nodes are sku sets and the weight of each edge is the Hamming distance between the respective sets. Then finding a packing order is equivalent to finding a path that touches each node exactly once. This is a Hamiltonian path (HP). You want the minimum weight HP.
The bad news is that finding a min weight HP is NP complete, which means an optimal solution will need exponential time in general.
The good news is that there are reasonable approximation algorithms. The obvious greedy algorithm gives an answer no worse than two times the optimal HP. It is:
create the graph of Hamming distances
sort the edges by weight in increasing order: e0, e1, ...
set C = emptyset
for e in sequence e0, e1, ...
if C union {e} does not cause a cycle nor a vertex with degree more than 2 in C
set C = C union {e}
return C
Note the if statement test can be implemented in nearly constant time with the classical disjoint set union-find algorithm and incident edge counters in vertices.
So the run time here can be O(n^2 log n) for n sku sets assuming that computing a Hamming distance is constant time.
If graphs are not in your vocabulary, think of a triangular table with one entry for each pair of sku sets. The entries in the table are Hamming distances. You want to sort the table entries and then add sku set pairs in sorted order one by one to your plan, skipping pairs that would cause a "fork" or a "loop." A fork would be a set of pairs like (a,b), (b,c), (b,d). A loop would be (a,b), (b,c), (c, a).
There are more complex polynomial time algorithms that get to a 3/2 approximation.

I like this problem so much I couldn't resist coding up the algorithm suggested above. The code is a little long, so I'm putting it in a separate response.
It comes up with this sequence on the example.
Step 1: c d
Step 2: b c
Step 3: a b c
Step 4: a b c d
Step 5: a b c d
Step 6: a b
Note this algorithm ignores initial setup and final teardown costs. It only considers inter-setup distances. Here the Hamming distances are 2 + 1 + 1 + 0 + 2 = 6. This is the same total distance as the order given in the question.
#include <stdio.h>
#include <stdlib.h>
// With these data types we can have up to 64k items and 64k sets of items,
// But then the table of pairs is about 20Gb!
typedef unsigned short ITEM, INDEX;
// A sku set in the problem.
struct set {
INDEX n_elts;
ITEM *elts;
};
// A pair of sku sets and associated info.
struct pair {
INDEX i, j; // Indices of sets.
ITEM dist; // Hamming distance between sets.
INDEX rank, parent; // Disjoint set union/find fields.
};
// For a given set, the adjacent ones along the path under construction.
struct adjacent {
unsigned char n; // 0, 1, or 2.
INDEX elts[2]; // Indices of n adjacent sets.
};
// Some tracing functions for fun.
void print_pair(struct pair *pairs, int i)
{
struct pair *p = pairs + i;
printf("%d:(%d,%d#%d)[%d->%d]\n", i, p->i, p->j, p->dist, p->rank, p->parent);
}
void print_adjacent(struct adjacent *adjs, int i)
{
struct adjacent *a = adjs + i;
switch (a->n) {
case 0: printf("%d:o", i); break;
case 1: printf("%d:o->%d\n", i, a->elts[0]); break;
default: printf("%d:%d<-o->%d\n", i, a->elts[0], a->elts[1]); break;
}
}
// Compute the Hamming distance between two sets. Assumes elements are sorted.
// Works a bit like merging.
ITEM hamming_distance(struct set *a, struct set *b)
{
int ia = 0, ib = 0;
ITEM d = 0;
while (ia < a->n_elts && ib < b->n_elts) {
if (a->elts[ia] < b->elts[ib]) {
++d;
++ia;
}
else if (a->elts[ia] > b->elts[ib]) {
++d;
++ib;
}
else {
++ia;
++ib;
}
}
return d + (a->n_elts - ia) + (b->n_elts - ib);
}
// Classic disjoint set find operation.
INDEX find(struct pair *pairs, INDEX x)
{
if (pairs[x].parent != x)
pairs[x].parent = find(pairs, pairs[x].parent);
return pairs[x].parent;
}
// Classic disjoint set union. Assumes x and y are canonical.
void do_union(struct pair *pairs, INDEX x, INDEX y)
{
if (x == y) return;
if (pairs[x].rank < pairs[y].rank)
pairs[x].parent = y;
else if (pairs[x].rank > pairs[y].rank)
pairs[y].parent = x;
else {
pairs[y].parent = x;
pairs[x].rank++;
}
}
// Sort predicate to sort pairs by Hamming distance.
int by_dist(const void *va, const void *vb)
{
const struct pair *a = va, *b = vb;
return a->dist < b->dist ? -1 : a->dist > b->dist ? +1 : 0;
}
// Return a plan with greedily found least Hamming distance sum.
// Just an array of indices into the given table of sets.
// TODO: Deal with calloc/malloc failure!
INDEX *make_plan(struct set *sets, INDEX n_sets)
{
// Allocate enough space for all the pairs taking care for overflow.
// This grows as the square of n_sets!
size_t n_pairs = (n_sets & 1) ? n_sets / 2 * n_sets : n_sets / 2 * (n_sets - 1);
struct pair *pairs = calloc(n_pairs, sizeof(struct pair));
// Initialize the pairs.
int ip = 0;
for (int j = 1; j < n_sets; j++) {
for (int i = 0; i < j; i++) {
struct pair *p = pairs + ip++;
p->i = i;
p->j = j;
p->dist = hamming_distance(sets + i, sets + j);
}
}
// Sort by Hamming distance.
qsort(pairs, n_pairs, sizeof pairs[0], by_dist);
// Initialize the disjoint sets.
for (int i = 0; i < n_pairs; i++) {
struct pair *p = pairs + i;
p->rank = 0;
p->parent = i;
}
// Greedily add pairs to the Hamiltonian path so long as they don't cause a non-path!
ip = 0;
struct adjacent *adjs = calloc(n_sets, sizeof(struct adjacent));
for (int i = 0; i < n_pairs; i++) {
struct pair *p = pairs + i;
struct adjacent *ai = adjs + p->i, *aj = adjs + p->j;
// Continue if we'd get a vertex with degree 3 by adding this edge.
if (ai->n == 2 || aj->n == 2) continue;
// Find (possibly) disjoint sets of pair's elements.
INDEX i_set = find(pairs, p->i);
INDEX j_set = find(pairs, p->j);
// Continue if we'd form a cycle by adding this edge.
if (i_set == j_set) continue;
// Otherwise add this edge.
do_union(pairs, i_set, j_set);
ai->elts[ai->n++] = p->j;
aj->elts[aj->n++] = p->i;
// Done after we've added enough pairs to touch all sets in a path.
if (++ip == n_sets - 1) break;
}
// Find a set with only one adjacency, the path start.
int p = -1;
for (int i = 0; i < n_sets; ++i)
if (adjs[i].n == 1) {
p = i;
break;
}
// A plan will be an ordering of sets.
INDEX *plan = malloc(n_sets * sizeof(INDEX));
// Walk along the path to get the ordering.
for (int i = 0; i < n_sets; i++) {
plan[i] = p;
struct adjacent *a = adjs + p;
// This logic figures out which adjacency takes us forward.
p = a->elts[ a->n > 1 && a->elts[1] != plan[i-1] ];
}
// Done with intermediate data structures.
free(pairs);
free(adjs);
return plan;
}
// A tiny test case. Much more testing needed!
#define ARRAY_SIZE(A) (sizeof A / sizeof A[0])
#define SET(Elts) { ARRAY_SIZE(Elts), Elts }
// Items must be in ascending order for Hamming distance calculation.
ITEM a1[] = { 'a', 'b' };
ITEM a2[] = { 'a', 'b', 'c' };
ITEM a3[] = { 'a', 'b', 'c', 'd' };
ITEM a4[] = { 'a', 'b', 'c', 'd' };
ITEM a5[] = { 'b', 'c' };
ITEM a6[] = { 'c', 'd' };
// Out of order to see how we do.
struct set sets[] = { SET(a3), SET(a6), SET(a1), SET(a4), SET(a5), SET(a2) };
int main(void)
{
int n_sets = ARRAY_SIZE(sets);
INDEX *plan = make_plan(sets, n_sets);
for (int i = 0; i < n_sets; i++) {
struct set *s = sets + plan[i];
printf("Step %d: ", i+1);
for (int j = 0; j < s->n_elts; j++) printf("%c ", (char)s->elts[j]);
printf("\n");
}
return 0;
}

Related

How will I solve this using DP?

Question link: http://codeforces.com/contest/2/problem/B
There is a square matrix n × n, consisting of non-negative integer numbers. You should find such a way on it that
starts in the upper left cell of the matrix;
each following cell is to the right or down from the current cell;
the way ends in the bottom right cell.
Moreover, if we multiply together all the numbers along the way, the result should be the least "round". In other words, it should end in the least possible number of zeros.
Input
The first line contains an integer number n (2 ≤ n ≤ 1000), n is the size of the matrix. Then follow n lines containing the matrix elements (non-negative integer numbers not exceeding 10^9).
Output
In the first line print the least number of trailing zeros. In the second line print the correspondent way itself.
I thought of the following: In the end, whatever the answer will be, it should contain minimum powers of 2's and 5's. Therefore, what I did was, for each entry in the input matrix, I calculated the powers of 2's and 5's and stored them in separate matrices.
for (i = 0; i < n; i++)
{
for ( j = 0; j < n; j++)
{
cin>>foo;
matrix[i][j] = foo;
int n1 = calctwo(foo); // calculates the number of 2's in factorisation of that number
int n2 = calcfive(foo); // calculates number of 5's
two[i][j] = n1;
five[i][j] = n2;
}
}
After that, I did this:
for (i = 0; i < n; i++)
{
for ( j = 0; j < n; j++ )
{
dp[i][j] = min(two[i][j],five[i][j]); // Here, dp[i][j] will store minimum number of 2's and 5's.
}
}
But the above doesn't really a valid answer, I don't know why? Have I implemented the correct approach? Or, is this the correct way of solving this question?
Edit: Here are my functions of calculating the number of two's and number of five's in a number.
int calctwo (int foo)
{
int counter = 0;
while (foo%2 == 0)
{
if (foo%2 == 0)
{
counter++;
foo = foo/2;
}
else
break;
}
return counter;
}
int calcfive (int foo)
{
int counter = 0;
while (foo%5 == 0)
{
if (foo%5 == 0)
{
counter++;
foo = foo/5;
}
else
break;
}
return counter;
}
Edit2: I/O Example as given in the link:
Input:
3
1 2 3
4 5 6
7 8 9
Output:
0
DDRR
Since you are interested only in the number of trailing zeroes you need only to consider the powers of 2, 5 which you could keep in two separate nxn arrays. So for the array
1 2 3
4 5 6
7 8 9
you just keep the arrays
the powers of 2 the powers of 5
0 1 0 0 0 0
2 0 1 0 1 0
0 3 0 0 0 0
The insight for the problem is the following. Notice that if you find a path which minimizes the sum of the powers of 2 and a path which minimizes the number sum of the powers of 5 then the answer is the one with lower value of those two paths. So you reduce your problem to the two times application of the following classical dp problem: find a path, starting from the top-left corner and ending at the bottom-right, such that the sum of its elements is minimum. Again, following the example, we have:
minimal path for the
powers of 2 value
* * - 2
- * *
- - *
minimal path for the
powers of 5 value
* - - 0
* - -
* * *
so your answer is
* - -
* - -
* * *
with value 0
Note 1
It might seem that taking the minimum of the both optimal paths gives only an upper bound so a question that may rise is: is this bound actually achieved? The answer is yes. For convenience, let the number of 2's along the 2's optimal path is a and the number of 5's along the 5's optimal path is b. Without loss of generality assume that the minimum of the both optimal paths is the one for the power of 2's (that is a < b). Let the number of 5's along the minimal path is c. Now the question is: are there as much as 5's as there are 2's along this path (i.e. is c >= a?). Assume that the answer is no. That means that there are less 5's than 2's along the minimal path (that is c < a). Since the optimal value of 5's paths is b we have that every 5's path has at least b 5's in it. This should also be true for the minimal path. That means that c > b. We have that c < a so a > b but the initial assumption was that a < b. Contradiction.
Note 2
You might also want consider the case in which there is an element 0 in your matrix. I'd assume that number of trailing zeroes when the product is 1. In this case, if the algorithm has produced a result with a value more than 1 you should output 1 and print a path that goes through the element 0.
Here is the code. I've used pair<int,int> to store factor of 2 and 5 in the matrix.
#include<vector>
#include<iostream>
using namespace std;
#define pii pair<int,int>
#define F first
#define S second
#define MP make_pair
int calc2(int a){
int c=0;
while(a%2==0){
c++;
a/=2;
}
return c;
}
int calc5(int a){
int c=0;
while(a%5==0){
c++;
a/=5;
}
return c;
}
int mini(int a,int b){
return a<b?a:b;
}
pii min(pii a, pii b){
if(mini(a.F,a.S) < mini(b.F,b.S))
return a;
return b;
}
int main(){
int n;
cin>>n;
vector<vector<pii > > v;
vector<vector<int> > path;
int i,j;
for(i=0;i<n;i++){
vector<pii > x;
vector<int> q(n,0);
for(j=0;j<n;j++){
int y;cin>>y;
x.push_back(MP(calc2(y),calc5(y))); //I store factors of 2,5 in the vector to calculate
}
x.push_back(MP(100000,100000)); //padding each row to n+1 elements (to handle overflow in code)
v.push_back(x);
path.push_back(q); //initialize path matrix to 0
}
vector<pii > x(n+1,MP(100000,100000));
v.push_back(x); //pad 1 more row to handle index overflow
for(i=n-1;i>=0;i--){
for(j=n-1;j>=0;j--){ //move from destination to source grid
if(i==n-1 && j==n-1)
continue;
//here, the LHS of condition in if block is the condition which determines minimum number of trailing 0's. This is the same condition that is used to manipulate "v" for getting the same result.
if(min(MP(v[i][j].F+v[i+1][j].F,v[i][j].S+v[i+1][j].S), MP(v[i][j].F+v[i][j+1].F,v[i][j].S+v[i][j+1].S)) == MP(v[i][j].F+v[i+1][j].F,v[i][j].S+v[i+1][j].S))
path[i][j] = 1; //go down
else
path[i][j] = 2; //go right
v[i][j] = min(MP(v[i][j].F+v[i+1][j].F,v[i][j].S+v[i+1][j].S), MP(v[i][j].F+v[i][j+1].F,v[i][j].S+v[i][j+1].S));
}
}
cout<<mini(v[0][0].F, v[0][0].S)<<endl; //print result
for(i=0,j=0;i<=n-1 && j<=n-1;){ //print path (I don't know o/p format)
cout<<"("<<i<<","<<j<<") -> ";
if(path[i][j]==1)
i++;
else
j++;
}
return 0;
}
This code gives fine results as far as the test cases I checked. If you have any doubts regarding this code, ask in comments.
EDIT:
The basic thought process.
To reach the destination, there are only 2 options. I started with destination to avoid the problem of path ahead calculation, because if 2 have same minimum values, then we chose any one of them. If the path to destination is already calculated, it does not matter which we take.
And minimum is to check which pair is more suitable. If a pair has minimum 2's or 5's than other, it will produce less 0's.
Here is a solution proposal using Javascript and functional programming.
It relies on several functions:
the core function is smallest_trailer that recursively goes through the grid. I have chosen to go in 4 possible direction, left "L", right "R", down "D" and "U". It is not possible to pass twice on the same cell. The direction that is chosen is the one with the smallest number of trailing zeros. The counting of trailing zeros is devoted to another function.
the function zero_trailer(p,n,nbz) assumes that you arrive on a cell with a value p while you already have an accumulator n and met nbz zeros on your way. The function returns an array with two elements, the new number of zeros and the new accumulator. The accumulator will be a power of 2 or 5. The function uses the auxiliary function pow_2_5(n) that returns the powers of 2 and 5 inside n.
Other functions are more anecdotical: deepCopy(arr) makes a standard deep copy of the array arr, out_bound(i,j,n) returns true if the cell (i,j) is out of bound of the grid of size n, myMinIndex(arr) returns the min index of an array of 2 dimensional arrays (each subarray contains the nb of trailing zeros and the path as a string). The min is only taken on the first element of subarrays.
MAX_SAFE_INTEGER is a (large) constant for the maximal number of trailing zeros when the path is wrong (goes out of bound for example).
Here is the code, which works on the example given in the comments above and in the orginal link.
var MAX_SAFE_INTEGER = 9007199254740991;
function pow_2_5(n) {
// returns the power of 2 and 5 inside n
function pow_not_2_5(k) {
if (k%2===0) {
return pow_not_2_5(k/2);
}
else if (k%5===0) {
return pow_not_2_5(k/5);
}
else {
return k;
}
}
return n/pow_not_2_5(n);
}
function zero_trailer(p,n,nbz) {
// takes an input two numbers p and n that should be multiplied and a given initial number of zeros (nbz = nb of zeros)
// n is the accumulator of previous multiplications (a power of 5 or 2)
// returns an array [kbz, k] where kbz is the total new number of zeros (nbz + the trailing zeros from the multiplication of p and n)
// and k is the new accumulator (typically a power of 5 or 2)
function zero_aux(k,kbz) {
if (k===0) {
return [1,0];
}
else if (k%10===0) {
return zero_aux(k/10,kbz+1);
}
else {
return [kbz,k];
}
}
return zero_aux(pow_2_5(p)*n,nbz);
}
function out_bound(i,j,n) {
return !((i>=0)&&(i<n)&&(j>=0)&&(j<n));
}
function deepCopy(arr){
var toR = new Array(arr.length);
for(var i=0;i<arr.length;i++){
var toRi = new Array(arr[i].length);
for(var j=0;j<arr[i].length;j++){
toRi[j] = arr[i][j];
}
toR[i] = toRi;
}
return toR;
}
function myMinIndex(arr) {
var min = arr[0][0];
var minIndex = 0;
for (var i = 1; i < arr.length; i++) {
if (arr[i][0] < min) {
minIndex = i;
min = arr[i][0];
}
}
return minIndex;
}
function smallest_trailer(grid) {
var n = grid.length;
function st_aux(i,j,grid_aux, acc_mult, nb_z, path) {
if ((i===n-1)&&(j===n-1)) {
var tmp_acc_nbz_f = zero_trailer(grid_aux[i][j],acc_mult,nb_z);
return [tmp_acc_nbz_f[0], path];
}
else if (out_bound(i,j,n)) {
return [MAX_SAFE_INTEGER,[]];
}
else if (grid_aux[i][j]<0) {
return [MAX_SAFE_INTEGER,[]];
}
else {
var tmp_acc_nbz = zero_trailer(grid_aux[i][j],acc_mult,nb_z) ;
grid_aux[i][j]=-1;
var res = [st_aux(i+1,j,deepCopy(grid_aux), tmp_acc_nbz[1], tmp_acc_nbz[0], path+"D"),
st_aux(i-1,j,deepCopy(grid_aux), tmp_acc_nbz[1], tmp_acc_nbz[0], path+"U"),
st_aux(i,j+1,deepCopy(grid_aux), tmp_acc_nbz[1], tmp_acc_nbz[0], path+"R"),
st_aux(i,j-1,deepCopy(grid_aux), tmp_acc_nbz[1], tmp_acc_nbz[0], path+"L")];
return res[myMinIndex(res)];
}
}
return st_aux(0,0,grid, 1, 0, "");
}
myGrid = [[1, 25, 100],[2, 1, 25],[100, 5, 1]];
console.log(smallest_trailer(myGrid)); //[0,"RDDR"]
myGrid = [[1, 2, 100],[25, 1, 5],[100, 25, 1]];
console.log(smallest_trailer(myGrid)); //[0,"DRDR"]
myGrid = [[1, 10, 1, 1, 1],[1, 1, 1, 10, 1],[10, 10, 10, 10, 1],[10, 10, 10, 10, 1],[10, 10, 10, 10, 1]];
console.log(smallest_trailer(myGrid)); //[0,"DRRURRDDDD"]
This is my Dynamic Programming solution.
https://app.codility.com/demo/results/trainingAXFQ5B-SZQ/
For better understanding we can simplify the task and assume that there are no zeros in the matrix (i.e. matrix contains only positive integers), then the Java solution will be the following:
class Solution {
public int solution(int[][] a) {
int minPws[][] = new int[a.length][a[0].length];
int minPws2 = getMinPws(a, minPws, 2);
int minPws5 = getMinPws(a, minPws, 5);
return min(minPws2, minPws5);
}
private int getMinPws(int[][] a, int[][] minPws, int p) {
minPws[0][0] = pws(a[0][0], p);
//Fullfill the first row
for (int j = 1; j < a[0].length; j++) {
minPws[0][j] = minPws[0][j-1] + pws(a[0][j], p);
}
//Fullfill the first column
for (int i = 1; i < a.length; i++) {
minPws[i][0] = minPws[i-1][0] + pws(a[i][0], p);
}
//Fullfill the rest of matrix
for (int i = 1; i < a.length; i++) {
for (int j = 1; j < a[0].length; j++) {
minPws[i][j] = min(minPws[i-1][j], minPws[i][j-1]) + pws(a[i][j], p);
}
}
return minPws[a.length-1][a[0].length-1];
}
private int pws(int n, int p) {
//Only when n > 0
int pws = 0;
while (n % p == 0) {
pws++;
n /= p;
}
return pws;
}
private int min(int a, int b) {
return (a < b) ? a : b;
}
}

SUM exactly using K elements solution

Problem: On a given array with N numbers, find subset of size M (exactly M elements) that equal to SUM.
I am looking for a Dynamic Programming(DP) solution for this problem. Basically looking to understand the matrix filled approach. I wrote below program but didn't add memoization as i am still wondering how to do that.
#include <stdio.h>
#define SIZE(a) sizeof(a)/sizeof(a[0])
int binary[100];
int a[] = {1, 2, 5, 5, 100};
void show(int* p, int size) {
int j;
for (j = 0; j < size; j++)
if (p[j])
printf("%d\n", a[j]);
}
void subset_sum(int target, int i, int sum, int *a, int size, int K) {
if (sum == target && !K) {
show(binary, size);
} else if (sum < target && i < size) {
binary[i] = 1;
foo(target, i + 1, sum + a[i], a, size, K-1);
binary[i] = 0;
foo(target, i + 1, sum, a, size, K);
}
}
int main() {
int target = 10;
int K = 2;
subset_sum(target, 0, 0, a, SIZE(a), K);
}
Is the below recurrence solution makes sense?
Let DP[SUM][j][k] sum up to SUM with exactly K elements picked from 0 to j elements.
DP[i][j][k] = DP[i][j-1][k] || DP[i-a[j]][j-1][k-1] { input array a[0....j] }
Base cases are:
DP[0][0][0] = DP[0][j][0] = DP[0][0][k] = 1
DP[i][0][0] = DP[i][j][0] = 0
It means we can either consider this element ( DP[i-a[j]][j-1][k-1] ) or we don't consider the current element (DP[i][j-1][k]). If we consider current element, k is reduced by 1 which reduces the elements that needs to be considered and same goes when current element is not considered i.e. K is not reduced by 1.
Your solution looks right to me.
Right now, you're basically backtracking over all possibilities and printing each solution. If you only want one solution, you could add a flag that you set when one solution was found and check before continuing with recursive calls.
For memoization, you should first get rid of the binary array, after which you can do something like this:
int memo[NUM_ELEMENTS][MAX_SUM][MAX_K];
bool subset_sum(int target, int i, int sum, int *a, int size, int K) {
if (sum == target && !K) {
memo[i][sum][K] = true;
return memo[i][sum][K];
} else if (sum < target && i < size) {
if (memo[i][sum][K] != -1)
return memo[i][sum][K];
memo[i][sum][K] = foo(target, i + 1, sum + a[i], a, size, K-1) ||
foo(target, i + 1, sum, a, size, K);
return memo[i][sum][K]
}
return false;
}
Then, look at memo[_all indexes_][target][K]. If this is true, there exists at least one solution. You can store addition information to get you that next solution, or you can iterate with an i from found_index - 1 to 0 and check for which i you have memo[i][sum - a[i]][K - 1] == true. Then recurse on that, and so on. This will allow you to reconstruct the solution using just the memo array.
To my understanding, if only the feasibility of the input has to be checked, the problem can be solved with a two-dimensional state space
bool[][] IsFeasible = new bool[n][k]
where IsFeasible[i][j] is true if and only if there is a subset of the elements 1 to i which sum up to exactly j for every
1 <= i <= n
1 <= j <= k
and for this state space, the recurrence relation
IsFeasible[i][j] = IsFeasible[i-1][k-a[i]] || IsFeasible[i-1][k]
can be used, where the left-hand side of the or-operator || corresponds to selecting the i-th item and the right-hand side corresponds to to not selecting the i-th item. The actual choice of items could be obtained by backtracking or auxiliary information saved during evaluation.

Lexographically smallest path in a N*M grid

I came across this in a recent interview.
We are given a N*M grid consisting of numbers and a path in the grid is the nodes you traverse.We are given a constraint that we can only move either right or down in the grid.So given this grid, we need to find the lexographically smallest path,after sorting it, to reach from top left to bottom right point of the grid
Eg. if grid is 2*2
4 3
5 1
then lexographically smallest path as per the question is "1 3 4".
How to do such problem? Code is appreciated. Thanks in advance.
You can use Dynamic programming to solve this problem. Let f(i, j) be the smallest lexicographical path (after sorting the path) from (i, j) to (N, M) moving only right and down. Consider the following recurrence:
f(i, j) = sort( a(i, j) + smallest(f(i + 1, j), f(i, j + 1)))
where a(i, j) is the value in the grid at (i, j), smallest (x, y) returns the smaller lexicographical string between x and y. the + concatenate two strings, and sort(str) sorts the string str in lexical order.
The base case of the recurrence is:
f(N, M) = a(N, M)
Also the recurrence change when i = N or j = M (make sure that you see that).
Consider the following code written in C++:
//-- the 200 is just the array size. It can be modified
string a[200][200]; //-- represent the input grid
string f[200][200]; //-- represent the array used for memoization
bool calculated[200][200]; //-- false if we have not calculate the value before, and true if we have
int N = 199, M = 199; //-- Number of rows, Number of columns
//-- sort the string str and return it
string srt(string &str){
sort(str.begin(), str.end());
return str;
}
//-- return the smallest of x and y
string smallest(string & x, string &y){
for (int i = 0; i < x.size(); i++){
if (x[i] < y[i]) return x;
if (x[i] > y[i]) return y;
}
return x;
}
string solve(int i, int j){
if (i == N && j == M) return a[i][j]; //-- if we have reached the buttom right cell (I assumed the array is 1-indexed
if (calculated[i][j]) return f[i][j]; //-- if we have calculated this before
string ans;
if (i == N) ans = srt(a[i][j] + solve(i, j + 1)); //-- if we are at the buttom boundary
else if (j == M) ans = srt(a[i][j] + solve(i + 1, j)); //-- if we are at the right boundary
else ans = srt(a[i][j] + smallest(solve(i, j + 1), solve(i + 1, j)));
calculated[i][j] = true; //-- to fetch the calculated result in future calls
f[i][j] = ans;
return ans;
}
string calculateSmallestPath(){
return solve(1, 1);
}
You can apply a dynamic programming approach to solve this problem in O(N * M * (N + M)) time and space complexity.
Below I'll consider, that N is the number of rows, M is the number of columns, and top left cell has coordinates (0, 0), first for row and second for column.
Lets for each cell store the lexicographically smallest path ended at this cell in sorted order. The answer for row and column with 0 index is trivial, because there is only one way to reach each of these cells. For the rest of cells you should choose the smallest path for top and left cells and insert the value of current cell.
The algorithm is:
path[0][0] <- a[0][0]
path[i][0] <- insert(a[i][0], path[i - 1][0])
path[0][j] <- insert(a[0][j], path[0][j - 1])
path[i][j] <- insert(a[i][j], min(path[i - 1][j], path[i][j - 1])
If no number is repeated, this can be achieved in O (NM log (NM)) as well.
Intuition:
Suppose I label a grid with upper left corner (a,b) and bottom right corner (c,d) as G(a,b,c,d). Since you've to attain the lexicographically smallest string AFTER sorting the path, the aim should be to find the minimum value every time in G. If this minimum value is attained at, let's say, (i,j), then G(i,b,c,j) and G(a,j,i,d) are rendered useless for the search of our next min (for the path). That is to say, the values for the path we desire would never be in these two grids. Proof? Any location within these grids, if traversed will not let us reach the minimum value in G(a,b,c,d) (the one at (i,j)). And, if we avoid (i,j), the path we build cannot be lexicographically smallest.
So, first we find the min for G(1,1,m,n). Suppose it's at (i,j). Mark the min. We then find out the min in G(1,1,i,j) and G(i,j,m,n) and do the same for them. Keep continuing this way until, at the end, we have m+n-1 marked entries, which will constitute our path. Traverse the original grid G(1,1,m,n) linearly and the report the value if it is marked.
Approach:
To find the min every time in G is costly. What if we map each value in the grid to it's location? - Traverse the grid and maintain a dictionary Dict with the key being the value at (i,j) and the value being the tuple (i,j). At the end, you'll have a list of key value pairs covering all the values in the grid.
Now, we'll be maintaining a list of valid grids in which we will find candidates for our path. The first valid grid will be G(1,1,m,n).
Sort the keys and start iterating from the first value in the sorted key set S.
Maintain a tree of valid grids, T(G), such that for each G(a,b,c,d) in T, G.left = G(a,b,i,j) and G.right = G(i,j,c,d) where (i,j) = location of min val in G(a,b,c,d)
The algorithm now:
for each val in sorted key set S do
(i,j) <- Dict(val)
Grid G <- Root(T)
do while (i,j) in G
if G has no child do
G.left <- G(a,b,i,j)
G.right <- G(i,j,c,d)
else if (i,j) in G.left
G <- G.left
else if (i,j) in G.right
G <- G.right
else
dict(val) <- null
end do
end if-else
end do
end for
for each val in G(1,1,m,n)
if dict(val) not null
solution.append(val)
end if
end for
return solution
The Java code:
class Grid{
int a, b, c, d;
Grid left, right;
Grid(int a, int b, int c, int d){
this.a = a;
this.b = b;
this.c = c;
this.d = d;
left = right = null;
}
public boolean isInGrid(int e, int f){
return (e >= a && e <= c && f >= b && f <= d);
}
public boolean hasNoChild(){
return (left == null && right == null);
}
}
public static int[] findPath(int[][] arr){
int row = arr.length;
int col = arr[0].length;
int[][] index = new int[row*col+1][2];
HashMap<Integer,Point> map = new HashMap<Integer,Point>();
for(int i = 0; i < row; i++){
for(int j = 0; j < col; j++){
map.put(arr[i][j], new Point(i,j));
}
}
Grid root = new Grid(0,0,row-1,col-1);
SortedSet<Integer> keys = new TreeSet<Integer>(map.keySet());
for(Integer entry : keys){
Grid temp = root;
int x = map.get(entry).x, y = map.get(entry).y;
while(temp.isInGrid(x, y)){
if(temp.hasNoChild()){
temp.left = new Grid(temp.a,temp.b,x, y);
temp.right = new Grid(x, y,temp.c,temp.d);
break;
}
if(temp.left.isInGrid(x, y)){
temp = temp.left;
}
else if(temp.right.isInGrid(x, y)){
temp = temp.right;
}
else{
map.get(entry).x = -1;
break;
}
}
}
int[] solution = new int[row+col-1];
int count = 0;
for(int i = 0 ; i < row; i++){
for(int j = 0; j < col; j++){
if(map.get(arr[i][j]).x >= 0){
solution[count++] = arr[i][j];
}
}
}
return solution;
}
The space complexity is constituted by maintenance of dictionary - O(NM) and of the tree - O(N+M). Overall: O(NM)
The time complexity for filling up and then sorting the dictionary - O(NM log(NM)); for checking the tree for each of the NM values - O(NM log(N+M)). Overall - O(NM log(NM)).
Of course, this won't work if values are repeated since then we'd have more than one (i,j)'s for a single value in the grid and the decision to chose which will no longer be satisfied by a greedy approach.
Additional FYI: The problem similar to this I heard about earlier had an additional grid property - there are no values repeating and the numbers are from 1 to NM. In such a case, the complexity could further reduce to O(NM log(N+M)) since instead of a dictionary, you can simply use values in the grid as indices of an array (which won't required sorting.)

Adding sum of frequencies whille solving Optimal Binary search tree

I am referring to THIS problem and solution.
Firstly, I did not get why sum of frequencies is added in the recursive equation.
Can someone please help understand that with an example may be.
In Author's word.
We add sum of frequencies from i to j (see first term in the above
formula), this is added because every search will go through root and
one comparison will be done for every search.
In code, sum of frequencies (purpose of which I do not understand) ... corresponds to fsum.
int optCost(int freq[], int i, int j)
{
// Base cases
if (j < i) // If there are no elements in this subarray
return 0;
if (j == i) // If there is one element in this subarray
return freq[i];
// Get sum of freq[i], freq[i+1], ... freq[j]
int fsum = sum(freq, i, j);
// Initialize minimum value
int min = INT_MAX;
// One by one consider all elements as root and recursively find cost
// of the BST, compare the cost with min and update min if needed
for (int r = i; r <= j; ++r)
{
int cost = optCost(freq, i, r-1) + optCost(freq, r+1, j);
if (cost < min)
min = cost;
}
// Return minimum value
return min + fsum;
}
Secondly, this solution will just return the optimal cost. Any suggestions regarding how to get the actual bst ?
Why we need sum of frequencies
The idea behind sum of frequencies is to correctly calculate cost of particular tree. It behaves like accumulator value to store tree weight.
Imagine that on first level of recursion we start with all keys located on first level of the tree (we haven't picked any root element yet). Remember the weight function - it sums over all node weights multiplied by node level. For now weight of our tree equals to sum of weights of all keys because any of our keys can be located on any level (starting from first) and anyway we will have at least one weight for each key in our result.
1) Suppose that we found optimal root key, say key r. Next we move all our keys except r one level down because each of the elements left can be located at most on second level (first level is already occupied). Because of that we add weight of each key left to our sum because anyway for all of them we will have at least double weight. Keys left we split in two sub arrays according to r element(to the left from r and to the right) which we selected before.
2) Next step is to select optimal keys for second level, one from each of two sub arrays left from first step. After doing that we again move all keys left one level down and add their weights to the sum because they will be located at least on third level so we will have at least triple weight for each of them.
3) And so on.
I hope this explanation will give you some understanding of why we need this sum of frequencies.
Finding optimal bst
As author mentioned at the end of the article
2) In the above solutions, we have computed optimal cost only. The
solutions can be easily modified to store the structure of BSTs also.
We can create another auxiliary array of size n to store the structure
of tree. All we need to do is, store the chosen ‘r’ in the innermost
loop.
We can do just that. Below you will find my implementation.
Some notes about it:
1) I was forced to replace int[n][n] with utility class Matrix because I used Visual C++ and it does not support non-compile time constant expression as array size.
2) I used second implementation of the algorithm from article which you provided (with memorization) because it is much easier to add functionality to store optimal bst to it.
3) Author has mistake in his code:
Second loop for (int i=0; i<=n-L+1; i++) should have n-L as upper bound not n-L+1.
4) The way we store optimal bst is as follows:
For each pair i, j we store optimal key index. This is the same as for optimal cost but instead of storing optimal cost we store optimal key index. For example for 0, n-1 we will have index of the root key r of our result tree. Next we split our array in two according to root element index r and get their optimal key indexes. We can dot that by accessing matrix elements 0, r-1 and r+1, n-1. And so forth. Utility function 'PrintResultTree' uses this approach and prints result tree in in-order (left subtree, node, right subtree). So you basically get ordered list because it is binary search tree.
5) Please don't flame me for my code - I'm not really a c++ programmer. :)
int optimalSearchTree(int keys[], int freq[], int n, Matrix& optimalKeyIndexes)
{
/* Create an auxiliary 2D matrix to store results of subproblems */
Matrix cost(n,n);
optimalKeyIndexes = Matrix(n, n);
/* cost[i][j] = Optimal cost of binary search tree that can be
formed from keys[i] to keys[j].
cost[0][n-1] will store the resultant cost */
// For a single key, cost is equal to frequency of the key
for (int i = 0; i < n; i++)
cost.SetCell(i, i, freq[i]);
// Now we need to consider chains of length 2, 3, ... .
// L is chain length.
for (int L = 2; L <= n; L++)
{
// i is row number in cost[][]
for (int i = 0; i <= n - L; i++)
{
// Get column number j from row number i and chain length L
int j = i + L - 1;
cost.SetCell(i, j, INT_MAX);
// Try making all keys in interval keys[i..j] as root
for (int r = i; r <= j; r++)
{
// c = cost when keys[r] becomes root of this subtree
int c = ((r > i) ? cost.GetCell(i, r - 1) : 0) +
((r < j) ? cost.GetCell(r + 1, j) : 0) +
sum(freq, i, j);
if (c < cost.GetCell(i, j))
{
cost.SetCell(i, j, c);
optimalKeyIndexes.SetCell(i, j, r);
}
}
}
}
return cost.GetCell(0, n - 1);
}
Below is utility class Matrix:
class Matrix
{
private:
int rowCount;
int columnCount;
std::vector<int> cells;
public:
Matrix()
{
}
Matrix(int rows, int columns)
{
rowCount = rows;
columnCount = columns;
cells = std::vector<int>(rows * columns);
}
int GetCell(int rowNum, int columnNum)
{
return cells[columnNum + rowNum * columnCount];
}
void SetCell(int rowNum, int columnNum, int value)
{
cells[columnNum + rowNum * columnCount] = value;
}
};
And main method with utility function to print result tree in in-order:
//Print result tree in in-order
void PrintResultTree(
Matrix& optimalKeyIndexes,
int startIndex,
int endIndex,
int* keys)
{
if (startIndex == endIndex)
{
printf("%d\n", keys[startIndex]);
return;
}
else if (startIndex > endIndex)
{
return;
}
int currentOptimalKeyIndex = optimalKeyIndexes.GetCell(startIndex, endIndex);
PrintResultTree(optimalKeyIndexes, startIndex, currentOptimalKeyIndex - 1, keys);
printf("%d\n", keys[currentOptimalKeyIndex]);
PrintResultTree(optimalKeyIndexes, currentOptimalKeyIndex + 1, endIndex, keys);
}
int main(int argc, char* argv[])
{
int keys[] = { 10, 12, 20 };
int freq[] = { 34, 8, 50 };
int n = sizeof(keys) / sizeof(keys[0]);
Matrix optimalKeyIndexes;
printf("Cost of Optimal BST is %d \n", optimalSearchTree(keys, freq, n, optimalKeyIndexes));
PrintResultTree(optimalKeyIndexes, 0, n - 1, keys);
return 0;
}
EDIT:
Below you can find code to create simple tree like structure.
Here is utility TreeNode class
struct TreeNode
{
public:
int Key;
TreeNode* Left;
TreeNode* Right;
};
Updated main function with BuildResultTree function
void BuildResultTree(Matrix& optimalKeyIndexes,
int startIndex,
int endIndex,
int* keys,
TreeNode*& tree)
{
if (startIndex > endIndex)
{
return;
}
tree = new TreeNode();
tree->Left = NULL;
tree->Right = NULL;
if (startIndex == endIndex)
{
tree->Key = keys[startIndex];
return;
}
int currentOptimalKeyIndex = optimalKeyIndexes.GetCell(startIndex, endIndex);
tree->Key = keys[currentOptimalKeyIndex];
BuildResultTree(optimalKeyIndexes, startIndex, currentOptimalKeyIndex - 1, keys, tree->Left);
BuildResultTree(optimalKeyIndexes, currentOptimalKeyIndex + 1, endIndex, keys, tree->Right);
}
int main(int argc, char* argv[])
{
int keys[] = { 10, 12, 20 };
int freq[] = { 34, 8, 50 };
int n = sizeof(keys) / sizeof(keys[0]);
Matrix optimalKeyIndexes;
printf("Cost of Optimal BST is %d \n", optimalSearchTree(keys, freq, n, optimalKeyIndexes));
PrintResultTree(optimalKeyIndexes, 0, n - 1, keys);
TreeNode* tree = new TreeNode();
BuildResultTree(optimalKeyIndexes, 0, n - 1, keys, tree);
return 0;
}

Convert array to a sorted one using only two operations

I found this question on an online forum: Really interested on how it can be solved:
Given an array A of positive integers. Convert it to a sorted array with minimum cost. The only valid operation are:
1) Decrement with cost = 1
2) Delete an element completely from the array with cost = value of element
This is an interview question asked for a tech company
NOTE : The original answer has been replaced with one in which I have a lot more confidence (and I can explain it, too). Both answers produced the same results on my set of test cases.
You can solve this problem using a dynamic programming approach. The key observation is that it never makes sense to decrement a number to a value not found in the original array. (Informal proof: suppose that you decremented a number O1 to a value X that is not in the original sequence in order to avoid removing a number O2 > X from the result sequence. Then you can decrement O1 to O2 instead, and reduce the cost by O2-X).
Now the solution becomes easy to understand: it is a DP in two dimensions. If we sort the elements of the distinct elements of the original sequence d into a sorted array s, the length of d becomes the first dimension of the DP; the length of s becomes the second dimension.
We declare dp[d.Length,s.Length]. The value of dp[i,j] is the cost of solving subproblem d[0 to i] while keeping the last element of the solution under s[j]. Note: this cost includes the cost of eliminating d[i] if it is less than s[j].
The first row dp[0,j] is computed as the cost of trimming d[0] to s[j], or zero if d[0] < s[j]. The value of dp[i,j] next row is calculated as the minimum of dp[i-1, 0 to j] + trim, where trim is the cost of trimming d[i] to s[j], or d[i] if it needs to be eliminated because s[j] is bigger than d[i].
The answer is calculated as the minimum of the last row dp[d.Length-1, 0 to s.Length].
Here is an implementation in C#:
static int Cost(int[] d) {
var s = d.Distinct().OrderBy(v => v).ToArray();
var dp = new int[d.Length,s.Length];
for (var j = 0 ; j != s.Length ; j++) {
dp[0, j] = Math.Max(d[0] - s[j], 0);
}
for (var i = 1; i != d.Length; i++) {
for (var j = 0 ; j != s.Length ; j++) {
dp[i, j] = int.MaxValue;
var trim = d[i] - s[j];
if (trim < 0) {
trim = d[i];
}
dp[i, j] = int.MaxValue;
for (var k = j ; k >= 0 ; k--) {
dp[i, j] = Math.Min(dp[i, j], dp[i - 1, k] + trim);
}
}
}
var best = int.MaxValue;
for (var j = 0 ; j != s.Length ; j++) {
best = Math.Min(best, dp[d.Length - 1, j]);
}
return best;
}
This direct implementation has space complexity of O(N^2). You can reduce it to O(N) by observing that only two last rows are used at the same time.
I'm assuming that "sorted" means smallest values at the start of the array, given the nature of the allowed operations.
The performance boundary between the two operations occurs when the cost of removing an out of sequence element is equal to the cost of either decrementing all greater-valued elements up to and including the offender, or removing all lesser-valued elements after the offender. You choose between decrementing preceding elements or removing later elements based on why the offending element is out of sequence. If it's less than the previous element, consider decrementing the earlier elements; if it's greater than the next element, consider removing later elements.
Some examples:
10 1 2 3 4 5
Decrement 10 to 1, cost 9.
1 2 3 4 10 4
Remove 4, cost 4.
1 2 3 4 10 5
Remove 5 or decrement 10 to 5, cost 5.
5 6 7 8 1 10
Remove 1, cost 1.
5 6 7 8 6 10
Decrement 7 and 8 to 6, cost 3.
2 1 1 4 2 4 4 3
Decrement the first 1, the first 4 by two, and the other two fours once each, cost 5.
The simplest implementation to find the solutions relies on having set knowledge, so it's very inefficient. Thankfully, the question doesn't care about that. The idea is to walk the array, and make the decision whether to remove or decrement to fix the set when an out of sequence element is encountered. A much more efficient implementation of this would be to use running totals (as opposed to calculate methods) and walk the array twice, forwards and backwards. I've written a mock up of the simpler version, as I think it's easier to read.
Pseudocode, returns total cost:
if array.Length < 2 : return 0; // no sorting necessary
resultArray = array.Copy();
int cost = 0;
for i = 0 to array.Length - 1 :
if i > 0 and array[i-1] > array[i] :
if CostToDecrementPreviousItems(i, array[i]) > array[i]) :
resultArray[i] = -1;
cost += array[i];
else :
cost += DecrementItemsThroughIndexGreaterThanValue(resultArray, i, array[i]);
end if
else if i < array.Length - 1 and array[i+1] < array[i] :
if CostToRemoveLaterItems(i, array[i]) > array[i] :
resultArray[i] = -1;
cost += array[i];
else :
cost += RemoveItemsAfterIndexGreaterThanValue(resultArray, i, array[i]);
end if
end if
end for
RemoveNegativeElements(resultArray);
array = resultArray;
return cost;
Hopefully the undefined method calls are self explanatory.
Construct decision graph, add start vertex to it. Each vertex contains "trim level", i.e. the value to which should be decremented all array values to the left of current node. Start vertex's "trim level" is infinity. Each edge of the graph has a value, corresponding to the cost of decision.
For each array element, starting from the rightmost, do steps 3 .. 5.
For each leaf vertex, do steps 4 .. 5.
Create up to 2 outgoing edges, (1) with the cost of deleting the array element and (2) with the cost of trimming all elements to the left (exactly, the cost of decreasing "trim level").
Connect these edges to newly created vertexes, one vertex for each array element and each "trim level".
Find shortest path from start vertex to one of the vertexes, corresponding to leftmost array element. Length of this path equals to the cost of the solution.
Decrement and delete array elements according to the decision graph.
This algorithm may be treated as an optimization of brute-force approach. For brute-force search, starting from rightmost array element, construct binary decision tree. Each vertex has 2 outgoing edges, one for "delete" decision, other "trim" decision. Decision cost is associated with each edge. "Trim level" is associated with each vertex. Optimal solution is determined by shortest path in this tree.
Remove every path, that is obviously non-optimal. For example, if the largest element is the last in the array, "trim" decision has cost zero, and "delete" decision is non-optimal. Delete path, starting from this "delete" decision. After this optimization, decision tree is more sparse: some vertexes have 2 outgoing edges, some - only one.
On each depth level, decision tree may have several vertexes with the same "trim level". Subtrees, starting from these vertexes, are identical to each other. That's a good reason to join all these vertexes to one vertex. This transforms tree into graph having at most n2/2 vertexes.
Complexity
Simplest implementation of this algorithm is O(n3), because for each of the O(n2) vertexes it computes trimming cost iteratively, in O(n) time.
Repeated trimming cost calculations are not necessary if there is enough memory to store all partial trimming cost results. This may require O(n2) or even O(n) space.
With such optimization, this algorithm is O(n2). Due to simple structure of the graph, shortest path search has O(n2) complexity, not O(n2 * log(n)).
C++11 implementation (both space and time complexity is O(n2)):
//g++ -std=c++0x
#include <iostream>
#include <vector>
#include <algorithm>
typedef unsigned val_t;
typedef unsigned long long acc_t; // to avoid overflows
typedef unsigned ind_t;
typedef std::vector<val_t> arr_t;
struct Node
{
acc_t trimCost;
acc_t cost;
ind_t link;
bool used;
Node()
: trimCost(0)
, used(false)
{}
};
class Matrix
{
std::vector<Node> m;
ind_t columns;
public:
Matrix(ind_t rows, ind_t cols)
: m(rows * cols)
, columns(cols)
{}
Node& operator () (ind_t row, ind_t column)
{
return m[columns * row + column];
}
};
void fillTrimCosts(const arr_t& array, const arr_t& levels, Matrix& matrix)
{
for (ind_t row = 0; row != array.size(); ++row)
{
for (ind_t column = 0; column != levels.size(); ++column)
{
Node& node = matrix(row + 1, column);
node.trimCost = matrix(row, column).trimCost;
if (array[row] > levels[column])
{
node.trimCost += array[row] - levels[column];
}
}
}
}
void updateNode(Node& node, acc_t cost, ind_t column)
{
if (!node.used || node.cost > cost)
{
node.cost = cost;
node.link = column;
}
}
acc_t transform(arr_t& array)
{
const ind_t size = array.size();
// Sorted array of trim levels
arr_t levels = array;
std::sort(levels.begin(), levels.end());
levels.erase(
std::unique(levels.begin(), levels.end()),
levels.end());
// Initialize matrix
Matrix matrix(size + 1, levels.size());
fillTrimCosts(array, levels, matrix);
Node& startNode = matrix(size, levels.size() - 1);
startNode.used = true;
startNode.cost = 0;
// For each array element, starting from the last one
for (ind_t row = size; row != 0; --row)
{
// Determine trim level for this array element
auto iter = std::lower_bound(levels.begin(), levels.end(), array[row - 1]);
const ind_t newLevel = iter - levels.begin();
// For each trim level
for (ind_t column = 0; column != levels.size(); ++column)
{
const Node& node = matrix(row, column);
if (!node.used)
continue;
// Determine cost of trimming to current array element's level
const acc_t oldCost = node.trimCost;
const acc_t newCost = matrix(row, newLevel).trimCost;
const acc_t trimCost = (newCost > oldCost)? newCost - oldCost: 0;
// Nodes for "trim" and "delete" decisions
Node& trimNode = matrix(row - 1, newLevel);
Node& nextNode = matrix(row - 1, column);
if (trimCost)
{
// Decision needed, update both nodes
updateNode(trimNode, trimCost + node.cost, column);
updateNode(nextNode, array[row - 1] + node.cost, column);
trimNode.used = true;
}
else
{
// No decision needed, pass current state to the next row's node
updateNode(nextNode, node.cost, column);
}
nextNode.used = true;
}
}
// Find optimal cost and starting trim level for it
acc_t bestCost = size * levels.size();
ind_t bestLevel = levels.size();
for (ind_t column = 0; column != levels.size(); ++column)
{
const Node& node = matrix(0, column);
if (node.used && node.cost < bestCost)
{
bestCost = node.cost;
bestLevel = column;
}
}
// Trace the path of minimum cost
for (ind_t row = 0; row != size; ++row)
{
const Node& node = matrix(row, bestLevel);
const ind_t next = node.link;
if (next == bestLevel && node.cost != matrix(row + 1, next).cost)
{
array[row] = 0;
}
else if (array[row] > levels[bestLevel])
{
array[row] = levels[bestLevel];
}
bestLevel = next;
}
return bestCost;
}
void printArray(const arr_t& array)
{
for (val_t val: array)
if (val)
std::cout << val << ' ';
else
std::cout << "* ";
std::cout << std::endl;
}
int main()
{
arr_t array({9,8,7,6,5,4,3,2,1});
printArray(array);
acc_t cost = transform(array);
printArray(array);
std::cout << "Cost=" << cost << std::endl;
return 0;
}

Resources