Related
I came across this in a recent interview.
We are given a N*M grid consisting of numbers and a path in the grid is the nodes you traverse.We are given a constraint that we can only move either right or down in the grid.So given this grid, we need to find the lexographically smallest path,after sorting it, to reach from top left to bottom right point of the grid
Eg. if grid is 2*2
4 3
5 1
then lexographically smallest path as per the question is "1 3 4".
How to do such problem? Code is appreciated. Thanks in advance.
You can use Dynamic programming to solve this problem. Let f(i, j) be the smallest lexicographical path (after sorting the path) from (i, j) to (N, M) moving only right and down. Consider the following recurrence:
f(i, j) = sort( a(i, j) + smallest(f(i + 1, j), f(i, j + 1)))
where a(i, j) is the value in the grid at (i, j), smallest (x, y) returns the smaller lexicographical string between x and y. the + concatenate two strings, and sort(str) sorts the string str in lexical order.
The base case of the recurrence is:
f(N, M) = a(N, M)
Also the recurrence change when i = N or j = M (make sure that you see that).
Consider the following code written in C++:
//-- the 200 is just the array size. It can be modified
string a[200][200]; //-- represent the input grid
string f[200][200]; //-- represent the array used for memoization
bool calculated[200][200]; //-- false if we have not calculate the value before, and true if we have
int N = 199, M = 199; //-- Number of rows, Number of columns
//-- sort the string str and return it
string srt(string &str){
sort(str.begin(), str.end());
return str;
}
//-- return the smallest of x and y
string smallest(string & x, string &y){
for (int i = 0; i < x.size(); i++){
if (x[i] < y[i]) return x;
if (x[i] > y[i]) return y;
}
return x;
}
string solve(int i, int j){
if (i == N && j == M) return a[i][j]; //-- if we have reached the buttom right cell (I assumed the array is 1-indexed
if (calculated[i][j]) return f[i][j]; //-- if we have calculated this before
string ans;
if (i == N) ans = srt(a[i][j] + solve(i, j + 1)); //-- if we are at the buttom boundary
else if (j == M) ans = srt(a[i][j] + solve(i + 1, j)); //-- if we are at the right boundary
else ans = srt(a[i][j] + smallest(solve(i, j + 1), solve(i + 1, j)));
calculated[i][j] = true; //-- to fetch the calculated result in future calls
f[i][j] = ans;
return ans;
}
string calculateSmallestPath(){
return solve(1, 1);
}
You can apply a dynamic programming approach to solve this problem in O(N * M * (N + M)) time and space complexity.
Below I'll consider, that N is the number of rows, M is the number of columns, and top left cell has coordinates (0, 0), first for row and second for column.
Lets for each cell store the lexicographically smallest path ended at this cell in sorted order. The answer for row and column with 0 index is trivial, because there is only one way to reach each of these cells. For the rest of cells you should choose the smallest path for top and left cells and insert the value of current cell.
The algorithm is:
path[0][0] <- a[0][0]
path[i][0] <- insert(a[i][0], path[i - 1][0])
path[0][j] <- insert(a[0][j], path[0][j - 1])
path[i][j] <- insert(a[i][j], min(path[i - 1][j], path[i][j - 1])
If no number is repeated, this can be achieved in O (NM log (NM)) as well.
Intuition:
Suppose I label a grid with upper left corner (a,b) and bottom right corner (c,d) as G(a,b,c,d). Since you've to attain the lexicographically smallest string AFTER sorting the path, the aim should be to find the minimum value every time in G. If this minimum value is attained at, let's say, (i,j), then G(i,b,c,j) and G(a,j,i,d) are rendered useless for the search of our next min (for the path). That is to say, the values for the path we desire would never be in these two grids. Proof? Any location within these grids, if traversed will not let us reach the minimum value in G(a,b,c,d) (the one at (i,j)). And, if we avoid (i,j), the path we build cannot be lexicographically smallest.
So, first we find the min for G(1,1,m,n). Suppose it's at (i,j). Mark the min. We then find out the min in G(1,1,i,j) and G(i,j,m,n) and do the same for them. Keep continuing this way until, at the end, we have m+n-1 marked entries, which will constitute our path. Traverse the original grid G(1,1,m,n) linearly and the report the value if it is marked.
Approach:
To find the min every time in G is costly. What if we map each value in the grid to it's location? - Traverse the grid and maintain a dictionary Dict with the key being the value at (i,j) and the value being the tuple (i,j). At the end, you'll have a list of key value pairs covering all the values in the grid.
Now, we'll be maintaining a list of valid grids in which we will find candidates for our path. The first valid grid will be G(1,1,m,n).
Sort the keys and start iterating from the first value in the sorted key set S.
Maintain a tree of valid grids, T(G), such that for each G(a,b,c,d) in T, G.left = G(a,b,i,j) and G.right = G(i,j,c,d) where (i,j) = location of min val in G(a,b,c,d)
The algorithm now:
for each val in sorted key set S do
(i,j) <- Dict(val)
Grid G <- Root(T)
do while (i,j) in G
if G has no child do
G.left <- G(a,b,i,j)
G.right <- G(i,j,c,d)
else if (i,j) in G.left
G <- G.left
else if (i,j) in G.right
G <- G.right
else
dict(val) <- null
end do
end if-else
end do
end for
for each val in G(1,1,m,n)
if dict(val) not null
solution.append(val)
end if
end for
return solution
The Java code:
class Grid{
int a, b, c, d;
Grid left, right;
Grid(int a, int b, int c, int d){
this.a = a;
this.b = b;
this.c = c;
this.d = d;
left = right = null;
}
public boolean isInGrid(int e, int f){
return (e >= a && e <= c && f >= b && f <= d);
}
public boolean hasNoChild(){
return (left == null && right == null);
}
}
public static int[] findPath(int[][] arr){
int row = arr.length;
int col = arr[0].length;
int[][] index = new int[row*col+1][2];
HashMap<Integer,Point> map = new HashMap<Integer,Point>();
for(int i = 0; i < row; i++){
for(int j = 0; j < col; j++){
map.put(arr[i][j], new Point(i,j));
}
}
Grid root = new Grid(0,0,row-1,col-1);
SortedSet<Integer> keys = new TreeSet<Integer>(map.keySet());
for(Integer entry : keys){
Grid temp = root;
int x = map.get(entry).x, y = map.get(entry).y;
while(temp.isInGrid(x, y)){
if(temp.hasNoChild()){
temp.left = new Grid(temp.a,temp.b,x, y);
temp.right = new Grid(x, y,temp.c,temp.d);
break;
}
if(temp.left.isInGrid(x, y)){
temp = temp.left;
}
else if(temp.right.isInGrid(x, y)){
temp = temp.right;
}
else{
map.get(entry).x = -1;
break;
}
}
}
int[] solution = new int[row+col-1];
int count = 0;
for(int i = 0 ; i < row; i++){
for(int j = 0; j < col; j++){
if(map.get(arr[i][j]).x >= 0){
solution[count++] = arr[i][j];
}
}
}
return solution;
}
The space complexity is constituted by maintenance of dictionary - O(NM) and of the tree - O(N+M). Overall: O(NM)
The time complexity for filling up and then sorting the dictionary - O(NM log(NM)); for checking the tree for each of the NM values - O(NM log(N+M)). Overall - O(NM log(NM)).
Of course, this won't work if values are repeated since then we'd have more than one (i,j)'s for a single value in the grid and the decision to chose which will no longer be satisfied by a greedy approach.
Additional FYI: The problem similar to this I heard about earlier had an additional grid property - there are no values repeating and the numbers are from 1 to NM. In such a case, the complexity could further reduce to O(NM log(N+M)) since instead of a dictionary, you can simply use values in the grid as indices of an array (which won't required sorting.)
Suppose I needed to solve the following equation,
ax + by = c
Where a, b, and c are known values and x, y are natural numbers between 0 and 10 (inclusively).
Other than the trivial solution of,
for (x = 0; x <= 10; x++)
for (y = 0; y <= 10; y++)
if (a * x + b * y == c)
printf("%d %d", x, y);
... is there any way to find all solutions for this independent system efficiently?
In your case, since x and y only take values between 0 and 10, brute force algorithm maybe the best option as it takes less time to implement.
However, if you have to find all pairs of integral solution (x, y) in a larger range, you really should apply the right mathematical tool for tackling this problem.
You are trying to solve a linear Diophantine equation, and it is well known that integral solution exists if and only if the greatest common divisor d of a and b divides c.
If solution does not exist, then you are done. Otherwise, you should first apply the Extended Euclidean Algorithm to find a paritcular solution for the equation ax + by = d.
And according to Bézout's identity, all other integral solutions are of the form:
where k is an arbitrary integer.
But note that we are interested in the solution of ax + by = c, we have to scale all our pairs of (x, y) by a factor of c / d.
You only to loop thru x, then calculate y. (x, y) is a solution if y is integer, and between 0 and 10.
In C:
for (int x = 0; x <= 10; ++x) {
double y = (double)(c - ax) / b;
// If y is an integer, and it's between 0 and 10, then (x, y) is a solution
BOOL isInteger = abs(floor(y) - y) < 0.001;
if (isInteger && 0 <= y && y <= 10) {
printf("%d %d", x, y);
}
}
You could avoid the second for loop by checking directly if (c-a*x)/b is an integer.
EDIT: My code is less clean than I had hoped, due to some careless oversights on my part pointed out in the comments, but it is still faster than nested for loops.
int by;
for (x = 0; x <= 10; x++) {
by = c-a*x; // this is b*y
if(b==0) { // check for special case of b==0
if (by==0) {
printf("%d and any value for y", x);
}
} else { // b!=0 case
y = by/b;
if (by%b==0 && 0<=y && y<=10) { // is y an integer between 0 and 10?
printf("%d %d", x, by/b);
}
}
}
There is grid of n x 1. You have to colour it with atleast r red cells, atleast g green cells, atleast b blue cells. (n + r + g <= n). Two patterns are said to be different if they differ in atleast one position. In how many ways u can colour it. (Solution can be either algorithmic or mathematical).
My attempt:
enter code here
int func(int id, int r, int g, int b)
{
int ma = 0;
if (id == n) {
if (r > 0)
ma++;
if (g > 0)
ma++;
if (b > 0)
ma++;
return ma;
}
if (r > 0)
ma += func(r-1, g, b, id + 1);
if (g > 0)
ma += func(r, g-1, b, id + 1);
if (b > 0)
ma += func(r, g, b-1, id + 1);
if (r + g + b < n - id) {
ma += func(r, g, b, id + 1);
}
return ma;
}
Suppose the number of them is f(n,r,g,b), then we have the following recursion:
f(n,r,g,b) = f(n-1,r,g,b)*3 + f(n-1,r-1,g,b)+f(n-1,r,g-1,b)+f(n-1,r,g,b-1).
Also we know the base cases: f(1,1,0,0)=f(1,0,1,0)=f(1,0,0,1)=1. Start from bottom and by above recursion build up f(n,r,g,b). (Is simple if you use memoization instead of for loops). the running time is O(n*r*g*b).
Update: Your code is close to my answer but first I should say that it's wrong, second, you used naive recursion which causes to exponential running time, allocate an array of size nrg*b, to prevent from recomputing already computed answer. See this for an instance of memoization.
I've sequences builded from 0's and 1's. I want to somehow measure their distance from target string. But target string is incomplete.
Example of data I have, where x is target string, where [0] means the occurance of at least one '0' :
x =11[0]1111[0]1111111[0]1[0]`, the length of x is fixed and eaquel to length of y.
y1=11110111111000000101010110101010111
y2=01101000011100001101010101101010010
all y's have the same length
it's easy to see that x could be indeed interpreted as set of strings, but this set could be very large, mayby simply I need to sample from that set and take average of minimum edit distances, but again it's too big computional problem.
I've tried to figure out algo, but I'm stacked, it steps look like this :
x - target string - fuzzy one,
y - second string - fixed
Cx1, Cy1 - numbers of ones in x and y
Gx1, Gy1 - lists of vectors, length of each list is equal to number of groups of ones in given sequence,
Gx1[i] i-th vector,
Gx1[i]=(first one in i-th group of ones, length of i-th group of ones)
if lengths of Gx1 and Gy1 are the same then we know how many ones to add or remove from each group, but there's a problem, because I don't know if simple adding and removing gives minimum distance
Let (Q, Σ, δ, q0, F) be the target automaton, which accepts a regular language L ⊆ Σ*, and let w ∈ Σ* be the source string. You want to compute minx ∈ L d(x, w), where d denotes Levenshtein distance.
My approach is to generalize the usual dynamic program. Let D be a table indexed by Q × {0, …, |w|}. At the end of the computation, D(q, i) will be
minx : δ(q0, x) = q d(x, w[0…i]),
where w[0…i] denotes the length-(i + 1) prefix of w. In other words, D(q, i) is the distance between w[0…i] and the set of strings that leave the automaton in state q. The overall answer is
minq ∈ F D(q, |w|),
or the distance between w and the set of strings that leave the automaton in one of the final states, i.e., the language L.
The first column of D consists of the entries D(q, 0) for every state q ∈ Q. Since for every string x ∈ Σ* it holds that d(x, ε) = |x|, the entry D(q, 0) is the length of the shortest path from q0 to q in the graph defined by the transition function δ. Compute these entries by running "Dijkstra's algorithm" from q0 (actually just breadth-first search because the edge-lengths are all 1).
Subsequent columns of D are computed from the preceding column. First compute an auxiliary quantity D'(q, i) by minimizing over several possibilities.
Exact match For every state r ∈ Q such that δ(r, w[i]) = q, include D(r, i - 1).
Deletion Include D(q, i - 1) + 1.
Substitution For every state r ∈ Q and every letter a ∈ Σ ∖ {w[i]} such that δ(r, a) = q, include D(r, i - 1) + 1.
Note that I have left out Insertion. As with the first column, this is because it may be necessary to insert many letters here. To compute the D(i, q)s from the D'(i, q)s, run Dijkstra on an implicit graph with vertices Q ∪ {s} and, for every q ∈ Q, edges of length D'(i, q) from the super-source s to q and, for every q ∈ Q and a ∈ Σ, edges of length 1 from q to δ(q, a). Let D(i, q) be the final distances.
I believe that this algorithm, if implemented well (with a heap specialized to support Dijkstra with unit lengths), has running time O(|Q| |w| |Σ|), which, for small alphabets Σ, is comparable to the usual Levenshtein DP.
I would propose that you use dynamic programming for this one. The dp is two dimensional:xi - the index in the xpattern string you are in and yi - the index in the y string you are in and the value for each subproblem is the minimum edit distance between the substrings x[xi..x.size-1] and y[yi...y.size-1].
Here is how you can find the minimum edit distance between a x pattern given as you explain an a fixed y string. I will assume that the symbol # in the x-pattern means any positive number of zeros. Also I will use some global variables to make the code easier to read.
#include <iostream>
#include <string>
using namespace std;
const int max_len = 1000;
const int NO_SOLUTION = -2;
int dp[max_len][max_len];
string x; // pattern;
string y; // to compute minimum edit distance to
int solve(int xi /* index in x */, int yi /* index in y */) {
if (yi + 1 == y.size()) {
if (xi + 1 != x.size()) {
return dp[xi][yi] = NO_SOLUTION;
} else {
if (x[xi] == y[yi] || (y[yi] == '0' && x[xi] == '#')) {
return dp[xi][yi] = 0;
} else {
return dp[xi][yi] = 1; // need to change the character
}
}
}
if (xi + 1 == x.size()) {
if (x[xi] != '#') {
return dp[xi][yi] = NO_SOLUTION;
}
int number_of_ones = 0;
for (int j = yi; j < y.size(); ++j) {
if (y[j] == '1') {
number_of_ones++;
}
}
return dp[xi][yi] = number_of_ones;
}
int best = NO_SOLUTION;
if (x[xi] != '#') {
int temp = ((dp[xi + 1][yi + 1] == -1)?solve(xi + 1, yi +1):dp[xi + 1][yi +1]);
if (temp != NO_SOLUTION && x[xi] != y[yi]) {
temp++;
}
best = temp;
} else {
int temp = ((dp[xi + 1][yi + 1] == -1)?solve(xi + 1, yi +1):dp[xi + 1][yi +1]);
if (temp != NO_SOLUTION) {
if (y[yi] != '0') {
temp++;
}
best = temp;
}
int edit_distance = 0; // number of '1' covered by the '#'
// Here i represents the number of chars covered by the '#'
for (int i = 1; i < y.size(); ++i) {
if (yi + i >= y.size()) {
break;
}
int temp = ((dp[xi][yi + i] == -1)?solve(xi, yi + i):dp[xi][yi + i]);
if (temp == NO_SOLUTION) {
continue;
}
if (y[yi] != '0') {
edit_distance++;
}
temp += edit_distance;
if (best == NO_SOLUTION || temp < best) {
best = temp;
}
}
}
return best;
}
int main() {
memset(dp, -1, sizeof(dp));
cin >> x >> y;
cout << "Minimum possible edit distance is: " << solve(0,0) << endl;
return 0;
}
Hope this helps.
here is another dynamic programming question (Vazirani ch6)
Consider the following 3-PARTITION
problem. Given integers a1...an, we
want to determine whether it is
possible to partition of {1...n} into
three disjoint subsets I, J, K such
that
sum(I) = sum(J) = sum(K) = 1/3*sum(ALL)
For example, for input (1; 2; 3; 4; 4;
5; 8) the answer is yes, because there
is the partition (1; 8), (4; 5), (2;
3; 4). On the other hand, for input
(2; 2; 3; 5) the answer is no. Devise
and analyze a dynamic programming
algorithm for 3-PARTITION that runs in
time poly- nomial in n and (Sum a_i)
How can I solve this problem? I know 2-partition but still can't solve it
It's easy to generalize 2-sets solution for 3-sets case.
In original version, you create array of boolean sums where sums[i] tells whether sum i can be reached with numbers from the set, or not. Then, once array is created, you just see if sums[TOTAL/2] is true or not.
Since you said you know old version already, I'll describe only difference between them.
In 3-partition case, you keep array of boolean sums, where sums[i][j] tells whether first set can have sum i and second - sum j. Then, once array is created, you just see if sums[TOTAL/3][TOTAL/3] is true or not.
If original complexity is O(TOTAL*n), here it's O(TOTAL^2*n).
It may not be polynomial in the strictest sense of the word, but then original version isn't strictly polynomial too :)
I think by reduction it goes like this:
Reducing 2-partition to 3-partition:
Let S be the original set, and A be its total sum, then let S'=union({A/2},S).
Hence, perform a 3-partition on the set S' yields three sets X, Y, Z.
Among X, Y, Z, one of them must be {A/2}, say it's set Z, then X and Y is a 2-partition.
The witnesses of 3-partition on S' is the witnesses of 2-partition on S, thus 2-partition reduces to 3-partition.
If this problem is to be solvable; then sum(ALL)/3 must be an integer. Any solution must have SUM(J) + SUM(K) = SUM(I) + sum(ALL)/3. This represents a solution to the 2-partition problem over concat(ALL, {sum(ALL)/3}).
You say you have a 2-partition implementation: use it to solve that problem. Then (at least) one of the two partitions will contain the number sum(ALL)/3 - remove the number from that partion, and you've found I. For the other partition, run 2-partition again, to split J from K; after all, J and K must be equal in sum themselves.
Edit: This solution is probably incorrect - the 2-partition of the concatenated set will have several solutions (at least one for each of I, J, K) - however, if there are other solutions, then the "other side" may not consist of the union of two of I, J, K, and may not be splittable at all. You'll need to actually think, I fear :-).
Try 2: Iterate over the multiset, maintaining the following map: R(i,j,k) :: Boolean which represents the fact whether up to the current iteration the numbers permit division into three multisets that have sums i, j, k. I.e., for any R(i,j,k) and next number n in the next state R' it holds that R'(i+n,j,k) and R'(i,j+n,k) and R'(i,j,k+n). Note that the complexity (as per the excersize) depends on the magnitude of the input numbers; this is a pseudo-polynomialtime algorithm. Nikita's solution is conceptually similar but more efficient than this solution since it doesn't track the third set's sum: that's unnecessary since you can trivially compute it.
As I have answered in same another question like this, the C++ implementation would look something like this:
int partition3(vector<int> &A)
{
int sum = accumulate(A.begin(), A.end(), 0);
if (sum % 3 != 0)
{
return false;
}
int size = A.size();
vector<vector<int>> dp(sum + 1, vector<int>(sum + 1, 0));
dp[0][0] = true;
// process the numbers one by one
for (int i = 0; i < size; i++)
{
for (int j = sum; j >= 0; --j)
{
for (int k = sum; k >= 0; --k)
{
if (dp[j][k])
{
dp[j + A[i]][k] = true;
dp[j][k + A[i]] = true;
}
}
}
}
return dp[sum / 3][sum / 3];
}
Let's say you want to partition the set $X = {x_1, ..., x_n}$ in $k$ partitions.
Create a $ n \times k $ table. Assume the cost $M[i,j]$ be the maximum sum of $i$ elements in $j$ partitions. Just recursively use the following optimality criterion to fill it:
M[n,k] = min_{i\leq n} max ( M[i, k-1], \sum_{j=i+1}^{n} x_i )
Using these initial values for the table:
M[i,1] = \sum_{j=1}^{i} x_i and M[1,j] = x_j
The running time is $O(kn^2)$ (polynomial )
Create a three dimensional array, where size is count of elements, and part is equal to to sum of all elements divided by 3. So each cell of array[seq][sum1][sum2] tells can you create sum1 and sum2 using max seq elements from given array A[] or not. So compute all values of array, result will be in cell array[using all elements][sum of all element / 3][sum of all elements / 3], if you can create two sets without crossing equal to sum/3, there will be third set.
Logic of checking: exlude A[seq] element to third sum(not stored), check cell without element if it has same two sums; OR include to sum1 - if it is possible to get two sets without seq element, where sum1 is smaller by value of element seq A[seq], and sum2 isn't changed; OR include to sum2 check like previous.
int partition3(vector<int> &A)
{
int part=0;
for (int a : A)
part += a;
if (part%3)
return 0;
int size = A.size()+1;
part = part/3+1;
bool array[size][part][part];
//sequence from 0 integers inside to all inside
for(int seq=0; seq<size; seq++)
for(int sum1=0; sum1<part; sum1++)
for(int sum2=0;sum2<part; sum2++) {
bool curRes;
if (seq==0)
if (sum1 == 0 && sum2 == 0)
curRes = true;
else
curRes= false;
else {
int curInSeq = seq-1;
bool excludeFrom = array[seq-1][sum1][sum2];
bool includeToSum1 = (sum1>=A[curInSeq]
&& array[seq-1][sum1-A[curInSeq]][sum2]);
bool includeToSum2 = (sum2>=A[curInSeq]
&& array[seq-1][sum1][sum2-A[curInSeq]]);
curRes = excludeFrom || includeToSum1 || includeToSum2;
}
array[seq][sum1][sum2] = curRes;
}
int result = array[size-1][part-1][part-1];
return result;
}
Another example in C++ (based on the previous answers):
bool partition3(vector<int> const &A) {
int sum = 0;
for (int i = 0; i < A.size(); i++) {
sum += A[i];
}
if (sum % 3 != 0) {
return false;
}
vector<vector<vector<int>>> E(A.size() + 1, vector<vector<int>>(sum / 3 + 1, vector<int>(sum / 3 + 1, 0)));
for (int i = 1; i <= A.size(); i++) {
for (int j = 0; j <= sum / 3; j++) {
for (int k = 0; k <= sum / 3; k++) {
E[i][j][k] = E[i - 1][j][k];
if (A[i - 1] <= k) {
E[i][j][k] = max(E[i][j][k], E[i - 1][j][k - A[i - 1]] + A[i - 1]);
}
if (A[i - 1] <= j) {
E[i][j][k] = max(E[i][j][k], E[i - 1][j - A[i - 1]][k] + A[i - 1]);
}
}
}
}
return (E.back().back().back() / 2 == sum / 3);
}
You really want Korf's Complete Karmarkar-Karp algorithm (http://ac.els-cdn.com/S0004370298000861/1-s2.0-S0004370298000861-main.pdf, http://ijcai.org/papers09/Papers/IJCAI09-096.pdf). A generalization to three-partitioning is given. The algorithm is surprisingly fast given the complexity of the problem, but requires some implementation.
The essential idea of KK is to ensure that large blocks of similar size appear in different partitions. One groups pairs of blocks, which can then be treated as a smaller block of size equal to the difference in sizes that can be placed as normal: by doing this recursively, one ends up with small blocks that are easy to place. One then does a two-coloring of the block groups to ensure that the opposite placements are handled. The extension to 3-partition is a bit complicated. The Korf extension is to use depth-first search in KK order to find all possible solutions or to find a solution quickly.