How to print values in memoization method-Dynamic pragraming - algorithm

I know for a problem that can be solved using DP, can be solved by either tabulation(bottom-up) approach or memoization(top-down) approach. personally i find memoization is easy and even efficient approach(analysis required just to get recursive formula,once recursive formula is obtained, a brute-force recursive method can easily be converted to store sub-problem's result and reuse it.) The only problem that i am facing in this approach is, i am not able to construct actual result from the table which i filled on demand.
For example, in Matrix Product Parenthesization problem ( to decide in which order to perform the multiplications on Matrices so that cost of multiplication is minimum) i am able to calculate minimum cost not not able to generate order in algo.
For example, suppose A is a 10 × 30 matrix, B is a 30 × 5 matrix, and C is a 5 × 60 matrix. Then,
(AB)C = (10×30×5) + (10×5×60) = 1500 + 3000 = 4500 operations
A(BC) = (30×5×60) + (10×30×60) = 9000 + 18000 = 27000 operations.
here i am able to get min-cost as 27000 but unable to get order which is A(BC).
I used this. Suppose F[i, j] represents least number of multiplication needed to multiply Ai.....Aj and an array p[] is given which represents the chain of matrices such that the ith matrix Ai is of dimension p[i-1] x p[i]. So
0 if i=j
F[i,j]=
min(F[i,k] + F[k+1,j] +P_i-1 * P_k * P_j where k∈[i,j)
Below is the implementation that i have created.
#include<stdio.h>
#include<limits.h>
#include<string.h>
#define MAX 4
int lookup[MAX][MAX];
int MatrixChainOrder(int p[], int i, int j)
{
if(i==j) return 0;
int min = INT_MAX;
int k, count;
if(lookup[i][j]==0){
// recursively calculate count of multiplcations and return the minimum count
for (k = i; k<j; k++) {
int gmin=0;
if(lookup[i][k]==0)
lookup[i][k]=MatrixChainOrder(p, i, k);
if(lookup[k+1][j]==0)
lookup[k+1][j]=MatrixChainOrder(p, k+1, j);
count = lookup[i][k] + lookup[k+1][j] + p[i-1]*p[k]*p[j];
if (count < min){
min = count;
printf("\n****%d ",k); // i think something has be done here to represent the correct answer ((AB)C)D where first mat is represented by A second by B and so on.
}
}
lookup[i][j] = min;
}
return lookup[i][j];
}
// Driver program to test above function
int main()
{
int arr[] = {2,3,6,4,5};
int n = sizeof(arr)/sizeof(arr[0]);
memset(lookup, 0, sizeof(lookup));
int width =10;
printf("Minimum number of multiplications is %d ", MatrixChainOrder(arr, 1, n-1));
printf("\n ---->");
for(int l=0;l<MAX;++l)
printf(" %*d ",width,l);
printf("\n");
for(int z=0;z<MAX;z++){
printf("\n %d--->",z);
for(int x=0;x<MAX;x++)
printf(" %*d ",width,lookup[z][x]);
}
return 0;
}
I know using tabulation approach printing the solution is much easy but i want to do it in memoization technique.
Thanks.

Your code correctly computes the minimum number of multiplications, but you're struggling to display the optimal chain of matrix multiplications.
There's two possibilities:
When you compute the table, you can store the best index found in another memoization array.
You can recompute the optimal splitting points from the results in the memoization array.
The first would involve creating the split points in a separate array:
int lookup_splits[MAX][MAX];
And then updating it inside your MatrixChainOrder function:
...
if (count < min) {
min = count;
lookup_splits[i][j] = k;
}
You can then generate the multiplication chain recursively like this:
void print_mult_chain(int i, int j) {
if (i == j) {
putchar('A' + i - 1);
return;
}
putchar('(');
print_mult_chain(i, lookup_splits[i][j]);
print_mult_chain(lookup_splits[i][j] + 1, j);
putchar(')');
}
You can call the function with print_mult_chain(1, n - 1) from main.
The second possibility is that you don't cache lookup_splits and recompute it as necessary.
int get_lookup_splits(int p[], int i, int j) {
int best = INT_MAX;
int k_best;
for (int k = i; k < j; k++) {
int count = lookup[i][k] + lookup[k+1][j] + p[i-1]*p[k]*p[j];
if (count < best) {
best = count;
k_best = k;
}
}
return k;
}
This is essentially the same computation you did inside MatrixChainOrder, so if you go with this solution you should factor the code appropriately to avoid having two copies.
With this function, you can adapt print_mult_chain above to use it rather than the lookup_splits array. (You'll need to pass the p array in).
[None of this code is tested, so you may need to edit the answer to fix bugs].

Related

Dynamic programming based zigzag puzzle

I found this interesting dynamic programming problem where it's required to re-order a sequence of integers in order to maximize the output.
Steve has got N liquor bottles. Alcohol quantity of ith bottle is given by A[i]. Now he wants to have one drink from each of the bottles, in such a way that the total hangover is maximised.
Total hangover is calculated as follow (Assume the 'alcohol quantity' array uses 1-based indexing) :
int hangover=0 ;
for( int i=2 ; i<=N ; i++ ){
hangover += i * abs(A[i] - A[i-1]) ;
}
So, obviously the order in which he drinks from each bottle changes the Total hangover. He can drink the liquors in any order but not more than one drink from each bottle. Also once he starts drinking a liquor he will finish that drink before moving to some other liquor.
Steve is confused about the order in which he should drink so that the hangover is maximized. Help him find the maximum hangover he can have, if he can drink the liquors in any order.
Input Format :
First line contain number of test cases T. First line of each test case contains N, denoting the number of fruits. Next line contain N space separated integers denoting the sweetness of each fruit.
2
7
83 133 410 637 665 744 986
4
1 5 9 11
I tried everything that I could but I wasn't able to achieve a O(n^2) solution. By simply calculating the total hangover over all the permutations has a O(n!) time complexity. Can this problem be solved more efficiently?
Thanks!
My hunch: use a sort of "greedy chaining algorithm" instead of DP.
1) find the pair with the greatest difference (O(n^2))
2) starting from either, find successively the next element with the greatest difference, forming a sort of "chain" (2 x O(n^2))
3) once you've done it for both you'll have two "sums". Return the largest one as your optimal answer.
This greedy strategy should work because the nature of the problem itself is greedy: choose the largest difference for the last bottle, because this has the largest index, so the result will always be larger than some "compromising" alternative (one that distributes smaller but roughly uniform differences to the indices).
Complexity: O(3n^2). Can prob. reduce it to O(3/2 n^2) if you use linked lists instead of a static array + boolean flag array.
Pseudo-ish code:
int hang_recurse(int* A, int N, int I, int K, bool* F)
{
int sum = 0;
for (int j = 2; j <= N; j++, I--)
{
int maxdiff = 0, maxidx;
for (int i = 1; i <= N; i++)
{
if (F[i] == false)
{
int diff = abs(F[K] - F[i]);
if (diff > maxdiff)
{
maxdiff = diff;
maxidx = i;
}
}
}
K = maxidx;
F[K] = true;
sum += maxdiff * I;
}
return sum;
}
int hangover(int* A, int N)
{
bool* F = new bool[N];
int maxdiff = 0;
int maxidx_i, maxidx_j;
for (int j = 2; j <= N; j++, I--)
{
for (int i = 1; i <= N; i++)
{
int diff = abs(F[j] - F[i]);
if (diff > maxdiff)
{
maxdiff = diff;
maxidx_i = i;
maxidx_j = j;
}
}
}
F[maxidx_i] = F[maxidx_j] = true;
int maxsum = max(hang_recurse(A, N, N - 1, maxidx_i, F),
hang_recurse(A, N, N - 1, maxidx_j, F));
delete [] F;
return maxdiff * N + maxsum;
}

How to find all taxicab numbers less than N?

A taxicab number is an integer that can be expressed as the sum of two cubes of integers in two different ways: a^3+b^3 = c^3+d^3. Design an algorithm to find all taxicab numbers with a, b, c, and d less than N.
Please give both the space and time complexity in terms of N.
I could do it in o(N^2.logN) time with O(N^2) space.
Best algorithm I've found so far:
Form all pairs: N^2
Sort the sum: N^2 logN
Find duplicates less than N
But this takes N^2 space. Can we do better?
But this takes N^2 space. Can we do better?
There exists an O(N) space solution based on a priority queue. Time complexity is O(N^2 logN). To sketch out the idea of the algorithm, here is the matrix M such that M[i][j] = i^3 + j^3 (of course, the matrix is never created in memory):
0 1 8 27 64 125
1 2 9 28 65 126
8 9 16 35 72 133
27 28 35 54 91 152
64 65 72 91 128 189
125 126 133 152 189 250
Observe that every line and every row is sorted in ascending order. Let PQ be the priority queue. First we put the biggest element in the priority queue. Then perform the following, as long as the PQ is not empty:
Pop the biggest element from PQ
add adjacent element above if the PQ doesn't have any element from that row
add adjacent element on the left if the PQ doesn't have any element from that column, and if it is not under the diagonal of the matrix (to avoid redundant elements)
Note that
You don't need to create the matrix in memory to implement the algorithm
The elements will be popped from the PQ in descending order, from the biggest element of the matrix to its smallest one (avoiding elements from the redundant half part of the matrix).
Everytime the PQ issues the same value twice then we have found a taxicab number.
As an illustration, here is an implementation in C++. The time complexity is O(N^2 logN) and space complexity O(N).
#include <iostream>
#include <cassert>
#include <queue>
using namespace std;
typedef unsigned int value_type;
struct Square
{
value_type i;
value_type j;
value_type sum_of_cubes;
Square(value_type i, value_type j) : i(i), j(j), sum_of_cubes(i*i*i+j*j*j) {}
friend class SquareCompare;
bool taxicab(const Square& sq) const
{
return sum_of_cubes == sq.sum_of_cubes && i != sq.i && i != sq.j;
}
friend ostream& operator<<(ostream& os, const Square& sq);
};
class SquareCompare
{
public:
bool operator()(const Square& a, const Square& b)
{
return a.sum_of_cubes < b.sum_of_cubes;
}
};
ostream& operator<<(ostream& os, const Square& sq)
{
return os << sq.i << "^3 + " << sq.j << "^3 = " << sq.sum_of_cubes;
}
int main()
{
const value_type N=2001;
value_type count = 0;
bool in_i [N];
bool in_j [N];
for (value_type i=0; i<N; i++) {
in_i[i] = false;
in_j[i] = false;
}
priority_queue<Square, vector<Square>, SquareCompare> p_queue;
p_queue.push(Square(N-1, N-1));
in_i[N-1] = true;
in_j[N-1] = true;
while(!p_queue.empty()) {
Square sq = p_queue.top();
p_queue.pop();
in_i[sq.i] = false;
in_j[sq.j] = false;
// cout << "pop " << sq.i << " " << sq.j << endl;
if (sq.i > 0 && !in_i[sq.i - 1] && sq.i-1 >= sq.j) {
p_queue.push(Square(sq.i-1, sq.j));
in_i[sq.i-1] = true;
in_j[sq.j] = true;
// cout << "push " << sq.i-1 << " " << sq.j << endl;
}
if (sq.j > 0 && !in_j[sq.j-1] && sq.i >= sq.j - 1) {
p_queue.push(Square(sq.i, sq.j-1));
in_i[sq.i] = true;
in_j[sq.j - 1] = true;
// cout << "push " << sq.i << " " << sq.j-1 << endl;
}
if (sq.taxicab(p_queue.top())) {
/* taxicab number */
cout << sq << " " << p_queue.top() << endl;
count++;
}
}
cout << endl;
cout << "there are " << count << " taxicab numbers with a, b, c, d < " << N << endl;
return 0;
}
The answers given by Novneet Nov and user3017842 are both correct ideas for finding the taxicab numbers with storage O(N) using minHeap.
Just a little bit more explanation why the minHeap of size N works.
First, if you had all the sums (O(N^2)) and could sort them (O(N^2lgN)) you would just pick the duplicates as you traverse the sorted array. Well, in our case using a minHeap we can traverse in-order all the sums: we just need to ensure that the minHeap always contains the minimum unprocessed sum.
Now, we have a huge number of sums (O(N^2)). But, notice that this number can be split into N groups each of which has an easily defined minimum!
(fix a, change b from 0 to N-1 => here are your N groups. The sum in one group with a smaller b is smaller than one with a bigger b in the same group - because a is the same).
The minimum of union of these groups is in the union of mins of these
groups. Therefore, if you keep all minimums of these groups in the
minHeap you are guaranteed to have the total minimum in the minHeap.
Now, when you extract Min from the heap, you just add next smallest element from the group of this extracted min (so if you extracted (a, b) you add (a, b+1)) and you are guaranteed that your minHeap still contains the next unprocessed min of all the sums.
I found the solution/code here : Time complexity O(N^2 logN), space complexity O(N)
The solution is implemented by help of priority queues.
Reverse thinking can be easily done by looking at the code. It can be done in an array of size N because the min sums are deleted from the array after comparing to the next minimum and then the array is made to size N by adding a new sum - (i^3 + (j+1)^3).
A intuitive proof is here :
Initially, we have added (1,1),(2,2),(3,3),...,(N,N) in the min-priority queue.
Suppose a^+b^3=c^3+d^3, and (a,b) is the minimum that will be taken out of the priority queue next. To be able to detect this taxicab number, (c,d) must also be in the priority queue which would be taken out after (a,b).
Note: We would be adding (a,b+1) after extracting (a,b) so there is no way that extraction of (a,b) would result in addition of (c,d) to the priority queue, so it must already exist in the priority queue.
Now lets assume that (c,d) is not in the priority queue, because we haven't gotten to it yet. Instead, there is some (c,d−k) in the priority queue where k>0.
Since (a,b) is being taken out,
a^3+b^3≤c^3+(d−k)^3
However, a^3+b^3=c^3+d^3
Therefore,
c^3+d^3≤c^3+(d−k)^3
d≤d−k
k≤0
Since k>0, this is impossible. Thus our assumption can never come to pass.
Thus for every (a,b) which is being removed from the min-PQ, (c,d) is already in the min-PQ (or was just removed) if a^3+b^3=c^3+d^3
The time complexity of the algorithm can't be less than O(N2) in any case, since you might print up to O(N2) taxicab numbers.
To reduce space usage you could, in theory, use the suggestion mentioned here: little link. Basically, the idea is that first you try all possible pairs a, b and find the solution to this:
a = 1 − (p − 3 * q)(p2 + 3 * q2)
b = −1 + (p + 3 * q)(p2 + 3q2)
Then you can find the appropriate c, d pair using:
c = (p + 3 * q) - (p2 + 3 * q2)
d = -(p - 3 * q) + (p2 + 3 * q2)
and check whether they are both less than N. The issue here is that solving that system of equations might get a bit messy (by 'a bit' I mean very tedious).
The O(N2) space solution is much simpler, and it'd probably be efficient enough since anything of quadratic time complexity that can run in reasonable time limits will probably be fine with quadratic space usage.
I hope that helped!
version1 uses List and sorting
O(n^2*logn) time and O(n^2) space
public static void Taxicab1(int n)
{
// O(n^2) time and O(n^2) space
var list = new List<int>();
for (int i = 1; i <= n; i++)
{
for (int j = i; j <= n; j++)
{
list.Add(i * i * i + j * j * j);
}
}
// O(n^2*log(n^2)) time
list.Sort();
// O(n^2) time
int prev = -1;
foreach (var next in list)
{
if (prev == next)
{
Console.WriteLine(prev);
}
prev = next;
}
}
version2 uses HashSet
O(n^2) time and O(n^2) space
public static void Taxicab2(int n)
{
// O(n^2) time and O(n^2) space
var set = new HashSet<int>();
for (int i = 1; i <= n; i++)
{
for (int j = i; j <= n; j++)
{
int x = i * i * i + j * j * j;
if (!set.Add(x))
{
Console.WriteLine(x);
}
}
}
}
version3 uses min oriented Priority Queue
O(n^2*logn) time and O(n) space
public static void Taxicab3(int n)
{
// O(n) time and O(n) space
var pq = new MinPQ<SumOfCubes>();
for (int i = 1; i <= n; i++)
{
pq.Push(new SumOfCubes(i, i));
}
// O(n^2*logn) time
var sentinel = new SumOfCubes(0, 0);
while (pq.Count > 0)
{
var current = pq.Pop();
if (current.Result == sentinel.Result)
Console.WriteLine($"{sentinel.A}^3+{sentinel.B}^3 = {current.A}^3+{current.B}^3 = {current.Result}");
if (current.B <= n)
pq.Push(new SumOfCubes(current.A, current.B + 1));
sentinel = current;
}
}
where SummOfCubes
public class SumOfCubes : IComparable<SumOfCubes>
{
public int A { get; private set; }
public int B { get; private set; }
public int Result { get; private set; }
public SumOfCubes(int a, int b)
{
A = a;
B = b;
Result = a * a * a + b * b * b;
}
public int CompareTo(SumOfCubes other)
{
return Result.CompareTo(other.Result);
}
}
github
create an array: 1^3, 2^3, 3^3, 4^3, ....... k^3. such that k^3 < N and (k+1)^3 > N. the array size would be ~ (N)^(1/3). the array is sorted order.
use 2sum technique (link) in lineal time proportional to the array size. if we find 2 pairs of numbers, that is a hit.
looping through step 2 by decreasing N by 1 each time.
This will use O(N^(1/3)) extra space and ~ O(N^(4/3)) time.
A easy way of understanding Time complexity O(N^2 logN), space complexity O(N) is to think it as a merge of N sorted arrays plus a bookkeeping of the previously merged element.
It seems like a simple brute-force algorithm with proper bounds solves it in time proportional to n^1.33 and space proportional to n. Or could anyone point me to the place where I'm mistaken?
Consider 4 nested loops, each running from 1 to cubic root of n. Using these loops we can go over all possible combinations of 4 values and find the pairs forming taxicab numbers. It means each loop takes time proportional to cubic root of n, or n^(1/3). Multiply this value 4 times and get:
(n^(1/3)^4 = n^(4/3) = n^1.33
I wrote a solution in JavaScript and benchmarked it, and it seems to be working. One caveat is that the result is only partially sorted.
Here is my JavaScript code (it's not optimal yet, could be optimized even more):
function taxicab(n) {
let a = 1, b = 1, c = 1, d = 1,
cubeA = a**3 + b**3,
cubeB = c**3 + d**3,
results = [];
while (cubeA < n) { // loop over a
while (cubeA < n) { // loop over b
// avoid running nested loops if this number is already in results
if (results.indexOf(cubeA) === -1) {
while (cubeB <= cubeA) { // loop over c
while (cubeB <= cubeA) { // loop over d
if (cubeB === cubeA && a!=c && a!=d) { // found a taxicab number!
results.push(cubeA);
}
d++;
cubeB = c**3 + d**3;
} // end loop over d
c++;
d = c;
cubeB = c**3 + d**3;
} // end loop over c
}
b++;
cubeA = a**3 + b**3;
c = d = 1;
cubeB = c**3 + d**3;
} // end loop over d
a++;
b = a;
cubeA = a**3 + b**3;
} // end loop over a
return results;
}
Running taxicab(1E8) takes around 30 seconds in a browser console and yields 485 numbers as a result. Ten times smaller value taxicab(1E7) (10 millions) takes almost 1.4 seconds and yields 150 numbers. 10^1.33 * 1.4 = 29.9, i.e. multiplying n by 10 leads to the running time increased by 10^1.33 times. The result array is unsorted, but after quickly sorting it we get correct result, as it seems:
[1729, 4104, 13832, 20683, 32832, 39312, 40033, 46683, 64232, 65728,
110656, 110808, 134379, 149389, 165464, 171288, 195841, 216027, 216125,
262656, 314496, 320264, 327763, 373464, 402597, 439101, 443889, 513000,
513856, 515375, 525824, 558441, 593047, 684019, 704977, 805688, 842751,
885248, 886464, 920673, 955016, 984067, 994688, 1009736, 1016496, 1061424,
1073375, 1075032, 1080891, 1092728, 1195112, 1260441, 1323712, 1331064,
1370304, 1407672, 1533357, 1566728, 1609272, 1728216, 1729000, 1734264,
1774656, 1845649, 2048391, 2101248, 2301299, 2418271, 2515968, 2562112,
2585375, 2622104, 2691451, 2864288, 2987712, 2991816, 3220776, 3242197,
3375001, 3375008, 3511872, 3512808, 3551112, 3587409, 3628233, 3798613,
3813992, 4033503, 4104000, 4110848, 4123000, 4174281, 4206592, 4342914,
4467528, 4505949, 4511808, 4607064, 4624776, 4673088, …]
Here is a code for benchmarking:
// run taxicab(n) for k trials and return the average running time
function benchmark(n, k) {
let t = 0;
k = k || 1; // how many times to repeat the trial to get an averaged result
for(let i = 0; i < k; i++) {
let t1 = new Date();
taxicab(n);
let t2 = new Date();
t += t2 - t1;
}
return Math.round(t/k);
}
Finally, I tested it:
let T = benchmark(1E7, 3); // 1376 - running time for n = 10 million
let T2 = benchmark(2E7, 3);// 4821 - running time for n = 20 million
let powerLaw = Math.log2(T2/T); // 1.3206693816701993
So it means time is proportional to n^1.32 in this test. Repeating this many times with different values always yields around the same result: from 1.3 to 1.4.
First of all, we will construct the taxicab numbers instead of searching for them. The range we will use to construct a taxicab number i.e Ta(2) will go up to n^1/3 not n. Because if you cube a number bigger than n^1/3 it will be bigger than n and also we can't cube negative numbers to prevent that case by definition. We will use a HashSet to remember the sums of two cubed numbers in the algorithm. This will help us to lookup previous cubed sums in O(1) time while we are iterating over every possible pair of numbers in the range I mentioned earlier.
Time complexity: O(n^2/3)
Space complexity: O(n^1/3)
def taxicab_numbers(n: int) -> list[int]:
taxicab_numbers = []
max_num = math.floor(n ** (1. / 3.))
seen_sums = set()
for i in range(1, max_num + 1):
for j in range(i, max_num + 1):
cube_sum = i ** 3 + j ** 3
if cube_sum in seen_sums:
taxicab_numbers.append(cube_sum)
else:
seen_sums.add(cube_sum)
return taxicab_numbers
import java.util.*;
public class A5Q24 {
public static void main(String[] args) {
Scanner sc = new Scanner(System.in);
System.out.println("Enter number:");
int n = sc.nextInt();
// start checking every int less than the input
for (int a = 2;a <= n;a++) {
int count = 0;
// number of ways that number be expressed in sum of two number cubes
for (int i = 1; Math.pow(i, 3) < a; i++) {
// if the cube of number smaller is greater than the number than it goes out
for (int j = 1; j <= i; j++) {
if (Math.pow(i, 3) + Math.pow(j, 3) == a)
count++;
}
}
if (count == 2)
System.out.println(a);
}
sc.close();
}
}
I think we can also do better on time (O (N ^ 2)) with O(N ^ 2) memory, using a hashmap to check if a pair of cubes has already be seen. In Python:
def find_taxicab_numbers(n: int) -> List[Tuple[int, int, int, int, int]]:
"""
find all taxicab numbers smaller than n, i.e. integers that can be expressed as the sum of two cubes of positive
integers in two different ways so that a^3 + b^3 = c^3 + d^3.
Time: O(n ^ 2) (two loops, one dict lookup). Space: O(n ^ 2)) (all possible cubes)
:param n: upper bound for a, b, c, d
:return: list of tuples of int: a, b, c, d, and taxicab numbers
"""
cubes = [i ** 3 for i in range(n)]
seen_sum_cubes = dict() # mapping sum cubes -> a, b
taxicabs = list() # list of a, b, c, d, taxicab
# check all possible sums of cubes
for i in range(n):
for j in range(i):
sum_cubes = cubes[i] + cubes[j]
if sum_cubes in seen_sum_cubes:
prev_i, prev_j = seen_sum_cubes[sum_cubes]
taxicabs.append((i, j, prev_i, prev_j, sum_cubes))
else:
seen_sum_cubes[sum_cubes] = (i, j)
return taxicabs

How many moves to reach a destination? Efficient flood filling

I want to compute the distance of cells from a destination cell, using number of four-way movements to reach something. So the the four cells immediately adjacent to the destination have a distance of 1, and those on the four cardinal directions of each of them have a distance of 2 and so on. There is a maximum distance that might be around 16 or 20, and there are cells that are occupied by barriers; the distance can flow around them but not through them.
I want to store the output into a 2D array, and I want to be able to compute this 'distance map' for any destination on a bigger maze map very quickly.
I am successfully doing it with a variation on a flood fill where the I place incremental distance of the adjacent unfilled cells in a priority queue (using C++ STL).
I am happy with the functionality and now want to focus on optimizing the code, as it is very performance sensitive.
What cunning and fast approaches might there be?
I think you have done everything right. If you coded it correct it takes O(n) time and O(n) memory to compute flood fill, where n is the number of cells, and it can be proven that it's impossible to do better (in general case). And after fill is complete you just return distance for any destination with O(1), it easy to see that it also can be done better.
So if you want to optimize performance, you can only focused on CODE LOCAL OPTIMIZATION. Which will not affect asymptotic but can significantly improve your real execution time. But it's hard to give you any advice for code optimization without actually seeing source.
So if you really want to see optimized code see the following (Pure C):
include
int* BFS()
{
int N, M; // Assume we have NxM grid.
int X, Y; // Start position. X, Y are unit based.
int i, j;
int movex[4] = {0, 0, 1, -1}; // Move on x dimension.
int movey[4] = {1, -1, 0, 0}; // Move on y dimension.
// TO DO: Read N, M, X, Y
// To reduce redundant functions calls and memory reallocation
// allocate all needed memory once and use a simple arrays.
int* map = (int*)malloc((N + 2) * (M + 2));
int leadDim = M + 2;
// Our map. We use one dimension array. map[x][y] = map[leadDim * x + y];
// If (x,y) is occupied then map[leadDim*x + y] = -1;
// If (x,y) is not visited map[leadDim*x + y] = -2;
int* queue = (int*)malloc(N*M);
int first = 0, last =1;
// Fill the boarders to simplify the code and reduce conditions
for (i = 0; i < N+2; ++i)
{
map[i * leadDim + 0] = -1;
map[i * leadDim + M + 1] = -1;
}
for (j = 0; j < M+2; ++j)
{
map[j] = -1;
map[(N + 1) * leadDim + j] = -1;
}
// TO DO: Read the map.
queue[first] = X * leadDim + Y;
map[X * leadDim + Y] = 0;
// Very simple optimized process loop.
while (first < last)
{
int current = queue[first];
int step = map[current];
for (i = 0; i < 4; ++i)
{
int temp = current + movex[i] * leadDim + movey[i];
if (map[temp] == -2) // only one condition in internal loop.
{
map[temp] = step + 1;
queue[last++] = temp;
}
}
++first;
}
free(queue);
return map;
}
Code may seems tricky. And of course, it doesn't look like OOP (I actually think that OOP fans will hate it) but if you want something really fast that's what you need.
It's common task for BFS. Complexity is O(cellsCount)
My c++ implementation:
vector<vector<int> > GetDistance(int x, int y, vector<vector<int> > cells)
{
const int INF = 0x7FFFFF;
vector<vector<int> > distance(cells.size());
for(int i = 0; i < distance.size(); i++)
distance[i].assign(cells[i].size(), INF);
queue<pair<int, int> > q;
q.push(make_pair(x, y));
distance[x][y] = 0;
while(!q.empty())
{
pair<int, int> curPoint = q.front();
q.pop();
int curDistance = distance[curPoint.first][curPoint.second];
for(int i = -1; i <= 1; i++)
for(int j = -1; j <= 1; j++)
{
if( (i + j) % 2 == 0 ) continue;
pair<int, int> nextPoint(curPoint.first + i, curPoint.second + j);
if(nextPoint.first >= 0 && nextPoint.first < cells.size()
&& nextPoint.second >= 0 && nextPoint.second < cells[nextPoint.first].size()
&& cells[nextPoint.first][nextPoint.second] != BARRIER
&& distance[nextPoint.first][nextPoint.second] > curDistance + 1)
{
distance[nextPoint.first][nextPoint.second] = curDistance + 1;
q.push(nextPoint);
}
}
}
return distance;
}
Start with a recursive implementation: (untested code)
int visit( int xy, int dist) {
int ret =1;
if (array[xy] <= dist) return 0;
array[xy] = dist;
if (dist == maxdist) return ret;
ret += visit ( RIGHT(xy) , dist+1);
...
same for left, up, down
...
return ret;
}
You'l need to handle the initalisation and the edge-cases. And you have to decide if you want a two dimentional array or a one dimensonal array.
A next step could be to use a todo list and remove the recursion, and a third step could be to add some bitmasking.
8-bit computers in the 1970s did this with an optimization that has the same algorithmic complexity, but in the typical case is much faster on actual hardware.
Starting from the initial square, scan to the left and right until "walls" are found. Now you have a "span" that is one square tall and N squares wide. Mark the span as "filled," in this case each square with the distance to the initial square.
For each square above and below the current span, if it's not a "wall" or already filled, pick it as the new origin of a span.
Repeat until no new spans are found.
Since horizontal rows tend to be stored contiguously in memory, this algorithm tends to thrash the cache far less than one that has no bias for horizontal searches.
Also, since in the most common cases far fewer items are pushed and popped from a stack (spans instead of individual blocks) there is less time spent maintaining the stack.

3-PARTITION problem

here is another dynamic programming question (Vazirani ch6)
Consider the following 3-PARTITION
problem. Given integers a1...an, we
want to determine whether it is
possible to partition of {1...n} into
three disjoint subsets I, J, K such
that
sum(I) = sum(J) = sum(K) = 1/3*sum(ALL)
For example, for input (1; 2; 3; 4; 4;
5; 8) the answer is yes, because there
is the partition (1; 8), (4; 5), (2;
3; 4). On the other hand, for input
(2; 2; 3; 5) the answer is no. Devise
and analyze a dynamic programming
algorithm for 3-PARTITION that runs in
time poly- nomial in n and (Sum a_i)
How can I solve this problem? I know 2-partition but still can't solve it
It's easy to generalize 2-sets solution for 3-sets case.
In original version, you create array of boolean sums where sums[i] tells whether sum i can be reached with numbers from the set, or not. Then, once array is created, you just see if sums[TOTAL/2] is true or not.
Since you said you know old version already, I'll describe only difference between them.
In 3-partition case, you keep array of boolean sums, where sums[i][j] tells whether first set can have sum i and second - sum j. Then, once array is created, you just see if sums[TOTAL/3][TOTAL/3] is true or not.
If original complexity is O(TOTAL*n), here it's O(TOTAL^2*n).
It may not be polynomial in the strictest sense of the word, but then original version isn't strictly polynomial too :)
I think by reduction it goes like this:
Reducing 2-partition to 3-partition:
Let S be the original set, and A be its total sum, then let S'=union({A/2},S).
Hence, perform a 3-partition on the set S' yields three sets X, Y, Z.
Among X, Y, Z, one of them must be {A/2}, say it's set Z, then X and Y is a 2-partition.
The witnesses of 3-partition on S' is the witnesses of 2-partition on S, thus 2-partition reduces to 3-partition.
If this problem is to be solvable; then sum(ALL)/3 must be an integer. Any solution must have SUM(J) + SUM(K) = SUM(I) + sum(ALL)/3. This represents a solution to the 2-partition problem over concat(ALL, {sum(ALL)/3}).
You say you have a 2-partition implementation: use it to solve that problem. Then (at least) one of the two partitions will contain the number sum(ALL)/3 - remove the number from that partion, and you've found I. For the other partition, run 2-partition again, to split J from K; after all, J and K must be equal in sum themselves.
Edit: This solution is probably incorrect - the 2-partition of the concatenated set will have several solutions (at least one for each of I, J, K) - however, if there are other solutions, then the "other side" may not consist of the union of two of I, J, K, and may not be splittable at all. You'll need to actually think, I fear :-).
Try 2: Iterate over the multiset, maintaining the following map: R(i,j,k) :: Boolean which represents the fact whether up to the current iteration the numbers permit division into three multisets that have sums i, j, k. I.e., for any R(i,j,k) and next number n in the next state R' it holds that R'(i+n,j,k) and R'(i,j+n,k) and R'(i,j,k+n). Note that the complexity (as per the excersize) depends on the magnitude of the input numbers; this is a pseudo-polynomialtime algorithm. Nikita's solution is conceptually similar but more efficient than this solution since it doesn't track the third set's sum: that's unnecessary since you can trivially compute it.
As I have answered in same another question like this, the C++ implementation would look something like this:
int partition3(vector<int> &A)
{
int sum = accumulate(A.begin(), A.end(), 0);
if (sum % 3 != 0)
{
return false;
}
int size = A.size();
vector<vector<int>> dp(sum + 1, vector<int>(sum + 1, 0));
dp[0][0] = true;
// process the numbers one by one
for (int i = 0; i < size; i++)
{
for (int j = sum; j >= 0; --j)
{
for (int k = sum; k >= 0; --k)
{
if (dp[j][k])
{
dp[j + A[i]][k] = true;
dp[j][k + A[i]] = true;
}
}
}
}
return dp[sum / 3][sum / 3];
}
Let's say you want to partition the set $X = {x_1, ..., x_n}$ in $k$ partitions.
Create a $ n \times k $ table. Assume the cost $M[i,j]$ be the maximum sum of $i$ elements in $j$ partitions. Just recursively use the following optimality criterion to fill it:
M[n,k] = min_{i\leq n} max ( M[i, k-1], \sum_{j=i+1}^{n} x_i )
Using these initial values for the table:
M[i,1] = \sum_{j=1}^{i} x_i and M[1,j] = x_j
The running time is $O(kn^2)$ (polynomial )
Create a three dimensional array, where size is count of elements, and part is equal to to sum of all elements divided by 3. So each cell of array[seq][sum1][sum2] tells can you create sum1 and sum2 using max seq elements from given array A[] or not. So compute all values of array, result will be in cell array[using all elements][sum of all element / 3][sum of all elements / 3], if you can create two sets without crossing equal to sum/3, there will be third set.
Logic of checking: exlude A[seq] element to third sum(not stored), check cell without element if it has same two sums; OR include to sum1 - if it is possible to get two sets without seq element, where sum1 is smaller by value of element seq A[seq], and sum2 isn't changed; OR include to sum2 check like previous.
int partition3(vector<int> &A)
{
int part=0;
for (int a : A)
part += a;
if (part%3)
return 0;
int size = A.size()+1;
part = part/3+1;
bool array[size][part][part];
//sequence from 0 integers inside to all inside
for(int seq=0; seq<size; seq++)
for(int sum1=0; sum1<part; sum1++)
for(int sum2=0;sum2<part; sum2++) {
bool curRes;
if (seq==0)
if (sum1 == 0 && sum2 == 0)
curRes = true;
else
curRes= false;
else {
int curInSeq = seq-1;
bool excludeFrom = array[seq-1][sum1][sum2];
bool includeToSum1 = (sum1>=A[curInSeq]
&& array[seq-1][sum1-A[curInSeq]][sum2]);
bool includeToSum2 = (sum2>=A[curInSeq]
&& array[seq-1][sum1][sum2-A[curInSeq]]);
curRes = excludeFrom || includeToSum1 || includeToSum2;
}
array[seq][sum1][sum2] = curRes;
}
int result = array[size-1][part-1][part-1];
return result;
}
Another example in C++ (based on the previous answers):
bool partition3(vector<int> const &A) {
int sum = 0;
for (int i = 0; i < A.size(); i++) {
sum += A[i];
}
if (sum % 3 != 0) {
return false;
}
vector<vector<vector<int>>> E(A.size() + 1, vector<vector<int>>(sum / 3 + 1, vector<int>(sum / 3 + 1, 0)));
for (int i = 1; i <= A.size(); i++) {
for (int j = 0; j <= sum / 3; j++) {
for (int k = 0; k <= sum / 3; k++) {
E[i][j][k] = E[i - 1][j][k];
if (A[i - 1] <= k) {
E[i][j][k] = max(E[i][j][k], E[i - 1][j][k - A[i - 1]] + A[i - 1]);
}
if (A[i - 1] <= j) {
E[i][j][k] = max(E[i][j][k], E[i - 1][j - A[i - 1]][k] + A[i - 1]);
}
}
}
}
return (E.back().back().back() / 2 == sum / 3);
}
You really want Korf's Complete Karmarkar-Karp algorithm (http://ac.els-cdn.com/S0004370298000861/1-s2.0-S0004370298000861-main.pdf, http://ijcai.org/papers09/Papers/IJCAI09-096.pdf). A generalization to three-partitioning is given. The algorithm is surprisingly fast given the complexity of the problem, but requires some implementation.
The essential idea of KK is to ensure that large blocks of similar size appear in different partitions. One groups pairs of blocks, which can then be treated as a smaller block of size equal to the difference in sizes that can be placed as normal: by doing this recursively, one ends up with small blocks that are easy to place. One then does a two-coloring of the block groups to ensure that the opposite placements are handled. The extension to 3-partition is a bit complicated. The Korf extension is to use depth-first search in KK order to find all possible solutions or to find a solution quickly.

Iterating shuffled [0..n) without arrays

I know of a couple of routines that work as follows:
Xn+1 = Routine(Xn, max)
For example, something like a LCG generator:
Xn+1 = (a*Xn + c) mod m
There isn't enough parameterization in this generator to generate every sequence.
Dream Function:
Xn+1 = Routine(Xn, max, permutation number)
This routine, parameterized by an index into the set of all permutations, would return the next number in the sequence. The sequence may be arbitrarily large (so storing the array and using factoradic numbers is impractical.
Failing that, does anyone have pointers to similar functions that are either stateless or have a constant amount of state for arbitrary 'max', such that they will iterate a shuffled list.
There are n! permutations of n elements. Storing which one you're using requires at least log(n!) / log(2) bits. By Stirling's approximation, this takes roughly n log(n) / log (2) bits.
Explicitly storing one index takes log(n) / log(2) bits. Storing all n, as in an array of indices takes n times as many, or again n log(n) / log(2). Information-theoretically, there is no better way than explicitly storing the permutation.
In other words, the index you pass in of what permutation in the set you want takes the same asymptotic storage space as just writing out the permutation. If, for, example, you limit the index of the permutation to 32 bit values, you can only handle permutations of up to 12 elements. 64 bit indices only get you up to 20 elements.
As the index takes the same space as the permutation would, either change your representation to just use the permutation directly, or accept unpacking into an array of size N.
From my response to another question:
It is actually possible to do this in
space proportional to the number of
elements selected, rather than the
size of the set you're selecting from,
regardless of what proportion of the
total set you're selecting. You do
this by generating a random
permutation, then selecting from it
like this:
Pick a block cipher, such as TEA
or XTEA. Use XOR folding to
reduce the block size to the smallest
power of two larger than the set
you're selecting from. Use the random
seed as the key to the cipher. To
generate an element n in the
permutation, encrypt n with the
cipher. If the output number is not in
your set, encrypt that. Repeat until
the number is inside the set. On
average you will have to do less than
two encryptions per generated number.
This has the added benefit that if
your seed is cryptographically secure,
so is your entire permutation.
I wrote about this in much more detail
here.
Of course, there's no guarantee that every permutation can be generated (and depending on your block size and key size, that may not even be possible), but the permutations you can get are highly random (if they weren't, it wouldn't be a good cipher), and you can have as many of them as you want.
If you are wanting a function that takes up less stack space, then you should look into using an iterated version, rather than a function. You can also use a datastructure like a TreeMap, and have it stored on disk, and read on an as needed basis.
X(n+1) = Routine(Xn, max, permutation number)
for(i = n; i > 0; i--)
{
int temp = Map.lookup(i)
otherfun(temp,max,perm)
}
Is it possible to index a set of permutations without previously computing and storing the whole thing in memory? I tried something like this before and didn't find a solution - I think it is impossible (in the mathematical sense).
Disclaimer: I may have misunderstood your question...
Code that uses an iterate interface. Time complexity is O(n^2), Space complexity has an overhead of: copy of n (log n bits), an iteration variable (log n bits), keeping track of n-i (log n bits), , copy of current value (log n bits), copy of p (n log n bits), creation of next value (log n bits), and a bit set of used values (n bits). You can't avoid an overhead of n log n bits. Timewise, this is also O(n^2), for setting the bits. This can be reduced a bit, but at the cost of using a decorated tree to store the used values.
This can be altered to use arbitrary precision integers and bit sets by using calls to the appropriate libraries instead, and the above bounds will actually start to kick in, rather than being capped at N=8, portably (an int can be the same as a short, and as small as 16 bits). 9! = 362880 > 65536 = 2^16
#include <math.h>
#include <stdio.h>
typedef signed char index_t;
typedef unsigned int permutation;
static index_t permutation_next(index_t n, permutation p, index_t value)
{
permutation used = 0;
for (index_t i = 0; i < n; ++i) {
index_t left = n - i;
index_t digit = p % left;
p /= left;
for (index_t j = 0; j <= digit; ++j) {
if (used & (1 << j)) {
digit++;
}
}
used |= (1 << digit);
if (value == -1) {
return digit;
}
if (value == digit) {
value = -1;
}
}
/* value not found */
return -1;
}
static void dump_permutation(index_t n, permutation p)
{
index_t value = -1;
fputs("[", stdout);
value = permutation_next(n, p, value);
while (value != -1) {
printf("%d", value);
value = permutation_next(n, p, value);
if (value != -1) {
fputs(", ", stdout);
}
}
puts("]");
}
static int factorial(int n)
{
int prod = 1;
for (int i = 1; i <= n; ++i) {
prod *= i;
}
return prod;
}
int main(int argc, char **argv)
{
const index_t n = 4;
const permutation max = factorial(n);
for (permutation p = 0; p < max; ++p) {
dump_permutation(n, p);
}
}
Code that unpacks a permutation index into an array, with a certain mapping from index to permutation. There are loads of others, but this one is convenient.
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
typedef unsigned char index_t;
typedef unsigned int permutation;
static void permutation_to_array(index_t *indices, index_t n, permutation p)
{
index_t used = 0;
for (index_t i = 0; i < n; ++i) {
index_t left = n - i;
index_t digit = p % left;
for (index_t j = 0; j <= digit; ++j) {
if (used & (1 << j)) {
digit++;
}
}
used |= (1 << digit);
indices[i] = digit;
p /= left;
}
}
static void dump_array(index_t *indices, index_t n)
{
fputs("[", stdout);
for (index_t i = 0; i < n; ++i) {
printf("%d", indices[i]);
if (i != n - 1) {
fputs(", ", stdout);
}
}
puts("]");
}
static int factorial(int n)
{
int prod = 1;
for (int i = 1; i <= n; ++i) {
prod *= i;
}
return prod;
}
int main(int argc, char **argv)
{
const index_t n = 4;
const permutation max = factorial(n);
index_t *indices = malloc(n * sizeof (*indices));
for (permutation p = 0; p < max; ++p) {
permutation_to_array(indices, n, p);
dump_array(indices, n);
}
free(indices);
}

Resources