Deriving a mathematical formulation for my recursive solution? - algorithm

In the question we have items with different values but all of the items weight doesn't matter. We have a goal of profit that we want to have by picking those items. But we want to have least amount of items and items are infinite.
So let's say our goal is 10 and we have items with values of 1,2,3,4. We want to have 4,4,2 rather than 3,3,3,1. They have same total value but what we wanted was the least amount of items with the same profit.
I already derived both dynamic and recursive methods to solve it but the trouble for me is that I just can not derive a mathematical formula for my recursive method.
Here is my recursive method
static int recursiveSolution(int[] values, int goal, int totalAmount) {
if (goal == 0)
return totalAmount;
if (goal < 0)
return Integer.MAX_VALUE;
totalAmount++;
int minAmount = Integer.MAX_VALUE;
for (int i = 0; i < values.length; i++) {
minAmount = Math.min(minAmount, recursiveSolution(values, goal - values[i], totalAmount));
}
return minAmount;
}

Assuming that what you are asking for is the Big-O of your algorithm...
This looks like a simple brute-force summation search. Unfortunately, the run-time of these are not easy to estimate and are dependent on the inputs values as well as their count (N). The Big-O estimates that I have seen for this type of solution are something like this:
Let
N = the number of values in the `values[]` array,
Gv = PRODUCT(values[])^(1/N)
(the geometric mean of the `values[]` array),
M = the target summation (`totalAmount`),
Then the complexity of the algorithm is (very approximately)
O(N^(M/Gv))
If I recall this correctly.

Related

Big-O & Runing Time of this algorthm , how can convert this to an iterative algorthm

What is the runing time of this algorthm in Big-O and how i convert this to iterative algorthm?
public static int RecursiveMaxOfArray(int[] array) {
int array1[] = new int[array.length/2];
int array2[] = new int[array.length - (array.length/2)];
for (int index = 0; index < array.length/2 ; index++) {
array1[index] = array[index];
}
for (int index = array.length/2; index < array.length; index++) {
array2[index - array.length/2] = array[index] ;
}
if (array.length > 1) {
if(RecursiveMaxOfArray(array1) > RecursiveMaxOfArray(array2)) {
return RecursiveMaxOfArray(array1) ;
}
else {
return RecursiveMaxOfArray(array2) ;
}
}
return array[0] ;
}
At each stage, an array of size N is divided into equal halves. The function is then recursively called three times on an array of size N/2. Why three instead of the four which are written? Because the if statement only enters one of its clauses. Therefore the recurrence relation is T(N) = 3T(N/2) + O(N), which (using the Master theorem) gives O(N^[log2(3)]) = O(n^1.58).
However, you don't need to call it for the third time; just cache the return result of each recursive call in a local variable. The coefficient 3 in the recurrence relation becomes 2; I'll leave it to you to apply the Master theorem on the new recurrence.
There's another answer that accurately describes your algorithm's runtime complexity, how to determine it, and how to improve it, so I won't focus on that. Instead, let's look at the other part of your question:
how [do] i convert this to [an] iterative algorithm?
Well, there's a straightforward solution to that which you hopefully could have gotten yourself - loop over the list and track the smallest value you've seen so far.
However, I'm guessing your question is better phrased as this:
How do I convert a recursive algorithm into an iterative algorithm?
There are plenty of questions and answers on this, not just here on StackOverflow, so I suggest you do some more research on this subject. These blog posts on converting recursion to iteration may be an excellent place to start if this is your approach to take, though I can't vouch for them because I haven't read them. I just googled "convert recursion to iteration," picked the first result, then found this page which links to all four of the blog post.

Modification to subsetsum algorithm by pisinger

I was looking at the algorithm by pisinger as detailed here
Fast solution to Subset sum algorithm by Pisinger
and on wikipedia http://en.wikipedia.org/wiki/Subset_sum_problem
For the case that each xi is positive and bounded by a fixed constant C, Pisinger found a linear time algorithm having time complexity O(NC).[3] (Note that this is for the version of the problem where the target sum is not necessarily zero, otherwise the problem would be trivial.)
It seems that, with his approach, there are two constraints. The first one in particular says that all values in any input, must be <= C.
It sounds to me that, with just that constraint alone, this is not an algorithm that can solve the original subset sum problem (with no restrictions).
But suppose C=1000000 for example. Then before we even run the algorithm on the input list. Can't we just divide every element by some number d (and also divide the target sum by d too) such that every number in the input list will be <= C.
When the algorithm returns some subset s, we just multiply every element in s by d. And we will have our true subset.
With that observation, it feels like it not worth mentioning that we need that C constraint. Or at least say that the input can be changed W.L.O.G (without loss of generality). So we can still solve the same problem no matter the input. That constraint is not making the problem any harder in other words. So really his algorithms only constraint is that it can only handle numbers >=0.
Am I right?
Thanks
What adrian.budau said is correct.
In Polynomial solution,
every value of array should remain as integer after the division you said.
otherwise DP solution does not work anymore, because item values are used as index in DP array.
Following is my Polynomial(DP) solution for subsetsum problem(C=ELEMENT_MAX_VALUE)
bool subsetsum_dp(VI& v, int sum)
{
const int MAX_ELEMENT = 100;
const int MAX_ELEMENT_VALUE = 1000;
static int dp[MAX_ELEMENT+1][MAX_ELEMENT*MAX_ELEMENT_VALUE + 1]; memset(dp, -1, sizeof(dp));
int n = S(v);
dp[0][0] = 1, dp[0][v[0]] = 1;//include, excluse the value
FOR(i, 1, n)
{
REP(j, MAX_ELEMENT*MAX_ELEMENT_VALUE + 1)
{
if (dp[i-1][j] != -1) dp[i][j] = 1, dp[i][j + v[i]] = 1;//include, excluse the value
}
}
if (dp[n-1][sum] == 1) return true;
return false;
}

Coin Change analysis

{1,3,5} denomination coins; Sum = 11.
find minimum number of coins which can be used to make the sum
(We can use any number of coins of each denomination)
I searched for Run Time complexity of this Coin change problem particularly using dynamic programming method. But was not able to find explanation anywhere.
How to calculate the complexity of non dynamic solution and then change it for the dynamic one? (not the greedy one)
Edit:
Here is an implementation for which analysis was asked.
public int findCoinChange(int[] coins, int sum,int count) {
int ret = 0, maxRet = -1;
if(sum ==0)maxRet = count;
else if(sum < 0)maxRet = -1;
else{
for(int i:coins){
ret = findCoinChange(coins, sum - i,count+1);
if(maxRet< 0)maxRet = ret;
else if(ret >=0 && ret < maxRet){
maxRet = ret;
}
}
}
if(maxRet < 0)return -1;
else return maxRet;
}
Looks like Combinatorial explosion to me. However I am not sure how to deduce a run time complexity for this.
The dynamic programming solution to this problem is clearly O(k * n) (nested loops, blah blah blah) where k is the number of coins and n is the amount of money that change is being made for.
I don't know what you mean by non-dynamic programming solution. Sorry, you're going to have specify what algorithm you mean. The greedy algorithm fails in some cases, so you shouldn't be referring to that. Do you mean the linear programming solution? That's a terrible approach to this problem because we don't know what the complexity is, and it's possible to make it run arbitrarily slowly.
I also don't know what you mean by "change it for the dynamic one."

Algorithm on interview

Recently I was asked the following interview question:
You have two sets of numbers of the same length N, for example A = [3, 5, 9] and B = [7, 5, 1]. Next, for each position i in range 0..N-1, you can pick either number A[i] or B[i], so at the end you will have another array C of length N which consists in elements from A and B. If sum of all elements in C is less than or equal to K, then such array is good. Please write an algorithm to figure out the total number of good arrays by given arrays A, B and number K.
The only solution I've come up is Dynamic Programming approach, when we have a matrix of size NxK and M[i][j] represents how many combinations could we have for number X[i] if current sum is equal to j. But looks like they expected me to come up with a formula. Could you please help me with that? At least what direction should I look for? Will appreciate any help. Thanks.
After some consideration, I believe this is an NP-complete problem. Consider:
A = [0, 0, 0, ..., 0]
B = [b1, b2, b3, ..., bn]
Note that every construction of the third set C = ( A[i] or B[i] for i = 0..n ) is is just the union of some subset of A and some subset of B. In this case, since every subset of A sums to 0, the sum of C is the same as the sum of some subset of B.
Now your question "How many ways can we construct C with a sum less than K?" can be restated as "How many subsets of B sum to less than K?". Solving this problem for K = 1 and K = 0 yields the solution to the subset sum problem for B (the difference between the two solutions is the number of subsets that sum to 0).
By similar argument, even in the general case where A contains nonzero elements, we can construct an array S = [b1-a1, b2-a2, b3-a3, ..., bn-an], and the question becomes "How many subsets of S sum to less than K - sum(A)?"
Since the subset sum problem is NP-complete, this problem must be also. So with that in mind, I would venture that the dynamic programming solution you proposed is the best you can do, and certainly no magic formula exists.
" Please write an algorithm to figure out the total number of good
arrays by given arrays A, B and number K."
Is it not the goal?
int A[];
int B[];
int N;
int K;
int Solutions = 0;
void FindSolutons(int Depth, int theSumSoFar) {
if (theSumSoFar > K) return;
if (Depth >= N) {
Solutions++;
return;
}
FindSolutions(Depth+1,theSumSoFar+A[Depth]);
FindSolutions(Depth+1,theSumSoFar+B[Depth]);
}
Invoke FindSolutions with both arguments set to zero. On return the Solutions will be equal to the number of good arrays;
this is how i would try to solve the problem
(Sorry if its stupid)
think of arrays
A=[3,5,9,8,2]
B=[7,5,1,8,2]
if
elements
0..N-1
number of choices
2^N
C1=0,C2=0
for all A[i]=B[i]
{
C1++
C2+=A[i]+B[i]
}
then create new two arrays like
A1=[3,5,9]
B1=[7,5,1]
also now C2 is 10
now number of all choices are reduced to 2^(N-C1)
now calculate all good numbers
using 'K' as K=K-C2
unfortunately
no matter what method you use, you have
to calculate sum 2^(N-C1) times
So there's 2^N choices, since at each point you either pick from A or from B. In the specific example you give where N happens to be 3 there are 8. For discussion you can characterise each set of decisions as a bit pattern.
So as a brute-force approach would try every single bit pattern.
But what should be obvious is that if the first few bits produce a number too large then every subsequent possible group of tail bits will also produce a number that is too large. So probably a better way to model it is a tree where you don't bother walking down the limbs that have already grown beyond your limit.
You can also compute the maximum totals that can be reached from each bit to the end of the table. If at any point your running total plus the maximum that you can obtain from here on down is less than K then every subtree from where you are is acceptable without any need for traversal. The case, as discussed in the comments, where every single combination is acceptable is a special case of this observation.
As pointed out by Serge below, a related observation is to us minimums and use the converse logic to cancel whole subtrees without traversal.
A potential further optimisation rests behind the observation that, as long as we shuffle each in the same way, changing the order of A and B has no effect because addition is commutative. You can therefore make an effort to ensure either that the maximums grow as quickly as possible or the minimums grow as slowly as possible, to try to get the earliest possible exit from traversal. In practice you'd probably want to apply a heuristic comparing the absolute maximum and minimum (both of which you've computed anyway) to K.
That being the case, a recursive implementation is easiest, e.g. (in C)
/* assume A, B and N are known globals */
unsigned int numberOfGoodArraysFromBit(
unsigned int bit,
unsigned int runningTotal,
unsigned int limit)
{
// have we ended up in an unacceptable subtree?
if(runningTotal > limit) return 0;
// have we reached the leaf node without at any
// point finding this subtree to be unacceptable?
if(bit >= N) return 1;
// maybe every subtree is acceptable?
if(runningTotal + MAXV[bit] <= limit)
{
return 1 << (N - bit);
}
// maybe no subtrees are acceptable?
if(runningTotal + MINV[bit] > limit)
{
return 0;
}
// if we can't prima facie judge the subtreees,
// we'll need specifically to evaluate them
return
numberOfGoodArraysFromBit(bit+1, runningTotal+A[bit], limit) +
numberOfGoodArraysFromBit(bit+1, runningTotal+B[bit], limit);
}
// work out the minimum and maximum values at each position
for(int i = 0; i < N; i++)
{
MAXV[i] = MAX(A[i], B[i]);
MINV[i] = MIN(A[i], B[i]);
}
// hence work out the cumulative totals from right to left
for(int i = N-2; i >= 0; i--)
{
MAXV[i] += MAXV[i+1];
MINV[i] += MINV[i+1];
}
// to kick it off
printf("Total valid combinations is %u", numberOfGoodArraysFromBit(0, 0, K));
I'm just thinking extemporaneously; it's likely better solutions exist.

Most efficient way of randomly choosing a set of distinct integers

I'm looking for the most efficient algorithm to randomly choose a set of n distinct integers, where all the integers are in some range [0..maxValue].
Constraints:
maxValue is larger than n, and possibly much larger
I don't care if the output list is sorted or not
all integers must be chosen with equal probability
My initial idea was to construct a list of the integers [0..maxValue] then extract n elements at random without replacement. But that seems quite inefficient, especially if maxValue is large.
Any better solutions?
Here is an optimal algorithm, assuming that we are allowed to use hashmaps. It runs in O(n) time and space (and not O(maxValue) time, which is too expensive).
It is based on Floyd's random sample algorithm. See my blog post about it for details.
The code is in Java:
private static Random rnd = new Random();
public static Set<Integer> randomSample(int max, int n) {
HashSet<Integer> res = new HashSet<Integer>(n);
int count = max + 1;
for (int i = count - n; i < count; i++) {
Integer item = rnd.nextInt(i + 1);
if (res.contains(item))
res.add(i);
else
res.add(item);
}
return res;
}
For small values of maxValue such that it is reasonable to generate an array of all the integers in memory then you can use a variation of the Fisher-Yates shuffle except only performing the first n steps.
If n is much smaller than maxValue and you don't wish to generate the entire array then you can use this algorithm:
Keep a sorted list l of number picked so far, initially empty.
Pick a random number x between 0 and maxValue - (elements in l)
For each number in l if it smaller than or equal to x, add 1 to x
Add the adjusted value of x into the sorted list and repeat.
If n is very close to maxValue then you can randomly pick the elements that aren't in the result and then find the complement of that set.
Here is another algorithm that is simpler but has potentially unbounded execution time:
Keep a set s of element picked so far, initially empty.
Pick a number at random between 0 and maxValue.
If the number is not in s, add it to s.
Go back to step 2 until s has n elements.
In practice if n is small and maxValue is large this will be good enough for most purposes.
One way to do it without generating the full array.
Say I want a randomly selected subset of m items from a set {x1, ..., xn} where m <= n.
Consider element x1. I add x1 to my subset with probability m/n.
If I do add x1 to my subset then I reduce my problem to selecting (m - 1) items from {x2, ..., xn}.
If I don't add x1 to my subset then I reduce my problem to selecting m items from {x2, ..., xn}.
Lather, rinse, and repeat until m = 0.
This algorithm is O(n) where n is the number of items I have to consider.
I rather imagine there is an O(m) algorithm where at each step you consider how many elements to remove from the "front" of the set of possibilities, but I haven't convinced myself of a good solution and I have to do some work now!
If you are selecting M elements out of N, the strategy changes depending on whether M is of the same order as N or much less (i.e. less than about N/log N).
If they are similar in size, then you go through each item from 1 to N. You keep track of how many items you've got so far (let's call that m items picked out of n that you've gone through), and then you take the next number with probability (M-m)/(N-n) and discard it otherwise. You then update m and n appropriately and continue. This is a O(N) algorithm with low constant cost.
If, on the other hand, M is significantly less than N, then a resampling strategy is a good one. Here you will want to sort M so you can find them quickly (and that will cost you O(M log M) time--stick them into a tree, for example). Now you pick numbers uniformly from 1 to N and insert them into your list. If you find a collision, pick again. You will collide about M/N of the time (actually, you're integrating from 1/N to M/N), which will require you to pick again (recursively), so you'll expect to take M/(1-M/N) selections to complete the process. Thus, your cost for this algorithm is approximately O(M*(N/(N-M))*log(M)).
These are both such simple methods that you can just implement both--assuming you have access to a sorted tree--and pick the one that is appropriate given the fraction of numbers that will be picked.
(Note that picking numbers is symmetric with not picking them, so if M is almost equal to N, then you can use the resampling strategy, but pick those numbers to not include; this can be a win, even if you have to push all almost-N numbers around, if your random number generation is expensive.)
My solution is the same as Mark Byers'. It takes O(n^2) time, hence it's useful when n is much smaller than maxValue. Here's the implementation in python:
def pick(n, maxValue):
chosen = []
for i in range(n):
r = random.randint(0, maxValue - i)
for e in chosen:
if e <= r:
r += 1
else:
break;
bisect.insort(chosen, r)
return chosen
The trick is to use a variation of shuffle or in other words a partial shuffle.
function random_pick( a, n )
{
N = len(a);
n = min(n, N);
picked = array_fill(0, n, 0); backup = array_fill(0, n, 0);
// partially shuffle the array, and generate unbiased selection simultaneously
// this is a variation on fisher-yates-knuth shuffle
for (i=0; i<n; i++) // O(n) times
{
selected = rand( 0, --N ); // unbiased sampling N * N-1 * N-2 * .. * N-n+1
value = a[ selected ];
a[ selected ] = a[ N ];
a[ N ] = value;
backup[ i ] = selected;
picked[ i ] = value;
}
// restore partially shuffled input array from backup
// optional step, if needed it can be ignored
for (i=n-1; i>=0; i--) // O(n) times
{
selected = backup[ i ];
value = a[ N ];
a[ N ] = a[ selected ];
a[ selected ] = value;
N++;
}
return picked;
}
NOTE the algorithm is strictly O(n) in both time and space, produces unbiased selections (it is a partial unbiased shuffling) and does not need hasmaps (which may not be available and/or usualy hide a complexity behind their implementation, e.g fetch time is not O(1), it might even be O(n) in worst case)
adapted from here
Linear congruential generator modulo maxValue+1. I'm sure I've written this answer before, but I can't find it...
UPDATE: I am wrong. The output of this is not uniformly distributed. Details on why are here.
I think this algorithm below is optimum. I.e. you cannot get better performance than this.
For choosing n numbers out of m numbers, the best offered algorithm so far is presented below. Its worst run time complexity is O(n), and needs only a single array to store the original numbers. It partially shuffles the first n elements from the original array, and then you pick those first n shuffled numbers as your solution.
This is also a fully working C program. What you find is:
Function getrand: This is just a PRNG that returns a number from 0 up to upto.
Function randselect: This is the function that randmoly chooses n unique numbers out of m many numbers. This is what this question is about.
Function main: This is only to demonstrate a use for other functions, so that you could compile it into a program and have fun.
#include <stdio.h>
#include <stdlib.h>
int getrand(int upto) {
long int r;
do {
r = rand();
} while (r > upto);
return r;
}
void randselect(int *all, int end, int select) {
int upto = RAND_MAX - (RAND_MAX % end);
int binwidth = upto / end;
int c;
for (c = 0; c < select; c++) {
/* randomly choose some bin */
int bin = getrand(upto)/binwidth;
/* swap c with bin */
int tmp = all[c];
all[c] = all[bin];
all[bin] = tmp;
}
}
int main() {
int end = 1000;
int select = 5;
/* initialize all numbers up to end */
int *all = malloc(end * sizeof(int));
int c;
for (c = 0; c < end; c++) {
all[c] = c;
}
/* select select unique numbers randomly */
srand(0);
randselect(all, end, select);
for (c = 0; c < select; c++) printf("%d ", all[c]);
putchar('\n');
return 0;
}
Here is the output of an example code where I randomly output 4 permutations out of a pool of 8 numbers for 100,000,000 many times. Then I use those many permutations to compute the probability of having each unique permutation occur. I then sort them by this probability. You notice that the numbers are fairly close, which I think means that it is uniformly distributed. The theoretical probability should be 1/1680 = 0.000595238095238095. Note how the empirical test is close to the theoretical one.

Resources