a dynamic program about subarray sum - algorithm

I've see a problem about dynamic program
like this:
let's say there is a array like this: [600, 500, 300, 220, 210]
I want to find a sub array whose sum is the most closest to 1000 and bigger than it.(>=1000).
how can I write the code? I already understand the 01 backpack problem but still cannot make out this problem

A few things:
First, I think you are referring to "dynamic programming", not "a dynamic program"; read up here if you want to know the difference: https://en.wikipedia.org/wiki/Dynamic_programming
Second, I think you mean "closest to 1000 but NOT bigger than it (< 1000)", since that is the general constraint. If you were allowed to go over 1000, then the problem doesn't make sense because there is no constraint.
Like the backpack problem, this is going to be a non-polynomial (NP) time problem (a problem where the time required to compute increases faster than polynomial growth - usually exponential or faster), where you would normally have to check every possible combination of numbers, which can take a long time for seemingly small set sizes.
I believe that the correct answer from the 5 you provided is 500+220+210, which sums to 930, the largest that you can make without going over 1000.
The basic idea of dynamic programming is to break the problem into smaller similar problems that are more easily computable; for example, if you had a million numbers and wanted to find the subset that is closest to 100000 but not over, you might divide the million into 100,000 subsets of 10 elements, and find the closest to a smaller number of each of those subsets, then use the resulting set of 100,000 sums to repeat with 10,000 sets, etc, until you reduce it to a close-but-not-perfect solution.
In any non-polynomial-time problem, dynamic programming can only be used in building a close approximation, since the solution isn't guaranteed to be optimal.

You can use transaction optimizer from the EmerCoin wallet.
It exacly does, what you're looking for.

An approach to solve this problem can be done in two steps:
define a function which takes a subarray and gives you an evaluation or a score of this subarray so that you can actually compare subarrays and take the best. A function could be simply
if(sum(subarray) < 1000) return INFINITY
else return sum(subarray) - 1000
note that you can also use dynamic programming to compute the sum of subarrays
Assuming that the length of your goal array is N, you will need to solve the problems of size 1 to N. If the array's length is 1 then obviously there is one possibility and it's the best. If size > 1 then we take the solution of the problem with length size - 1 and we compare it with every subarray containing the last element of the array and we take the best subarray as the solution of the problem with length size.
I hope my explanation makes sense

Related

Dynamic algorithm to multiply elements in a sequence two at a time and find the total

I am trying to find a dynamic approach to multiply each element in a linear sequence to the following element, and do the same with the pair of elements, etc. and find the sum of all of the products. Note that any two elements cannot be multiplied. It must be the first with the second, the third with the fourth, and so on. All I know about the linear sequence is that there are an even amount of elements.
I assume I have to store the numbers being multiplied, and their product each time, then check some other "multipliable" pair of elements to see if the product has already been calculated (perhaps they possess opposite signs compared to the current pair).
However, by my understanding of a linear sequence, the values must be increasing or decreasing by the same amount each time. But since there are an even amount of numbers, I don't believe it is possible to have two "multipliable" pairs be the same (with potentially opposite signs), due to the issue shown in the following example:
Sequence: { -2, -1, 0, 1, 2, 3 }
Pairs: -2*-1, 0*1, 2*3
Clearly, since there are an even amount of pairs, the only case in which the same multiplication may occur more than once is if the elements are increasing/decreasing by 0 each time.
I fail to see how this is a dynamic programming question, and if anyone could clarify, it would be greatly appreciated!
A quick google for define linear sequence gave
A number pattern which increases (or decreases) by the same amount each time is called a linear sequence. The amount it increases or decreases by is known as the common difference.
In your case the common difference is 1. And you are not considering any other case.
The same multiplication may occur in the following sequence
Sequence = {-3, -1, 1, 3}
Pairs = -3 * -1 , 1 * 3
with a common difference of 2.
However this is not necessarily to be solved by dynamic programming. You can just iterate over the numbers and store the multiplication of two numbers in a set(as a set contains unique numbers) and then find the sum.
Probably not what you are looking for, but I've found a closed solution for the problem.
Suppose we observe the first two numbers. Note the first number by a, the difference between the numbers d. We then count for a total of 2n numbers in the whole sequence. Then the sum you defined is:
sum = na^2 + n(2n-1)ad + (4n^2 - 3n - 1)nd^2/3
That aside, I also failed to see how this is a dynamic problem, or at least this seems to be a problem where dynamic programming approach really doesn't do much. It is not likely that the sequence will go from negative to positive at all, and even then the chance that you will see repeated entries decreases the bigger your difference between two numbers is. Furthermore, multiplication is so fast the overhead from fetching them from a data structure might be more expensive. (mul instruction is probably faster than lw).

Algorithm to generate k element subsets in order of their sum

If I have an unsorted large set of n integers (say 2^20 of them) and would like to generate subsets with k elements each (where k is small, say 5) in increasing order of their sums, what is the most efficient way to do so?
Why I need to generate these subsets in this fashion is that I would like to find the k-element subset with the smallest sum satisfying a certain condition, and I thus would apply the condition on each of the k-element subsets generated.
Also, what would be the complexity of the algorithm?
There is a similar question here: Algorithm to get every possible subset of a list, in order of their product, without building and sorting the entire list (i.e Generators) about generating subsets in order of their product, but it wouldn't fit my needs due to the extremely large size of the set n
I intend to implement the algorithm in Mathematica, but could do it in C++ or Python too.
If your desired property of the small subsets (call it P) is fairly common, a probabilistic approach may work well:
Sort the n integers (for millions of integers i.e. 10s to 100s of MB of ram, this should not be a problem), and sum the k-1 smallest. Call this total offset.
Generate a random k-subset (say, by sampling k random numbers, mod n) and check it for P-ness.
On a match, note the sum-total of the subset. Subtract offset from this to find an upper bound on the largest element of any k-subset of equivalent sum-total.
Restrict your set of n integers to those less than or equal to this bound.
Repeat (goto 2) until no matches are found within some fixed number of iterations.
Note the initial sort is O(n log n). The binary search implicit in step 4 is O(log n).
Obviously, if P is so rare that random pot-shots are unlikely to get a match, this does you no good.
Even if only 1 in 1000 of the k-sized sets meets your condition, That's still far too many combinations to test. I believe runtime scales with nCk (n choose k), where n is the size of your unsorted list. The answer by Andrew Mao has a link to this value. 10^28/1000 is still 10^25. Even at 1000 tests per second, that's still 10^22 seconds. =10^14 years.
If you are allowed to, I think you need to eliminate duplicate numbers from your large set. Each duplicate you remove will drastically reduce the number of evaluations you need to perform. Sort the list, then kill the dupes.
Also, are you looking for the single best answer here? Who will verify the answer, and how long would that take? I suggest implementing a Genetic Algorithm and running a bunch of instances overnight (for as long as you have the time). This will yield a very good answer, in much less time than the duration of the universe.
Do you mean 20 integers, or 2^20? If it's really 2^20, then you may need to go through a significant amount of (2^20 choose 5) subsets before you find one that satisfies your condition. On a modern 100k MIPS CPU, assuming just 1 instruction can compute a set and evaluate that condition, going through that entire set would still take 3 quadrillion years. So if you even need to go through a fraction of that, it's not going to finish in your lifetime.
Even if the number of integers is smaller, this seems to be a rather brute force way to solve this problem. I conjecture that you may be able to express your condition as a constraint in a mixed integer program, in which case solving the following could be a much faster way to obtain the solution than brute force enumeration. Assuming your integers are w_i, i from 1 to N:
min sum(i) w_i*x_i
x_i binary
sum over x_i = k
subject to (some constraints on w_i*x_i)
If it turns out that the linear programming relaxation of your MIP is tight, then you would be in luck and have a very efficient way to solve the problem, even for 2^20 integers (Example: max-flow/min-cut problem.) Also, you can use the approach of column generation to find a solution since you may have a very large number of values that cannot be solved for at the same time.
If you post a bit more about the constraint you are interested in, I or someone else may be able to propose a more concrete solution for you that doesn't involve brute force enumeration.
Here's an approximate way to do what you're saying.
First, sort the list. Then, consider some length-5 index vector v, corresponding to the positions in the sorted list, where the maximum index is some number m, and some other index vector v', with some max index m' > m. The smallest sum for all such vectors v' is always greater than the smallest sum for all vectors v.
So, here's how you can loop through the elements with approximately increasing sum:
sort arr
for i = 1 to N
for v = 5-element subsets of (1, ..., i)
set = arr{v}
if condition(set) is satisfied
break_loop = true
compute sum(set), keep set if it is the best so far
break if break_loop
Basically, this means that you no longer need to check for 5-element combinations of (1, ..., n+1) if you find a satisfying assignment in (1, ..., n), since any satisfying assignment with max index n+1 will have a greater sum, and you can stop after that set. However, there is no easy way to loop through the 5-combinations of (1, ..., n) while guaranteeing that the sum is always increasing, but at least you can stop checking after you find a satisfying set at some n.
This looks to be a perfect candidate for map-reduce (http://en.wikipedia.org/wiki/MapReduce). If you know of any way of partitioning them smartly so that passing candidates are equally present in each node then you can probably get a great throughput.
Complete sort may not really be needed as the map stage can take care of it. Each node can then verify the condition against the k-tuples and output results into a file that can be aggregated / reduced later.
If you know of the probability of occurrence and don't need all of the results try looking at probabilistic algorithms to converge to an answer.

Finding the repeated element

In an array with integers between 1 and 1,000,000 or say some very larger value ,if a single value is occurring twice twice. How do you determine which one?
I think we can use a bitmap to mark the elements , and then traverse allover again to find out the repeated element . But , i think it is a process with high complexity.Is there any better way ?
This sounds like homework or an interview question ... so rather than giving away the answer, here's a hint.
What calculations can you do on a range of integers whose answer you can determine ahead of time?
Once you realize the answer to this, you should be able to figure it out .... if you still can't figure it out ... (and it's not homework) I'll post the solution :)
EDIT: Ok. So here's the elegant solution ... if the list contains ALL of the integers within the range.
We know that all of the values between 1 and N must exist in the list. Using Guass' formula we can quickly compute the expected value of a range of integers:
Sum(1..N) = 1/2 * (1 + N) * Count(1..N).
Since we know the expected sum, all we have to do is loop through all the values and sum their values. The different between this sum and the expected sum is the duplicate value.
EDIT: As other's have commented, the question doesn't state that the range contains all of the integers ... in this case, you have to decide whether you want to optimize for memory or time.
If you want to perform the operation using O(1) storage, you can perform an in-place sort of the list. As you're sorting you have to check adjacent elements. Once you see a duplicate, you know you can stop. Optimal sorting is an O(n log n) operation on average - which establishes an upper bound for find the duplicate in this manner.
If you want to optimize for speed, you can use an additional O(n) storage. Using a HashSet (or similar structure), insert values from your list until you determine you are inserting a duplicate into the HashSet. Inserting n items into a HashSet is an O(n) operation on average, which establishes that as an upper bound for this method.
you may try to use bits as hashmap:
1 at position k means that number k occured before
0 at position k means that number k did not occured before
pseudocode:
0. assume that your array is A
1. initialize bitarray(there is nice class in c# for this) of 1000000 length filled with zeros
2. for each num in A:
if bitarray[num]
return num
else
bitarray[num] = 1
end
The time complexity of the bitmap solution is O(n) and it doesn't seem like you could do better than that. However it will take up a lot of memory for a generic list of numbers. Sorting the numbers is an obvious way to detect duplicates and doesn't require extra space if you don't mind the current order changing.
Assuming the array is of length n < N (i.e. not ALL integers are present -- in this case LBushkin's trick is the answer to this homework problem), there is no way to solve this problem using less than O(n) memory using an algorithm that just takes a single pass through the array. This is by reduction to the set disjointness problem.
Suppose I made the problem easier, and I promised you that the duplicate elements were in the array such that the first one was in the first n/2 elements, and the second one was in the last n/2 elements. Now we can think of playing a game in which two people each hold a string of n/2 elements, and want to know how many messages they have to send to be sure that none of their elements are the same. Since the first player could simulate the run of any algorithm that takes a pass through the array, and send the contents of its memory to the second player, a lower bound on the number of messages they need to send implies a lower bound on the memory requirements of any algorithm.
But its easy to see in this simple game that they need to send n/2 messages to be sure that they don't hold any of the same elements, which yields the lower bound.
Edit: This generalizes to show that for algorithms that make k passes through the array and use memory m, that m*k = Omega(n). And it is easy to see that you can in fact trade off memory for time in this way.
Of course, if you are willing to use algorithms that don't simply take passes through the array, you can do better as suggested already: sort the array, then take 1 pass through. This takes time O(nlogn) and space O(1). But note curiously that this proves that any sorting algorithm that just makes passes through the array must take time Omega(n^2)! Sorting algorithms that break the n^2 bound must make random accesses.

How to divide a set of values into two sets of fixed size, such that their sums approach a particular value

I have a set of 18 values (it will always be 18) which I need to distribute into two sets, one of 10 items, and one of 8 items.
The rule for distribution is that the values of each set must be equal (or as close as possible) to a particular known value - so in the first set the sum of the values must be as close as possible to 1500000 and in the second set the sum iof the values must be as close as possible to 1000000.
What is the best (and that may mean simplest) algorithm to do this?
Further clarification, the values all range between 110000 and 200000. The values are always multiples of a 100 and are all positive integers, and there can be duplicates.
There are only 43758 such selections. Go through each of them and find the best.
It is an optimization problem. Here you have two optimization criteria, which should be combine to single one. For example like this:
F(A, B) = w1*abs(sum(A) - 1500000) + w2*abs(sum(B) - 1000000)
where A and B your sets, sum() is a sum of elements in a set, and w1 and w2 is weights.
Then you should find a strategy for iteration over possible combinations. The simpliest strategy is to find all 10-combinations of 18, and select that one which minimize F(A,B). There are C(18,10) = 43758 combinations.
While brute force is probably best for this problem size, there are other tricks you can play if you're willing to get an approximate solution or if the brute force method is still too expensive. the basic idea is to snap the values to a small grid, and then do brute force on the (much smaller) set of entries in the grid.
in your case, (pretending I've already divided by 100), all numbers are between 1100 and 2000, so you can "snap" them to the 10 integers 1100, 1200 and so on. The maximum error in doing is at most 50/1100 which is less than 5%. Now you've halved the input size, which makes the brute force run a bit faster.
Again, I wouldn't recommend this unless (a) brute force is really slow right now or (a) the problem size increases beyond 18.
p.s the problem is called SUBSET SUM (or sometimes KNAPSACK depending on the formulation) and is NP-complete. Here's a reference for the approximation idea.
Your problem, as stated, is np unless there is a pattern to the data.
The only way to achieve the best answer is find all permutations of 18 into 10 and 8 and associated sums. Weight according to your preference.
Looks like an optimization problem to me. Randomly separate the values into the two sets, and then start swapping values (use a good heuristic), and accept the change if the result is better.

Tricky programming problem that I'm having trouble getting my head around

First off, let me say that this is not homework (I am an A-Level student, this is nothing close to what we problem solve (this is way harder)), but more of a problem I'm trying to suss out to improve my programming logic.
I thought of a scenario where there is an array of random integers, let's for example say 10 integers. The user will input a number he wants to count to, and the algorithm will try and work out what numbers are needed to make that sum. For example if I wanted to make the sum 44 from this array of integers:
myIntegers = array(1, 5, 9, 3, 7, 12, 36, 22, 19, 63);
The output would be:
36 + 3 + 5 = 44
Or something along those lines. I hope I make myself clear. As an added bonus I would like to make the algorithm pick as few numbers as possible to make the required sum, or give out an error if the sum cannot be made with the numbers supplied.
I thought about using recursion and iterating through the array, adding numbers over and over until the sum is met or gone past. But what I can't get my head around is what to do if the algorithm goes past the sum and needs to be selective about what numbers to pick from the array.
I'm not looking for complete code, or a complete algorithm, I just want your opinions on how I should proceed with this and perhaps share a few tips or something. I'll probably start work on this tonight. :P
As I said, not homework. Just me wanting to do something a bit more advanced.
Thanks for any help you're able to offer. :)
You are looking at the Knapsack Problem
The knapsack problem or rucksack problem is a problem in combinatorial optimization: Given a set of items, each with a weight and a value, determine the number of each item to include in a collection so that the total weight is less than a given limit and the total value is as large as possible. It derives its name from the problem faced by someone who is constrained by a fixed-size knapsack and must fill it with the most useful items.
Edit: Your special case is the Subset Sum Problem
Will subset sum do? ;]
This is the classic Knapsack problem that you would see in college level algorithms course (or at least I saw it then). Best to work this out on paper and the solution in code should be relatively easy to work out.
EDIT: One thing to consider is dynamic programming.
Your Problem is related to the subset sum problem.
You have to try all possible combinations in the worst case.
No shortcuts here I'm afraid. In addition to what other people have said, about what specific problem this is etc., here's some practical advice to offer you a starting point:
I would sort the array and given the input sum m, would find the first number in the array less than m, call it n (this is your first possible number for the sum), and start from the highest possible complement (m-n), working your way down.
If you don't find a precise match, pick the highest available, call it o, (that now is your 2nd number) and look for the 3rd one starting from (m-n-o) and work your way down again.
If you don't find a precise match, start with the next number n (index of original n at index-1) and do the same. You can keep doing this until you find a precise match for two numbers. If no match for the sum is found for two numbers, start the process again, but expand it to include a 3rd number. And so on.
That could be done recursively. At least this approach ensures that when you find a match, it will be the one with the least possible numbers in the set forming the total input sum.
Potentially though, worst case, you end up going through the whole lot.
Edit: As Venr correctly points out, my first approach was incorrect. Edited approach to reflect this.
There is a very efficient randomized algorithm for this problem. I know you already accepted an answer, but I'm happy to share anyway, I just hope people will still check this question :).
Let Used = list of numbers that you sum.
Let Unused = list of numbers that you DON'T sum.
Let tmpsum = 0.
Let S = desired sum you want to reach.
for ( each number x you read )
toss a coin:
if it's heads and tmpsum < S
add x to Used
else
add x to Unused
while ( tmpsum != S )
if tmpsum < S
MOVE one random number from Unused to Used
else
MOVE one random number from Used to Unused
print the Used list, containing the numbers you need to add to get S
This will be much faster than the dynamic programming solution, especially for random inputs. The only problems are that you cannot reliably detect when there is no solution (you could let the algorithm run for a few seconds and if it doesn't finish, assume there is no solution) and that you cannot be sure you will get the solution with minimum number of elements chosen. Again, you could add some logic to make the algorithm keep going and trying to find a solution with less elements until certain stop conditions are met, but this will make it slower. However, if you are only interested in a solution that works and you have a LOT of numbers and the desired sum can be VERY big, this is probably better than the DP algorithm.
Another advantage of this approach is that it will also work for negative and rational numbers with no modifications, which is not true for the DP solution, because the DP solution involves using partial sums as array indexes, and indexes can only be natural numbers. You can of course use hashtables for example, but that will make the DP solution even slower.
I don't know exactly what's this task is called, but it seems that it's kind of http://en.wikipedia.org/wiki/Knapsack_problem.
Heh, I'll play the "incomplete specification" card (nobody said that numbers couldn't appear more than once!) and reduce this to the "making change" problem. Sort your numbers in decreasing order, find the first one less than your desired sum, then subtract that from your sum (division and remainders could speed this up). Repeat until sum = 0 or no number less than the sum is found.
For completeness, you would need to keep track of the number of addends in each sum, and of course generate the additional sequences by keeping track of the first number you use, skipping that, and repeating the process with the additional numbers. This would solve the (7 + 2 + 1) over (6 + 4) problem.
Repeating the answer of others: it is a Subset Sum problem.
It could be efficiently solved by Dynamic Programing technique.
The following has not been mentioned yet: the problem is Pseudo-P (or NP-Complete in weak sense).
Existence of an algorithm (based on dynamic programming) polynomial in S (where S is the sum) and n (the number of elements) proves this claim.
Regards.
Ok, I wrote a C++ program to solve the above problem. The algorithm is simple :-)
First of all arrange whatever array you have in descending order(I have hard-coded the array in descending form but you may apply any of the sorting algorithms ).
Next I took three stacks n, pos and sum. The first one stores the number for which a possible sum combination is to be found, the second holds the index of the array from where to start the search, the third stores the elements whose addition will give you the number you enter.
The function looks for the largest number in the array which is smaller than or equal to the number entered. If it is equal, it simply pushes the number onto the sum stack. If not, then it pushes the encountered array element to the sum stack(temporarily), and finds the difference between the number to search for and number encountered, and then it performs recursion.
Let me show an example:-
to find 44 in {63,36,22,19,12,9,7,5,3,1}
first 36 will be pushed in sum(largest number less than 44)
44-36=8 will be pushed in n(next number to search for)
7 will be pushed in sum
8-7=1 will be pushed in n
1 will be pushed in sum
thus 44=36+7+1 :-)
#include <iostream>
#include<conio.h>
using namespace std;
int found=0;
void func(int n[],int pos[],int sum[],int arr[],int &topN,int &topP,int &topS)
{
int i=pos[topP],temp;
while(i<=9)
{
if(arr[i]<=n[topN])
{
pos[topP]=i;
topS++;
sum[topS]=arr[i];
temp=n[topN]-arr[i];
if(temp==0)
{
found=1;
break;
}
topN++;
n[topN]=temp;
temp=pos[topP]+1;
topP++;
pos[topP]=temp;
break;
}
i++;
}
if(i==10)
{
topP=topP-1;
topN=topN-1;
pos[topP]+=1;
topS=topS-1;
if(topP!=-1)
func(n,pos,sum,arr,topN,topP,topS);
}
else if(found!=1)
func(n,pos,sum,arr,topN,topP,topS);
}
main()
{
int x,n[100],pos[100],sum[100],arr[10]={63,36,22,19,12,9,7,5,3,1},topN=-1,topP=-1,topS=-1;
cout<<"Enter a number: ";
cin>>x;
topN=topN+1;
n[topN]=x;
topP=topP+1;
pos[topP]=0;
func(n,pos,sum,arr,topN,topP,topS);
if(found==0)
cout<<"Not found any combination";
else{
cout<<"\n"<<sum[0];
for(int i=1;i<=topS;i++)
cout<<" + "<<sum[i];
}
getch();
}
You can copy the code and paste it in your IDE, works fine :-)

Resources