I have obtained a proof that would discredit a generally held idea regarding the 0/1 knapsack problem and I'm really having a hard time convincing my self I am right because I couldn't find any thing any where to support my claims, so I am going to first state my claims and then prove them and I would appreciate anyone to try to substantiate my claims further or disproof them. Any collaboration is appreciated.
Assertions:
The size of the bnb (branch and bound) algorithm for solving the knapsack problem is not independent of the K (capacity of the knapsack).
The size of bnb tree complete space is always of O(NK) with N being the number of items and not O(2^N)
The bnb algorithm is always better than the standard dynamic programming approach both in time and space.
Pre-assumptions: The bnb algorithm prone the invalid nodes (if the remaining capacity is less than the weight of current item, we are not going to extend it. Also, the bnb algorithm is done in a depth-first manner.
Sloppy Proof:
Here is the recursive formula for solving the knapsack problem:
Value(i,k) = max (Value(i-1,k) , Value(n-1 , k-weight(i)) + value(i)
however if k < weight(i): Value(i,k) = Value(i-1,k)
Now imagine this example:
K = 9
N = 3
V W:
5 4
6 5
3 2
Now here is the Dynamic solution and table for this problem:
Now imagine regardless of whether it is a good idea or not we want to do this using only the recursive formula through memoization and not with the table, with something like a map/dictionary or a simple array to store the visited cells. For solving this problem using memoization we should solve the denoted cells:
Now this is exactly like the tree we would obtain using the bnb approach:
and now for the sloppy proofs:
Memoization and bnb tree have the same amount of nodes
Memoization nodes is dependent of the table size
Table size is dependent of N and K
Therefore bnb is not independent of K
Memoization space is bounded by NK i.e. O(NK)
Therefore bnb tree complete space (or the space if we do the bnb in a breadth first manner) is always of O(NK) and not O(N^2) because the whole tree is not going to be constructed and it would be exactly like the momization.
Memoization has better space than the standard dynamic programming.
bnb has better space than the dynamic programming (even if done in breadth first)
The simple bnb without relaxation (and just eliminating the infeasible nodes) would have better time than memoization (memoization has to search in the look up table and even if the look up was negligible they would still be the same.)
If we disregard the look up search of memoization, it is better than dynamic.
Therefore bnb algorithm is always better than dynamic both in time and space.
Questions:
If by any mean my proofs are correct some questions would arise that are interesting:
Why bother with dynamic programming? In my experience the best thing you could do in dp knapsack is to have the last two columns and you can improve it further to one column if you fill it bottom to top, and it would have O(K) space but still can't (if the above assertions are correct) beat the bnb approach.
Can we still say bnb is better if we integrate it with relaxation pruning (with regard to time)?
ps: Sorry for the long long post!
Edit:
Since two of the answers are focused on memoization, I just want to clarify that I'm not focused on this at all! I just used memoization as a technique to prove my assertions. My main focus is Branch and Bound technique vs dynamic programming, here is a complete example of another problem, solved by bnb + relaxation (source: Coursera - Discrete Optimization) :
I think there is a misunderstanding from your side, that the dynamic programming is the state-of-the art solution for the knapsack problem. This algorithm is taught at universities because it is an easy and nice example for dynamic programming and pseudo-polynomial time algorithms.
I have no expertise in the field and don't know what is the state-of-the art now, but branch-and-bound approaches have been used for quite some time to solve the knapsack-problem: The book Knapsak-Problems by Martello and Toth is already pretty old but treats the branch-and-bound pretty extensively.
Still, this is a great observation from your side, that the branch and bound approach can be used for knapsack - alas, you were born too late to be the first to have this idea:)
There are some points in your proof which I don't understand and which need more explanation in my opinion:
You need memoization, otherwise your tree would have O(2^N) nodes (there will be obviously such a case otherwise knapsack would not be NP-hard). I don't see anything in your proof, that assures that the memoization memory/computation steps are less than O(NK).
Dynamical programming needs only O(K) memory-space, so I don't see why you could claim "bnb algorithm is always better than dynamic both in time and space".
Maybe your claims are true, but I'm not able to see it the way the proof goes now.
The other problem is the definition of "better". Is branch-and-bound approach better if it is better for most of the problems or the common problems or does it has to be better for the wost-case (which would not play any role in the real life)?
The book I have linked to has also some comparisons for the running times of the algorithms. The dynamic programming based algorithms (clearly more complex as the one taught at school) are even better for some kind of problems - see section 2.10.1. Not bad for a total joke!
First of all, since you are applying memorization, you are still doing DP. That's basically the definition of DP: recursion + memorization. And that is also good. Without memorization your computation costs would explode. Just imagine if two items both have weight 2 and a third and a fourth have weight 1. They all end up at the same node in the tree, you would have to do the computation multiple times and you'll end up with exponential running time.
The main difference is the order of computation. The way of computing the entire matrix is called "bottom-up DP", since you start with (0,0) and work yourself upwards. Your way (the tree approach) is called "top-down DP", since you start with the goal and work yourself down the tree. But they are both using dynamic programming.
Now to your questions:
You are overestimating how much you really save. N = 3 is a pretty small example. I quickly tried a bigger example, with N = 20, K=63 (which is still pretty small) and random values and random weights. This is the first picture that I've generated:
values: [4, 10, 9, 1, 1, 2, 1, 2, 6, 4, 8, 9, 8, 2, 8, 8, 4, 10, 2, 6]
weights: [6, 4, 1, 10, 1, 2, 9, 9, 1, 6, 2, 3, 10, 7, 2, 4, 10, 9, 8, 2]
111111111111111111111111111111111111111111111111111111111111111
111111111111111111111111111111111111111111111111111111111111111
111111111111111111111111111111111111111111111111111111111111111
111111111111111111111111111111111111111111111111111111111111111
111111111111111111111111111111111111111111111111111111111111111
111111111111111111111111111111111111111111111111111111111111111
111111111111111111111111111111111111111111111111111111111111111
111111111111111111111111111111111111111111111111111111111111111
111111111111111111111111111111111111111111111111111111111111111
011111111111111111111111111111111111111111111111111111111111101
000001011111111111111111111111111111111111111111111111111111101
000000010111111111111111111111111111111111111111111111111111101
000000000010101011111111111111111111111111111111111111111010101
000000000000000000001010101111111111111111111111111111111010101
000000000000000000000000000101010101111111111111111111101010101
000000000000000000000000000001010101011111111111111111101010101
000000000000000000000000000000000101000001111100001111100000101
000000000000000000000000000000000000000000010100000111100000101
000000000000000000000000000000000000000000000000000010100000101
000000000000000000000000000000000000000000000000000000000000101
000000000000000000000000000000000000000000000000000000000000001
This picture is a transposed version of your displayed matrix. Rows represent the i values (first i elements in the array), and the cols represent the k values (allowed weights). The 1s are the positions in the DP matrix that you will visit during your tree-approach. Of course you'll see a lot of 0s at the bottom of the matrix, but you will visit every position the the upper half. About 68% of the positions in the matrix are visited. A bottom-up DP solution will be faster in such a situation. Recursion calls are slower, since you have to allocate a new stack frame for each recursive call. A speedup of 2x with loops instead of recursive calls is not untypical, and this would already be enough to make the bottom up approach faster. And we haven't even talked about the memorization costs of the tree approach yet.
Notice, that I haven't used actual bnb here. I'm not quite sure how you would do the bound-part, since you actually only know the value of a node once you compute it by visiting its children.
With my input data, the bottom-up approach is clearly a winner. But that doesn't mean that your approach is bad. Quite the opposite. It can actually be quite good. It all depends on the input data. Let's just imagine that K = 10^18 and all your weights are about 10^16. The bottom-up approach would not even find enough memory to allocate the matrix, while your approach will succeed in no time.
However, you probably could improve your version by performing A* instead of bnb. You can estimate the best value for each node (i, k) with int(k / max(weight[1..i]) * min(values[1..i]) and prune a lot of nodes using this heuristic.
In practice dynamic programming can be better for integer 0/1 knapsack because:
No recursion means you can never run into a stack overflow
No need to do a lookup search for each node, so often faster
As you note, storing the last two columns means that the memory requirement is lower
The code is simpler (no need for a memoization table)
Related
Let’s use as an example the problem LeetCode 322. Coin Change
I know it is best solved by using Dynamic Programming, but I want to focus on my Brute Force solution:
class Solution:
def coinChange(self, coins: List[int], amount: int) -> int:
curr_min = float('inf')
def helper(amount):
nonlocal curr_min
if amount < 0:
return float('inf')
if amount == 0:
return 0
for coin in coins:
curr_min = min(curr_min, helper(amount-coin) + 1)
return curr_min
ans = helper(amount)
return -1 if ans == float('inf') else ans
The Recursion Tree looks like: Recursion Tree
I can say it is Divide and Conquer: We are dividing the problem into smaller sub-problems, solving individually and using those individual results to construct the result for the original problem.
I can also say it is Backtracking: we are enumerating all combinations of coin frequencies which satisfy the constraints.
I know both are implemented via Recursion, but I would like to know which paradigm my Brute Force solution belongs to: Divide and Conquer or Backtracking.
A complication in categorizing your algorithm is that there aren’t clear, well-defined boundaries between different classes of algorithms and different people might have slightly different definitions in mind.
For example, generally speaking, divide-and-conquer algorithms involve breaking the problem apart into non-overlapping subproblems. (See, for example, mergesort, quicksort, binary search, closest pair of points, etc.) In that sense, your algorithm doesn’t nicely map onto the divide-and-conquer paradigm, since the subproblems you’re considering involve some degree of overlap in the subproblems they solve. (Then again, not all divide-and-conquer algorithms have this property. See, for example, stoogesort.)
Similarly, backtracking algorithms usually, but not always, work by committing to a decision, recursively searching to see whether a solution exists given that decision, then unwinding the choice if it turns out not to lead to a solution. Your algorithm doesn’t have this property, since it explores all options and then takes the best. (When I teach intro programming, I usually classify algorithms this way. But my colleagues sometimes describe what you’re doing as backtracking!)
I would classify your algorithm as belonging to a different family of exhaustive search. The algorithm you’ve proposed essentially works by enumerating all possible ways of making change, then returning the one that uses the fewest coins. Exhaustive search algorithms are ones that work by trying all possible options and returning the best, and I think that’s the best way of classifying your strategy.
To me this doesn't fit with either paradigm.
Backtracking to me is associated with reaching a point where the candidate cannot be further developed, but here we develop it to it's end, infinity, and we don't throw it away, we use it in comparisons.
Divide and conquer I associate with a division into a relatively small number of candidate groups (the classic example is two, like binary search). To call each path in a recursion a group for the sake of Divide and Conquer would lose the latter's meaning.
The most practical answer is it doesn't matter.
Safest answer recursion. My best interpretation is that its backtracking.
I think the options here are recursion, backtracking, divide-and-conquer, and dynamic programming.
Recursion being the most general and encapsulating of backtracking, D&C, and DP. If indeed it has backtracking and D&C algorithms then recursion would be the best answer as it contains both.
In Skiena's ADM (Section 5.3.1), it says:
A typical divide-and-conquer algorithm breaks a given problem into a smaller pieces, each of which is of size n/b.
By this interpretation is doesn't meet the as we divide our solution by coins and each coin amount being a different size.
In Erickson's Algorithms (section 1.6), it says:
divide and conquer:
Divide the given instance of the problem into several independent smaller instances of exactly the same problem.
So in this case, according to the recursion tree, are not always independent (they overlap).
Which leaves backtracking. Erickson defines the 'recursive strategy' as:
A backtracking algorithm tries to construct a solution to a computational problem incrementally, one small piece at a time.
Which seems general enough to fit all DP problems under it. The provided code can be said it backtracks when a solution path fails.
Additionally, according to Wikipedia:
It is often the most convenient technique for parsing, for the knapsack problem and other combinatorial optimization problems.
Coin Change being an Unbounded Knapsack type problem, then it fits into the description of backtracking.
In the Subset Sum problem, if we don't use the Dynamic Programming approach, then we have an exponential time complexity. But if we draw the recursion tree, it seems that all the 2^n branches are unique. If we use dynamic programming, how can we assure that all the unique branches are explored? If there really exists 2^n possible solutions, how does dynamic programming reduce it to polynomial time while also ensuring all 2^n solutions are explored?
How does dynamic programming reduce it to polynomial time while also ensuring all 2^n solutions are explored?
It is pseudo polynomial time, not polynomial time. It's a very important distinction. According to Wikipedia, A numeric algorithm runs in pseudo-polynomial time if its running time is a polynomial in the numeric value of the input, but not necessarily in the length of the input, which is the case for polynomial time algorithms.
What does it matter?
Consider an example [1, 2, 3, 4], sum = 1 + 2 + 3 + 4 = 10.
There does in fact exist 2^4 = 16 subsequences, however, do we need to check them all? The answer is no, since we are only concerned about the sum of subsequence. To illustrate this, let's say we're iterating from the 1st element to the 4th element:
1st element:
We can choose to take or not take the 1st element, so the possible sum will be [0, 1].
2nd element:
We can choose to take or not to take the 2nd element. Same idea, possible sum will be [0, 1, 2, 3].
3rd element:
We have [0, 1, 2, 3] now. We now consider taking the third element. But wait... If we take the third element and add it to 0, we still get 3, which is already present in the array, do we need to store this piece of information? Apparently not. In fact, we only need to know whether a sum is possible at any stage. If there are multiple subsequences summing to the same value, we ignore it. This is the key to the reduction of complexity, if you consider it as a reduction.
With that said, a real polynomial solution for subset sum is not known since it is NP-complete
I have found out that selection sort uses Brute Force strategy. However, I think that it uses Greedy strategy.
Why do I think that it uses Greedy: it goes from 0 to n-1 at it outer loop and from i+1 to n-1. This is really naive. It selects the minimum element in one every iteration - it chooses best locally. Everything like in Greedy but it is not.
Can you please explain me why it is not how I think? Information about this issue I have not found in the Internet.
A selection sort could indeed be described as a greedy algorithm, in the sense that it:
tries to choose an output (a permutation of its inputs) that optimizes a certain measure ("sortedness", which could be measured in various ways, e.g. by number of inversions), and
does so by breaking the task into smaller subproblems (for selection sort, finding the k-th element in the output permutation) and picking the locally optimal solution to each subproblem.
As it happens, the same description could be applied to most other sorting algorithms, as well — the only real difference is the choice of subproblems. For example:
insertion sort locally optimizes the sortedness of the permutation of k first input elements;
bubble sort optimizes the sortedness of adjacent pairs of elements; it needs to iterate over the list several times to reach a global optimum, but this still falls within the broad definition of a greedy algorithm;
merge sort optimizes the sortedness of exponentially growing subsequences of the input sequence;
quicksort recursively divides its input into subsequences on either side of an arbitrarily chosen pivot, optimizing the division to maximize sortedness at each stage.
Indeed, off the top of my head, I can't think of any practical sorting algorithm that wouldn't be greedy in this sense. (Bogosort isn't, but can hardly be called practical.) Furthermore, formulating these sorting algorithms as greedy optimization problems like this rather obscures the details that actually matter in practice when comparing sorting algorithms.
Thus, I'd say that characterizing selection sort, or any other sorting algorithm, as greedy is technically valid but practically useless, since such classification provides no real useful information.
Let A be a list of intgers such that: A = [5, 4, 3, 6, 1, 2, 7 ]
A greedy algorithm will look for the most promising direction, therefore :
we will compare: 5 to 4, see that 4 is indeed smaller than 5, set 4 as our minimum
compare 4 to 3 , set 3 as our minimum
Now we compare 3 to 6 and here is the tricky part: while in a normal selection sort(brute force) we will keep considering the remaining numbers, in a greedy approach we will take 3 as our minimum and will not consider the remaining numbers, hence "Best locally".
so a sorted list using this approach will result in a list sorted as such:
[3, 4, 5, 1, 2, 7]
Greedy and brute force describe different traits of the algorithm.
Greedy means that the algorithm on each step selects some option which is locally the best. That is, it have no look-ahead.
Brute-force means that the algorithm looks for an options in a straightforward manner, considering them all. E.g. it might search for an element via binary search, and it wouldn't be brute force anymore.
So the algorithm may be both greedy and brute force. These qualities are not mutually exclusive.
This is a paragraph of the book: Introduction to Algorithms, 3rd Edition. p.336
"These two approaches yield algorithms with the same asymptotic running time,
expect in unusual circumstances where the top-down approach does not actually
recurse to examine all possible subproblems. The bottom-up approach often has
much better constant factors, since it has less overhead for procedure calls."
The Context : two approaches are first top-down + memoization(DP) and second
bottom-up method.
I got a question for you one more. Does 'overhead' of function call mean every function call needs time? Even if we solve all subproblems, top-down takes more time because of the 'overhead'?
A bottom-up approach to dynamic programming means solving all the small problems first, and then using them to find answers to the next smallest, and so on. So, for instance, if the solution to a problem of length n depends only on answers to problems of length n-1, you might start by putting in all the solutions for length 0, then you'd iteratively fill in solutions to length 1, 2, 3, and so on, each time using the answers you'd already calculated at the previous level. It is efficient in that it means you don't end up solving a sub-problem twice.
A top-down with memoization approach would look at it the other way. If you want the solution to a problem of length 10, then you do so recursively. You notice that it relies on (say) three problems of length 9, so you recursively solve them, and then you know the answer of length 10. But whenever you solve a sub-problem, you remember the answer, and whenever you need the answer to a sub-problem, you look first to see whether you've already solved it, and if you have, you return the cached answer.
The bottom-up approach is good in that it can be written iteratively (using for loops) rather than recursively, which means you don't run out of stack space on large problems, and loops are also faster. Its disadvantage is that you solve all the sub-problems, and you might not need them all to be solved in order to solve the large problem you want the answer to.
The top-down approach is slower if you need all the sub-problems solved anyway, because of the recursion overhead. But it is faster if the problem you're solving only needs a smallish subset of the sub-problems to be solved, because it only solves the ones that it needs.
It is essentially the same as the difference between eager evaluation (bottom up) and lazy evaluation (top down).
What is the main difference between divide and conquer and dynamic programming? If we take an example merge sort is basically solved by divide and conquer which uses recursion . Dynamic programming is also based on recursion than why not Merge sort considered to be an example of dynamic programming?
The two are similar in that they both break up the problem into small problems and solve those. However, in divide and conquer, the subproblems are independent, while in dynamic programming, the subproblems are dependent. Both requiring recombining the subproblems in some way, but the distinction comes from whether or not the subproblems relate to other subproblems (of the same "level")
D&C example: Mergesort
In Mergesort, you break the sorting into a lot of little "sub-sorts", that is instead of sorting 100 items, you sort 50, then 25, etc. However, after breaking the original into (for example) 4 "sub-sorts", it doesn't matter which you do first; order is irrelevant because they are independent. All that matter is that they eventually get done. As such, each time, you get an entirely independent problem with its own right answer.
DP example: Recursive Fibonacci
Though there are sub-problems, each is directly built on top of the other. If you want the 10th digit, you have to the solve the problems building up to that (1+2, 2+3, etc) in a specific order. As such, they are not independent.
D&C is used when sub-problems are independent. Dynamic programming needed when a recursive function repeats same recursive calls.
Take fibonacci recurrence: f(n)=f(n-1)+f(n-2)
For example:
f(8) = f(7) + f(6)
= ( f(6) + f(5) ) + f(6)
As you can see f(6) will be calculated twice. From the recurrence relation, obviously there are too many repeating values. It's better to memorize these values rather than calculating over and over again. Most important thing in dp is memorizing these calculated values. If you look at dp problems generally an array or a matrix is used for preventing repetitive calculations.
Comparing to dp, d&c generally divides problem into independent sub-problems and memorizing any value is not necessary.
So I would say that D&C is a bigger concept and DP is special kind of D&C. Specifically, when you found that your subproblems need to share some calculations of same smaller subproblem, you may not want them to calculate the same things again and again, you cache the intermediate results to speed up time, that comes the DP. So, essentially, I would way, DP is a fast version of D&C.