While reading CLRS section 15.3 about "When I should use dynamic programming." and during the explanation of Optimal substructure they gave 2 examples, they are unweighted longest simple path and unweighted shortest path.
They said,
The subproblems in finding the longest simple path are not independent, whereas for shortest paths they
That's why we can't solve unweighted longest simple path using dynamic programming and I don't have any problem with that but they also said
Both problems examined in Sections 15.1 and 15.2 have independent subproblems ...... In rod cutting, to determine the best way to cut up a rod of length n, we look at the best ways of cutting up rods of length I for I = 0, 1,..., n-1. Because an optimal solution to the length -n problem includes just one of these subproblem solutions (after we have cut off the first piece), independence of subproblems is not an issue.
The last sentence
independence of subproblems is not an issue.
Is the independence of subproblems an issue or not? and If not an issue so why they said the first quote or I'm just me misunderstanding something.
The problem is that you cannot simply merge the answers to two dependent subproblems.
If problems are independent, that is fine. If you only look at one subproblem, likewise that is fine. But if you need to combine 2 answers and they are dependent, you have to maintain some additional state.
Where, as in the case of the longest simple path, "some additional state" may blow up on you exponentially. :-)
Related
There are two key attributes that a problem must have in order for dynamic programming to be applicable: optimal substructure and overlapping subproblems [1]. For this question, we going to focus on the latter property only.
There are various definitions for overlapping subproblems, two of which are:
A problem is said to have overlapping subproblems if the problem can be broken down into subproblems which are reused several times OR a recursive algorithm for the problem solves the same subproblem over and over rather than always generating new subproblems [2].
A second ingredient that an optimization problem must have for dynamic programming to apply is that the space of subproblems must be "small" in the sense that a recursive algorithm for the problem solves the same subproblems over and over, rather than always generating new subproblems (Introduction to Algorithms by CLRS)
Both definitions (and lots of others on the internet) seem to boil down to a problem having overlapping subproblems if finding its solution involves solving the same subproblems multiple times. In other words, there are many small sub-problems which are computed many times during finding the solution to the original problem. A classic example is the Fibonacci algorithm that lots of examples use to make people understand this property.
Until a couple of days ago, life was great until I discovered Kadane's algorithm which made me question the overlapping subproblems definition. This was mostly due to the fact that people have different views on whether or NOT it is a DP algorithm:
Dynamic programming aspect in Kadane's algorithm
Is Kadane's algorithm consider DP or not? And how to implement it recursively?
Is Kadane's Algorithm Greedy or Optimised DP?
Dynamic Programming vs Memoization (see my comment)
The most compelling reason why someone wouldn't consider Kadane's algorithm a DP algorithm is that each subproblem would only appear and be computed once in a recursive implementation [3], hence it doesn't entail the overlapping subproblems property. However, lots of articles on the internet consider Kadane's algorithm to be a DP algorithm, which made me question my understanding of what overlapping subproblems means in the first place.
People seem to interpret the overlapping subproblems property differently. It's easy to see it with simple problems such as the Fibonacci algorithm but things become very unclear once you introduce Kadane's algorithm for instance. I would really appreciate it if someone could offer some further explanation.
You've read so much about this already. The only thing I have to add is this:
The overlapping subproblems in Kadane's algorithm are here:
max_subarray = max( from i=1 to n [ max_subarray_to(i) ] )
max_subarray_to(i) = max(max_subarray_to(i-1) + array[i], array[i])
As you can see, max_subarray_to() is evaluated twice for each i. Kadane's algorithm memoizes these, turning it from O(n2) to O(n)
... But as #Stef says, it doesn't matter what you call it, as long as you understand it.
I was revisiting my notes on Dynamic Programming. Its basically a memoized recursion technique, which stores away the solutions to smaller subproblems for later reuse in computing solutions to relatively larger sub problems.
The question I have is that in order to apply DP to a recursive problem, it must have an optimal substructure. This basically necessitates that an optimal solution to a problem contains optimal solution to subproblems.
Is it possible otherwise ? I mean have you ever seen a case where optimal solution to a problem does not contain an optimal solution to subproblems.
Please share some examples, if you know to deepen my understanding.
In dynamic programming a given problems has Optimal Substructure Property if optimal solution of the given problem can be obtained by using optimal solutions of its sub problems.
For example the shortest path problem has following optimal substructure property: If a node X lies in the shortest path from a source node U to destination node V then the shortest path from U to V is combination of shortest path from U to X and shortest path from X to V.
But Longest path problem doesn’t have the Optimal Substructure property.
i.e the longest path between two nodes doesn't have to be the longest path between the in between nodes.
For example, the longest path q->r->t is not a combination of longest path from q to r and longest path from r to t, because the longest path from q to r is q->s->t->r.
So here: optimal solution to a problem does not contain an optimal solution to the sub problems.
For more details you can read
Longest path problem from wikipedia
Optimal substructure from wikipedia
You're perfectly right that the definitions are imprecise. DP is a technique for getting algorithmic speedups rather than an algorithm in itself. The term "optimal substructure" is a vague concept. (You're right again here!) To wit, every loop can be expressed as a recursive function: each iteration solves a subproblem for the successive one. Is every algorithm with a loop a DP? Clearly not.
What people actually mean by "optimal substructure" and "overlapping subproblems" is that subproblem results are used often enough to decrease the asymptotic complexity of solutions. In other words, the memoization is useful! In most cases the subtle implication is a decrease from exponential to polynomial time, O(n^k) to O(n^p), p<k or similar.
Ex: There is an exponential number of paths between two nodes in a dense graph. DP finds the shortest path looking at only a polynomial number of them because the memos are extremely useful in this case.
On the other hand, Traveling salesman can be expressed as a memoized function (e.g. see this discussion), where the memos cause a factor of O( (1/2)^n ) time to be saved. But, the number of TS paths through n cities, is O(n!). This is so much bigger that the asymptotic run time is still super-exponential: O(n!)/O(2^n) = O(n!). Such an algorithm is generally not called a Dynamic Program even though it's following very much the same pattern as the DP for shortest paths. Apparently it's only a DP if it gives a nice result!
To my understanding, this 'optimal substructure' property is necessary not only for Dynamic Programming, but to obtain a recursive formulation of the solution in the first place. Note that in addition to the Wikipedia article on Dynamic Programming, there is a separate article on the optimal substructure property. To make things even more involved, there is also an article about the Bellman equation.
You could solve the Traveling Salesman Problem, choosing the nearest city at each step, but it's wrong method.
The whole idea is to narrow down the problem into the relatively small set of the candidates for optimal solution and use "brute force" to solve it.
So it better be that solutions of the smaller sub-problem should be sufficient to solve the bigger problem.
This is expressed via a recursion as function of the optimal solution of smaller sub-problems.
answering this question:
Is it possible otherwise ? I mean have you ever seen a case where
optimal solution to a problem does not contain an optimal solution to
subproblems.
no it is not possible, and can even be proven.
You can try to implement dynamic programming on any recursive problem but you will not get any better result if it doesn't have optimal substructure property. In other words dynamic programming methodology is not useful to implement on a problem which doesn't have optimal substructure property.
I am reading up dynamic programming chapter of Introduction to Algorithms by Cormen et al. I am trying to understand how to characterize the space of subproblems . They gave two examples of dynamic programming . Both these two problems have an input of size n
Rod cutting problem (Cut the rod of size n optimally)
Matrix parenthesization problem .(Parenthesize the matrix product A1 . A2 .A3 ...An optimally to get the least number of scalar multiplications)
For the first problem , they choose a subproblem of the form where they make a cut of length k , assuming that the left subproblem resulting from the cut can not be cut any further and the right subproblem can be cut further thereby giving us a single subproblem of size (n-k) .
But for the second problem that choose subproblems of the type Ai...Aj where 1<=i<=j<=n . Why did they choose to keep both ends open for this problem ? Why not close one end and just consider on subproblems of size (n-k)? Why need both i and j here instead of a single k split?
It is an art. There are many types of dynamic programming problems, and it is not easy to define one way to work out what dimensions of space we want to solve sub-problems for.
It depends on how the sub-problems interact, and very much on the size of each dimension of space.
Dynamic programming is a general term describing the caching or memoization of sub-problems to solve larger problems more efficiently. But there are so many different problems that can be solved by dynamic programming in so many different ways, that I cannot explain it all, unless you have a specific dynamic programming problem that you need to solve.
All that I can suggest is to try when solving a problem is:
if you know how to solve one problem, you can use similar techniques for similar problems.
try different approaches, and estimate the order of complexity (in time and memory) in terms of input size for each dimension, then given the size of each dimension, see if it executes fast enough, and within memory limits.
Some algorithms that can be described as dynamic programming, include:
shortest path algorithms (Dijkstra, Floyd-Warshall, ...)
string algorithms (longest common subsequence, Levenshtein distance, ...)
and much more...
Vazirani's technical note on Dynamic Programming
http://www.cs.berkeley.edu/~vazirani/algorithms/chap6.pdf has some useful ways create subproblems given an input. I have added some other ways to the list below:
Input x_1, x_2, ..x_n. Subproblem is x_1...x_i.
Input x_1, x_2....x_n. Subproblem is x_i, ...x_j.
Input x_1, x_2...x_n and y_1, y_2..y_m. Subproblem is x_1, x_2, ..x_i and y_1, y_2, ..y_j.
Input is a rooted tree. Subproblem is a rooted subtree.
Input is a matrix. Subproblem is submatrices of different lengths that share a corner with the original matrix.
Input is a matrix. Subproblem is all possible submatrices.
Which subproblems to use usualy depends on the problem. Try out these known variations and see which one suits your needs the best.
I've seen references to cut-and-paste proofs in certain texts on algorithms analysis and design. It is often mentioned within the context of Dynamic Programming when proving optimal substructure for an optimization problem (See Chapter 15.3 CLRS). It also shows up on graphs manipulation.
What is the main idea of such proofs? How do I go about using them to prove the correctness of an algorithm or the convenience of a particular approach?
The term "cut and paste" shows up in algorithms sometimes when doing dynamic programming (and other things too, but that is where I first saw it). The idea is that in order to use dynamic programming, the problem you are trying to solve probably has some kind of underlying redundancy. You use a table or similar technique to avoid solving the same optimization problems over and over again. Of course, before you start trying to use dynamic programming, it would be nice to prove that the problem has this redundancy in it, otherwise you won't gain anything by using a table. This is often called the "optimal subproblem" property (e.g., in CLRS).
The "cut and paste" technique is a way to prove that a problem has this property. In particular, you want to show that when you come up with an optimal solution to a problem, you have necessarily used optimal solutions to the constituent subproblems. The proof is by contradiction. Suppose you came up with an optimal solution to a problem by using suboptimal solutions to subproblems. Then, if you were to replace ("cut") those suboptimal subproblem solutions with optimal subproblem solutions (by "pasting" them in), you would improve your optimal solution. But, since your solution was optimal by assumption, you have a contradiction. There are some other steps involved in such a proof, but that is the "cut and paste" part.
'cut-and-paste' technique can be used in proving both correctness of greedy algorithm (both optimal structure and greedy-choice property' and dynamic programming algorithm correctness.
Greedy Correctness
This lecture notes Correctness of MST from MIT 2005 undergrad algorithm class exhibits 'cut-and-paste' technique to prove both optimal structure and greedy-choice property.
This lecture notes Correctness of MST from MIT 6.046J / 18.410J spring 2015 use 'cut-and-paste' technique to prove greedy-choice property
Dynamic Programming Correctness
A more authentic explanation for 'cut-and-paste' was introduced in CLRS (3rd edtion) Chapter 15.3 Element of dynamic programming at page 379
"4. You show that the solutions to the subproblems used within the optimal solution to the problem must themselves be optimal by using a “cut-and-paste”
technique, You do so by supposing that each of the subproblem solutions is not optimal and then deriving a contradiction. In particular, by “cutting out” the nonoptimal subproblem solution and “pasting in” the optimal one, you show that you can get a better solution to the original problem, thus contradicting your supposition that you already had an optimal solution. If there is more than one subproblem, they are typically so similar that the cut-and-paste argument for one can be modified for the others with little effort."
Cut-and-Paste is a way used in proofing graph theory concepts, Idea is this: Assume you have solution for Problem A, you want to say some edge/node, should be available in solution. You will assume you have solution without specified edge/node, you try to reconstruct a solution by cutting an edge/node and pasting specified edge/node and say new solution benefit is at least as same as previous solution.
One of a most important samples is proving MST attributes (proving that greedy choice is good enough). see presentation on MST from CLRS book.
Proof by contradiction
P is assumed to be false, that is !P is true.
It is shown that !P implies two mutually contradictory assertions, Q and !Q.
Since Q and !Q cannot both be true, the assumption that P is false must be wrong, and P must be true.
It is not a new proof technique per se. It is just a fun way to articulate a proof by contradiction.
An example of cut-and-paste:
Suppose you are solving the shortest path problem in graphs with vertices x_1, x_2... x_n. Suppose you find a shortest path from x_1 to x_n and it goes through x_i and x_j (in that order). Then, clearly, the sub path from x_i to x_j must be a shortest path between x_i and x_j as well. Why? Because Math.
Proof using Cut and Paste:
Suppose there exists a strictly shorter path from x_i to x_j. "Cut" that path, and "Paste" into the overall path from x_1 to x_n. Then, you will have another path from x_1 to x_n that is (strictly) shorter than the previously known path, and contradicts the "shortest" nature of that path.
Plain Old Proof by Contradiction:
Suppose P: The presented path from x_1 to x_n is a shortest path.
Q: The subpath from x_i to x_j is a shortest path.
The, Not Q => not P (using the argument presented above.
Hence, P => Q.
So, "Cut" and "Paste" makes it a bit more visual and easy to explain.
I heard the only difference between dynamic programming and back tracking is DP allows overlapping of sub problems, e.g.
fib(n) = fib(n-1) + fib (n-2)
Is it right? Are there any other differences?
Also, I would like know some common problems solved using these techniques.
There are two typical implementations of Dynamic Programming approach: bottom-to-top and top-to-bottom.
Top-to-bottom Dynamic Programming is nothing else than ordinary recursion, enhanced with memorizing the solutions for intermediate sub-problems. When a given sub-problem arises second (third, fourth...) time, it is not solved from scratch, but instead the previously memorized solution is used right away. This technique is known under the name memoization (no 'r' before 'i').
This is actually what your example with Fibonacci sequence is supposed to illustrate. Just use the recursive formula for Fibonacci sequence, but build the table of fib(i) values along the way, and you get a Top-to-bottom DP algorithm for this problem (so that, for example, if you need to calculate fib(5) second time, you get it from the table instead of calculating it again).
In Bottom-to-top Dynamic Programming the approach is also based on storing sub-solutions in memory, but they are solved in a different order (from smaller to bigger), and the resultant general structure of the algorithm is not recursive. LCS algorithm is a classic Bottom-to-top DP example.
Bottom-to-top DP algorithms are usually more efficient, but they are generally harder (and sometimes impossible) to build, since it is not always easy to predict which primitive sub-problems you are going to need to solve the whole original problem, and which path you have to take from small sub-problems to get to the final solution in the most efficient way.
Dynamic problems also requires "optimal substructure".
According to Wikipedia:
Dynamic programming is a method of
solving complex problems by breaking
them down into simpler steps. It is
applicable to problems that exhibit
the properties of 1) overlapping
subproblems which are only slightly
smaller and 2) optimal substructure.
Backtracking is a general algorithm
for finding all (or some) solutions to
some computational problem, that
incrementally builds candidates to the
solutions, and abandons each partial
candidate c ("backtracks") as soon as
it determines that c cannot possibly
be completed to a valid solution.
For a detailed discussion of "optimal substructure", please read the CLRS book.
Common problems for backtracking I can think of are:
Eight queen puzzle
Map coloring
Sudoku
DP problems:
This website at MIT has a good collection of DP problems with nice animated explanations.
A chapter from a book from a professor at Berkeley.
One more difference could be that Dynamic programming problems usually rely on the principle of optimality. The principle of optimality states that an optimal sequence of decision or choices each sub sequence must also be optimal.
Backtracking problems are usually NOT optimal on their way! They can only be applied to problems which admit the concept of partial candidate solution.
Say that we have a solution tree, whose leaves are the solutions for the original problem, and whose non-leaf nodes are the suboptimal solutions for part of the problem. We try to traverse the solution tree for the solutions.
Dynamic programming is more like BFS: we find all possible suboptimal solutions represented the non-leaf nodes, and only grow the tree by one layer under those non-leaf nodes.
Backtracking is more like DFS: we grow the tree as deep as possible and prune the tree at one node if the solutions under the node are not what we expect.
Then there is one inference derived from the aforementioned theory: Dynamic programming usually takes more space than backtracking, because BFS usually takes more space than DFS (O(N) vs O(log N)). In fact, dynamic programming requires memorizing all the suboptimal solutions in the previous step for later use, while backtracking does not require that.
DP allows for solving a large, computationally intensive problem by breaking it down into subproblems whose solution requires only knowledge of the immediate prior solution. You will get a very good idea by picking up Needleman-Wunsch and solving a sample because it is so easy to see the application.
Backtracking seems to be more complicated where the solution tree is pruned is it is known that a specific path will not yield an optimal result.
Therefore one could say that Backtracking optimizes for memory since DP assumes that all the computations are performed and then the algorithm goes back stepping through the lowest cost nodes.
IMHO, the difference is very subtle since both (DP and BCKT) are used to explore all possibilities to solve a problem.
As for today, I see two subtelties:
BCKT is a brute force solution to a problem. DP is not a brute force solution. Thus, you might say: DP explores the solution space more optimally than BCKT. In practice, when you want to solve a problem using DP strategy, it is recommended to first build a recursive solution. Well, that recursive solution could be considered also the BCKT solution.
There are hundreds of ways to explore a solution space (wellcome to the world of optimization) "more optimally" than a brute force exploration. DP is DP because in its core it is implementing a mathematical recurrence relation, i.e., current value is a combination of past values (bottom-to-top). So, we might say, that DP is DP because the problem space satisfies exploring its solution space by using a recurrence relation. If you explore the solution space based on another idea, then that won't be a DP solution. As in any problem, the problem itself may facilitate to use one optimization technique or another, based on the problem structure itself. The structure of some problems enable to use DP optimization technique. In this sense, BCKT is more general though not all problems allow BCKT too.
Example: Sudoku enables BCKT to explore its whole solution space. However, it does not allow to use DP to explore more efficiently its solution space, since there is no recurrence relation anywhere that can be derived. However, there are other optimization techniques that fit with the problem and improve brute force BCKT.
Example: Just get the minimum of a classic mathematical function. This problem does not allow BCKT to explore the state space of the problem.
Example: Any problem that can be solved using DP can also be solved using BCKT. In this sense, the recursive solution of the problem could be considered the BCKT solution.
Hope this helps a bit.
In a very simple sentence I can say: Dynamic programming is a strategy to solve optimization problem. optimization problem is about minimum or maximum result (a single result). but in, Backtracking we use brute force approach, not for optimization problem. it is for when you have multiple results and you want all or some of them.
Depth first node generation of state space tree with bounding function is called backtracking. Here the current node is dependent on the node that generated it.
Depth first node generation of state space tree with memory function is called top down dynamic programming. Here the current node is dependant on the node it generates.