What are overlapping subproblems in Dynamic Programming (DP)? - algorithm

There are two key attributes that a problem must have in order for dynamic programming to be applicable: optimal substructure and overlapping subproblems [1]. For this question, we going to focus on the latter property only.
There are various definitions for overlapping subproblems, two of which are:
A problem is said to have overlapping subproblems if the problem can be broken down into subproblems which are reused several times OR a recursive algorithm for the problem solves the same subproblem over and over rather than always generating new subproblems [2].
A second ingredient that an optimization problem must have for dynamic programming to apply is that the space of subproblems must be "small" in the sense that a recursive algorithm for the problem solves the same subproblems over and over, rather than always generating new subproblems (Introduction to Algorithms by CLRS)
Both definitions (and lots of others on the internet) seem to boil down to a problem having overlapping subproblems if finding its solution involves solving the same subproblems multiple times. In other words, there are many small sub-problems which are computed many times during finding the solution to the original problem. A classic example is the Fibonacci algorithm that lots of examples use to make people understand this property.
Until a couple of days ago, life was great until I discovered Kadane's algorithm which made me question the overlapping subproblems definition. This was mostly due to the fact that people have different views on whether or NOT it is a DP algorithm:
Dynamic programming aspect in Kadane's algorithm
Is Kadane's algorithm consider DP or not? And how to implement it recursively?
Is Kadane's Algorithm Greedy or Optimised DP?
Dynamic Programming vs Memoization (see my comment)
The most compelling reason why someone wouldn't consider Kadane's algorithm a DP algorithm is that each subproblem would only appear and be computed once in a recursive implementation [3], hence it doesn't entail the overlapping subproblems property. However, lots of articles on the internet consider Kadane's algorithm to be a DP algorithm, which made me question my understanding of what overlapping subproblems means in the first place.
People seem to interpret the overlapping subproblems property differently. It's easy to see it with simple problems such as the Fibonacci algorithm but things become very unclear once you introduce Kadane's algorithm for instance. I would really appreciate it if someone could offer some further explanation.

You've read so much about this already. The only thing I have to add is this:
The overlapping subproblems in Kadane's algorithm are here:
max_subarray = max( from i=1 to n [ max_subarray_to(i) ] )
max_subarray_to(i) = max(max_subarray_to(i-1) + array[i], array[i])
As you can see, max_subarray_to() is evaluated twice for each i. Kadane's algorithm memoizes these, turning it from O(n2) to O(n)
... But as #Stef says, it doesn't matter what you call it, as long as you understand it.

Related

Divide and Conquer vs Backtracking

Let’s use as an example the problem LeetCode 322. Coin Change
I know it is best solved by using Dynamic Programming, but I want to focus on my Brute Force solution:
class Solution:
def coinChange(self, coins: List[int], amount: int) -> int:
curr_min = float('inf')
def helper(amount):
nonlocal curr_min
if amount < 0:
return float('inf')
if amount == 0:
return 0
for coin in coins:
curr_min = min(curr_min, helper(amount-coin) + 1)
return curr_min
ans = helper(amount)
return -1 if ans == float('inf') else ans
The Recursion Tree looks like: Recursion Tree
I can say it is Divide and Conquer: We are dividing the problem into smaller sub-problems, solving individually and using those individual results to construct the result for the original problem.
I can also say it is Backtracking: we are enumerating all combinations of coin frequencies which satisfy the constraints.
I know both are implemented via Recursion, but I would like to know which paradigm my Brute Force solution belongs to: Divide and Conquer or Backtracking.
A complication in categorizing your algorithm is that there aren’t clear, well-defined boundaries between different classes of algorithms and different people might have slightly different definitions in mind.
For example, generally speaking, divide-and-conquer algorithms involve breaking the problem apart into non-overlapping subproblems. (See, for example, mergesort, quicksort, binary search, closest pair of points, etc.) In that sense, your algorithm doesn’t nicely map onto the divide-and-conquer paradigm, since the subproblems you’re considering involve some degree of overlap in the subproblems they solve. (Then again, not all divide-and-conquer algorithms have this property. See, for example, stoogesort.)
Similarly, backtracking algorithms usually, but not always, work by committing to a decision, recursively searching to see whether a solution exists given that decision, then unwinding the choice if it turns out not to lead to a solution. Your algorithm doesn’t have this property, since it explores all options and then takes the best. (When I teach intro programming, I usually classify algorithms this way. But my colleagues sometimes describe what you’re doing as backtracking!)
I would classify your algorithm as belonging to a different family of exhaustive search. The algorithm you’ve proposed essentially works by enumerating all possible ways of making change, then returning the one that uses the fewest coins. Exhaustive search algorithms are ones that work by trying all possible options and returning the best, and I think that’s the best way of classifying your strategy.
To me this doesn't fit with either paradigm.
Backtracking to me is associated with reaching a point where the candidate cannot be further developed, but here we develop it to it's end, infinity, and we don't throw it away, we use it in comparisons.
Divide and conquer I associate with a division into a relatively small number of candidate groups (the classic example is two, like binary search). To call each path in a recursion a group for the sake of Divide and Conquer would lose the latter's meaning.
The most practical answer is it doesn't matter.
Safest answer recursion. My best interpretation is that its backtracking.
I think the options here are recursion, backtracking, divide-and-conquer, and dynamic programming.
Recursion being the most general and encapsulating of backtracking, D&C, and DP. If indeed it has backtracking and D&C algorithms then recursion would be the best answer as it contains both.
In Skiena's ADM (Section 5.3.1), it says:
A typical divide-and-conquer algorithm breaks a given problem into a smaller pieces, each of which is of size n/b.
By this interpretation is doesn't meet the as we divide our solution by coins and each coin amount being a different size.
In Erickson's Algorithms (section 1.6), it says:
divide and conquer:
Divide the given instance of the problem into several independent smaller instances of exactly the same problem.
So in this case, according to the recursion tree, are not always independent (they overlap).
Which leaves backtracking. Erickson defines the 'recursive strategy' as:
A backtracking algorithm tries to construct a solution to a computational problem incrementally, one small piece at a time.
Which seems general enough to fit all DP problems under it. The provided code can be said it backtracks when a solution path fails.
Additionally, according to Wikipedia:
It is often the most convenient technique for parsing, for the knapsack problem and other combinatorial optimization problems.
Coin Change being an Unbounded Knapsack type problem, then it fits into the description of backtracking.

I need to solve an NP-hard problem. Is there hope?

There are a lot of real-world problems that turn out to be NP-hard. If we assume that P ≠ NP, there aren't any polynomial-time algorithms for these problems.
If you have to solve one of these problems, is there any hope that you'll be able to do so efficiently? Or are you just out of luck?
If a problem is NP-hard, under the assumption that P ≠ NP there is no algorithm that is
deterministic,
exactly correct on all inputs all the time, and
efficient on all possible inputs.
If you absolutely need all of the above guarantees, then you're pretty much out of luck. However, if you're willing to settle for a solution to the problem that relaxes some of these constraints, then there very well still might be hope! Here are a few options to consider.
Option One: Approximation Algorithms
If a problem is NP-hard and P ≠ NP, it means that there's is no algorithm that will always efficiently produce the exactly correct answer on all inputs. But what if you don't need the exact answer? What if you just need answers that are close to correct? In some cases, you may be able to combat NP-hardness by using an approximation algorithm.
For example, a canonical example of an NP-hard problem is the traveling salesman problem. In this problem, you're given as input a complete graph representing a transportation network. Each edge in the graph has an associated weight. The goal is to find a cycle that goes through every node in the graph exactly once and which has minimum total weight. In the case where the edge weights satisfy the triangle inequality (that is, the best route from point A to point B is always to follow the direct link from A to B), then you can get back a cycle whose cost is at most 3/2 optimal by using the Christofides algorithm.
As another example, the 0/1 knapsack problem is known to be NP-hard. In this problem, you're given a bag and a collection of objects with different weights and values. The goal is to pack the maximum value of objects into the bag without exceeding the bag's weight limit. Even though computing an exact answer requires exponential time in the worst case, it's possible to approximate the correct answer to an arbitrary degree of precision in polynomial time. (The algorithm that does this is called a fully polynomial-time approximation scheme or FPTAS).
Unfortunately, we do have some theoretical limits on the approximability of certain NP-hard problems. The Christofides algorithm mentioned earlier gives a 3/2 approximation to TSP where the edges obey the triangle inequality, but interestingly enough it's possible to show that if P ≠ NP, there is no polynomial-time approximation algorithm for TSP that can get within any constant factor of optimal. Usually, you need to do some research to learn more about which problems can be well-approximated and which ones can't, since many NP-hard problems can be approximated well and many can't. There doesn't seem to be a unified theme.
Option Two: Heuristics
In many NP-hard problems, standard approaches like greedy algortihms won't always produce the right answer, but often do reasonably well on "reasonable" inputs. In many cases, it's reasonable to attack NP-hard problems with heuristics. The exact definition of a heuristic varies from context to context, but typically a heuristic is either an approach to a problem that "often" gives back good answers at the cost of sometimes giving back wrong answers, or is a useful rule of thumb that helps speed up searches even if it might not always guide the search the right way.
As an example of the first type of heuristic, let's look at the graph-coloring problem. This NP-hard problem asks, given a graph, to find the minimum number of colors necessary to paint the nodes in the graph such that no edge's endpoints are the same color. This turns out to be a particularly tough problem to solve with many other approaches (the best known approximation algorithms have terrible bounds, and it's not suspected to have a parameterized efficient algorithm). However, there are many heuristics for graph coloring that do quite well in practice. Many greedy coloring heuristics exist for assigning colors to nodes in a reasonable order, and these heuristics often do quite well in practice. Unfortunately, sometimes these heuristics give terrible answers back, but provided that the graph isn't pathologically constructed the heuristics often work just fine.
As an example of the second type of heuristic, it's helpful to look at SAT solvers. SAT, the Boolean satisfiability problem, was the first problem proven to be NP-hard. The problem asks, given a propositional formula (often written in conjunctive normal form), to determine whether there is a way to assign values to the variables such that the overall formula evaluates to true. Modern SAT solvers are getting quite good at solving SAT in many cases by using heuristics to guide their search over possible variable assignments. One famous SAT-solving algorithm, DPLL, essentially tries all possible assignments to see if the formula is satisfiable, using heuristics to speed up the search. For example, if it finds that a variable is either always true or always false, DPLL will try assigning that variable its forced value before trying other variables. DPLL also finds unit clauses (clauses with just one literal) and sets those variables' values before trying other variables. The net effect of these heuristics is that DPLL ends up being very fast in practice, even though it's known to have exponential worst-case behavior.
Option Three: Pseudopolynomial-Time Algorithms
If P ≠ NP, then no NP-hard problem can be solved in polynomial time. However, in some cases, the definition of "polynomial time" doesn't necessarily match the standard intuition of polynomial time. Formally speaking, polynomial time means polynomial in the number of bits necessary to specify the input, which doesn't always sync up with what we consider the input to be.
As an example, consider the set partition problem. In this problem, you're given a set of numbers and need to determine whether there's a way to split the set into two smaller sets, each of which has the same sum. The naive solution to this problem runs in time O(2n) and works by just brute-force testing all subsets. With dynamic programming, though, it's possible to solve this problem in time O(nN), where n is the number of elements in the set and N is the maximum value in the set. Technically speaking, the runtime O(nN) is not polynomial time because the numeric value N is written out in only log2 N bits, but assuming that the numeric value of N isn't too large, this is a perfectly reasonable runtime.
This algorithm is called a pseudopolynomial-time algorithm because the runtime O(nN) "looks" like a polynomial, but technically speaking is exponential in the size of the input. Many NP-hard problems, especially ones involving numeric values, admit pseudopolynomial-time algorithms and are therefore easy to solve assuming that the numeric values aren't too large.
For more information on pseudopolynomial time, check out this earlier Stack Overflow question about pseudopolynomial time.
Option Four: Randomized Algorithms
If a problem is NP-hard and P ≠ NP, then there is no deterministic algorithm that can solve that problem in worst-case polynomial time. But what happens if we allow for algorithms that introduce randomness? If we're willing to settle for an algorithm that gives a good answer on expectation, then we can often get relatively good answers to NP-hard problems in not much time.
As an example, consider the maximum cut problem. In this problem, you're given an undirected graph and want to find a way to split the nodes in the graph into two nonempty groups A and B with the maximum number of edges running between the groups. This has some interesting applications in computational physics (unfortunately, I don't understand them at all, but you can peruse this paper for some details about this). This problem is known to be NP-hard, but there's a simple randomized approximation algorithm for it. If you just toss each node into one of the two groups completely at random, you end up with a cut that, on expectation, is within 50% of the optimal solution.
Returning to SAT, many modern SAT solvers use some degree of randomness to guide the search for a satisfying assignment. The WalkSAT and GSAT algorithms, for example, work by picking a random clause that isn't currently satisfied and trying to satisfy it by flipping some variable's truth value. This often guides the search toward a satisfying assignment, causing these algorithms to work well in practice.
It turns out there's a lot of open theoretical problems about the ability to solve NP-hard problems using randomized algorithms. If you're curious, check out the complexity class BPP and the open problem of its relation to NP.
Option Five: Parameterized Algorithms
Some NP-hard problems take in multiple different inputs. For example, the long path problem takes as input a graph and a length k, then asks whether there's a simple path of length k in the graph. The subset sum problem takes in as input a set of numbers and a target number k, then asks whether there's a subset of the numbers that dds up to exactly k.
Interestingly, in the case of the long path problem, there's an algorithm (the color-coding algorithm) whose runtime is O((n3 log n) · bk), where n is the number of nodes, k is the length of the requested path, and b is some constant. This runtime is exponential in k, but is only polynomial in n, the number of nodes. This means that if k is fixed and known in advance, the runtime of the algorithm as a function of the number of nodes is only O(n3 log n), which is quite a nice polynomial. Similarly, in the case of the subset sum problem, there's a dynamic programming algorithm whose runtime is O(nW), where n is the number of elements of the set and W is the maximum weight of those elements. If W is fixed in advance as some constant, then this algorithm will run in time O(n), meaning that it will be possible to exactly solve subset sum in linear time.
Both of these algorithms are examples of parameterized algorithms, algorithms for solving NP-hard problems that split the hardness of the problem into two pieces - a "hard" piece that depends on some input parameter to the problem, and an "easy" piece that scales gracefully with the size of the input. These algorithms can be useful for finding exact solutions to NP-hard problems when the parameter in question is small. The color-coding algorithm mentioned above, for example, has proven quite useful in practice in computational biology.
However, some problems are conjectured to not have any nice parameterized algorithms. Graph coloring, for example, is suspected to not have any efficient parameterized algorithms. In the cases where parameterized algorithms exist, they're often quite efficient, but you can't rely on them for all problems.
For more information on parameterized algorithms, check out this earlier Stack Overflow question.
Option Six: Fast Exponential-Time Algorithms
Exponential-time algorithms don't scale well - their runtimes approach the lifetime of the universe for inputs as small as 100 or 200 elements.
What if you need to solve an NP-hard problem, but you know the input is reasonably small - say, perhaps its size is somewhere between 50 and 70. Standard exponential-time algorithms are probably not going to be fast enough to solve these problems. What if you really do need an exact solution to the problem and the other approaches here won't cut it?
In some cases, there are "optimized" exponential-time algorithms for NP-hard problems. These are algorithms whose runtime is exponential, but not as bad an exponential as the naive solution. For example, a simple exponential-time algorithm for the 3-coloring problem (given a graph, determine if you can color the nodes one of three colors each so that no edge's endpoints are the same color) might work checking each possible way of coloring the nodes in the graph, testing if any of them are 3-colorings. There are 3n possible ways to do this, so in the worst case the runtime of this algorithm will be O(3n · poly(n)) for some small polynomial poly(n). However, using more clever tricks and techniques, it's possible to develop an algorithm for 3-colorability that runs in time O(1.3289n). This is still an exponential-time algorithm, but it's a much faster exponential-time algorithm. For example, 319 is about 109, so if a computer can do one billion operations per second, it can use our initial brute-force algorithm to (roughly speaking) solve 3-colorability in graphs with up to 19 nodes in one second. Using the O((1.3289n)-time exponential algorithm, we could solve instances of up to about 73 nodes in about a second. That's a huge improvement - we've grown the size we can handle in one second by more than a factor of three!
As another famous example, consider the traveling salesman problem. There's an obvious O(n! · poly(n))-time solution to TSP that works by enumerating all permutations of the nodes and testing the paths resulting from those permutations. However, by using a dynamic programming algorithm similar to that used by the color-coding algorithm, it's possible to improve the runtime to "only" O(n2 2n). Given that 13! is about one billion, the naive solution would let you solve TSP for 13-node graphs in roughly a second. For comparison, the DP solution lets you solve TSP on 28-node graphs in about one second.
These fast exponential-time algorithms are often useful for boosting the size of the inputs that can be exactly solved in practice. Of course, they still run in exponential time, so these approaches are typically not useful for solving very large problem instances.
Option Seven: Solve an Easy Special Case
Many problems that are NP-hard in general have restricted special cases that are known to be solvable efficiently. For example, while in general it’s NP-hard to determine whether a graph has a k-coloring, in the specific case of k = 2 this is equivalent to checking whether a graph is bipartite, which can be checked in linear time using a modified depth-first search. Boolean satisfiability is, generally speaking, NP-hard, but it can be solved in polynomial time if you have an input formula with at most two literals per clause, or where the formula is formed from clauses using XOR rather than inclusive-OR, etc. Finding the largest independent set in a graph is generally speaking NP-hard, but if the graph is bipartite this can be done efficiently due to König’s theorem.
As a result, if you find yourself needing to solve what might initially appear to be an NP-hard problem, first check whether the inputs you actually need to solve that problem on have some additional restricted structure. If so, you might be able to find an algorithm that applies to your special case and runs much faster than a solver for the problem in its full generality.
Conclusion
If you need to solve an NP-hard problem, don't despair! There are lots of great options available that might make your intractable problem a lot more approachable. No one of the above techniques works in all cases, but by using some combination of these approaches, it's usually possible to make progress even when confronted with NP-hardness.

What is the difference between dynamic programming and greedy approach?

What is the main difference between dynamic programming and greedy approach in terms of usage?
As far as I understood, the greedy approach sometimes gives an optimal solution; in other cases, the dynamic programming approach gives an optimal solution.
Are there any particular conditions which must be met in order to use one approach (or the other) to obtain an optimal solution?
Based on Wikipedia's articles.
Greedy Approach
A greedy algorithm is an algorithm that follows the problem solving heuristic of making
the locally optimal choice at each stage with the hope of finding a global optimum. In
many problems, a greedy strategy does not in general produce an optimal solution, but nonetheless a greedy heuristic may yield locally optimal solutions that approximate a global optimal solution in a reasonable time.
We can make whatever choice seems best at the moment and then solve the subproblems that arise later. The choice made by a greedy algorithm may depend on choices made so far but not on future choices or all the solutions to the subproblem. It iteratively makes one greedy choice after another, reducing each given problem into a smaller one.
Dynamic programming
The idea behind dynamic programming is quite simple. In general, to solve a given problem, we need to solve different parts of the problem (subproblems), then combine the solutions of the subproblems to reach an overall solution. Often when using a more naive method, many of the subproblems are generated and solved many times. The dynamic programming approach seeks to solve each subproblem only once, thus reducing the number of computations: once the solution to a given subproblem has been computed, it is stored or "memo-ized": the next time the same solution is needed, it is simply looked up. This approach is especially useful when the number of repeating subproblems grows exponentially as a function of the size of the input.
Difference
Greedy choice property
We can make whatever choice seems best at the moment and then solve the subproblems that arise later. The choice made by a greedy algorithm may depend on choices made so far but not on future choices or all the solutions to the subproblem. It iteratively makes one greedy choice after another, reducing each given problem into a smaller one. In other words, a greedy algorithm never reconsiders its choices.
This is the main difference from dynamic programming, which is exhaustive and is guaranteed to find the solution. After every stage, dynamic programming makes decisions based on all the decisions made in the previous stage, and may reconsider the previous stage's algorithmic path to solution.
For example, let's say that you have to get from point A to point B as fast as possible, in a given city, during rush hour. A dynamic programming algorithm will look into the entire traffic report, looking into all possible combinations of roads you might take, and will only then tell you which way is the fastest. Of course, you might have to wait for a while until the algorithm finishes, and only then can you start driving. The path you will take will be the fastest one (assuming that nothing changed in the external environment).
On the other hand, a greedy algorithm will start you driving immediately and will pick the road that looks the fastest at every intersection. As you can imagine, this strategy might not lead to the fastest arrival time, since you might take some "easy" streets and then find yourself hopelessly stuck in a traffic jam.
Some other details...
In mathematical optimization, greedy algorithms solve combinatorial problems having the properties of matroids.
Dynamic programming is applicable to problems exhibiting the properties of overlapping subproblems and optimal substructure.
I would like to cite a paragraph which describes the major difference between greedy algorithms and dynamic programming algorithms stated in the book Introduction to Algorithms (3rd edition) by Cormen, Chapter 15.3, page 381:
One major difference between greedy algorithms and dynamic programming is that instead of first finding optimal solutions to subproblems and then making an informed choice, greedy algorithms first make a greedy choice, the choice that looks best at the time, and then solve a resulting subproblem, without bothering to solve all possible related smaller subproblems.
Difference between greedy method and dynamic programming are given below :
Greedy method never reconsiders its choices whereas Dynamic programming may consider the previous state.
Greedy algorithm is less efficient whereas Dynamic programming is more efficient.
Greedy algorithm have a local choice of the sub-problems whereas Dynamic programming would solve the all sub-problems and then select one that would lead to an optimal solution.
Greedy algorithm take decision in one time whereas Dynamic programming take decision at every stage.
In simple words we can say that in Dynamic Programming (having problem sending message on network) one can first examine the path which takes the shortest time and then start journey,
On the other hand Greedy algorithm take the optimal decision on the spot without thinking for the next step and on the next step change its decision again and so on...
Notes: Dynamic programming is reliable while Greedy Algorithms is not reliable always.
With the reference of Biswajit Roy:
Dynamic Programming firstly plans then Go.
and
Greedy algorithm uses greedy choice, it firstly Go then continuously Plans.
the major difference between greedy method and dynamic programming is in greedy method only one optimal decision sequence is ever generated and in dynamic programming more than one optimal decision sequence may be generated.

Divide and conquer, dynamic programming and greedy algorithms!

When I have a problem with optimal substructur and no subproblem shares subsubproblems then I can use a divide and conquer algorithm to solve it?
But when the subproblem shares subsubproblems (overlapping subproblems) then I can use dynamic programming to solve the problem?
Is this correct?
And how is greedy algorithms similar to dynamic programming?
When I have a problem with optimal
substructur and no subproblem shares
subsubproblems then I can use a divide
and conquer algorithm to solve it?
Yes, as long as you can find an optimal algorithm for each kind of subproblem.
But when the subproblem shares
subsubproblems (overlapping
subproblems) then I can use dynamic
programming to solve the problem?
Is this correct?
Yes. Dynamic programming is basically a special case of the family of Divide & Conquer algorithms, where all subproblems are the same.
And how is greedy algorithms similar
to dynamic programming?
They're different.
Dynamic programming gives you the optimal solution.
A Greedy algorithm usually give a good/fair solution in a small amount of time but it doesn't assure to reach the optimum.
It is, let's say, similar because it usually divides the solution construction in several stages in which it takes choices that are locally optimal. But if stages are not optimal substructures of the original problem, then normally it doesn't lead to the best solution.
EDIT:
As pointed out by #rrenaud, there are some greedy algorithms that have been proven to be optimal (e.g. Dijkstra, Kruskal, Prim etc.).
So, to be more correct, the main difference between greedy and dynamic programming is that the former is not exhaustive on the space of solutions while the latter is.
In fact greedy algorithms are short-sighted on that space, and each choice made during solution construction is never reconsidered.
Dynamic program uses bottom-up approach, saves the previous solution and refer it, this will allow us to make optimal solution among all available solutions, whereas greedy approach uses the top-down approach, so it takes an optimal solution from the locally available solution, will not take the previous level solutions which leads to the less optimized solution.
Dynamic= bottom up, optimal solution
Greedy= top down, less optimal, less time consuming

Difference between back tracking and dynamic programming

I heard the only difference between dynamic programming and back tracking is DP allows overlapping of sub problems, e.g.
fib(n) = fib(n-1) + fib (n-2)
Is it right? Are there any other differences?
Also, I would like know some common problems solved using these techniques.
There are two typical implementations of Dynamic Programming approach: bottom-to-top and top-to-bottom.
Top-to-bottom Dynamic Programming is nothing else than ordinary recursion, enhanced with memorizing the solutions for intermediate sub-problems. When a given sub-problem arises second (third, fourth...) time, it is not solved from scratch, but instead the previously memorized solution is used right away. This technique is known under the name memoization (no 'r' before 'i').
This is actually what your example with Fibonacci sequence is supposed to illustrate. Just use the recursive formula for Fibonacci sequence, but build the table of fib(i) values along the way, and you get a Top-to-bottom DP algorithm for this problem (so that, for example, if you need to calculate fib(5) second time, you get it from the table instead of calculating it again).
In Bottom-to-top Dynamic Programming the approach is also based on storing sub-solutions in memory, but they are solved in a different order (from smaller to bigger), and the resultant general structure of the algorithm is not recursive. LCS algorithm is a classic Bottom-to-top DP example.
Bottom-to-top DP algorithms are usually more efficient, but they are generally harder (and sometimes impossible) to build, since it is not always easy to predict which primitive sub-problems you are going to need to solve the whole original problem, and which path you have to take from small sub-problems to get to the final solution in the most efficient way.
Dynamic problems also requires "optimal substructure".
According to Wikipedia:
Dynamic programming is a method of
solving complex problems by breaking
them down into simpler steps. It is
applicable to problems that exhibit
the properties of 1) overlapping
subproblems which are only slightly
smaller and 2) optimal substructure.
Backtracking is a general algorithm
for finding all (or some) solutions to
some computational problem, that
incrementally builds candidates to the
solutions, and abandons each partial
candidate c ("backtracks") as soon as
it determines that c cannot possibly
be completed to a valid solution.
For a detailed discussion of "optimal substructure", please read the CLRS book.
Common problems for backtracking I can think of are:
Eight queen puzzle
Map coloring
Sudoku
DP problems:
This website at MIT has a good collection of DP problems with nice animated explanations.
A chapter from a book from a professor at Berkeley.
One more difference could be that Dynamic programming problems usually rely on the principle of optimality. The principle of optimality states that an optimal sequence of decision or choices each sub sequence must also be optimal.
Backtracking problems are usually NOT optimal on their way! They can only be applied to problems which admit the concept of partial candidate solution.
Say that we have a solution tree, whose leaves are the solutions for the original problem, and whose non-leaf nodes are the suboptimal solutions for part of the problem. We try to traverse the solution tree for the solutions.
Dynamic programming is more like BFS: we find all possible suboptimal solutions represented the non-leaf nodes, and only grow the tree by one layer under those non-leaf nodes.
Backtracking is more like DFS: we grow the tree as deep as possible and prune the tree at one node if the solutions under the node are not what we expect.
Then there is one inference derived from the aforementioned theory: Dynamic programming usually takes more space than backtracking, because BFS usually takes more space than DFS (O(N) vs O(log N)). In fact, dynamic programming requires memorizing all the suboptimal solutions in the previous step for later use, while backtracking does not require that.
DP allows for solving a large, computationally intensive problem by breaking it down into subproblems whose solution requires only knowledge of the immediate prior solution. You will get a very good idea by picking up Needleman-Wunsch and solving a sample because it is so easy to see the application.
Backtracking seems to be more complicated where the solution tree is pruned is it is known that a specific path will not yield an optimal result.
Therefore one could say that Backtracking optimizes for memory since DP assumes that all the computations are performed and then the algorithm goes back stepping through the lowest cost nodes.
IMHO, the difference is very subtle since both (DP and BCKT) are used to explore all possibilities to solve a problem.
As for today, I see two subtelties:
BCKT is a brute force solution to a problem. DP is not a brute force solution. Thus, you might say: DP explores the solution space more optimally than BCKT. In practice, when you want to solve a problem using DP strategy, it is recommended to first build a recursive solution. Well, that recursive solution could be considered also the BCKT solution.
There are hundreds of ways to explore a solution space (wellcome to the world of optimization) "more optimally" than a brute force exploration. DP is DP because in its core it is implementing a mathematical recurrence relation, i.e., current value is a combination of past values (bottom-to-top). So, we might say, that DP is DP because the problem space satisfies exploring its solution space by using a recurrence relation. If you explore the solution space based on another idea, then that won't be a DP solution. As in any problem, the problem itself may facilitate to use one optimization technique or another, based on the problem structure itself. The structure of some problems enable to use DP optimization technique. In this sense, BCKT is more general though not all problems allow BCKT too.
Example: Sudoku enables BCKT to explore its whole solution space. However, it does not allow to use DP to explore more efficiently its solution space, since there is no recurrence relation anywhere that can be derived. However, there are other optimization techniques that fit with the problem and improve brute force BCKT.
Example: Just get the minimum of a classic mathematical function. This problem does not allow BCKT to explore the state space of the problem.
Example: Any problem that can be solved using DP can also be solved using BCKT. In this sense, the recursive solution of the problem could be considered the BCKT solution.
Hope this helps a bit.
In a very simple sentence I can say: Dynamic programming is a strategy to solve optimization problem. optimization problem is about minimum or maximum result (a single result). but in, Backtracking we use brute force approach, not for optimization problem. it is for when you have multiple results and you want all or some of them.
Depth first node generation of state space tree with bounding function is called backtracking. Here the current node is dependent on the node that generated it.
Depth first node generation of state space tree with memory function is called top down dynamic programming. Here the current node is dependant on the node it generates.

Resources