Huffman coding is based on what Greedy Approach or Dynamic Programming - algorithm

Can we solve Problem of Huffman Coding by using Dynamic Programming, Is there any algorithm

Huffman Coding works by creating a binary tree of nodes. These can be stored in a regular array, the size of which depends on the number of symbols, n. Huffman Coding can be implemented in O(n logn) time by using the Greedy Algorithm approach. Huffman Coding is not suitable for a Dynamic Programming solution as the problem does not contain overlapping sub problems.

As per my knowledge about algorithms, I beleive huffman coding is based on Greedy approach.
As we do in Activity selection problem. Task scheduling and knapsack problem are another examples of it.

[Update]: I think the solution I mentioned below generates an optimal prefix code, but not necessarily same as Huffman code. Huffman code, by definition, is generated by taking the greedy approach. While it is optimal, it is not the only solution. (For example, once the Huffman tree is generated, you could interchange the leaves in the same level to give them different codes while still being optimal)
I think this can be solved using Dynamic Programming, although it won't be that efficient. The approach here is very similar to finding the Optimal Binary Search Tree, since as you go one level down you add one more bit to the leaves.
Here is the link for the code to calculate the minimum number of total bits. Of course this is exponential, but if you use DP it can be solved in O(n^2) time. I haven't tried generating the encoding, but should be possible I believe.

Related

What is the difference between dynamic programming and greedy approach?

What is the main difference between dynamic programming and greedy approach in terms of usage?
As far as I understood, the greedy approach sometimes gives an optimal solution; in other cases, the dynamic programming approach gives an optimal solution.
Are there any particular conditions which must be met in order to use one approach (or the other) to obtain an optimal solution?
Based on Wikipedia's articles.
Greedy Approach
A greedy algorithm is an algorithm that follows the problem solving heuristic of making
the locally optimal choice at each stage with the hope of finding a global optimum. In
many problems, a greedy strategy does not in general produce an optimal solution, but nonetheless a greedy heuristic may yield locally optimal solutions that approximate a global optimal solution in a reasonable time.
We can make whatever choice seems best at the moment and then solve the subproblems that arise later. The choice made by a greedy algorithm may depend on choices made so far but not on future choices or all the solutions to the subproblem. It iteratively makes one greedy choice after another, reducing each given problem into a smaller one.
Dynamic programming
The idea behind dynamic programming is quite simple. In general, to solve a given problem, we need to solve different parts of the problem (subproblems), then combine the solutions of the subproblems to reach an overall solution. Often when using a more naive method, many of the subproblems are generated and solved many times. The dynamic programming approach seeks to solve each subproblem only once, thus reducing the number of computations: once the solution to a given subproblem has been computed, it is stored or "memo-ized": the next time the same solution is needed, it is simply looked up. This approach is especially useful when the number of repeating subproblems grows exponentially as a function of the size of the input.
Difference
Greedy choice property
We can make whatever choice seems best at the moment and then solve the subproblems that arise later. The choice made by a greedy algorithm may depend on choices made so far but not on future choices or all the solutions to the subproblem. It iteratively makes one greedy choice after another, reducing each given problem into a smaller one. In other words, a greedy algorithm never reconsiders its choices.
This is the main difference from dynamic programming, which is exhaustive and is guaranteed to find the solution. After every stage, dynamic programming makes decisions based on all the decisions made in the previous stage, and may reconsider the previous stage's algorithmic path to solution.
For example, let's say that you have to get from point A to point B as fast as possible, in a given city, during rush hour. A dynamic programming algorithm will look into the entire traffic report, looking into all possible combinations of roads you might take, and will only then tell you which way is the fastest. Of course, you might have to wait for a while until the algorithm finishes, and only then can you start driving. The path you will take will be the fastest one (assuming that nothing changed in the external environment).
On the other hand, a greedy algorithm will start you driving immediately and will pick the road that looks the fastest at every intersection. As you can imagine, this strategy might not lead to the fastest arrival time, since you might take some "easy" streets and then find yourself hopelessly stuck in a traffic jam.
Some other details...
In mathematical optimization, greedy algorithms solve combinatorial problems having the properties of matroids.
Dynamic programming is applicable to problems exhibiting the properties of overlapping subproblems and optimal substructure.
I would like to cite a paragraph which describes the major difference between greedy algorithms and dynamic programming algorithms stated in the book Introduction to Algorithms (3rd edition) by Cormen, Chapter 15.3, page 381:
One major difference between greedy algorithms and dynamic programming is that instead of first finding optimal solutions to subproblems and then making an informed choice, greedy algorithms first make a greedy choice, the choice that looks best at the time, and then solve a resulting subproblem, without bothering to solve all possible related smaller subproblems.
Difference between greedy method and dynamic programming are given below :
Greedy method never reconsiders its choices whereas Dynamic programming may consider the previous state.
Greedy algorithm is less efficient whereas Dynamic programming is more efficient.
Greedy algorithm have a local choice of the sub-problems whereas Dynamic programming would solve the all sub-problems and then select one that would lead to an optimal solution.
Greedy algorithm take decision in one time whereas Dynamic programming take decision at every stage.
In simple words we can say that in Dynamic Programming (having problem sending message on network) one can first examine the path which takes the shortest time and then start journey,
On the other hand Greedy algorithm take the optimal decision on the spot without thinking for the next step and on the next step change its decision again and so on...
Notes: Dynamic programming is reliable while Greedy Algorithms is not reliable always.
With the reference of Biswajit Roy:
Dynamic Programming firstly plans then Go.
and
Greedy algorithm uses greedy choice, it firstly Go then continuously Plans.
the major difference between greedy method and dynamic programming is in greedy method only one optimal decision sequence is ever generated and in dynamic programming more than one optimal decision sequence may be generated.

Memoization or Tabulation approach for Dynamic programming

There are many problems that can be solved using Dynamic programming e.g. Longest increasing subsequence. This problem can be solved by using 2 approaches
Memoization (Top Down) - Using recursion to solve the sub-problem and storing the result in some hash table.
Tabulation (Bottom Up) - Using Iterative approach to solve the problem by solving the smaller sub-problems first and then using it during the execution of bigger problem.
My question is which is better approach in terms of time and space complexity?
Short answer: it depends on the problem!
Memoization usually requires more code and is less straightforward, but has computational advantages in some problems, mainly those which you do not need to compute all the values for the whole matrix to reach the answer.
Tabulation is more straightforward, but may compute unnecessary values. If you do need to compute all the values, this method is usually faster, though, because of the smaller overhead.
First understand what is dynamic programming?
If a problem at hand can be broken down to sub-problems whose solutions are also optimal and can be combined to reach solution of original/bigger problem. For such problems, we can apply dynamic programming.
It's way of solving a problem by storing the results of sub-problems in program memory and reuse it instead of recalculating it at later stage.
Remember the ideal case of dynamic programming usage is, when you can reuse the solutions of sub-problems more than one time, otherwise, there is no point in storing the result.
Now, dynamic programming can be applied in bottom-up approach(Tabulation) and top-down approach(Memoization).
Tabulation: We start with calculating solutions to smallest sub-problem and progress one level up at a time. Basically follow bottom-up approach.
Here note, that we are exhaustively finding solutions for each of the sub-problems, without knowing if they are really neeeded in future.
Memoization: We start with the original problem and keep breaking it one level down till the base case whose solution we know. In most cases, such breaking down(top-down approach) is recursive. Hence, time taken is slower if problem is using each steps sub-solutions due to recursive calls. But, in case when all sub-solutions are not needed then, Memoization performs better than Tabulation.
I found this short video quite helpful: https://youtu.be/p4VRynhZYIE
Asymptotically a dynamic programming implementation that is top down is the same as going bottom up, assuming you're using the same recurrence relation. However, bottom up is generally more efficient because of the overhead of recursion which is used in memoization.
If the problem has overlapping sub-problems property then use Memoization, else it depends on the problem

How is "dynamic" programming different than "normal" programming?

Whenever I look at solutions to computer contests, I always see the term "dynamic programming". I Googled the term and read a few articles, but none of them provide a simple example of programming VS "dynamic" programming. So how is "dynamic" programming different than "normal" programming? (simple terms please!)
Dynamic Programming uses programming more in the sense used with Linear Programming -- a mechanism of solving a problem.
One description I recently read (but can no longer recall the source -- [citation needed]) suggested that the usual approach of divide and conquer used in recursion is a top-down approach to solving problems, while dynamic programming is a bottom-up approach to solving problems.
The Wikipedia article suggests computing the Fibonocci sequence is an excellent use of dynamic programming -- you memoize results as you compute them for further use in the algorithm, to avoid re-computing similar results.
Knuth's algorithm for line-breaking paragraphs is another good example of dynamic programming: if you consider the possibility of inserting line breaks between every word (and even breaking lines inside words, at hyphenation points), it feels like the only algorithms will be exponential -- or worse. However, by keeping track of the "badness" associated with previous line breaks, Knuth's algorithm actually runs in linear time with the size of the input. (I must admit that I don't fully understand Knuth's algorithm -- only that it is supremely clever.)
By "normal" programming, I think you mean C++/Java/C# programming, right?
Dynamic programming is not "programming" in that sense. It is not about writing code, but the word "programming" there is used in the context of solving complex problems by breaking them down into simpler problems.
Dynamic programming is not really 'programming' but table lookup [and storage] - That is sacrifice a bit of space to improve time complexity [quite a bit].
I knew this is an old post but I was having the same questions so I am answering myself here.
Dynamic programming has two properties:
Optimal substructure: a globally optimal solution can be found by combining optimal solutions to local subproblems. For example, fib(x) = fib(x - 1) + fib(x – 2)
Overlapping subproblems: finding an optimal solution
involves solving the same problem multiple times. For example, compute fib(x) many times in Fibonocci sequence
By the above definition/properties, it is not clear whether certain questions like "Does an element belong to a set"? or "How to find the sum of a set" can be classified as dynamic programming? I can divide the set into subset (solve globally) and add it up (get global answer). Also, within the subset, I did the summation many times.
I found a paragraph in a book and I think it provides very helpful tips to distinguish "Dynamic Programming"(DP) and "Divide and conquer algorithm"(D&C).
In D&C subproblems, they are substantially smaller then the original problem. In contrast, DP involves solving problems that are only slightly smaller than the original problem. For example, computing Fib(19) is not a substantially smaller problem than computing Fib(20). While compute sum of ten elements is substantially smaller than sum of ten million elements.
Efficiency of D&C algorithms does not depend upon structuring the algorithm so that identical problems are solved repeatedly. In contrast, DP is efficient only when the number of distinct subproblems is significantly smaller than the total number of subproblems.

Iterate through all trees of a given size

I am often faced with the problem of checking some property of trees (the graph ones) of a given size by brute force. Do you have any nice tricks for doing this? Ideally, I'd like to examine each isomorphism class only once (but after all, speed is all that matters).
Bit twiddling tricks are most welcome since n is usually less than 32 :)
I'm asking for slightly more refined algorithms than the likes of "loop through all (n-1)-edge subsets and check if they form a tree" for trees on n nodes.
This is in Knuth's The Art of Computer Programming volume on Combinatorial Algorithms. If I remember correctly, it's an exercise there. Since he has the solutions for such, I point you there.
Some googling turned up the following algorithm description: http://www.cs.auckland.ac.nz/compsci720s1c/lectures/mjd/treenotes.pdf. They adapt an algorithm for enumerating rooted trees to enumerating unrooted trees.
Apparently others have proved that this requires only amortised constant time per tree, and the PDF shows some performance measurements demonstrating this.

Difference between back tracking and dynamic programming

I heard the only difference between dynamic programming and back tracking is DP allows overlapping of sub problems, e.g.
fib(n) = fib(n-1) + fib (n-2)
Is it right? Are there any other differences?
Also, I would like know some common problems solved using these techniques.
There are two typical implementations of Dynamic Programming approach: bottom-to-top and top-to-bottom.
Top-to-bottom Dynamic Programming is nothing else than ordinary recursion, enhanced with memorizing the solutions for intermediate sub-problems. When a given sub-problem arises second (third, fourth...) time, it is not solved from scratch, but instead the previously memorized solution is used right away. This technique is known under the name memoization (no 'r' before 'i').
This is actually what your example with Fibonacci sequence is supposed to illustrate. Just use the recursive formula for Fibonacci sequence, but build the table of fib(i) values along the way, and you get a Top-to-bottom DP algorithm for this problem (so that, for example, if you need to calculate fib(5) second time, you get it from the table instead of calculating it again).
In Bottom-to-top Dynamic Programming the approach is also based on storing sub-solutions in memory, but they are solved in a different order (from smaller to bigger), and the resultant general structure of the algorithm is not recursive. LCS algorithm is a classic Bottom-to-top DP example.
Bottom-to-top DP algorithms are usually more efficient, but they are generally harder (and sometimes impossible) to build, since it is not always easy to predict which primitive sub-problems you are going to need to solve the whole original problem, and which path you have to take from small sub-problems to get to the final solution in the most efficient way.
Dynamic problems also requires "optimal substructure".
According to Wikipedia:
Dynamic programming is a method of
solving complex problems by breaking
them down into simpler steps. It is
applicable to problems that exhibit
the properties of 1) overlapping
subproblems which are only slightly
smaller and 2) optimal substructure.
Backtracking is a general algorithm
for finding all (or some) solutions to
some computational problem, that
incrementally builds candidates to the
solutions, and abandons each partial
candidate c ("backtracks") as soon as
it determines that c cannot possibly
be completed to a valid solution.
For a detailed discussion of "optimal substructure", please read the CLRS book.
Common problems for backtracking I can think of are:
Eight queen puzzle
Map coloring
Sudoku
DP problems:
This website at MIT has a good collection of DP problems with nice animated explanations.
A chapter from a book from a professor at Berkeley.
One more difference could be that Dynamic programming problems usually rely on the principle of optimality. The principle of optimality states that an optimal sequence of decision or choices each sub sequence must also be optimal.
Backtracking problems are usually NOT optimal on their way! They can only be applied to problems which admit the concept of partial candidate solution.
Say that we have a solution tree, whose leaves are the solutions for the original problem, and whose non-leaf nodes are the suboptimal solutions for part of the problem. We try to traverse the solution tree for the solutions.
Dynamic programming is more like BFS: we find all possible suboptimal solutions represented the non-leaf nodes, and only grow the tree by one layer under those non-leaf nodes.
Backtracking is more like DFS: we grow the tree as deep as possible and prune the tree at one node if the solutions under the node are not what we expect.
Then there is one inference derived from the aforementioned theory: Dynamic programming usually takes more space than backtracking, because BFS usually takes more space than DFS (O(N) vs O(log N)). In fact, dynamic programming requires memorizing all the suboptimal solutions in the previous step for later use, while backtracking does not require that.
DP allows for solving a large, computationally intensive problem by breaking it down into subproblems whose solution requires only knowledge of the immediate prior solution. You will get a very good idea by picking up Needleman-Wunsch and solving a sample because it is so easy to see the application.
Backtracking seems to be more complicated where the solution tree is pruned is it is known that a specific path will not yield an optimal result.
Therefore one could say that Backtracking optimizes for memory since DP assumes that all the computations are performed and then the algorithm goes back stepping through the lowest cost nodes.
IMHO, the difference is very subtle since both (DP and BCKT) are used to explore all possibilities to solve a problem.
As for today, I see two subtelties:
BCKT is a brute force solution to a problem. DP is not a brute force solution. Thus, you might say: DP explores the solution space more optimally than BCKT. In practice, when you want to solve a problem using DP strategy, it is recommended to first build a recursive solution. Well, that recursive solution could be considered also the BCKT solution.
There are hundreds of ways to explore a solution space (wellcome to the world of optimization) "more optimally" than a brute force exploration. DP is DP because in its core it is implementing a mathematical recurrence relation, i.e., current value is a combination of past values (bottom-to-top). So, we might say, that DP is DP because the problem space satisfies exploring its solution space by using a recurrence relation. If you explore the solution space based on another idea, then that won't be a DP solution. As in any problem, the problem itself may facilitate to use one optimization technique or another, based on the problem structure itself. The structure of some problems enable to use DP optimization technique. In this sense, BCKT is more general though not all problems allow BCKT too.
Example: Sudoku enables BCKT to explore its whole solution space. However, it does not allow to use DP to explore more efficiently its solution space, since there is no recurrence relation anywhere that can be derived. However, there are other optimization techniques that fit with the problem and improve brute force BCKT.
Example: Just get the minimum of a classic mathematical function. This problem does not allow BCKT to explore the state space of the problem.
Example: Any problem that can be solved using DP can also be solved using BCKT. In this sense, the recursive solution of the problem could be considered the BCKT solution.
Hope this helps a bit.
In a very simple sentence I can say: Dynamic programming is a strategy to solve optimization problem. optimization problem is about minimum or maximum result (a single result). but in, Backtracking we use brute force approach, not for optimization problem. it is for when you have multiple results and you want all or some of them.
Depth first node generation of state space tree with bounding function is called backtracking. Here the current node is dependent on the node that generated it.
Depth first node generation of state space tree with memory function is called top down dynamic programming. Here the current node is dependant on the node it generates.

Resources