Iterate through all trees of a given size - algorithm

I am often faced with the problem of checking some property of trees (the graph ones) of a given size by brute force. Do you have any nice tricks for doing this? Ideally, I'd like to examine each isomorphism class only once (but after all, speed is all that matters).
Bit twiddling tricks are most welcome since n is usually less than 32 :)
I'm asking for slightly more refined algorithms than the likes of "loop through all (n-1)-edge subsets and check if they form a tree" for trees on n nodes.

This is in Knuth's The Art of Computer Programming volume on Combinatorial Algorithms. If I remember correctly, it's an exercise there. Since he has the solutions for such, I point you there.

Some googling turned up the following algorithm description: http://www.cs.auckland.ac.nz/compsci720s1c/lectures/mjd/treenotes.pdf. They adapt an algorithm for enumerating rooted trees to enumerating unrooted trees.
Apparently others have proved that this requires only amortised constant time per tree, and the PDF shows some performance measurements demonstrating this.

Related

Is there a way to compute if some elements of an array sums up to a given target without bruteforce

I've recently implemented 2sum and 3sum in leetcode and been wondering if it's possible to find if elements can sum up to a given target without bruteforce.
You're asking if the "subset sum problem" has a non-bruteforce solution. It's not really clear what is and what isn't a bruteforce solution, but NP complete programs (which subset sum is) have no known way to solve them in polynomial time in the worst case, but there are very sophisticated approaches to solving them that work efficiently some of the time.
The wikipedia page has good details about solving the subset sum (either approximately or exactly), and links for further reading.
At its most general, depending on your precise definition of "brute force", this is an open problem in computer science; nobody knows. There are some algorithms that are often fast in practice, but whether there's a fundamentally fast algorithm or not, that's an active area of research
Look up "subset sum problem" and "NP-complete"

Original design of DecreaseKey of Fibonacci heaps in Fredman and Tarjan's paper

I am now implementing the Fibonacci heap according to the original Paper of Fredman and Tarjan. If I understand it correctly, according to the paper, to perform the DecreseKey operation of a node x, we simply cut it from its parent. But if the key after decreasing is still larger than its parent, it would be inefficient (I think). Also, I see many designs that cut a node only when its key becomes smaller than its parent, like in CLRS.
So I am a bit confused about the original design of it. Why didn't they apply a more efficient way to do DecreaseKey. Or maybe it makes the amortised analysis easier? Any response is appreciated. Thanks in advance.
I can't speak for Fredman and Tarjan (though I audited one of Tarjan's classes once), but presumably they were focused on the worst-case amortized complexity of DecreaseKey, on which that optimization has no effect.

Huffman coding is based on what Greedy Approach or Dynamic Programming

Can we solve Problem of Huffman Coding by using Dynamic Programming, Is there any algorithm
Huffman Coding works by creating a binary tree of nodes. These can be stored in a regular array, the size of which depends on the number of symbols, n. Huffman Coding can be implemented in O(n logn) time by using the Greedy Algorithm approach. Huffman Coding is not suitable for a Dynamic Programming solution as the problem does not contain overlapping sub problems.
As per my knowledge about algorithms, I beleive huffman coding is based on Greedy approach.
As we do in Activity selection problem. Task scheduling and knapsack problem are another examples of it.
[Update]: I think the solution I mentioned below generates an optimal prefix code, but not necessarily same as Huffman code. Huffman code, by definition, is generated by taking the greedy approach. While it is optimal, it is not the only solution. (For example, once the Huffman tree is generated, you could interchange the leaves in the same level to give them different codes while still being optimal)
I think this can be solved using Dynamic Programming, although it won't be that efficient. The approach here is very similar to finding the Optimal Binary Search Tree, since as you go one level down you add one more bit to the leaves.
Here is the link for the code to calculate the minimum number of total bits. Of course this is exponential, but if you use DP it can be solved in O(n^2) time. I haven't tried generating the encoding, but should be possible I believe.

Difference between back tracking and dynamic programming

I heard the only difference between dynamic programming and back tracking is DP allows overlapping of sub problems, e.g.
fib(n) = fib(n-1) + fib (n-2)
Is it right? Are there any other differences?
Also, I would like know some common problems solved using these techniques.
There are two typical implementations of Dynamic Programming approach: bottom-to-top and top-to-bottom.
Top-to-bottom Dynamic Programming is nothing else than ordinary recursion, enhanced with memorizing the solutions for intermediate sub-problems. When a given sub-problem arises second (third, fourth...) time, it is not solved from scratch, but instead the previously memorized solution is used right away. This technique is known under the name memoization (no 'r' before 'i').
This is actually what your example with Fibonacci sequence is supposed to illustrate. Just use the recursive formula for Fibonacci sequence, but build the table of fib(i) values along the way, and you get a Top-to-bottom DP algorithm for this problem (so that, for example, if you need to calculate fib(5) second time, you get it from the table instead of calculating it again).
In Bottom-to-top Dynamic Programming the approach is also based on storing sub-solutions in memory, but they are solved in a different order (from smaller to bigger), and the resultant general structure of the algorithm is not recursive. LCS algorithm is a classic Bottom-to-top DP example.
Bottom-to-top DP algorithms are usually more efficient, but they are generally harder (and sometimes impossible) to build, since it is not always easy to predict which primitive sub-problems you are going to need to solve the whole original problem, and which path you have to take from small sub-problems to get to the final solution in the most efficient way.
Dynamic problems also requires "optimal substructure".
According to Wikipedia:
Dynamic programming is a method of
solving complex problems by breaking
them down into simpler steps. It is
applicable to problems that exhibit
the properties of 1) overlapping
subproblems which are only slightly
smaller and 2) optimal substructure.
Backtracking is a general algorithm
for finding all (or some) solutions to
some computational problem, that
incrementally builds candidates to the
solutions, and abandons each partial
candidate c ("backtracks") as soon as
it determines that c cannot possibly
be completed to a valid solution.
For a detailed discussion of "optimal substructure", please read the CLRS book.
Common problems for backtracking I can think of are:
Eight queen puzzle
Map coloring
Sudoku
DP problems:
This website at MIT has a good collection of DP problems with nice animated explanations.
A chapter from a book from a professor at Berkeley.
One more difference could be that Dynamic programming problems usually rely on the principle of optimality. The principle of optimality states that an optimal sequence of decision or choices each sub sequence must also be optimal.
Backtracking problems are usually NOT optimal on their way! They can only be applied to problems which admit the concept of partial candidate solution.
Say that we have a solution tree, whose leaves are the solutions for the original problem, and whose non-leaf nodes are the suboptimal solutions for part of the problem. We try to traverse the solution tree for the solutions.
Dynamic programming is more like BFS: we find all possible suboptimal solutions represented the non-leaf nodes, and only grow the tree by one layer under those non-leaf nodes.
Backtracking is more like DFS: we grow the tree as deep as possible and prune the tree at one node if the solutions under the node are not what we expect.
Then there is one inference derived from the aforementioned theory: Dynamic programming usually takes more space than backtracking, because BFS usually takes more space than DFS (O(N) vs O(log N)). In fact, dynamic programming requires memorizing all the suboptimal solutions in the previous step for later use, while backtracking does not require that.
DP allows for solving a large, computationally intensive problem by breaking it down into subproblems whose solution requires only knowledge of the immediate prior solution. You will get a very good idea by picking up Needleman-Wunsch and solving a sample because it is so easy to see the application.
Backtracking seems to be more complicated where the solution tree is pruned is it is known that a specific path will not yield an optimal result.
Therefore one could say that Backtracking optimizes for memory since DP assumes that all the computations are performed and then the algorithm goes back stepping through the lowest cost nodes.
IMHO, the difference is very subtle since both (DP and BCKT) are used to explore all possibilities to solve a problem.
As for today, I see two subtelties:
BCKT is a brute force solution to a problem. DP is not a brute force solution. Thus, you might say: DP explores the solution space more optimally than BCKT. In practice, when you want to solve a problem using DP strategy, it is recommended to first build a recursive solution. Well, that recursive solution could be considered also the BCKT solution.
There are hundreds of ways to explore a solution space (wellcome to the world of optimization) "more optimally" than a brute force exploration. DP is DP because in its core it is implementing a mathematical recurrence relation, i.e., current value is a combination of past values (bottom-to-top). So, we might say, that DP is DP because the problem space satisfies exploring its solution space by using a recurrence relation. If you explore the solution space based on another idea, then that won't be a DP solution. As in any problem, the problem itself may facilitate to use one optimization technique or another, based on the problem structure itself. The structure of some problems enable to use DP optimization technique. In this sense, BCKT is more general though not all problems allow BCKT too.
Example: Sudoku enables BCKT to explore its whole solution space. However, it does not allow to use DP to explore more efficiently its solution space, since there is no recurrence relation anywhere that can be derived. However, there are other optimization techniques that fit with the problem and improve brute force BCKT.
Example: Just get the minimum of a classic mathematical function. This problem does not allow BCKT to explore the state space of the problem.
Example: Any problem that can be solved using DP can also be solved using BCKT. In this sense, the recursive solution of the problem could be considered the BCKT solution.
Hope this helps a bit.
In a very simple sentence I can say: Dynamic programming is a strategy to solve optimization problem. optimization problem is about minimum or maximum result (a single result). but in, Backtracking we use brute force approach, not for optimization problem. it is for when you have multiple results and you want all or some of them.
Depth first node generation of state space tree with bounding function is called backtracking. Here the current node is dependent on the node that generated it.
Depth first node generation of state space tree with memory function is called top down dynamic programming. Here the current node is dependant on the node it generates.

How to test an algorithm for perfect optimization?

Is there any way to test an algorithm for perfect optimization?
There is no easy way to prove that any given algorithm is asymptotically optimal.
Proving optimality (if ever) sometimes follows years and/or decades after the algorithm has been written. A classic example is the Union-Find/disjoint-set data structure.
Disjoint-set forests are a data structure where each set is represented by a tree data structure, in which each node holds a reference to its parent node. They were first described by Bernard A. Galler and Michael J. Fischer in 1964, although their precise analysis took years.
[...] These two techniques complement each other; applied together, the amortized time per operation is only O(α(n)), where α(n) is the inverse of the function f(n) = A(n,n), and A is the extremely quickly-growing Ackermann function.
[...] In fact, this is asymptotically optimal: Fredman and Saks showed in 1989 that Ω(α(n)) words must be accessed by any disjoint-set data structure per operation on average.
For some algorithms optimality can be proven after very careful analysis, but generally speaking, there's no easy way to tell if an algorithm is optimal once it's written. In fact, it's not always easy to prove if the algorithm is even correct.
See also
Wikipedia/Matrix multiplication
The naive algorithm is O(N3), Strassen's is roughly O(N2.807), Coppersmith-Winograd is O(N2.376), and we still don't know what is optimal.
Wikipedia/Asymptotically optimal
it is an open problem whether many of the most well-known algorithms today are asymptotically optimal or not. For example, there is an O(nα(n)) algorithm for finding minimum spanning trees. Whether this algorithm is asymptotically optimal is unknown, and would be likely to be hailed as a significant result if it were resolved either way.
Practical considerations
Note that sometimes asymptotically "worse" algorithms are better in practice due to many factors (e.g. ease of implementation, actually better performance for the given input parameter range, etc).
A typical example is quicksort with a simple pivot selection that may exhibit quadratic worst-case performance, but is still favored in many scenarios over a more complicated variant and/or other asymptotically optimal sorting algorithms.
For those among us mortals that merely want to know if an algorithm:
reasonably works as expected;
is faster than others;
there is an easy step called 'benchmark'.
Pick up the best contenders in the area and compare them with your algorithm.
If your algorithm wins then it better matches your needs (the ones defined by
your benchmarks).

Resources