Analyzing Recursive Algorithms - algorithm

I've often been slightly stumped by recursive algorithms that seem to require magical leaps (with a large dose of shrunken notation born of an ink shortage) of logic.
I realize that the alternative is to simply memorize the Big O notation for all the common algorithms but at a certain point, that approach fails. For example, I am happy to disclose the performance for bubble sort, insertion sort, binary tree insertion/removal, mergesort, and quicksort but don't ask me to come up with the performance of AVL trees or Djikstra's shortest path algorithm off the top of my head.
Where can I go to get:
A discussion of recursive algorithm analysis that uses words instead of a profusion of symbols
Practice problems to confirm that my newly-obtained understanding is actually correct
Example:
Bad:
Sigma v e T (1+cv)
Possible 'good' equivalent:
The amount of work required for 1 node in the tree (which is 1+the # of children of a node), which is then executed once for every element in the tree where the original node is the root.
Side commentary:
I could simply watch a video for every single algorithm because there's no way to make one's voice turn into a subscript (or any of the other contortions) but I suspect that would take an inordinate amount of time compared to reading textual descriptions.
Update:
Here's 1 source of solved problems: http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-046j-introduction-to-algorithms-sma-5503-fall-2005/ (this tackles #2 above)

TopCoders has a great source of tutorials and thorough explanations. Have you tried them out?
http://www.topcoder.com/tc?d1=tutorials&d2=alg_index&module=Static

To 'officially' answer this question...
Source of sample problems: http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-046j-introduction-to-algorithms-sma-5503-fall-2005/
An explanation of analysis for CS: http://en.wikipedia.org/wiki/Concrete_Mathematics.

Related

Why is chess, checkers, Go, etc. in EXP but conjectured to be in NP?

If I tell you the moves for a game of chess and declare who wins, why can't it be checked in polynomial time if the winner does really win? This would make it an NP problem from my understanding.
First of all: The number of positions you can set up with 32 pieces on a 8x8 field is limited. We need to consider any pawn being converted to any other piece and include any such available position, too. Of course, among all these, there are some positions that cannot be reached following the rules of chess, but this does not matter. The important thing is: we have a limit. Lets name this limit simply MaxPositions.
Now for any given position, let's build up a tree as follows:
The given position is the root.
Add any position (legal chess position or not) as child.
For any of these children, add any position as child again.
Continue this way, until your tree reaches a depth of MaxPositions.
I'm now too tired to think of if we need one additional level of depth or not for the idea (proof?), but heck, just let's add it. The important thing is: the tree constructed like this is limited.
Next step: Of this tree, remove any sub-tree that is not reachable from the root via legal chess moves. Repeat this step for the remaining children, grand-children, ..., until there is no unreachable position left in the whole tree. The number of steps must be limited, as the tree is limited.
Now do a breadth-first search and make any node a leaf if it has been found previously. It must be marked as such(!; draw candidate?). Same for any mate position.
How to find out if there is a forced mate? In any sub tree, if it is your turn, there must be at least one child leading to a forced mate. If it is the opponents move, there must be a grand child for every child that leads to a mate. This applies recursively, of course. However, as the tree is limited, this whole algorithm is limited.
[sensored], this whole algorithm is limited! There is some constant limiting the whole stuff. So: although the limit is incredibly high (and far beyond what up-to-date hardware can handle), it is a limit (please do not ask me to calculate it...). So: our problem actually is O(1)!!!
The same for checkers, go, ...
This applies for the forced mate, so far. What is the best move? First, check if we can find a forced mate. If so, fine, we found the best move. If there are several, select the one with the least moves necessary (still there might be more than one...).
If there is no such forced mate, then we need to measure by some means the 'best' one. Possibly count the number of available successions to mate. Other propositions for measurement? As long as operating on this tree from top to down, we still remain limited. So again, we are O(1).
Now what did we miss? Have a look at the link in your comment again. They are talking about an NxN checkers! The author is varying size of the field!
So have a look back at how we constructed the tree. I think it is obvious that the tree grows exponentially with the size of the field (try to prove it yourself...).
I know very well that this answer is not a prove for that the problem is EXP(TIME). Actually, I admit, it is not really an answer at all. But I think what I illustrated still gives quite a good image/impression of the complexity of the problem. And as long as no one provides a better answer, I dare to claim that this is better than nothing at all...
Addendum, considering your comment:
Let me allow to refer to wikipedia. Actually, it should be suffient to transform the other problem in exponential time, not polynomial as in the link, as applying the transformation + solving the resulting problem still remains exponential. But I'm not sure about the exact definition...
It is sufficient to show this for a problem of which you know already it is EXP complete (transforming any other problem to this one and then to the chess problem again remains exponential, if both transformations are exponential).
Apparently, J.M. Robson found a way to do this for NxN checkers. It must be possible for generalized chess, too, probably simply modifying Robsons algorithm. I do not think it is possible for classical 8x8 chess, though...
O(1) applies for classical chess only, not for generalized chess. But it is the latter one for which we assume not being in NP! Actually, in my answer up to this addendum, there is one prove lacking: The size of the limited tree (if N is fix) does not grow faster than exponentially with growing N (so the answer actually is incomplete!).
And to prove that generalized chess is not in NP, we have to prove that there is no polynomial algorithm to solve the problem on a non-deterministic turing machine. This I leave open again, and my answer remains even less complete...
If I tell you the moves for a game of chess and declare who wins, why
can't it be checked in polynomial time if the winner does really win?
This would make it an NP problem from my understanding.
Because in order to check if the winner(white) does really win, you will have to also evaluate all possible moves that the looser(black) could've made in other to also win. That makes the checking also exponential.

Tips on solving binary tree/binary search tree problems using recursion (Java)

I am reviewing how to solve binary trees/binary search tree problems in Java using recursion and I feel like I'm struggling a little bit. Ex: manipulating/changing the nodes in the tree, counting certain patterns (# of even ints, height of tree), etc. I pretty much always use private helper methods.
I'd like to hear some rules of thumb to follow to make my life easier to solve these problems. Thanks in advance!
edit: be more specific.... I'm just talking about ANY kinds of problems involving using recursion to solve a binary tree/BST problem. I'm not talking about any one problem. I want to know STRATEGIES FOR SOLVING these problems when creating METHODS to solve them. Like certain things to always include or think of when solving them. I can't get more specific than that. Thanks for the votes down.
First off, always remember that you're working on a BST, not an unsorted binary tree or non-binary general tree. That means at all times you can rely on the BST invariant: every value in left subtree < this < every value in right subtree. (equality included on one of the sides in some implementations).
Example where this is relevant: in BST searching, if the value you're trying to find is less than this, there's no point looking in the right subtree for it; it's not there.
Other than that, treat it like you would any recursion problem. That means, for a given problem:
1) Determine what cases are trivial and don't require recursion. Write code that correctly identifies these cases and returns the trivial result. Some possibilities include a height-0 tree, or no tree (null root).
2) For all other cases, determine which of the following would make more sense: (both are usually possible, but one can be cleaner)
What non-recursive work you could do to then reduce the problem to a sub-problem and return that solution (recursion at end, potentially tail recursion)
What recursive work you would need to do first in order to solve this problem. (recursion at start, not tail recursion)
Having private helper methods is not necessarily a bad thing; so long as they serve a distinct and useful function you shouldn't feel bad for writing them. There are certainly some recursion problems that are much cleaner and less redundant when split into 3-4 methods rather than cramming them into one.
Take some BST problems and see if you can identify the general structure of the solution before you sit down to code it. Hope this helps! Also you may want to have a java tag on your question if you're specifically asking about BST stuff for Java.

really hard to understand suffix tree

I've been searching for tutorials about suffix tree for quite a while. In SO, I found 2 posts about understanding suffix tree: 1, 2.
But I can't say that I understand how to build one, Oops. In Skiena's book Algorithm design manual, he says:
Since linear time suffix tree construction algorithms are nontrivial,
I recommend using an existing implementation.
Well, is the on-line construction algorithm for suffix tree so hard? Anybody can put me in the right direction to understand it?
Anyway, cut to the chase, besides the construction, there is one more thing I don't understand about suffix tree. Because the edges in suffix tree are just a pair of integers (right?) specifying the starting and ending pos of the substring, then if I want to search a string x in this suffix tree, how should I do it? De-reference those integers in the suffix tree, then compare them one by one with x? Couldn't be this way.
Firstly, there are many ways to construct a suffix tree. There is the original O(n) method by Weiner (1973), the improved one by McCreight (1976), the most well-known by Ukkonen (1991/1992), and a number of further improvements, largely related to implementation and storage efficiency considerations. Most notable among those is perhaps the Efficient implementation of lazy suffix trees by Giegerich and Kurtz.
Moreover, since the direct construction of suffix arrays has become possible in O(n) time in 2003 (e.g. using the Skew algorithm, but there are others as well), and since there are well-studied methods for
emulating suffix trees using suffix arrays (e.g. Abouelhoda/Kurtz 2004)
compressing suffix arrays (see Navarro/Mäkinen 2007 for a survey)
suffix arrays are usually preferred over suffix trees. Therefore, if your intention is to build a highly optimised implementation for a specific purpose, you might want to look into studying suffix array construction algorithms.
However, if your interest is in suffix tree construction, and in particular the Ukkonen algorithm, I would like to suggest that you take a close look at the description in this SO post, which you mentioned already, and we try to improve that description together. It's certainly far from a perfectly intuitive explanation.
To answer the question about how to compare input string to edge labels: For efficiency reasons during construction and look-up, the initial character of every edge label is usually stored in the node. But the rest must be looked up in the main text string, just like you said, and indeed this can cause issues, in particular when the string is so large that it cannot readily be held in memory. That (plus the fact that, like any direct implementation of a tree, the suffix tree is a data structure that contains a lot of pointers, which consume much memory and make it hard to maintain locality of reference and to benefit from memory caching) is one of the main reasons why suffix trees are so much harder to handle than e.g. inverted indexes.
If you combine the suffix array with an lcp table and a child table, which of course you should do, you essentially get a suffix tree. This point is made in the paper: Linearized Suffix Trees by Kim, Park and Kim. The lcp table enables a rather awkward bottom-up traversal, and the child table enables an easy traversal of either kind. So the story about suffix trees using pointers causing locality of reference problems is in my opinion obsolete information. The suffix tree is therefore``the right and easy way to go,'' as long as you implement the tree using an underlying suffix array.
The paper by Kim, Park and Kim describes a variant of the approach in Abouelhoda et al's misleadingly titled paper: Replacing suffix trees with enhanced suffix arrays. The Kim et al paper get it right that this is an implementation of suffix trees, and not a replacement. Moreover, the details of Abouelhoda et al's construction are more simply and intuitively described in Kim et al.
,
there's an implementation of Ukkonen's linear construction of suffix trees (plus suffix arrays, lcp array) here: http://code.google.com/p/text-indexing/ . the visualization provided along with the suffixtree.js may help

Bidirectional A* (A-star) Search

I'm implementing a bidirectional A* search (bidirectional as in the search is performed from both the origin and destination simultaneously, and when these two searches meet, I'll have my shortest path - at least with a bit of extra logic thrown in).
Does anyone have any experience with taking a unidirectional A* and bidirectionalising(!) it - what sort of performance gain can I expect? I'd reckoned on it more-or-less halving the search time, at a minimum - but may I see bigger gains that this? I'm using the algorithm to determine shortest routes on a road network - if that's in any way relevant (I've read about MS's "Reach" algorithm, but want to take baby-steps towards this rather than jumping straight in).
In the best possible case it'd run in O(b^(n/2)) instead of O(b^n), but that's only if you're lucky :)
(where b is your branching factor and n is the number of nodes a unidirectional A* would consider)
It all depends on how easily the two searches meet, if they find each other at a good halfway point early in the search you've done away with a lot of search time, but if they branch into wildly different directions you may end up with something slower than simple A* (because of all the extra bookkeeping)
You may want to try https://github.com/sroycode/tway
there is a benchmarking script that compares this with standard astar
( for NY city roads data it seems to give a 30% time benefit )
You may consider clustering as much more efficient booster. Also see this article

I need an algorithm to find the best path

I need an algorithm to find the best solution of a path finding problem. The problem can be stated as:
At the starting point I can proceed along multiple different paths.
At each step there are another multiple possible choices where to proceed.
There are two operations possible at each step:
A boundary condition that determine if a path is acceptable or not.
A condition that determine if the path has reached the final destination and can be selected as the best one.
At each step a number of paths can be eliminated, letting only the "good" paths to grow.
I hope this sufficiently describes my problem, and also a possible brute force solution.
My question is: is the brute force is the best/only solution to the problem, and I need some hint also about the best coding structure of the algorithm.
Take a look at A*, and use the length as boundary condition.
http://en.wikipedia.org/wiki/A%2a_search_algorithm
You are looking for some kind of state space search algorithm. Without knowing more about the particular problem, it is difficult to recommend one over another.
If your space is open-ended (infinite tree search), or nearly so (chess, for example), you want an algorithm that prunes unpromising paths, as well as selects promising ones. The alpha-beta algorithm (used by many OLD chess programs) comes immediately to mind.
The A* algorithm can give good results. The key to getting good results out of A* is choosing a good heuristic (weighting function) to evaluate the current node and the various successor nodes, to select the most promising path. Simple path length is probably not good enough.
Elaine Rich's AI textbook (oldie but goodie) spent a fair amount of time on various search algorithms. Full Disclosure: I was one of the guinea pigs for the text, during my undergraduate days at UT Austin.
did you try breadth-first search? (BFS) that is if length is a criteria for best path
you will also have to modify the algorithm to disregard "unacceptable paths"
If your problem is exactly as you describe it, you have two choices: depth-first search, and breadth first search.
Depth first search considers a possible path, pursues it all the way to the end (or as far as it is acceptable), and only then is it compared with other paths.
Breadth first search is probably more appropriate, at each junction you consider all possible next steps and use some score to rank the order in which each possible step is taken. This allows you to prioritise your search and find good solutions faster, (but to prove you have found the best solution it takes just as long as depth-first searching, and is less easy to parallelise).
However, your problem may also be suitable for Dijkstra's algorithm depending on the details of your problem. If it is, that is a much better approach!
This would also be a good starting point to develop your own algorithm that performs much better than iterative searching (if such an algorithm is actually possible, which it may not be!)
A* plus floodfill and dynamic programming. It is hard to implement, and too hard to describe in a simple post and too valuable to just give away so sorry I can't provide more but searching on flood fill and dynamic programming will put you on the path if you want to go that route.

Resources