Viterbi algorithm in linear time - algorithm

I have a problem where given a Hidden Markov model and the states S I need to find an algorithm that returns the most probable path through the Hidden Markov model for a given sequence X in time O(|S|).
I was thinking of developing a graph where I would have all the different states at different positions in X and run a shortest path algorithm on this graph. However I will have n|S|^2 edges (where n is the number of states in X) and n|S| vertices.
The best algorithm I have found is the acyclic shortest path which runs in time O(|E|+|V|) which is O(|S|^2) in my case.
Is there an algorithm I can develop for it to run in time O(|S|)? All I need is the general idea.
Thanks

I think if you want to retrieve the exact most likely sequence you cannot do it in linear time on all instances. However if your symbol space is discrete the average case time complexity may be reduced. Take a look at Ukkonen's optimization for computing edit distances and its generalizations. Also take a look at compression based techniques this is also based on Ukkonen's work.

Related

Shortest root using machine learning/AI

Assume that I have set of points scattered on the XY plane, and i have two points say start and end point any where in XY plane. I want to find the shortest path between start and end point without touching scattered points. The path has to maintain certain offset ( i.e assume path has some width ).
How to approach this kind of problems in programming, Are there any algorithms in machine learning.
So you need a greedy algorithm for the shortest path?
Try Dijsktra's Algorithm.
http://www.geeksforgeeks.org/greedy-algorithms-set-6-dijkstras-shortest-path-algorithm/
The shortest solution for the lowest price.
You can also consider the A* algorithm.
This finds the same solution as Dijkstra's algorithm, but often at a lower computational cost (which might be important in your case, since after the space discretization you might end up with a large grid).
This is because A* uses a heuristic to bias the search, so that it looks into more promising directions first (e.g. moving towards the target is in principle a good idea, so this is attempted first).
You can see some visualizations of A* running here and (side by side with Dijkstra's algorithm - thanks #Thrawn for the link), here.
This is not a machine learning problem but an optimization problem.
So you need a greedy algorithm for the shortest path
Indeed it could be solved this way but the challenge is to represent your grid as a graph...
For example, decomposing the grid in a n x n matrix. In your shortest path algorithm, a node is an element of your matrix (so you exclude the elements of the matrice that contains the scattered points) and the weight of the arcs are the distance.
However n must be small since shortest path algotithms are np-hard problems...
Maybe other algorithms exist for this specific problem but I'm not aware of.
Like others already stated: this is not a typical "Artificial Intelligence" problem. It is kind of a path planning problem.
There are different algorithms available. If your path doesn't neet to satisfy any constraints like .g. smoothness, you can use an A*-Algorithm with distance as heuristic.
You have to represent your XYZ-space as a Graph where each node has a coordinate. Further you need to take into account, that no nodes lie near the points you want to avoid.
If your path needs to satisfy constraints, this turns into a more complicated path planning problem where you could apply optimization or RRTs.

Approximate Edit distance tree - Exact Edit path

I've been looking for an algorithm to efficiently compute an edit path between two trees, a path that does not have to correspond to shortest edit distance but preferably a relatively short one.
The case is that I have two directory trees, consisting of directories and files, and want to compute a sequence of deletes, inserts and renames that will transform one to the other.
I have tried searching both stackoverflow and the wild web but all I find is algorithms for computing shortest edit distance, but they all have high scaling factors.
So my question is, is there any more efficient way then for example "Zhang and Shasha" when I don't need the optimum distance?
Kind regards
There is the Klein algorithm that performs slightly better than "Zhang and Sasha", however it remains of very high complexity in both space and time for practical purpose.
There is an algorithm here that is in fact an heuristic, since the authors misused the term approximation.
It reduces the problem to a series of maximum weighted cliques for which it exists several approximation and heuristics, even a greedy approach could here perform reasonably well.
What is true for graphs is true for trees, you could therefore use a graph kernel convolution approach.
If you are looking for an off the shelf implementation (of an unspeficied algorithm, I woudl guess Zhang or Klein), you can check here

Approximation algorithm for TSP variant, fixed start and end anywhere but starting point + multiple visits at each vertex ALLOWED

NOTE: Due to the fact that the trip does not end at the same place it started and also the fact that every point can be visited more than once as long as I still visit all of them, this is not really a TSP variant, but I put it due to lack of a better definition of the problem.
So..
Suppose I am going on a hiking trip with n points of interest. These points are all connected by hiking trails. I have a map showing all trails with their distances, giving me a directed graph.
My problem is how to approximate a tour that starts at a point A and visits all n points of interest, while ending the tour anywhere but the point where I started and I want the tour to be as short as possible.
Due to the nature of hiking, I figured this would sadly not be a symmetric problem (or can I convert my asymmetric graph to a symmetric one?), since going from high to low altitude is obviously easier than the other way around.
Also I believe it has to be an algorithm that works for non-metric graphs, where the triangle inequality is not satisfied, since going from a to b to c might be faster than taking a really long and weird road that goes from a to c directly. I did consider if triangle inequality still holds, since there are no restrictions regarding how many times I visit each point, as long as I visit all of them, meaning I would always choose the shortest of two distinct paths from a to c and thus never takr the long and weird road.
I believe my problem is easier than TSP, so those algorithms do not fit this problem. I thought about using a minimum spanning tree, but I have a hard time convincing myself that they can be applied to a non-metric asymmetric directed graph.
What I really want are some pointers as to how I can come up with an approximation algorithm that will find a near optimal tour through all n points
To reduce your problem to asymmetric TSP, introduce a new node u and make arcs of length L from u to A and from all nodes but A to u, where L is very large (large enough that no optimal solution revisits u). Delete u from the tour to obtain a path from A to some other node via all others. Unfortunately this reduction preserves the objective only additively, which make the approximation guarantees worse by a constant factor.
The target of the reduction Evgeny pointed out is non-metric symmetric TSP, so that reduction is not useful to you, because the approximations known all require metric instances. Assuming that the collection of trails forms a planar graph (or is close to it), there is a constant-factor approximation due to Gharan and Saberi, which may unfortunately be rather difficult to implement, and may not give reasonable results in practice. Frieze, Galbiati, and Maffioli give a simple log-factor approximation for general graphs.
If there are a reasonable number of trails, branch and bound might be able to give you an optimal solution. Both G&S and branch and bound require solving the Held-Karp linear program for ATSP, which may be useful in itself for evaluating other approaches. For many symmetric TSP instances that arise in practice, it gives a lower bound on the cost of an optimal solution within 10% of the true value.
You can simplify this problem to a normal TSP problem with n+1 vertexes. To do this, take node 'A' and all the points of interest and compute a shortest path between each pair of these points. You can use the all-pairs shortest path algorithm on the original graph. Or, if n is significantly smaller than the original graph size, use single-source shortest path algorithm for these n+1 vertexes. Also you can set length of all the paths, ending at 'A', to some constant, larger than any other path, which allows to end the trip anywhere (this may be needed only for TSP algorithms, finding a round-trip path).
As a result, you get a complete graph, which is metric, but still asymmetric. All you need now is to solve a normal TSP problem on this graph. If you want to convert this asymmetric graph to a symmetric one, Wikipedia explains how to do it.

exact graph algorithms

in data structures and algorithms, what is meant by "Exact Graph Algorithms" ? can you give me some examples?
I suppose it refers to whether the algorithm yields a result, that is optimal for a given problem or if it yields "just" an approximative result.
For example, if you are looking in a graph for the shortest path from one node to another, there are a bunch of algorithms (Dijkstra, Floyd-Warshall,... you name them) that solve the problem exactly, i.e. they yield a shortest path between the two given nodes.
On the other hand, consider the Travelling Salesman problem. It states the question of a shortest circular path containing some given nodes. This problem is NP-complete, and thus (supposedly) not solvable exactly in a reasonable amount of time (at least for the general case). However, there exist approximation algorithms running in reasonable amount of time, that yield a solution that is at most 2*length(best route) (at least for metric TSP), so the solution from this algorithm is not an exact one, but just an approximation.

Algorithm: shortest path between all points

Suppose I have 10 points. I know the distance between each point.
I need to find the shortest possible route passing through all points.
I have tried a couple of algorithms (Dijkstra, Floyd Warshall,...) and they all give me the shortest path between start and end, but they don't make a route with all points on it.
Permutations work fine, but they are too resource-expensive.
What algorithms can you advise me to look into for this problem? Or is there a documented way to do this with the above-mentioned algorithms?
Have a look at travelling salesman problem.
You may want to look into some of the heuristic solutions. They may not be able to give you 100% exact results, but often they can come up with good enough solutions (2 to 3 % away from optimal solutions) in a reasonable amount of time.
This is obviously Travelling Salesman problem. Specifically for N=10, you can either try the O(N!) naive algorithm, or using Dynamic Programming, you can reduce this to O(n^2 2^n), by trading space.
Beyond that, since this is an NP-hard problem, you can only hope for an approximation or heuristic, given the usual caveats.
As others have mentioned, this is an instance of the TSP. I think Concord, developed at Georgia Tech is the current state-of-the-art solver. It can handle upwards of 10,000 points within a few seconds. It also has an API that's easy to work with.
I think this is what you're looking for, actually:
Floyd Warshall
In computer science, the Floyd–Warshall algorithm (sometimes known as
the WFI Algorithm[clarification needed], Roy–Floyd algorithm or just
Floyd's algorithm) is a graph analysis algorithm for finding shortest
paths in a weighted graph (with positive or negative edge weights). A
single execution of the algorithm will find the lengths (summed
weights) of the shortest paths between all pairs of vertices though it
does not return details of the paths themselves
In the "Path reconstruction" subsection it explains the data structure you'll need to store the "paths" (actually you just store the next node to go to and then trivially reconstruct whichever path is required as needed).

Resources