optimal search algorithm without admissible heuristic - algorithm

Please forgive me if I'm not using the correct terms or have overlooked an existing solution. I'm not experienced in search algorithms and the theories behind it. I just would like to solve a problem.
I've previously used what I was told to be the A* algorithm to solve a different problem. But reading up on it I've realized that what I learned is not quite what wikipedia tells me.
What I learned was:
Start at your origin node
Open a new solution for each path you can take
Recursively create a new subsolution for each path you can take from there
When you arrive at the same place with multiple solutions, drop those who took longer than the fastest
Now if I understand wikipedia correctly, this is what I was supposed to do:
Start at your origin node
Open a new solution for each path you can take
Order the solutions by "cost of path taken" + "estimated cost to target"
Take cheapest solution and create subsolutions for each possible path
order those solutions into the others then rinse repeat
I can see how this would help with not calculating quite as many solutions but my problem is that I see no possiblity to create an "optimistic" estimate.
I'm not searching for a path on a geographical map. I'm trying to find the best sequence of actions. There's a minimum sequence of - say - ABCDEFGH. You cannot do F before E but repeating previous actions in particilar ordering might make later actions more efficient.
Do I need a different search algorithm? Do I do what I originally learned and just live with the fact that doing more work is the price for not having a good heuristic function?
I believe my teacher recognized this problem. And what I learned was simply A* with a heuristic function of f(n) = 0.

I'm not searching for a path on a geographical map. I'm trying to find
the best sequence of actions. There's a minimum sequence of - say -
ABCDEFGH. You cannot do F before E but repeating previous actions in
particular ordering might make later actions more efficient.
It is not clear to me whether you can repeat one action, i.e., a solution is ABCDEFGH, but would ABBBBCDEFGH be possible?
If not, then you might be able to have A* algorithm, implemented like this:
1. At some stage (say the first, "empty"), you have one of several actions
available.
2. The cost of going from Empty City to A City is the cost of action A.
3. The cost of going from Empty City to B city is the cost of action B.
When you've reached B, the cost of doing C is constant (if it is not, then you can't use A* as is) and you insert the cost of going from B City to C City as the cost of C.
So you can handle the case in which an action has different costs, provided that this difference is completely described by the previous state. For example, if you can only do C if you have done A or B, and the cost of C is 5 and 8, you enter the "distance" between A and C as 5, and B to C as 8.
If the cost of, say, D depends on the two previous states, you can still use a more complicated A* implementation where you define the virtual "cities" BC, AB and AC, and the distance from BC to D is "the cost of D having done B and C", and so on. The cost of reaching BC from A is "the cost of B given A, and the cost of C given A and B". So if these costs depend on the previous states, things get even more complicated.
In the end, the complexity of this revised A* will grow until it becomes your algorithm, where every state depends potentially on the sequence of all preceding states. The more this is true, the more your algorithm is convenient; the more every state is a cost unto itself, the more A* is convenient.
And of course the possibility of closed loops (visiting the same state/action twice, making this a cyclic graph) blows A* straight out of the water.

Related

Algorithms for Deducing a Timeline / Chronology

I'm looking for leads on algorithms to deduce the timeline/chronology of a series of novels. I've split the texts into days and created a database of relationships between them, e.g.: X is a month before Y, Y and Z are consecutive, date of Z is known, X is on a Tuesday, etc. There is uncertainty ('month' really only means roughly 30 days) and also contradictions. I can mark some relationships as more reliable than others to help resolve ambiguity and contradictions.
What kind of algorithms exist to deduce a best-fit chronology from this kind of data, assigning a highest-probability date to each day? At least time is 1-dimensional but dealing with a complex relationship graph with inconsistencies seems non-trivial. I have a CS background so I can code something up but some idea about the names of applicable algorithms would be helpful. I guess what I have is a graph with days as nodes as relationships as edges.
A simple, crude first approximation to your problem would be to store information like "A happened before B" in a directed graph with edges like "A -> B". Test the graph to see whether it is a Directed Acyclic Graph (DAG). If it is, the information is consistent in the sense that there is a consistent chronology of what happened before what else. You can get a sample linear chronology by printing a "topological sort" (topsort) of the DAG. If events C and D happened simultaneously or there is no information to say which came before the other, they might appear in the topsort as ABCD or ABDC. You can even get the topsort algorithm to print all possibilities (so both ABCD and ABDC) for further analysis using more detailed information.
If the graph you obtain is not a DAG, you can use an algorithm like Tarjan's algorithm to quickly identify "strongly connected components", which are areas of the graph which contain chronological contradictions in the form of cycles. You could then analyze them more closely to determine which less reliable edges might be removed to resolve contradictions. Another way to identify edges to remove to eliminate cycles is to search for "minimum feedback arc sets". That's NP-hard in general but if your strongly connected components are small the search could be feasible.
Constraint programming is what you need. In propagation-based CP, you alternate between (a) making a decision at the current choice point in the search tree and (b) propagating the consequences of that decision as far as you can. Notionally you do this by maintaining a domain D of possible values for each problem variable x such that D(x) is the set of values for x which have not yet been ruled out along the current search path. In your problem, you might be able to reduce it to a large set of Boolean variables, x_ij, where x_ij is true iff event i precedes event j. Initially D(x) = {true, false} for all variables. A decision is simply reducing the domain of an undecided variable (for a Boolean variable this means reducing its domain to a single value, true or false, which is the same as an assignment). If at any point along a search path D(x) becomes empty for any x, you have reached a dead-end and have to backtrack.
If you're smart, you will try to learn from each failure and also retreat as far back up the search tree as required to avoid redundant search (this is called backjumping -- for example, if you identify that the dead-end you reached at level 7 was caused by the choice you made at level 3, there's no point in backtracking just to level 6 because no solution exists in this subtree given the choice you made at level 3!).
Now, given you have different degrees of confidence in your data, you actually have an optimisation problem. That is, you're not just looking for a solution that satisfies all the constraints that must be true, but one which also best satisfies the other "soft" constraints according to the degree of trust you have in them. What you need to do here is decide on an objective function assigning a score to a given set of satisfied/violated partial constraints. You then want to prune your search whenever you find the current search path cannot improve on the best previously found solution.
If you do decide to go for the Boolean approach, you could profitably look into SAT solvers, which tear through these kinds of problems. But the first place I'd look is at MiniZinc, a CP language which maps on to a whole variety of state of the art constraint solvers.
Best of luck!

Calculating taxi movements

Let's say I have N taxis, and N customers waiting to be picked up by the taxis. The initial positions of both customers and taxis are random/arbitrary.
Now I want to assign each taxi to exactly one customer.
The customers are all stationary, and the taxis all move at identical speed. For simplicity, let's assume there are no obstacles, and the taxis can move in straight lines to assigned customers.
I now want to minimize the time until the last customer enters his/her taxi.
Is there a standard algorithm to solve this? I have tens of thousands of taxis/customers. Solution doesn't have to be optimal, just ‘good’.
The problem can almost be modelled as the standard “Assignment Problem”, solvable using the Hungarian algorithm (the Kuhn–Munkres algorithm or Munkres assignment algorithm). However, I want to minimize the cost of the costliest assignment, not minimize the sum of costs of the assignments.
Since you mentioned Hungarian Algorithm, I guess one thing you could do is using some different measure of distance rather than the euclidean distance and then run t Hungarian Algorithm on it. For example, instead of using
d = sqrt((x0 - x1) ^ 2 + (y1 - y0) ^ 2)
use
d = ((x0 - x1) ^ 2 + (y1 - y0) ^ 2) ^ 10
that could cause the algorithm to penalize big numbers heavily, which could constrain the length of the max distance.
EDIT: This paper "Geometry Helps in Bottleneck Matching and Related
Problems" may contains a better algorithm. However, I am still in the process of reading it.
I'm not sure that the Hungarian algorithm will work for your problem here. According to the link, it runs in n ^ 3 time. Plugging in 25,000 as n would yield 25,000 ^ 3 = 15,625,000,000,000. That could take quite a while to run.
Since the solution does not need to be optimal, you might consider using simulated annealing or possibly a genetic algorithm instead. Either of these should be much faster and still produce close to optimal solutions.
If using a genetic algorithm, the fitness function can be designed to minimize the longest period of time that an individual would need to wait. But, you would have to be careful because if that is the sole criteria, then the solution won't work too well for cases when there is just one cab that is closest to the passenger that is furthest away. So, the fitness function would need to take into account the other waiting times as well. One idea to solve this would be to run the model iteratively and remove the longest cab trip (both cab & person) after each iteration. But, doing that for all 10,000+ cabs/people could be expensive time wise.
I don't think any cab owner or manager would even consider minimizing the waiting time for the last customer entering his cab over minimizing the sum of the waiting time for all cabs - simply because they make more money overall when minimizing the sum of the waiting times. At least Louie DePalma would never do that... So, I suspect that the real problem you have has little or nothing to do with cabs...
A "good" algorithm that would solve your problem is a Greedy Algorithm. Since taxis and people have a position, these positions can be related to a "central" spot. Sort the taxis and people needing to get picked up in order (in relation to the "centre"). Then start assigning taxis, in order, to pick up people in order. This greedy rule will ensure taxis closest to the centre will pick up people closest to the centre and taxis farthest away pick up people farthest away.
A better way might be to use Dynamic Programming however, I am not sure nor have the time to invest. A good tutorial for Dynamic Programming can be found here
For an optimal solution: construct a weighted bipartite graph with a vertex for each taxi and customer and an edge from each taxi to each customer whose weight is the travel time. Scan the edges in order of nondecreasing weight, maintaining a maximum matching of the subgraph containing the edges scanned so far. Stop when the matching is perfect.

Finding fastest path at a cost, less or equal to a specified

Here's visualisation of my problem.
I've been trying to use djikstra on that however, It haven't worked.
The complication, as I see it, is that Dijkstra's algorithm throws away information that you need to keep around: if you are trying to get from A to E in
B
/ \
A D - E
\ /
C
And ABD is shorter than ACD, Dijkstra's will forget that ACD was ever a possibility (it uses ACD as the canonical route from A to D). But if ABD has a higher cost than ACD, and ABDE is above the quota while ACDE is below, the now eliminated ACD was correct. The problem is that Dijkstra's algorithm assumes that if one path is at least as long as another, it is weakly dominated: there is no reason to prefer it. And in one dimension of comparison, paths are weakly ordered: given any two paths, one weakly dominates the other.
But here we have two dimensions of comparison, and so ordering does not hold: one path can be shorter, the other cheaper. Since we can only discard dominated paths, we must keep all paths that do not already exceed the budget and are not dominated. I have put a bit of work into implementing this approach; it looks doable but cannot find an argument for a worst-case bound below exponential complexity (although normal performance should be much better, since in a sane graphs most paths are dominated).
You can also, as Billiska notes, use k-th shortest routes algorithms and then proceed through their results until you find one below the budget. That uses time O(m+ K*n*log(m/n)); but unless someone sees an upper bound on K such that K is guaranteed to include a path under the budget (if one exists), we need to set K to be the total number of paths, again yielding exponential complexity (although again a strategy of incrementally increasing K would likely yield a reasonable average runtime, at least if length and cost are reasonably correlated).
EDIT:
Complicating (perhaps fatally) the implementation of my proposed modification is that Dijkstra's algorithm relies on an ordering of the accessibility of nodes, such that we know that if we take the unexplored node to which we have the shortest path, we will never find a better route to it (since all other routes are already known to be longer). If that shortest route is also expensive, that need not hold; even after exploring a node, we must be prepared to update paths out of it on the basis of longer but cheaper routes into it. I suspect that this will prevent it from reaching polynomial time in the worst case.
Basically you need to find the first shortest-path, check if it works, then find the second shortest-path, check if it works, and so on...
Dijkstra's algorithm isn't designed to work with such task.
And just a Google search on this new definition of the problem,
I arrive at Stack Overflow question on finding kth-shortest-paths.
I haven't read into it yet, so don't ask me.
I hope this helps.
I think you can do it with Dijkstra, but you have to change the way you are calculating the tentative distance in each step. Instead of just taking into account the distance, consider also the cost. the new distance should be 2-d number (dist, cost), when you will choose what is the minimal distance you should take the one with minimal dist AND cost <= 6, that's it.
I hope this is correct.

Optimal selection election algorithm

Given a bunch of sets of people (similar to):
[p1,p2,p3]
[p2,p3]
[p1]
[p1]
Select 1 from each set, trying to minimize the maximum number of times any one person is selected.
For the sets above, the max number of times a given person MUST be selected is 2.
I'm struggling to get an algorithm for this. I don't think it can be done with a greedy algorithm, more thinking along the lines of a dynamic programming solution.
Any hints on how to go about this? Or do any of you know any good websites about this stuff that I could have a look at?
This is neither dynamic nor greedy. Let's look at a different problem first -- can it be done by selecting every person at most once?
You have P people and S sets. Create a graph with S+P vertices, representing sets and people. There is an edge between person pi and set si iff pi is an element of si. This is a bipartite graph and the decision version of your problem is then equivalent to testing whether the maximum cardinality matching in that graph has size S.
As detailed on that page, this problem can be solved by using a maximum flow algorithm (note: if you don't know what I'm talking about, then take your time to read it now, as you won't understand the rest otherwise): first create a super-source, add an edge linking it to all people with capacity 1 (representing that each person may only be used once), then create a super-sink and add edges linking every set to that sink with capacity 1 (representing that each set may only be used once) and run a suitable max-flow algorithm between source and sink.
Now, let's consider a slightly different problem: can it be done by selecting every person at most k times?
If you paid attention to the remarks in the last paragraph, you should know the answer: just change the capacity of the edges leaving the super-source to indicate that each person may be used more than once in this case.
Therefore, you now have an algorithm to solve the decision problem in which people are selected at most k times. It's easy to see that if you can do it with k, then you can also do it with any value greater than k, that is, it's a monotonic function. Therefore, you can run a binary search on the decision version of the problem, looking for the smallest k possible that still works.
Note: You could also get rid of the binary search by testing each value of k sequentially, and augmenting the residual network obtained in the last run instead of starting from scratch. However, I decided to explain the binary search version as it's conceptually simpler.

Hill climbing and single-pair shortest path algorithms

I have a bit of a strange question. Can anyone tell me where to find information about, or give me a little bit of an introduction to using shortest path algorithms that use a hill climbing approach? I understand the basics of both, but I can't put the two together. Wikipedia has an interesting part about solving the Travelling Sales Person with hill climbing, but doesn't provide a more in-depth explanation of how to go about it exactly.
For example, hill climbing can be
applied to the traveling salesman
problem. It is easy to find a solution
that visits all the cities but will be
very poor compared to the optimal
solution. The algorithm starts with
such a solution and makes small
improvements to it, such as switching
the order in which two cities are
visited. Eventually, a much better
route is obtained.
As far as I understand it, you should pick any path and then iterate through it and make optimisations along the way. For instance going back and picking a different link from the starting node and checking whether that gives a shorter path.
I am sorry - I did not make myself very clear. I understand how to apply the idea to Travelling Salesperson. I would like to use it on a shortest distance algorithm.
You could just randomly exchange two cities.
You first path is: A B C D E F A with length 200
Now you change it by swapping C and D: A B D C E F A with length 350 - Worse!
Next step: A B C D F E A: length 150 - You improved your solution. ;-)
Hill climbing algorithms are really easy to implement but have several problems with local maxima! [A better approch based on the same idea is simulated annealing.]
Hill climbing is a very simple kind of evolutionary optimization, a much more sophisticated algorithm class are genetic algorithms.
Another good metaheuristic for solving the TSP is ant colony optimization
Examples would be genetic algorithms or expectation maximization in data clustering. With an iteration of single steps it is tried to come to a better solution with every step. The problem is that it only finds a local maximum/minimum, it is never assured that it finds the global maximum/minimum.
A solution for the travelling salesman problem as a genetic algorithm for which we need:
Representation of the solution as order of visited cities, e.g. [New York, Chicago, Denver, Salt Lake City, San Francisco]
Fitness function as the travelled distance
Selection of the best results is done by selecting items randomly depending on their fitness, the higher the fitness, the higher the probability that the solution is chosen to survive
Mutation would be switching to cities in a list, like [A,B,C,D] becomes [A,C,B,D]
Crossing of two possible solutions [B,A,C,D] and [A,B,D,C] result in [B,A,D,C], i.e. cutting both list in the middle and use the beginning of one parent and the end of the other parent to form the child
The algorithm then:
initalization of the starting set of solution
calculation of the fitness of every solution
until desired maximum fitness or until no changes happen any more
selection of the best solutions
crossing and mutation
fitness calculation of every solution
It is possible that with every execution of the algorithm the result is differently, therefore it should be executed more then once.
I'm not sure why you would want to use a hill-climbing algorithm, since Djikstra's algorithm is polynomial complexity O( | E | + | V | log | V | ) using Fibonacci queues:
http://en.wikipedia.org/wiki/Dijkstra's_algorithm
If you're looking for an heuristic approach to the single-path problem, then you can use A*:
http://en.wikipedia.org/wiki/A*_search_algorithm
but an improvement in efficiency is dependent on having an admissible heuristic estimate of the distance to the goal.
http://en.wikipedia.org/wiki/A*_search_algorithm
To hillclimb the TSP you should have a starting route. Of course picking a "smart" route wouldn't hurt.
From that starting route you make one change and compare the result. If it's higher you keep the new one, if it's lower keep the old one. Repeat this until you reach a point from where you can't climb anymore, which becomes your best result.
Obviously, with TSP, you will more than likely hit a local maximum. But it is possible to get decent results.

Resources