Improving A* algorithm - algorithm

Say I am finding a path in a house using A* algorithm. Now the running time could be O(n^2).
I was thinking will it improve the performance if I knew which doors to follow and according I shall apply A* on it i.e. if I have the starting position S and final position as F, and instead of applying the A* on these two end points, will be be better if I applied the A* on
`S` and `A1`
`A1` and `A2`
`A2` and F.
Where A1 and A2 are my intermediates(doors) that shall be followed for the shortest path? Will it be worth the improvement to find the intermediates and then follow the path and not just apply A* directly on starting and ending.
Considering it takes linear time to find the intermediates.

Yes, that will help a lot in case the algorithm takes O(n^2) behavior at runtime. Instead of one big problem you get two smaller problems with each being 1/4 as expensive to compute.
I'm sure there are pathological cases where it doesn't help or even hurt but in your scenario (house) it would probably help a lot.
I imagine that you are using the fact that one has to go up an elevator or stairs to change floors. That would help A* a lot because the cost function now has to work only within a single floor. It will be very representative of the real cost. In contrast to that the cost function would be greatly underestimating the distance if you wanted to move into the same room but one floor higher. Euclidean distance would fail totally in that case (and the algorithm would degrade into an exhaustive search). First moving to the stairs and then moving from the stairs to the desired room would work much better.

Related

Comparison of A* heuristics for solving an N-puzzle

I am trying to solve the N-puzzle using the A* algorithm with 3 different heuristic functions. I want to know how to compare each of the heuristics in terms of time complexity. The heuristics I am using are: manhattan distance , manhattan distance + linear conflict, N-max swap. And specifically for an 8-puzzle and an 15-puzzle.
The N-puzzle is, in general, NP hard to find the shortest solution, so no matter what heuristic you use it's unlikely you'll be able to find any difference in complexity between them, since you won't be prove the tightness of any bound.
If you restrict yourself to the 8-puzzle or 15-puzzle, an A* algorithm with any admissible heuristic will run in O(1) time since there are a finite (albeit large) number of board positions.
As #Harold said in his comment, the approach to compare time complexity of heuristic functions is tipically by experimental tests. In your case, generate a set of n random problems for the 8-puzzle and the 15-puzzle and solve them using the different heuristic functions. Things to be aware of are:
The comparison will always depend on several factors, like hardware expecs, programming language, your skills when implementing the algorithm, ...
Generally speaking, a more informed heuristic will always expand less nodes than a less informed one, and will probably be faster.
And finally, in order to compare the three heuristics for each problem set, I would suggest a graphic with average running times (repeat for example 5 times each problem) where:
The problems are in the x-axis sorted by difficulty.
The running times are in the y-axis for each heuristic function (perhaps in logarithmic scale if the difference between the alternatives cannot be easily seen).
and a similar graphic with the number of explored states.

Best algorithm for 5x5 tictactoe ai using 4 in a row

What would be the best algorithm to use if i am creating a 5x5 tictactoe ai using 4 in row. The original algorithm that we were supposed to use was the minimax but we are only given 10 seconds per turn.
If you want to build a high-quality player there are many important enhancements you can implement.
First, of course, is alpha-beta pruning, but there are several other techniques that will make alpha-beta pruning even more effective.
Second, as the time limit is important, you should add iterative deepening. That is, you search first to depth 1, then to depth 2, etc. When you run out of time you take the best move from the previously completed iteration. Because the tree grows exponentially, you don't really lose anything from your previously iterations. (With a branching factor of 2 the overhead is a factor of 2, but as the branching factor increases further this overhead drops to nothing.)
Third, use the history heuristic to order your search. On the small iterations you'll learn a better ordering of states so that you get closer to the optimal ordering (for alpha-beta pruning) on the later iterations.
Fourth, use a transposition table to avoid duplicate states. There are many transpositions that occur when searching the tree, and detecting them early will result in significant savings. (This will probably have a bigger impact than the history heuristic.)
Finally, build the best evaluation function possible. The better you are at evaluating states the better you play. (In the limit a perfect evaluation would then only require a 1-ply search to play perfectly.)
Of course, if you can just solve the game, do that. There are only 3^25 (847,288,609,443) possible states in 5x5 tic-tac-toe, so with a decently powered machine you can solve the game, giving you the perfect evaluation function.
Since you have mentioned about minimax algorithm, i want to suggest you a slightly complicated variant of minimax which is called alpha-beta pruning. Alpha-beta pruning avoids a part of the search space which is called pruning. I encourage you to read this article.
You can walk through the search tree till depth 5 or whatever your time allows you. Please not that, i am not saying alpha-beta pruning is the best but i am suggesting that the pruning technique might give you advantage over minimax in terms of time complexity.
Additional resources:
Alpha-beta pruning in Tic-Tac-Toe

How to calculate the average time complexity of the nearest neighbor search using kd-tree?

We know the complexity of the nearest neighbor search of kd-tree is O(logn). But how to calculate it? The main problem is the average time complexity of the back tracing. I have tried to read the paper "An Algorithm for Finding Best Matches in Logarithmic Expected Time", but it is too complicate for me. Does anyone know a simple way to calculate that?
The calculation in the paper is about as simple as possible for a rigorous analysis.
(NB This is the price of being a true computer scientist and software engineer. You must put the effort into learning the math. Knowing the math is what separates people who think they can write solid programs from those who actually can. Jon Bentley, the guy who invented kd-trees, did so when he was in high school. Take this as inspiration.)
If you want a rough intuitive idea that is not rigorous, here is one.
Assume we are working in 2d. The sizes of the geometric areas represented by the 2d-tree are the key.
In the average case, one point partitions the domain into 2 roughly equal-sized rectangles. 3 points into 4. 7 points into 8 parts. Etc. In general N points lead to N-1 roughly equal-sized rectangles.
It not hard to see that if the domain is 1x1, the length of a side of these parts is on average O(sqrt(1/N)).
When you search for a nearest neighbor, you descend the tree to the rectangle containing the search point. After doing this, you have used O(log N) effort to find a point within R = O(sqrt(1/N)) of the correct one. This is just a point contained in the leaf that you discovered.
But this rectangle is not the only one that must be searched. You must still look at all others containing a point no more than distance R away from the search point, refining R each time you find a closer point.
Fortunately, the O(sqrt(1/N)) limit on R provides a tight bound on the average number of other rectangles this can be. In the average case, it's about 8 because each equal-sized rectangle has no more than 8 neighbors.
So the total effort to search is O(8 log n) = O(log n).
Again, I repeat this is not a rigorous analysis, but it ought to give you a feel for why the algorithm is O(log N) in the average case.

How to design an optimal trip planner

I'm developing a trip planer program. Each city has a property called rateOfInterest. Each road between two cities has a time cost. The problem is, given the start city, and the specific amount of time we want to spend, how to output a path which is most interesting (i.e. the sum of the cities' rateOfInterest). I'm thinking using some greedy algorithm, but is there any algorithm that can guarantee an optimal path?
EDIT Just as #robotking said, we allow visit places multiple times and it's only interesting the first visit. We have 50 cities, and each city approximately has 5 adjacent cities. The cost function on each edge is either time or distance. We don't have to visit all cities, just with the given cost function, we need to return an optimal partial trip with highest ROI. I hope this makes the problem clearer!
This sounds very much like an instance of a TSP in a weighted manner meaning there is some vertices that are more desirable than other...
Now you could find an optimal path trying every possible permutation (using backtracking with some pruning to make it faster) depending on the number of cities we are talking about. See the TSP problem is a n! problem so after n > 10 you can forget it...
If your number of cities is not that small then finding an optimal path won't be doable so drop the idea... however there is most likely a good enough heuristic algorithm to approximate a good enough solution.
Steven Skiena recommends "Simulated Annealing" as the heuristics of choice to approximate such hard problem. It is very much like a "Hill Climbing" method but in a more flexible or forgiving way. What I mean is that while in "Hill Climbing" you always only accept changes that improve your solution, in "Simulated Annealing" there is some cases where you actually accept a change even if it makes your solution worse locally hoping that down the road you get your money back...
Either way, whatever is used to approximate a TSP-like problem is applicable here.
From http://en.wikipedia.org/wiki/Travelling_salesman_problem, note that the decision problem version is "(where, given a length L, the task is to decide whether any tour is shorter than L)". If somebody gives me a travelling salesman problem to solve I can set all the cities to have the same rate of interest and then the decision problem is whether a most interesting path for time L actually visits all the cities and returns.
So if there was an efficient solution for your problem there would be an efficient solution for the travelling salesman problem, which is unlikely.
If you want to go further than a greedy search, some of the approaches of the travelling salesman problem may be applicable - http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.26.5150 describes "Iterated Local Search" which looks interesting, with reference to the TSP.
If you want optimality, use a brute force exhaustive search where the leaves are the one where the time run out. As long as the expected depth of the search tree is less than 10 and worst case less than 15 you can produce a practical algorithm.
Now if you think about the future and expect your city network to grow, then you cannot ensure optimality. In this case you are dealing with a local search problem.

better heuristic then A*

I am enrolled in Stanford's ai-class.com and have just learned in my first week of lecture about a* algorithm and how it's better used then other search algo.
I also show one of my class mate implement it on 4x4 sliding block puzzle which he has published at: http://george.mitsuoka.org/StanfordAI/slidingBlocks/
While i very much appreciate and thank George to implement A* and publishing the result for our amusement.
I (and he also) were wondering if there is any way to make the process more optimized or if there is a better heuristic A*, like better heuristic function than the max of "number of blocks out of place" or "sum of distances to goals" that would speed things up?
and Also if there is a better algo then A* for such problems, i would like to know about them as well.
Thanks for the help and in case of discrepancies, before down grading my profile please allow me a chance to upgrade my approach or even if req to delete the question, as am still learning the ways of stackoverflow.
It depends on your heuristic function. for example, if you have a perfect heuristic [h*], then a greedy algorithm(*), will yield better result then A*, and will still be optimal [since your heuristic is perfect!]. It will develop only the nodes needed for the solution. Unfortunately, it is seldom the case that you have a perfect heuristic.
(*)greedy algorithm: always develop the node with the lowest h value.
However, if your heuristic is very bad: h=0, then A* is actually a BFS! And A* in this case will develop O(B^d) nodes, where B is the branch factor and d is the number of steps required for solving.
In this case, since you have a single target function, a bi-directional search (*) will be more efficient, since it needs to develop only O(2*B^(d/2))=O(B^(d/2)) nodes, which is much less then what A* will develop.
bi directional search: (*)run BFS from the target and from the start nodes, each iteration is one step from each side, the algorithm ends when there is a common vertex in both fronts.
For the average case, if you have a heuristic which is not perfect, but not completely terrbile, A* will probably perform better then both solutions.
Possible optimization for average case: You also can run bi-directional search with A*: from the start side, you can run A* with your heuristic, and a regular BFS from the target side. Will it get a solution faster? no idea, you should probably benchmark the two possibilities and find which is better. However, the solution found with this algorithm will also be optimal, like BFS and A*.
The performance of A* is based on the quality of the expected cost heuristic, as you learned in the videos. Getting your expected cost heuristic to match as closely as possible to the actual cost from that state will reduce the total number of states that need to be expanded. There are also a number of variations that perform better under certain circumstances, like for instance when faced with hardware restrictions in large state space searching.

Resources