Greedy Algorithm vs Nearest Neighbor Algorithm - nearest-neighbor

I'm doing an assignment on Travelling Salesman and I want to know the difference between the Greedy Algorithm and the Nearest Neighbor algorithm. I looked it up and they seemed almost the same to me. And there weren't any good resources tha compared them. What is their difference?

Actually, I'm having a hard time finding the similarity :) You talk about two very different algorithms here, for different kinds of problems.
Greedy algorithm is being used mainly for graphs, as it's supposed to solve staged-problems, when each stage requires us to make a decision. For example, when trying to find the shortest way from one point to another, it would choose each time the closest point to the current point it stands at.
K-NN is a lazy classification algorithm, being used a lot in machine learning problems. It calculates the class for a value depending on its distance from the k closest points in the set.
Thinking about it, you can actually say that each stage of the greedy algorithm uses a 1-Nearest-Neighbours algorithm to find the closest point, but it's pretty ridiculous... :)
Hope it's understandable!

Related

Solving maze with "islands"

I have this layout of a maze that I am having trouble thinking of how to implement a solution for:
I know there are many resources for maze solving algorithms e.g. http://www.astrolog.org/labyrnth/algrithm.htm but I am not sure which algorithm is best suited for the given maze.
There are three areas labelled “*” which are the locations that MazeSolver needs to go to before being able to exit the maze from the entrance at the top of the map.
I would appreciate pseudo code of solving the maze islands part. I would be looking for a simple solution and optimal time is not really an issue. The thing is even though an overview of the maze is provided beforehand to the solver, it may not be completely accurate at when the maze solver actually does the maze so its a little more complicated than coding it before hand or using an algorithm that uses omniscient view of the maze and needs to "half" human/doable if you get what I mean...
While the robot/robot programmer will be supplied with a map of the mine for each rescue, the map may be out of date due to new mining or due to damage from the event.
For this application the robot is required to first of all locate all the rescue areas and determine if they are occupied. The robot will have to be totally autonomous. When these have been investigated the robot should then do a check of all the passageways for humans.
The robot should also be self-navigating. While a GPS system is a natural choice, in this case it cannot be used due to the thickness of the rock ceiling preventing any GPS signals, therefore you are also required to design a navigation system for the robot. For this end, small hardware alterations, such as additional sensors or deployable radio beacons, may be added to the robot. Please note that at least one of the shelters is located in an “Island”.
Assuming you are not looking for a shortest path to get out of the maze - just any path, create some order for your Islands: island1,island2,...,islandk.
Now, assuming you know how to solve a "regular" maze, you need to find paths from:
start->island1, island1->island2, ...., islandk->end
Some comments:
Solving "regular" maze can be done using BFS, or DFS (the later is not optimal though).
If you are looking for a more efficient solution, you can use
all-to-all shortest path rather than multiple "regular" maze solving.
If you are looking for a shortest path, this is a variation of Traveling Salesman Problem. Possible solution is discussed here.
If you want to also pass through all passages, you can do it using a DFS that continues until all nodes are discovered. Again, this won't be the shortest such path, but finding the shortest path is going to be NP-Hard.
This problem is related to the Travelling salesman problem problem, which is NP-Hard, so I wouldn't expect any quick solutions for larger number of islands.
For small number of islands, you can do this: for each 2 islands (including your starting position), compute the shortest path between them. Since you are interested in distances between relatively low fraction of vertices, I recommend using the Dijkstra's algorithm, since it is relatively easy and can be done by hand (for reasonably large graf).
Now you have the shortest distances between all points of interest and it is when you need to find the Hamiltonian optimal path between them. Fortunately, the distances satisfy a metric, so you can have 2-approximation (easy, even by hand) or even 3/2-approximation (not so easy) algorithms, but no polynomial algorithms are known.
For perfect solution with 3 islands you have to check only 6 ways how to visit them (easy), but for 6 islands you can visit them in 720 ways, and for n in n! so good luck with that.

Is A* really better than Dijkstra in real-world path finding?

I'm developing a path finding program. It is said theoretically that A* is better than Dijkstra. In fact, the latter is a special case of the former. However, when testing in the real world, I begin to doubt that is A* really better?
I used data of New York City, from 9th DIMACS Implementation Challenge - Shortest Paths, in which each node's latitude and longitude is given.
When applying A*, I need to calculate the spherical distance between two points, using Haversine Formula, which involves sin, cos, arcsin, square root. All of those are very very time-consuming.
The result is,
Using Dijkstra: 39.953 ms, expanded 256540 nodes.
Using A*, 108.475 ms, expanded 255135 nodes.
Noticing that in A*, we expanded less 1405 nodes. However, the time to compute a heuristic is much more than that saved.
To my understanding, the reason is that in a very large real graph, the weight of the heuristic will be very small, and the effect of it can be ignored, while the computing time is dominating.
I think you're more or less missing the point of A*. It is intended to be a very performant algorithm, partially by intentionally doing more work but with cheap heuristics, and you're kind of tearing that to bits when burdening it with a heavy extremely accurate prediction heuristic.
For node selection in A* you should just use an approximation of distance. Simply using (latdiff^2)+(lngdiff^2) as the approximated distance should make your algorithm much more performant than Dijkstra, and not give much worse results in any real world scenario. Actually the results should even be exactly the same if you do calculate the travelled distance on a selected node properly with the Haversine. Just use a cheap algorithm for selecting potential next traversals.
A* can be reduced to Dijkstra by setting some trivial parameters. There are three possible ways in which it does not improve on Dijkstra:
The heuristic used is incorrect: it is not a so-called admissible heuristic. A* should use a heuristic which does not overestimate the distance to the goal as part of its cost function.
The heuristic is too expensive to calculate.
The real-world graph structure is special.
In the case of the latter you should try to build on existing research, e.g. "Highway Dimension, Shortest Paths, and Provably Efficient Algorithms" by Abraham et al.
Like everything else in the universe, there's a trade-off. You can take dijkstra's algorithm to precisely calculate the heuristic, but that would defeat the purpose wouldn't it?
A* is a great algorithm in that it makes you lean towards the goal having a general idea of which direction to expand first. That said, you should keep the heuristic as simple as possible because all you need is a general direction.
In fact, more precise geometric calculations that are not based on the actual data do not necessarily give you a better direction. As long as they are not based on the data, all those heuristics give you just a direction which are all equally (in)correct.
In general A* is more performant than Dijkstra's but it really depends the heuristic function you use in A*. You'll want an h(n) that's optimistic and finds the lowest cost path, h(n) should be less than the true cost. If h(n) >= cost, then you'll end up in a situation like the one you've described.
Could you post your heuristic function?
Also bear in mind that the performance of these algorithms highly depends on the nature of the graph, which is closely related to the accuracy of the heuristic.
If you compare Dijkstra vs A* when navigating out of a labyrinth where each passage corresponds to a single graph edge, there is probably very little difference in performance. On the other hand, if the graph has many edges between "far-away" nodes, the difference can be quite dramatic. Think of a robot (or an AI-controlled computer game character) navigating in almost open terrain with a few obstacles.
My point is that, even though the New York dataset you used is definitely a good example of "real-world" graph, it is not representative of all real-world path finding problems.

Travelling Salesman Constructive Heuristics

Say, we have a circular list representing a solution of the traveling salesman problem. This list is initially empty.
If the user is allowed to enter a city and it's coordinate one by one, what heuristics could be used to insert those coordinates into the already existing tour?
An example uses the nearest neighbor heuristic : it inserts the new coordinate after the nearest coordinate already in the tour.
What are some other options (pseudo-code if possible).
There are plenty of construction heuristics you can use, such as First Fit, First Fit Decreasing, Best Fit, Best Fit Decreasing and Cheapest Insertion.
Those constructions heuristics are applied on bin packing normally, but they can be converted to TSP too. Documentation about those heuristics is here.
Since you're only inserting 1 unassigned entity at at time, all of these basically revert to what you call nearest neighbor heuristic (with a slight variation on ties), but note that that is not what they usually call Nearest Neighbor. Nearest Neighbor always adds to the end of the line, the nearest neighbor of all unassigned entities.
Now, what you really want, is a decent solution, without having to restart your entire construction heuristics. That's harder: welcome to repeated planning and real-time planning (and this documentation). I am working on a open source example for TSP and vehicle routing that does real-time planning.
You can of course generalize the idea you have mentioned:
Define k'th_path(v) = minimum weight of a path including max{k,not_visited cities} cities
Note that calculating the k'th path is O(|V|^k) [this bound is not tight]
Special cases:
For k=1 you get the nearest neighbor, as you suggested.
for k=|V| you get an optimal solution [note it will be very expansive to calculate].
There are not other heuristic because TSP is always about to find the nearest coordinate. At least I don't know an algorithm that can insert a coordinate and knows the nearest coordinate but there are plenty algorithm to find a good tour. A good heuristic is for example the Christofides algorithm, it works only in euklidian space but it give you a guarantee of the solution to be within 3/2 of the optimum. It's not very easy to code. Especially the edmond blossom v algorithm is for an expert skill. The importance of a guarantee isn't high enough because how would you explain that your method can deliver non-sense in some rare situation?

Fastest path to walk over all given nodes

I'm coding a simple game and currently doing the AI part. NPC gets a list of his 'interest points' which he needs to visit. Each point has a coordinate on the map. I need to find a fastest path for the character to visit all of the given points.
As far as I understand it, the task could be described as 'finding fastest traverse path in a strongly connected weighted undirected graph'.
I'd like to get either the name of some algorithm to calculate that or if there is no name - some keypoints on programming it myself.
Thanks in advance.
This is very similar to the Travelling Salesman problem, although I'm not going to try to prove equivalency offhand. The TSP is NP-complete, which means that solving the problem exactly may be impractical, depending on the number of interest points. There are approximation algorithms that you may find more useful.
See previous post regarding tree traversals:
Tree traversal algorithm for directory structures with a lot of files
I would use algorithm like: ant algorithm.
Not directly on point but what I did in an MMO emulator was to store waypoint indices along with the rest of the pathing data. If your requirement is to demonstrate solutions to TSP then ignore this. If not, it's worth consideration IMO.
In my case it was the best solution as otherwise the server could have potentially hundreds of mobs (re)spawning and along with all the other AI logic, would have to burn cycles computing route logic.

Algorithm: shortest path between all points

Suppose I have 10 points. I know the distance between each point.
I need to find the shortest possible route passing through all points.
I have tried a couple of algorithms (Dijkstra, Floyd Warshall,...) and they all give me the shortest path between start and end, but they don't make a route with all points on it.
Permutations work fine, but they are too resource-expensive.
What algorithms can you advise me to look into for this problem? Or is there a documented way to do this with the above-mentioned algorithms?
Have a look at travelling salesman problem.
You may want to look into some of the heuristic solutions. They may not be able to give you 100% exact results, but often they can come up with good enough solutions (2 to 3 % away from optimal solutions) in a reasonable amount of time.
This is obviously Travelling Salesman problem. Specifically for N=10, you can either try the O(N!) naive algorithm, or using Dynamic Programming, you can reduce this to O(n^2 2^n), by trading space.
Beyond that, since this is an NP-hard problem, you can only hope for an approximation or heuristic, given the usual caveats.
As others have mentioned, this is an instance of the TSP. I think Concord, developed at Georgia Tech is the current state-of-the-art solver. It can handle upwards of 10,000 points within a few seconds. It also has an API that's easy to work with.
I think this is what you're looking for, actually:
Floyd Warshall
In computer science, the Floyd–Warshall algorithm (sometimes known as
the WFI Algorithm[clarification needed], Roy–Floyd algorithm or just
Floyd's algorithm) is a graph analysis algorithm for finding shortest
paths in a weighted graph (with positive or negative edge weights). A
single execution of the algorithm will find the lengths (summed
weights) of the shortest paths between all pairs of vertices though it
does not return details of the paths themselves
In the "Path reconstruction" subsection it explains the data structure you'll need to store the "paths" (actually you just store the next node to go to and then trivially reconstruct whichever path is required as needed).

Resources