How to design an optimal trip planner - algorithm

I'm developing a trip planer program. Each city has a property called rateOfInterest. Each road between two cities has a time cost. The problem is, given the start city, and the specific amount of time we want to spend, how to output a path which is most interesting (i.e. the sum of the cities' rateOfInterest). I'm thinking using some greedy algorithm, but is there any algorithm that can guarantee an optimal path?
EDIT Just as #robotking said, we allow visit places multiple times and it's only interesting the first visit. We have 50 cities, and each city approximately has 5 adjacent cities. The cost function on each edge is either time or distance. We don't have to visit all cities, just with the given cost function, we need to return an optimal partial trip with highest ROI. I hope this makes the problem clearer!

This sounds very much like an instance of a TSP in a weighted manner meaning there is some vertices that are more desirable than other...
Now you could find an optimal path trying every possible permutation (using backtracking with some pruning to make it faster) depending on the number of cities we are talking about. See the TSP problem is a n! problem so after n > 10 you can forget it...
If your number of cities is not that small then finding an optimal path won't be doable so drop the idea... however there is most likely a good enough heuristic algorithm to approximate a good enough solution.
Steven Skiena recommends "Simulated Annealing" as the heuristics of choice to approximate such hard problem. It is very much like a "Hill Climbing" method but in a more flexible or forgiving way. What I mean is that while in "Hill Climbing" you always only accept changes that improve your solution, in "Simulated Annealing" there is some cases where you actually accept a change even if it makes your solution worse locally hoping that down the road you get your money back...
Either way, whatever is used to approximate a TSP-like problem is applicable here.

From http://en.wikipedia.org/wiki/Travelling_salesman_problem, note that the decision problem version is "(where, given a length L, the task is to decide whether any tour is shorter than L)". If somebody gives me a travelling salesman problem to solve I can set all the cities to have the same rate of interest and then the decision problem is whether a most interesting path for time L actually visits all the cities and returns.
So if there was an efficient solution for your problem there would be an efficient solution for the travelling salesman problem, which is unlikely.
If you want to go further than a greedy search, some of the approaches of the travelling salesman problem may be applicable - http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.26.5150 describes "Iterated Local Search" which looks interesting, with reference to the TSP.

If you want optimality, use a brute force exhaustive search where the leaves are the one where the time run out. As long as the expected depth of the search tree is less than 10 and worst case less than 15 you can produce a practical algorithm.
Now if you think about the future and expect your city network to grow, then you cannot ensure optimality. In this case you are dealing with a local search problem.

Related

The greedy algorithm and implementation

Hello I've just started learning greedy algorithm and I've first looked at the classic coin changing problem. I could understand the greediness (i.e., choosing locally optimal solution towards a global optimum.) in the algorithm as I am choosing the highest value of coin such that the
sum+{value of chosen coin}<=total value . Then I started to solve some greedy algorithm problem in some sites. I could solve most of the problems but couldn't figure out exactly where the greediness is applied in the problem. I coded the only solution i could think of, for the problems and got it accepted. The editorials also show the same way of solving problem but i could not understand the application of greedy paradigm in the algorithm.
Are greedy algorithms the only way of solving a particular range of problems? Or they are one way of solving problems which could be more efficient?
Could you give me pseudo codes of a same problem with and without the application of greedy paradigm?
There are lots of real life examples of greedy algorithms. One of the obvious is the coin changing problem, to make change in a certain currency, we repeatedly dispense the largest denomination, thus , to give out seventeen dollars and sixty one cents in change, we give out a ten-dollar bill, a five-dollar bill, two one-dollar bills, two quarters , one dime, and one penny. By doing this, we are guaranteed to minimize the number of bills and coins. This algorithm does not work in all monetary systems...more here
I think that there is always another way to solve a problem, but sometimes, as you've stated, it probably will be less efficient.
For example, you can always check all the options (coins permutations), store the results and choose the best, but of course the efficiency is terrible.
Hope it helps.
Greedy algorithms are just a class of algorithms that iteratively construct/improve a solution.
Imagine the most famous problem - TSP. You can formulate it as Integer Linear Programming problem and give it to an ILP solver and it will give you globally optimal solution (if it has enought time). But you could do it in a greedy way. You can construct some solution (e.g. randomly) and then look for changes (e.g. switch an order of two cities) that improve your solution and you keep doing these changes until there is no such change possible.
So the bottom line is: greedy algorithms are only a method of solving hard problems efficiently (in time, but not necessary in the quality of solution), but there are other classes of algorithms for solving such problems.
For coins, greedy algorithm is also the optimal one, therefore the "greediness" is not as visible as with some other problems.
In some cases you prefer solution, which is not the best one, but you can compute it much faster (computing the real best solution can takes years for example).
Then you choose heuristic, that should give you the best results - based on average input data, its structure and what you want to want to accomplish.
On wikipedia, there is good solution on finding the biggest sum of numbers in tree
Imagine that you have for example 2^1000 nodes in this tree. To find optimal solution, you have to visit each node once. Personal computer today is not able to do this in your lifetime, therefore you want some heuristic. Greedy alghoritm however find solution in just 1000 steps (which does not take more than one milisecond)

Heuristics in Graph Traversal

I'm trying to use A* to find the optimal path in a Graph.
The context is that a tourist starts at his hotel, visits landmarks and returns to his hotel at the end of the day. The nodes (landmarks) have two values: importance and time spent. Edges have two values: time spent and cost(currency).
I want to minimize cost, maximize importance and make sure total time is under a certain value. I could find a balance between cost, importance and time for the past path-cost. But what about the future cost? I know how to do it with simpler pathfinding, but is there a method I could follow to find the heuristic I need?
You have a multi-dimensional objective (cost and importance) and so your problem is ill-posed because you haven't defined how to trade off cost and importance while you are "minimizing" cost and "maximizing" importance. You'll only get a partial ordering on sites to vist, because some pairs of sites may be incomparable because one site may have a higher cost and higher importance, while the other may have lower cost and lower importance. Look up multi-objective knapsack problem if you want concrete ways to formalize and solve your problem. And beware -- it can get a bit hairy the deeper you go into the literature.

In regards to genetic algorithms

Currently, I'm studying genetic algorithms (personal, not required) and I've come across some topics I'm unfamiliar or just basically familiar with and they are:
Search Space
The "extreme" of a Function
I understand that one's search space is a collection of all possible solutions but I also wish to know how one would decide the range of their search space. Furthermore I would like to know what an extreme is in relation to functions and how it is calculated.
I know I should probably understand what these are but so far I've only taken Algebra 2 and Geometry but I have ventured into physics, matrix/vector math, and data structures on my own so please excuse me if I seem naive.
Generally, all algorithms which are looking for a specific item in a collection of items are called search algorithms. When the collection of items is defined by a mathematical function (opposed to existing in a database), it is called a search space.
One of the most famous problems of this kind is the travelling salesman problem, where an algorithm is sought which will, given a list of cities and their distances, find the shortest route for visiting each city only once. For this problem, the exact solution can be found only by examining all possible routes (the entire search space), and finding the shortest one (the route which has the minimum distance, which is the extreme value in the search space). The best time complexity of such an algorithm (called an exhaustive search) is exponential (although it is still possible that there may be a better solution), meaning that the worst-case running time increases exponentially as the number of cities increases.
This is where genetic algorithms come into play. Similar to other heuristic algorithms, genetic algorithms try to get close to the optimal solution by improving a candidate solution iteratively, with no guarantee that an optimal solution will actually be found.
This iterative approach has the problem that the algorithm can easily get "stuck" in a local extreme (while trying to improve a solution), not knowing that there is a potentially better solution somewhere further away:
The figure shows that, in order to get to the actual, optimal solution (the global minimum), an algorithm currently examining the solution around the local minimum needs to "jump over" a large maximum in the search space. A genetic algorithm will rapidly locate such local optimums, but it will usually fail to "sacrifice" this short-term gain to get a potentially better solution.
So, a summary would be:
exhaustive search
examines the entire search space (long time)
finds global extremes
heuristic (e.g. genetic algorithms)
examines a part of the search space (short time)
finds local extremes
Genetic algorithms are not good in tuning to a local optimum. If you want to find a global optimum at least you should be able to approach or find a strategy to approach the local optimum. Recently some improvements have been developed to better find the local optima.
"GENETIC ALGORITHM FOR INFORMATIVE BASIS FUNCTION SELECTION
FROM THE WAVELET PACKET DECOMPOSITION WITH APPLICATION TO
CORROSION IDENTIFICATION USING ACOUSTIC EMISSION"
http://gbiomed.kuleuven.be/english/research/50000666/50000669/50488669/neuro_research/neuro_research_mvanhulle/comp_pdf/Chemometrics.pdf
In general, "search space" means, what type of answers are you looking for. For example, if you are writing a genetic algorithm which builds bridges, tests them out, and then builds more, the answers you are looking for are bridge models (in some form). As another example, if you're trying to find a function which agrees with a set of sample inputs on some number of points, you might try to find a polynomial which has this property. In this instance your search space might be polynomials. You might make this simpler by putting a bound on the number of terms, maximum degree of the polynomial, etc... So you could specify that you wanted to search for polynomials with integer exponents in the range [-4, 4]. In genetic algorithms, the search space is the set of possible solutions you could generate. In genetic algorithms you need to carefully limit your search space so you avoid answers which are completely dumb. At my former university, a physics student wrote a program which was a GA to calculate the best configuration of atoms in a molecule to have low energy properties: they found a great solution having almost no energy. Unfortunately, their solution put all the atoms at the exact center of the molecule, which is physically impossible :-). GAs really hone in on good solutions to your fitness functions, so it's important to choose your search space so that it doesn't produce solutions with good fitness but are in reality "impossible answers."
As for the "extreme" of a function. This is simply the point at which the function takes its maximum value. With respect to genetic algorithms, you want the best solution to the problem you're trying to solve. If you're building a bridge, you're looking for the best bridge. In this scenario, you have a fitness function that can tell you "this bridge can take 80 pounds of weight" and "that bridge can take 120 pounds of weight" then you look around for solutions which have higher fitness values than others. Some functions have simple extremes: you can find the extreme of a polynomial using simple high school calculus. Other functions don't have a simple way to calculate their extremes. Notably, highly nonlinear functions have extremes which might be difficult to find. Genetic algorithms excel at finding these solutions using a clever search technique which looks around for high points and then finds others. It's worth noting that there are other algorithms that do this as well, hill climbers in particular. The things that make GAs different is that if you find a local maximum, other types of algorithms can get "stuck," blinded by a locally good solution, so that they never see a possibly much better solution farther away in the search space. There are other ways to adapt hill climbers to this as well, simulated annealing, for one.
The range space usually requires some intuitive understanding of the problem you're trying to solve-- some expertise in the domain of the problem. There's really no guaranteed method to pick the range.
The extremes are just the minimum and maximum values of the function.
So for instance, if you're coding up a GA just for practice, to find the minimum of, say, f(x) = x^2, you know pretty well that your range should be +/- something because you already know that you're going to find the answer at x=0. But then of course, you wouldn't use a GA for that because you already have the answer, and even if you didn't, you could use calculus to find it.
One of the tricks in genetic algorithms is to take some real-world problem (often an engineering or scientific problem) and translate it, so to speak, into some mathematical function that can be minimized or maximized. But if you're doing that, you probably already have some basic notion where the solutions might lie, so it's not as hopeless as it sounds.
The term "search space" does not restrict to genetic algorithms. I actually just means the set of solutions to your optimization problem. An "extremum" is one solution that minimizes or maximizes the target function with respect to the search space.
Search space simply put is the space of all possible solutions. If you're looking for a shortest tour, the search space consists of all possible tours that can be formed. However, beware that it's not the space of all feasible solutions! It only depends on your encoding. If your encoding is e.g. a permutation, than the search space is that of the permutation which is n! (factorial) in size. If you're looking to minimize a certain function the search space with real valued input the search space is bounded by the hypercube of the real valued inputs. It's basically infinite, but of course limited by the precision of the computer.
If you're interested in genetic algorithms, maybe you're interested in experimenting with our software. We're using it to teach heuristic optimization in classes. It's GUI driven and windows based so you can start right away. We have included a number of problems such as real-valued test functions, traveling salesman, vehicle routing, etc. This allows you to e.g. look at how the best solution of a certain TSP is improving over the generations. It also exposes the problem of parameterization of metaheuristics and lets you find better parameters that will solve the problems more effectively. You can get it at http://dev.heuristiclab.com.

better heuristic then A*

I am enrolled in Stanford's ai-class.com and have just learned in my first week of lecture about a* algorithm and how it's better used then other search algo.
I also show one of my class mate implement it on 4x4 sliding block puzzle which he has published at: http://george.mitsuoka.org/StanfordAI/slidingBlocks/
While i very much appreciate and thank George to implement A* and publishing the result for our amusement.
I (and he also) were wondering if there is any way to make the process more optimized or if there is a better heuristic A*, like better heuristic function than the max of "number of blocks out of place" or "sum of distances to goals" that would speed things up?
and Also if there is a better algo then A* for such problems, i would like to know about them as well.
Thanks for the help and in case of discrepancies, before down grading my profile please allow me a chance to upgrade my approach or even if req to delete the question, as am still learning the ways of stackoverflow.
It depends on your heuristic function. for example, if you have a perfect heuristic [h*], then a greedy algorithm(*), will yield better result then A*, and will still be optimal [since your heuristic is perfect!]. It will develop only the nodes needed for the solution. Unfortunately, it is seldom the case that you have a perfect heuristic.
(*)greedy algorithm: always develop the node with the lowest h value.
However, if your heuristic is very bad: h=0, then A* is actually a BFS! And A* in this case will develop O(B^d) nodes, where B is the branch factor and d is the number of steps required for solving.
In this case, since you have a single target function, a bi-directional search (*) will be more efficient, since it needs to develop only O(2*B^(d/2))=O(B^(d/2)) nodes, which is much less then what A* will develop.
bi directional search: (*)run BFS from the target and from the start nodes, each iteration is one step from each side, the algorithm ends when there is a common vertex in both fronts.
For the average case, if you have a heuristic which is not perfect, but not completely terrbile, A* will probably perform better then both solutions.
Possible optimization for average case: You also can run bi-directional search with A*: from the start side, you can run A* with your heuristic, and a regular BFS from the target side. Will it get a solution faster? no idea, you should probably benchmark the two possibilities and find which is better. However, the solution found with this algorithm will also be optimal, like BFS and A*.
The performance of A* is based on the quality of the expected cost heuristic, as you learned in the videos. Getting your expected cost heuristic to match as closely as possible to the actual cost from that state will reduce the total number of states that need to be expanded. There are also a number of variations that perform better under certain circumstances, like for instance when faced with hardware restrictions in large state space searching.

What is the difference between a heuristic and an algorithm?

What is the difference between a heuristic and an algorithm?
An algorithm is the description of an automated solution to a problem. What the algorithm does is precisely defined. The solution could or could not be the best possible one but you know from the start what kind of result you will get. You implement the algorithm using some programming language to get (a part of) a program.
Now, some problems are hard and you may not be able to get an acceptable solution in an acceptable time. In such cases you often can get a not too bad solution much faster, by applying some arbitrary choices (educated guesses): that's a heuristic.
A heuristic is still a kind of an algorithm, but one that will not explore all possible states of the problem, or will begin by exploring the most likely ones.
Typical examples are from games. When writing a chess game program you could imagine trying every possible move at some depth level and applying some evaluation function to the board. A heuristic would exclude full branches that begin with obviously bad moves.
In some cases you're not searching for the best solution, but for any solution fitting some constraint. A good heuristic would help to find a solution in a short time, but may also fail to find any if the only solutions are in the states it chose not to try.
An algorithm is typically deterministic and proven to yield an optimal result
A heuristic has no proof of correctness, often involves random elements, and may not yield optimal results.
Many problems for which no efficient algorithm to find an optimal solution is known have heuristic approaches that yield near-optimal results very quickly.
There are some overlaps: "genetic algorithms" is an accepted term, but strictly speaking, those are heuristics, not algorithms.
Heuristic, in a nutshell is an "Educated guess". Wikipedia explains it nicely. At the end, a "general acceptance" method is taken as an optimal solution to the specified problem.
Heuristic is an adjective for
experience-based techniques that help
in problem solving, learning and
discovery. A heuristic method is used
to rapidly come to a solution that is
hoped to be close to the best possible
answer, or 'optimal solution'.
Heuristics are "rules of thumb",
educated guesses, intuitive judgments
or simply common sense. A heuristic is
a general way of solving a problem.
Heuristics as a noun is another name
for heuristic methods.
In more precise terms, heuristics
stand for strategies using readily
accessible, though loosely applicable,
information to control problem solving
in human beings and machines.
While an algorithm is a method containing finite set of instructions used to solving a problem. The method has been proven mathematically or scientifically to work for the problem. There are formal methods and proofs.
Heuristic algorithm is an algorithm that is able to produce an
acceptable solution to a problem in
many practical scenarios, in the
fashion of a general heuristic, but
for which there is no formal proof of
its correctness.
An algorithm is a self-contained step-by-step set of operations to be performed 4, typically interpreted as a finite sequence of (computer or human) instructions to determine a solution to a problem such as: is there a path from A to B, or what is the smallest path between A and B. In the latter case, you could also be satisfied with a 'reasonably close' alternative solution.
There are certain categories of algorithms, of which the heuristic algorithm is one. Depending on the (proven) properties of the algorithm in this case, it falls into one of these three categories (note 1):
Exact: the solution is proven to be an optimal (or exact solution) to the input problem
Approximation: the deviation of the solution value is proven to be never further away from the optimal value than some pre-defined bound (for example, never more than 50% larger than the optimal value)
Heuristic: the algorithm has not been proven to be optimal, nor within a pre-defined bound of the optimal solution
Notice that an approximation algorithm is also a heuristic, but with the stronger property that there is a proven bound to the solution (value) it outputs.
For some problems, noone has ever found an 'efficient' algorithm to compute the optimal solutions (note 2). One of those problems is the well-known Traveling Salesman Problem. Christophides' algorithm for the Traveling Salesman Problem, for example, used to be called a heuristic, as it was not proven that it was within 50% of the optimal solution. Since it has been proven, however, Christophides' algorithm is more accurately referred to as an approximation algorithm.
Due to restrictions on what computers can do, it is not always possible to efficiently find the best solution possible. If there is enough structure in a problem, there may be an efficient way to traverse the solution space, even though the solution space is huge (i.e. in the shortest path problem).
Heuristics are typically applied to improve the running time of algorithms, by adding 'expert information' or 'educated guesses' to guide the search direction. In practice, a heuristic may also be a sub-routine for an optimal algorithm, to determine where to look first.
(note 1): Additionally, algorithms are characterised by whether they include random or non-deterministic elements. An algorithm that always executes the same way and produces the same answer, is called deterministic.
(note 2): This is called the P vs NP problem, and problems that are classified as NP-complete and NP-hard are unlikely to have an 'efficient' algorithm. Note; as #Kriss mentioned in the comments, there are even 'worse' types of problems, which may need exponential time or space to compute.
There are several answers that answer part of the question. I deemed them less complete and not accurate enough, and decided not to edit the accepted answer made by #Kriss
Actually I don't think that there is a lot in common between them. Some algorithm use heuristics in their logic (often to make fewer calculations or get faster results). Usually heuristics are used in the so called greedy algorithms.
Heuristics is some "knowledge" that we assume is good to use in order to get the best choice in our algorithm (when a choice should be taken). For example ... a heuristics in chess could be (always take the opponents' queen if you can, since you know this is the stronger figure). Heuristics do not guarantee you that will lead you to the correct answer, but (if the assumptions is correct) often get answer which are close to the best in much shorter time.
An Algorithm is a clearly defined set of instructions to solve a problem, Heuristics involve utilising an approach of learning and discovery to reach a solution.
So, if you know how to solve a problem then use an algorithm. If you need to develop a solution then it's heuristics.
Heuristics are algorithms, so in that sense there is none, however, heuristics take a 'guess' approach to problem solving, yielding a 'good enough' answer, rather than finding a 'best possible' solution.
A good example is where you have a very hard (read NP-complete) problem you want a solution for but don't have the time to arrive to it, so have to use a good enough solution based on a heuristic algorithm, such as finding a solution to a travelling salesman problem using a genetic algorithm.
Algorithm is a sequence of some operations that given an input computes something (a function) and outputs a result.
Algorithm may yield an exact or approximate values.
It also may compute a random value that is with high probability close to the exact value.
A heuristic algorithm uses some insight on input values and computes not exact value (but may be close to optimal).
In some special cases, heuristic can find exact solution.
A heuristic is usually an optimization or a strategy that usually provides a good enough answer, but not always and rarely the best answer. For example, if you were to solve the traveling salesman problem with brute force, discarding a partial solution once its cost exceeds that of the current best solution is a heuristic: sometimes it helps, other times it doesn't, and it definitely doesn't improve the theoretical (big-oh notation) run time of the algorithm
I think Heuristic is more of a constraint used in Learning Based Model in Artificial Intelligent since the future solution states are difficult to predict.
But then my doubt after reading above answers is
"How would Heuristic can be successfully applied using Stochastic Optimization Techniques? or can they function as full fledged algorithms when used with Stochastic Optimization?"
http://en.wikipedia.org/wiki/Stochastic_optimization
One of the best explanations I have read comes from the great book Code Complete, which I now quote:
A heuristic is a technique that helps you look for an answer. Its
results are subject to chance because a heuristic tells you only how
to look, not what to find. It doesn’t tell you how to get directly
from point A to point B; it might not even know where point A and
point B are. In effect, a heuristic is an algorithm in a clown suit.
It’s less predict- able, it’s more fun, and it comes without a 30-day,
money-back guarantee.
Here is an algorithm for driving to someone’s house: Take Highway 167
south to Puy-allup. Take the South Hill Mall exit and drive 4.5 miles
up the hill. Turn right at the light by the grocery store, and then
take the first left. Turn into the driveway of the large tan house on
the left, at 714 North Cedar.
Here’s a heuristic for getting to someone’s house: Find the last
letter we mailed you. Drive to the town in the return address. When
you get to town, ask someone where our house is. Everyone knows
us—someone will be glad to help you. If you can’t find anyone, call us
from a public phone, and we’ll come get you.
The difference between an algorithm and a heuristic is subtle, and the
two terms over-lap somewhat. For the purposes of this book, the main
difference between the two is the level of indirection from the
solution. An algorithm gives you the instructions directly. A
heuristic tells you how to discover the instructions for yourself, or
at least where to look for them.
They find a solution suboptimally without any guarantee as to the quality of solution found, it is obvious that it makes sense to the development of heuristics only polynomial. The application of these methods is suitable to solve real world problems or large problems so awkward from the computational point of view that for them there is not even an algorithm capable of finding an approximate solution in polynomial time.

Resources