Please explain the difference between "hill climbing" and "greedy" algorithms.
It seems both are similiar, and I have a doubts that "hill climbing" is an algorithm; it seems to be an optimization. Is this correct?
Hill-climbing and greedy algorithms are both heuristics that can be used for optimization problems. In an optimization problem, we generally seek some optimum combination or ordering of problem elements. A given combination or ordering is a solution. In either case, a solution can evaluated to compare it against other solutions.
In a hill-climbing heuristic, you start with an initial solution. Generate one or more neighboring solutions. Pick the best and continue until there are no better neighboring solutions. This will generally yield one solution. In hill-climbing, we need to know how to evaluate a solution, and how to generate a "neighbor."
In a greedy heuristic, we need to know something special about the problem at hand. A greedy algorithm uses information to produce a single solution.
A good example of an optimization problem is a 0-1 knapsack. In this problem, there is a knapsack with a certain weight limit, and a bunch of items to put in the knapsack. Each item has a weight and a value. The object is to maximize the value of the objects in the knapsack while keeping the weight under the limit.
A greedy algorithm would pick objects of highest density and put them in until the knapsack is full. For example, compared to a brick, a diamond has a high value and a small weight, so we would put the diamond in first.
Here is an example of where a greedy algorithm would fail: say you have a knapsack with capacity 100. You have the following items:
Diamond, value 1000, weight 90 (density = 11.1)
5 gold coins, value 210, weight 20 (density each = 10.5)
The greedy algorithm would put in the diamond and then be done, giving a value of 1000. But the optimal solution would be to include the 5 gold coins, giving value 1050.
The hill-climbing algorithm would generate an initial solution--just randomly choose some items (ensure they are under the weight limit). Then evaluate the solution--that is, determine the value. Generate a neighboring solution. For example, try exchanging one item for another (ensure you are still under the weight limit). If this has a higher value, use this selection and start over.
Hill climbing is not a greedy algorithm.
Yes you are correct. Hill climbing is a general mathematical optimization technique (see: http://en.wikipedia.org/wiki/Hill_climbing). A greedy algorithm is any algorithm that simply picks the best choice it sees at the time and takes it.
An example of this is making change while minimizing the number of coins (at least with USD). You take the most of the highest denomination of coin, then the most of the next highest, until you reach the amount needed.
In this way, hill climbing is a greedy algorithm.
Related
Let’s use as an example the problem LeetCode 322. Coin Change
I know it is best solved by using Dynamic Programming, but I want to focus on my Brute Force solution:
class Solution:
def coinChange(self, coins: List[int], amount: int) -> int:
curr_min = float('inf')
def helper(amount):
nonlocal curr_min
if amount < 0:
return float('inf')
if amount == 0:
return 0
for coin in coins:
curr_min = min(curr_min, helper(amount-coin) + 1)
return curr_min
ans = helper(amount)
return -1 if ans == float('inf') else ans
The Recursion Tree looks like: Recursion Tree
I can say it is Divide and Conquer: We are dividing the problem into smaller sub-problems, solving individually and using those individual results to construct the result for the original problem.
I can also say it is Backtracking: we are enumerating all combinations of coin frequencies which satisfy the constraints.
I know both are implemented via Recursion, but I would like to know which paradigm my Brute Force solution belongs to: Divide and Conquer or Backtracking.
A complication in categorizing your algorithm is that there aren’t clear, well-defined boundaries between different classes of algorithms and different people might have slightly different definitions in mind.
For example, generally speaking, divide-and-conquer algorithms involve breaking the problem apart into non-overlapping subproblems. (See, for example, mergesort, quicksort, binary search, closest pair of points, etc.) In that sense, your algorithm doesn’t nicely map onto the divide-and-conquer paradigm, since the subproblems you’re considering involve some degree of overlap in the subproblems they solve. (Then again, not all divide-and-conquer algorithms have this property. See, for example, stoogesort.)
Similarly, backtracking algorithms usually, but not always, work by committing to a decision, recursively searching to see whether a solution exists given that decision, then unwinding the choice if it turns out not to lead to a solution. Your algorithm doesn’t have this property, since it explores all options and then takes the best. (When I teach intro programming, I usually classify algorithms this way. But my colleagues sometimes describe what you’re doing as backtracking!)
I would classify your algorithm as belonging to a different family of exhaustive search. The algorithm you’ve proposed essentially works by enumerating all possible ways of making change, then returning the one that uses the fewest coins. Exhaustive search algorithms are ones that work by trying all possible options and returning the best, and I think that’s the best way of classifying your strategy.
To me this doesn't fit with either paradigm.
Backtracking to me is associated with reaching a point where the candidate cannot be further developed, but here we develop it to it's end, infinity, and we don't throw it away, we use it in comparisons.
Divide and conquer I associate with a division into a relatively small number of candidate groups (the classic example is two, like binary search). To call each path in a recursion a group for the sake of Divide and Conquer would lose the latter's meaning.
The most practical answer is it doesn't matter.
Safest answer recursion. My best interpretation is that its backtracking.
I think the options here are recursion, backtracking, divide-and-conquer, and dynamic programming.
Recursion being the most general and encapsulating of backtracking, D&C, and DP. If indeed it has backtracking and D&C algorithms then recursion would be the best answer as it contains both.
In Skiena's ADM (Section 5.3.1), it says:
A typical divide-and-conquer algorithm breaks a given problem into a smaller pieces, each of which is of size n/b.
By this interpretation is doesn't meet the as we divide our solution by coins and each coin amount being a different size.
In Erickson's Algorithms (section 1.6), it says:
divide and conquer:
Divide the given instance of the problem into several independent smaller instances of exactly the same problem.
So in this case, according to the recursion tree, are not always independent (they overlap).
Which leaves backtracking. Erickson defines the 'recursive strategy' as:
A backtracking algorithm tries to construct a solution to a computational problem incrementally, one small piece at a time.
Which seems general enough to fit all DP problems under it. The provided code can be said it backtracks when a solution path fails.
Additionally, according to Wikipedia:
It is often the most convenient technique for parsing, for the knapsack problem and other combinatorial optimization problems.
Coin Change being an Unbounded Knapsack type problem, then it fits into the description of backtracking.
I'm reading up on the one dimensional bin packing problem and the different solutions that can be used to solve it.
Bin Packing Problem Definition: Given a list of objects and their weights, and a collection of bins of fixed size, find the smallest number of bins so that all of the objects are assigned to a bin.
Solutions I'm studying: Next Fit, First Fit, Best Fit, Worst Fit, First Fit Decreasing, Best Fit Decreasing
I notice that some articles I read call these "approximation algorithms", and others call these "heuristics". I know that there is a difference between approximation algorithms and heuristics:
Heuristic: With some hard problems, it's difficult to get an acceptable solution in a decent run time, so we can get an "okay" solution by applying some educated guesses, or arbitrarily choosing.
Approximation Algorithm: This gives an approximate solution, with some "guarantee" on it's performance (maybe a ratio, or something like that)
So, my question is are these solutions that I'm studying heuristic or approximation algorithms? I'm more inclined to believe that they are heuristic because we're choosing the next item to be placed in a bin by some "guess". We're not guaranteed of some optimal solution. So why do some people call them approximation algorithms?
If these aren't heuristic algorithms, then what are examples of heuristic algorithms to solve the bin packing problem?
An algorithm can be both a heuristic and an approximation algorithm -- the two terms don't conflict. If some "good but not always optimal" strategy (the heuristic) can be proven to be "not too bad" (the approximation guarantee), then it qualifies as both.
All of the algorithms you listed are heuristic because they prescribe a "usually good" strategy, which is the heuristic. For any of the algorithms where there is an approximation guarantee (the "error" must be bounded somehow), then you can also say it's an approximation algorithm.
I am trying to solve the N-puzzle using the A* algorithm with 3 different heuristic functions. I want to know how to compare each of the heuristics in terms of time complexity. The heuristics I am using are: manhattan distance , manhattan distance + linear conflict, N-max swap. And specifically for an 8-puzzle and an 15-puzzle.
The N-puzzle is, in general, NP hard to find the shortest solution, so no matter what heuristic you use it's unlikely you'll be able to find any difference in complexity between them, since you won't be prove the tightness of any bound.
If you restrict yourself to the 8-puzzle or 15-puzzle, an A* algorithm with any admissible heuristic will run in O(1) time since there are a finite (albeit large) number of board positions.
As #Harold said in his comment, the approach to compare time complexity of heuristic functions is tipically by experimental tests. In your case, generate a set of n random problems for the 8-puzzle and the 15-puzzle and solve them using the different heuristic functions. Things to be aware of are:
The comparison will always depend on several factors, like hardware expecs, programming language, your skills when implementing the algorithm, ...
Generally speaking, a more informed heuristic will always expand less nodes than a less informed one, and will probably be faster.
And finally, in order to compare the three heuristics for each problem set, I would suggest a graphic with average running times (repeat for example 5 times each problem) where:
The problems are in the x-axis sorted by difficulty.
The running times are in the y-axis for each heuristic function (perhaps in logarithmic scale if the difference between the alternatives cannot be easily seen).
and a similar graphic with the number of explored states.
I am going over a review for an upcoming test and was wondering if someone could restate part b of the question. This is the text from the review sheet passed out, but I am not sure what part b is asking exactly. I guess more straitly what does it mean by "yields a solution that is less than 1% of optimal for the 0/1 Knapsack problem."
a) Solve the following instance of the Knapsack problem, i.e., give fraction of each object chosen and value of optimal Knapsack. Show steps:
Capacity of Knapsack is C = 100
** Here he lists the objects, their values, and weights. in a table **
b) [10pts] Give an example with two objects that shows that the same greedy method used for the fractional Knapsack problem (slightly modified to leave out the last object chosen by the greedy method if it doesn’t fit) yields a solution that is less than 1% of optimal for the 0/1 Knapsack problem.
Usually the greedy heuristic works pretty well for the knapsack problem. If you just come up with a small problem instance at random, it's likely that applying the greedy heuristic will produce a good, or possibly even optimal solution. (The quality of a solution is measured by taking the total value of the objects it includes, and computing the ratio of that to the total value of the objects included in an optimal solution.)
This question is asking you to come up with a nasty problem instance (i.e. a list of objects with values and weights) that confuses the greedy heuristic so much that applying it yields a knapsack containing only 1% of the value that an optimal solution would contain.
I understand the part b) demanding to show that the greedy algorithm in not optimal and furthermore can yield less than 1% of the optimal value. This is the case already with small instances. Consider the following instance:
Item 1: profit 2, weight 1 (efficiency is 2/1 = 200/100 = 2)
Item 2: profit 400, weight 400 (efficiency is 400/400 = 1)
Knapsack capacity: 400
Note that the items are given in non-increasing order of efficiency, which is the order in which the greedy algorithm processes them. Now the greedy algorithm would chose item 1 since it fits, but now item 2 cannot be chosen. This yields a profit of 2. However, choice of item 2 yields a profit of 400. In total, the greedy algorithm yields a profit of 2 where the optimal value is at least 400, hence the greedy algorithm yields less than 1 percent of the optimal profit.
The Knapsack problem belongs to NP. I think second parts asks for a polynomial time approximation algorithm based on a greedy strategy(where the optimal solution is of exponential time complexity) which gives a solution which differs from the optimal solution by just 1%.
for example if the optimal answer is 100, the approximation algo should give a result of 99 or 101.
I'm developing a trip planer program. Each city has a property called rateOfInterest. Each road between two cities has a time cost. The problem is, given the start city, and the specific amount of time we want to spend, how to output a path which is most interesting (i.e. the sum of the cities' rateOfInterest). I'm thinking using some greedy algorithm, but is there any algorithm that can guarantee an optimal path?
EDIT Just as #robotking said, we allow visit places multiple times and it's only interesting the first visit. We have 50 cities, and each city approximately has 5 adjacent cities. The cost function on each edge is either time or distance. We don't have to visit all cities, just with the given cost function, we need to return an optimal partial trip with highest ROI. I hope this makes the problem clearer!
This sounds very much like an instance of a TSP in a weighted manner meaning there is some vertices that are more desirable than other...
Now you could find an optimal path trying every possible permutation (using backtracking with some pruning to make it faster) depending on the number of cities we are talking about. See the TSP problem is a n! problem so after n > 10 you can forget it...
If your number of cities is not that small then finding an optimal path won't be doable so drop the idea... however there is most likely a good enough heuristic algorithm to approximate a good enough solution.
Steven Skiena recommends "Simulated Annealing" as the heuristics of choice to approximate such hard problem. It is very much like a "Hill Climbing" method but in a more flexible or forgiving way. What I mean is that while in "Hill Climbing" you always only accept changes that improve your solution, in "Simulated Annealing" there is some cases where you actually accept a change even if it makes your solution worse locally hoping that down the road you get your money back...
Either way, whatever is used to approximate a TSP-like problem is applicable here.
From http://en.wikipedia.org/wiki/Travelling_salesman_problem, note that the decision problem version is "(where, given a length L, the task is to decide whether any tour is shorter than L)". If somebody gives me a travelling salesman problem to solve I can set all the cities to have the same rate of interest and then the decision problem is whether a most interesting path for time L actually visits all the cities and returns.
So if there was an efficient solution for your problem there would be an efficient solution for the travelling salesman problem, which is unlikely.
If you want to go further than a greedy search, some of the approaches of the travelling salesman problem may be applicable - http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.26.5150 describes "Iterated Local Search" which looks interesting, with reference to the TSP.
If you want optimality, use a brute force exhaustive search where the leaves are the one where the time run out. As long as the expected depth of the search tree is less than 10 and worst case less than 15 you can produce a practical algorithm.
Now if you think about the future and expect your city network to grow, then you cannot ensure optimality. In this case you are dealing with a local search problem.