I'm running both gradient ascent and hill climbing on a landscape to assess which one can reach the greatest height in less steps. The outcome of my test is that hill climbing always manages to reach greater heights in less increments in comparison to gradient ascent. What's reason behind this. Because I thought gradient ascent would be more efficient. Does anyone have experience with these algorithms that may have something to say about the outcome.
Thank you.
Related
In hill climbing for 1 dimension, I try two neighbors - a small delta to the left and one to the right of my current point, and then keep the one that gives a higher value of the objective function. How do I extend it to an n-dimensional space? How does one define a neighbor for an n-dimensional space? Do I have to try 2^n neighbors (a delta applied to each of the dimension)?
You don't need to compare each pair of neighbors, you need to compute a set of neighbors, e.g. on a circle (sphere/ hypersphere in a higher dimensions) with a radius of delta, and then take the one with the highest values to "climb up". In any case you will discretize the neighborhood of your current solution and compute the score function for each neighbor. When you can differentiate your function, than, Gradient ascent/descent based algorithms may solve your problem:
1) Compute the gradient (direction of steepest ascent)
2) Go a small step into the direction of the gradient
3) Stop if solution does not change
A common problem with those algorithms is, that you often only find local maxima / minima. You can find a great overview on gradient descent/ascent algorithms here: http://sebastianruder.com/optimizing-gradient-descent/
If you are using IEEE-754 floating point numbers then the obvious answer is something like (2^52*(log_2(delta)+1023))^(n-1)+1 if delta>=2^(-1022) (more or less depending on your search space...) as that is the only way you can be certain that there are no more neighboring solutions with a distance of delta.
Even assuming you instead take a random fixed size sample of all points within a given distance of delta, lets say delta=.1, you would still have the problem that if the distance from the local optimum was .0001 the probability of finding an improvement in just 1 dimension would be less than .0001/.1/2=0.05% so you would need to take more and more random samples as you get closer to the local optimum (of which you don't know the value...).
Obviously hill climbing is not intended for the real number space or theoretical graph spaces with infinite degree. You should instead be using a global search algorithm.
One example of a multidimensional search algorithm which needs only O(n) neighbours instead of O(2^n) neighbours is the Torczon simplex method described in Multidirectional search: A direct search algorithm for parallel machines (1989). I chose this over the more widely known Nelder-Mead method because the Torczon simplex method has a convergence proof (convergence to a local optimum given some reasonable conditions).
For an optimization course we are encouraged to think of an algorithm for solving the Travelling Salesman Problem. I have an idea for a solution, and I think it's pretty rad. The steps for the algorithm are listed below. The order of the steps correspond to the order of the images.
Find the center of the points (red dot)
Travel around the center
Remove the largest line
The main sub-problem here is step 2. How does one go about traveling around a center point? Is there a good algorithm to do this?
I have another question about the algorithm's performance. How bad is this algorithm? Can you show me an example where it would give a worst-case answer?
I'm trying to understand the difference between these two algorithms and how they differ in solving a problem. I have looked at the algorithms and the internals of them. It would be good to hear from others who already experienced with them. Specially, I would like to know how they would behave differently on the same problem.
Thank you.
Difference
The main difference between the two is the direction in which they move to reach the local minima (or maxima).
In Hill Climbing we move only one element of the vector space, we then calculate the value of function and replace it if the value improves. we keep on changing one element of the vector till we can't move in a direction such that position improves. In 3D sapce the move can be visualised as moving in any one of the axial direction along x,y or z axis.
In Gradient Descent we take steps in the direction of negative gradient of current point to reach the point of minima (positive in case of maxima). For eg, in 3D Space the direction need not to be an axial direction.
In addition to radbrawler's answer, they are however similar in the greedy approach that both use to find local minima/maxima. I may consider Gradient descent as the continuous version of the discrete Hill climbing technique.
I want to implement the robot path planning program applying hill climbing algorithm.
I understand the basic of hill climbing algorithm but I cannot think any idea!
I also Googled the hill climbing algorithm, but I cannot find any information about robot path planning with hill climbing algorithm.
It is hard to implement start function, choosing neighbor function, and check/draw path using Bresenham's line algorithm.
It all depends on which pathfinding algorithm you are using of course, but essentially just add a multiplier to the amount of 'cost' associated with hill climbing. Something as simple as:
//Psuedo-code
MovementCost = FlatDistance + (HillClimbAltitude * 2)
//Where 2 is the 'effort' involved in climbing compared to a flat distance
Would suffice. This also easily accomodates for a cost reduction where a hill decline (downhill) is involved. You could fancy it up by making the cost increase based on the angle of the incline etc
I am studying and implementing the Greedy algorithm for active contours as described in paper by Donna Williams - A Fast Algorithm For Active Contours And Curvature Estimation.
One of the advantages over another implementation (by Kass et al.) should be uniform distribution of points along the contour curve. In every iteration each point tries to move itself so the distance to the previous point is as close to the average as possible.
The contour is expected to be drawn around an object in image and then to shrink around it until it is "attached" to the object edges.
But the problem is that the contour won't shrink. It evolves so that the points are equally spaced to each other along the contour, but the contour cannot shrink around the image object because distances between points would go below average and the algorithm would move them back.
Do you have any thoughts on this? What am I missing? Other implementations of active contours do shrink, but have another drawbacks and the Greedy algorithm is supposed to be better and more stable.
The researchers hardly emphasize the disadvantages of their new solution.
Dont trust the paper, too much, If you don't have heard from other sources, that
this algorithm works.
I would implement only an algorithm if it is well accepted in literature (or if I have invented it ;-) ).
Companies need a robust solution that works, a researcher must publish something new,
which may be less useable in practise, and sometimes only works well on specific test sets.