Interpolation advice (linear, cubic?) [closed] - approximation

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I need to find good approximations of the points where an undefined function intersect a threshold value. I'm stepping through my space and whenever I find that two subsequent steps are on different sides of the threshold, I add a point somewhere in between:
(source: ning.com)
My first approach was to just pick the mid-point, but this is obviously a terrible solution:
(source: ning.com)
I'm now using linear interpolation which gives a reasonable result, but the underlying function will practically never be linear. Thus, this only works well if my stepsize is small enough:
(source: ning.com)
Sampling the base function can be quite expensive, but adding one or two additional samples in order to get a much better approximation is something I'd like to try. Is it possible to use Cubic interpolation here? Like so:
(source: ning.com)
Or are there better ways?
Much obliged,
David Rutten
ps. I'm writing in C#, but this is a language agnostic problem.

The magic word is "root solver"; a mathematical root is the value where the function equals zero. By adding/subtracting the threshold, you can use root finders.
If you have a clue what function you are interpolating you can set up a very fast root finder. If you don't have a clue as suggested by your post ("undefined"), the best method is "Brent's method", a combination of "secant method" and "bisection", or the "secant method" alone. Wikipedia has an entry for this method.
Contrary to your opinion it is not a good idea to use more complicated functions. The main performance hurdle are function evaluations which increase with more points/getting the derivative or more complex interpolation functions.
The Newton-Raphson method is VERY BAD if you are near a maximum/minimum/inflection point because the near zero derivative sends you far away from the point and there are some other problems with it. Do not use it until you know what you are doing.

Your last picture shows only three points, which only suffice to define a quadratic polynomial, not cubic. For cubic interpolation, you'll need four points. A cubic polynomial can be fitted in different ways; here are two.
The most straightforward way is to simply let the (unique) polynomial pass through all four points.
Another way is to use tangents. Again, we need four points. Let the left two points define a slope. Have the polynomial pass through the second point (in general, it doesn't pass through the first point), and match the computed slope in that point. And same on the right side for the fourth and third point.
By the way, any higher-order polynomial is probably a bad idea, because they tend to become very unstable in the presence of even a little bit of input noise.
If you give some more details about your problem domain, I might be able to give a more specific answer. For example, where do your data points come from, what kind of curve can you generally expect, and can you go back and sample more if required? I can provide equations and pseudo-code too, if needed.
Update: silly me left a sentence referring to two ways without typing them out. Typed them out now.

My maths is incredibly rusty, but you may find the Newton Raphson method gives you good results. In general it converges very quickly on an accurate solution, assuming the iteration begins "sufficiently near" the desired root.

Related

Upgrading a binary search algorithm to something more sophisticated

I solved an analytically unsolvable problem with numerical methods. I am searching for X, based on a desired Y value. f(x)=y is possible, x=f^-1(y) is not.
Currently the algorithm does a binary search. It starts at X=50%, calculates Y, returns Y_err=Y-Y_demand. It keeps stepping by intervals of 5% in the direction of shrinking Y_err, until Y_err changes sign, then it reduces the step, and steps in the opposite direction. This works, but it's embarassingly slow & inefficient.
Below, an example chart of x=f^-1(y). I chose one with high coefficients for the nonlinear part.
Example chart of x=f^-1(y)
It varies depending on coefficients, but always has this pseudoparabolic shape. It's of course nonlinear and even 9th order polynomial approximations don't offer satisfactory precision.
For simplicity's sake let's say the inflecton point is at X=50%, and am looking only for solutions where X>50%.
How should I proceed? I'm looking to optimise as much as possible. What are some good algorithms? Thanks.
EDIT: Thank you for pointing out that this is not in fact a binary search. I've updated the code and now have much better results by comparison.
I'm not sure if Newton's method applies here, or at least I don't know how to apply it. One-way trial and error is all I can do. When I have some more time I will try to learn and implement regula falsi.

How to find neighboring solutions in simulated annealing?

I'm working on an optimization problem and attempting to use simulated annealing as a heuristic. My goal is to optimize placement of k objects given some cost function. Solutions take the form of a set of k ordered pairs representing points in an M*N grid. I'm not sure how to best find a neighboring solution given a current solution. I've considered shifting each point by 1 or 0 units in a random direction. What might be a good approach to finding a neighboring solution given a current set of points?
Since I'm also trying to learn more about SA, what makes a good neighbor-finding algorithm and how close to the current solution should the neighbor be? Also, if randomness is involved, why is choosing a "neighbor" better than generating a random solution?
I would split your question into several smaller:
Also, if randomness is involved, why is choosing a "neighbor" better than generating a random solution?
Usually, you pick multiple points from a neighborhood, and you can explore all of them. For example, you generate 10 points randomly and choose the best one. By doing so you can efficiently explore more possible solutions.
Why is it better than a random guess? Good solutions tend to have a lot in common (e.g. they are close to each other in a search space). So by introducing small incremental changes, you would be able to find a good solution, while random guess could send you to completely different part of a search space and you'll never find an appropriate solution. And because of the curse of dimensionality random jumps are not better than brute force - there will be too many places to jump.
What might be a good approach to finding a neighboring solution given a current set of points?
I regret to tell you, that this question seems to be unsolvable in general. :( It's a mix between art and science. Choosing a right way to explore a search space is too problem specific. Even for solving a placement problem under varying constraints different heuristics may lead to completely different results.
You can try following:
Random shifts by fixed amount of steps (1,2...). That's your approach
Swapping two points
You can memorize bad moves for some time (something similar to tabu search), so you will use only 'good' ones next 100 steps
Use a greedy approach to generate a suboptimal placement, then improve it with methods above.
Try random restarts. At some stage, drop all of your progress so far (except for the best solution so far), raise a temperature and start again from a random initial point. You can do this each 10000 steps or something similar
Fix some points. Put an object at point (x,y) and do not move it at all, try searching for the best possible solution under this constraint.
Prohibit some combinations of objects, e.g. "distance between p1 and p2 must be larger than D".
Mix all steps above in different ways
Try to understand your problem in all tiniest details. You can derive some useful information/constraints/insights from your problem description. Assume that you can't solve placement problem in general, so try to reduce it to a more specific (== simpler, == with smaller search space) problem.
I would say that the last bullet is the most important. Look closely to your problem, consider its practical aspects only. For example, a size of your problems might allow you to enumerate something, or, maybe, some placements are not possible for you and so on and so forth. THere is no way for SA to derive such domain-specific knowledge by itself, so help it!
How to understand that your heuristic is a good one? Only by practical evaluation. Prepare a decent set of tests with obvious/well-known answers and try different approaches. Use well-known benchmarks if there are any of them.
I hope that this is helpful. :)

How to simplify a spline?

I have an interesting algorithmic challenge in a project I am working on. I have a sorted list of coordinate points pointing at buildings on either side of a street that, sufficiently zoomed in, looks like this:
I would like to take this zigzag and smooth it out to linearize the underlying street.
I can think of a couple of solutions:
Calculate centroids using rolling averages of six or so points, and use those.
Spline regression.
Is there a better or best way to approach this problem? (I am using Python 3.5)
Based on your description and your comments, you are looking for a line simplification algorithms.
Ramer-Doublas algorithm (suggested in the comment) is most probably the most well-known algorithm in this family, but there are many more.
For example Visvalingam’s algorithm works by removing the point with the smallest change, which is calculated by the smallest square of the triangle. This makes it super easy to code and intuitively understandable. If it is hard to read research paper, you can read this easy article.
Other algorithms in this family are:
Opheim
Lang
Zhao
Read about them, understand what are they trying to minify and select the most suitable for you.
Dali's post correctly surmises that a line simplification algorithm is useful for this task. Before posting this question I actually examined a few such algorithms but wasn't quite comfortable with them because even though they resulted in the simplified geometry that I liked, they didn't directly address the issue I had of points being on either side of the feature and never in the middle.
Thus I used a two-step process:
I computed the centroids of the polyline by using a rolling average of the coordinates of the five surrounding points. This didn't help much with smoothing the function but it did mostly succeed in remapping them to the middle of the street.
I applied Visvalingam’s algorithm to the new polyline, with n=20 points specified (using this wonderful implementation).
The result wasn't quite perfect but it was good enough:
Thanks for the help everyone!

Optimization of multivariate function with a initial solution close to the optimum

I was wondering if anyone knows which kind of algorithm could be use in my case. I already have run the optimizer on my multivariate function and found a solution to my problem, assuming that my function is regular enough. I slightly perturbate the problem and would like to find the optimum solution which is close to my last solution. Is there any very fast algorithm in this case or should I just fallback to a regular one.
We probably need a bit more information about your problem; but since you know you're near the right solution, and if derivatives are easy to calculate, Newton-Raphson is a sensible choice, and if not, Conjugate-Gradient may make sense.
If you already have an iterative optimizer (for example, based on Powell's direction set method, or CG), why don't you use your initial solution as a starting point for the next run of your optimizer?
EDIT: due to your comment: if calculating the Jacobian or the Hessian matrix gives you performance problems, try BFGS (http://en.wikipedia.org/wiki/BFGS_method), it avoids calculation of the Hessian completely; here
http://www.alglib.net/optimization/lbfgs.php you find a (free-for-non-commercial) implementation of BFGS. A good description of the details you will here.
And don't expect to get anything from finding your initial solution with a less sophisticated algorithm.
So this is all about unconstrained optimization. If you need information about constrained optimization, I suggest you google for "SQP".
there are a bunch of algorithms for finding the roots of equations. If you know approximately where the root is, there are algorithms that will get you arbitrarily close very quickly, in ln n time or better.
One is Newton's method
another is the Bisection Method
Note that these algorithms are for single variable functions, but can be expanded to multivariate functions.
Every minimization algorithm performs better (read: perform at all) if you have a good initial guess. The initial guess for the perturbed problem will be in your case the minimum point of the non perturbed problem.
Then, you have to specify your requirements: you want speed. What accuracy do you want ? Does space efficiency matters ? Most importantly: what information do you have: only the value of the function, or do you also have the derivatives (possibly second derivatives) ?
Some background on the problem would help too. Looking for a smooth function which has been discretized will be very different than looking for hundreds of unrelated parameters.
Global information (ie. is the function convex, is there a guaranteed global minimum or many local ones, etc) can be left aside for now. If you have trouble finding the minimum point of the perturbed problem, this is something you will have to investigate though.
Answering these questions will allow us to select a particular algorithm. There are many choices (and trade-offs) for multivariate optimization.
Also, which is quicker will very much depend on the problem (rather than on the algorithm), and should be determined by experimentation.
Thought I don't know much about using computers in this capacity, I remember an article that used neuroevolutionary techniques to find "best-fit" equations relatively efficiently, given a known function complexity (linear, Nth-polynomial, exponential, logarithmic, etc) and a set of point plots. As I recall it was one of the earliest uses of what we now know as computational neuroevolution; because the functional complexity (and thus the number of terms) of the equation is known and fixed, a static neural net can be used and seeded with your closest values, then "mutated" and tested for fitness, with heuristics to make new nets closer to existing nets with high fitness. Using multithreading, many nets can be created, tested and evaluated in parallel.

Project Euler #163 understanding

I spent quite a long time searching for a solution to this problem. I drew tons of cross-hatched triangles, counted the triangles in simple cases, and searched for some sort of pattern. Unfortunately, I hit the wall. I'm pretty sure my programming/math skills did not meet the prereq for this problem.
So I found a solution online in order to gain access to the forums. I didn't understand most of the methods at all, and some just seemed too complicated.
Can anyone give me an understanding of this problem? One of the methods, found here: http://www.math.uni-bielefeld.de/~sillke/SEQUENCES/grid-triangles (Problem C)
allowed for a single function to be used.
How did they come up with that solution? At this point, I'd really just like to understand some of the concepts behind this interesting problem. I know looking up the solution was not part of the Euler spirit, but I'm fairly sure I would not have solved this problem anyhow.
This is essentially a problem in enumerative combinatorics, which is the art of counting combinations of things. It's a beautiful subject, but probably takes some warming up to before you can appreciate the ninja tricks in the reference you gave.
On the other hand, the comments in the solutions thread for the problem indicate that many have solved the problem using a brute force approach. One of the most common tricks involves taking all possible combinations of three lines in the diagram, and seeing whether they yield a triangle that is inside the largest triangle.
You can cut down the search space considerably by noting that the lines are in one of six directions. Since a combination of lines that includes two lines that are parallel will not yield a triangle, you can iterate over line triples so that each line in the triple has a different direction.
Given three lines, calculate their intersection points. You will have three possibilities
1) the lines are coincident - they all intersect in a common point
2) two of the lines intersect at a point outside the triangle
3) all three points of intersection are distinct, and they all lie within the outer triangle
Just count the combos satisfying condition (3) and you are done. The number of line combos you have to test is O(n3), which is not prohibitive.
EDIT1: rereading your question, I get the impression you might be more interested in getting an explanation of the combinatorics solution/formula than an outline of a brute force approach. If that's the case, say so and I'll delete this answer. But I'd also say that the question in that case would not be suitable for this site.
EDIT2: See also a combinatorics solution by Bill Daly and others. It is mathematically a little gentler than the other one.
I have not solved this problem for project euler and am going off of the question and the solution you provided. In the case of the single function, the methodology presented was ultimately simple pattern finding. The solver broke the presented question into three parts, based on the types of triangles that were present from the intersections. It's a fairly standard aproach to this kind of problem, break the larger pattern down into smaller ones to make solving easier. The functions used to express the various forms of triangles I can only assume were generated with either a very acute pattern finding mind or some number theory / geometry. It is also beyond the scope of this explanation and my knowledge. This problem has nothing to do with programming. It's basically entirely mathematics. If you read through the site you liked you can see the logic that is gone through to reach the questions.

Resources