Curve Fitting - DataSet - algorithm

I am given the following problem.
I have a Set of functions which are linear combinations of the following functions (f1,f2,f3....fn) and a noisy dataset of pairs (x,y). I want to find a function from my set which approximates the dataset the best.
They key to finding the solution is to find coefficients a1,a2...an so that the resulting function f=a1*f1...an*fn approximates y well given the input x. If the data wasnt noisy, I could just choose 5 points and solve the resulting system of equations but I dont think this would work well with noisy data.
How would one find the coefficients ?
(I am asking for an algorithm and not for a program, for example matlab, that does the job for me)

In presence of noise you need to find some approximation solution, that minimizes discrepancies with ideal solution.
Such best fit problems are usually solved by optimization algorithms.
Widely used one is Levenberg–Marquardt algorithm.

Related

SGM Disparity subpixel estimation - how to?

Some weeks ago I've implemented a simple block matching stereo algorithm but the results had been bad. So I've searched on the Internet to find better algorithms. There I found the semi global matching (SGM), published by Heiko Hirschmueller. It gets one of the best results in relation to its processing time.
I've implemented the algorithm and got really good results (compared to simple block matching) as you can see here:
I've reprojected the 2D points to 3D by using the calculated disparity values with the following result
At the end of SGM I have an array with aggregated costs for each pixel. The disparity is equivalent to the index with the lowest cost value.
The problem is, that searching for the minimum only returns discrete values. This results in individually layers in the point-cloud. In other words: Round surfaces are cut into many layers (see point cloud).
Heiko mentioned in his paper, that it would be easy to get sub-pixel accuracy by fitting a polynomial function into the cost array and take the lowest point as disparity.
The problem is not bound to stereo vision, so in other words the task is the following:
given: An array of values, representing a polynomial function.
wanted: The lowest point of the polynomial function.
I don't have any idea how to do this. I need a fast algorithm, because I have to run this code for every pixel in the Image
For example: 500x500 Pixel with 60-200 costs each => Algorithm has to run 15000000-50000000 times!!).
I don't need a real time solution! My current SGM implementation (L2R and R2L matching, no cuda or multi-threading yet) takes about 20 seconds to process an image with 500x500 pixels ;).
I don't ask for libraries! I try to implement my own independent computer vision library :).
Thank you for your help!
With kind regards,
Andreas
Finding the exact lowest point in a general polynomial is a hard problem, since it is equivalent to finding the root of the derivative of the polynomial. In particular, if your polynomial is of degree 6, the derivative is a quintic polynomial, which is known not to be solvable by radical. You therefore need to either: fit the function using restricted families for which computing the roots of the derivatives e.g. the integrals of prod_i(x-ri)p(q) where deg(p)<=4, OR
using an iterative method to find an APPROXIMATE minimum, (newton's method, gradient descent).

eps estimation for DBSCAN by not using the already suggested algorithm in the Original research paper

I have to implement DBSCAN using python, and the epsilon estimation has been posing problems as the already suggested method in the original research paper assumes blob like distribution of the dataset, where as in my case it is more of a cure fittable data with jumps at some intervals. The jumps cause the DBSCAN to form different clusters of various datasets in the intervals between jumps(which is good enough for me), but the epsilon calculation dynamically for different datasets does not produce desired results as the points tend to lie on a straight line for many intervals, and changing 'k' value cause a considerable change in the eps value.
Try using OPTICS algorithm, you won't need to estimate eps in that.
Also, I would suggest recursive regression, where you use the python's best curve fit scipy.optimize.curve_fit to get best curve, and then find the rms error of all the points wrt the curve. Then remove 'n' percent of points, and recursively repeat this untill your rms error is less than your threshold.

Excel Polynomial Curve-Fitting Algorithm

What is the algorithm that Excel uses to calculate a 2nd-order polynomial regression (curve fitting)? Is there sample code or pseudo-code available?
I found a solution that returns the same formula that Excel gives:
Put together an augmented matrix of values used in a Least-Squares Parabola. See the sum equations in http://www.efunda.com/math/leastsquares/lstsqr2dcurve.cfm
Use Gaussian elimination to solve the matrix. Here is C# code that will do that http://www.codeproject.com/Tips/388179/Linear-Equation-Solver-Gaussian-Elimination-Csharp
After running that, the left-over values in the matrix (M) will equal the coefficients given in Excel.
Maybe I can find the R^2 somehow, but I don't need it for my purposes.
The polynomial trendlines in charts use least squares based on a QR decomposition method like the LINEST worksheet function ( http://support.microsoft.com/kb/828533 ). A second order or quadratic trend for given (x,y) data could be calculated using =LINEST(y,x^{1,2}).
You can call worksheet formulas from C# using the Worksheet.Evaluate method.
It depends, because there are a lot of ways to do such a thing depending on the data you supply and how important it is to have the curve pass through those points.
I'm guessing that you have many more points than you do coefficients in the polynomial (e.g. more than three points for a 2nd order curve).
If that's true, then the best you can do is least square fitting, which calculates the coefficients that minimize the mean square error between all the points and the resulting curve.
Since this is second order, my recommendation would be just create the damn second order terms and do a linear regression.
Ex. If you are doing z~second_order(x,y), it is equivalent to doing z~first_order(x,y,x^2,y^2, xy).

Accurate least-squares fit algorithm needed

I've experimented with the two ways of implementing a least-squares fit (LSF) algorithm shown here.
The first code is simply the textbook approach, as described by Wolfram's page on LSF. The second code re-arranges the equation to minimize machine errors. Both codes produce similar results for my data. I compared these results with Matlab's p=polyfit(x,y,1) function, using correlation coefficients to measure the "goodness" of fit and compare each of the 3 routines. I observed that while all 3 methods produced good results, at least for my data, Matlab's routine had the best fit (the other 2 routines had similar results to each other).
Matlab's p=polyfit(x,y,1) function uses a Vandermonde matrix, V (n x 2 matrix) and QR factorization to solve the least-squares problem. In Matlab code, it looks like:
V = [x1,1; x2,1; x3,1; ... xn,1] % this line is pseudo-code
[Q,R] = qr(V,0);
p = R\(Q'*y); % performs same as p = V\y
I'm not a mathematician, so I don't understand why it would be more accurate. Although the difference is slight, in my case I need to obtain the slope from the LSF and multiply it by a large number, so any improvement in accuracy shows up in my results.
For reasons I can't get into, I cannot use Matlab's routine in my work. So, I'm wondering if anyone has a more accurate equation-based approach recommendation I could use that is an improvement over the above two approaches, in terms of rounding errors/machine accuracy/etc.
Any comments appreciated! thanks in advance.
For a polynomial fitting, you can create a Vandermonde matrix and solve the linear system, as you already done.
Another solution is using methods like Gauss-Newton to fit the data (since the system is linear, one iteration should do fine). There are differences between the methods. One possibly reason is the Runge's phenomenon.

3 dimensional bin packing algorithms

I'm faced with a 3 dimensional bin packing problem and am currently conducting some preliminary research as to which algorithms/heuristics are currently yielding the best results. Since the problem is NP hard I do not expect to find the optimal solution in every case, but I was wondering:
1) what are the best exact solvers? Branch and Bound? What problem instance sizes can I expect to solve with reasonable computing resources?
2) what are the best heuristic solvers?
3) What off-the-shelf solutions exist to conduct some experiments with?
As far as off the shelf solutions, check out MAXLOADPRO for loading trucks. It may be able to be configured to load any rectangular volume, but I haven't tried that yet. In general 3d bin-packing problems have the added complication that the objects can be rotated into different positions so for any object with a given length, width and height, you effectively have to create three variables representing each position, but you only use one in the solution.
In general, stand-alone MIP formulations (or branch and bound) don't work well for the 2d or 3d problem but constraint programming has met with some success producing exact solutions for the 2d problem. Check out this abstract. Without looking at the paper, I like the decomposition approach for the problem where you're trying to minimize the number of same-sized bins. I haven't seen as many results for the 3d problem, but let us know if you find any that are implementable.
Good luck !
I've written a program which tests three various algorithms. Also this is a good source of information: A Thousand Ways to Pack the Bin - A Practical Approach to Two-Dimensional Rectangle Bin Packing. It is for two-dimensional rectangle bin, but you can always transform it to 3D.
From wikipedia:
Although these simple strategies are often good enough, efficient approximation algorithms have been demonstrated that can solve the bin packing problem within any fixed percentage of the optimal solution for sufficiently large inputs
Here are the two sources they give for this:
Approximation Algorithms
Bin packing can be solved within 1 + ε in linear time
Best exact solver: Use dynamic programming.
State variables:
Items you have packed and discarded.
Space filled in the container.
If the container is a parallelepiped grid, and the items "fit" in exact cells of the grid, you can use a 3-dimensional array to represent state variable 2. Otherwise, you will have to use more complex data structures.
Best heuristic solvers
I don't know. Perhaps Variable Neighborhood Search. There are some similarities between your problem and the timetable construction problem (which I'm working on), so the same heuristic might be good for both.
Off-the-shelf solutions to conduct experiments
I'm sorry, I don't even have a clue.
You question is similar to:
3d bin packing algorithm
Although, because you dis-allow rotation, you can get pretty good results. I suggest looking more towards a FIRST-FIT-DECREASING solution.
3dbinpacking is a commercial solution (not an algorithm) exposing an API to consume with nice visualization. It offers:
Single bin packing
Multi bin packing
Find third dimension
Find a bin dimensions

Resources