I have, as an output of a machine learning algorithm, a surface in z, which has known increments along x and y. These points along x and y match exactly to a surface which I am comparing the output of my algorithm against in order to get a metric of fit, or error. I have been struggling to find an optimal way of calculating this, and can't find any good resources on different options that I have. I have tried simple pointwise subtraction of the surfaces, which I take the absolute value and summation of, and I have tried squared versions of this, as well as divided versions, but each of these encounters different problems. I was wondering if any of you knew of any good resources on different options and which of these work in different situations. Thanks!
If your problem is outliers, compute all absolute height differences and discard the N/2 largest (or another fraction, depending on the usual proportion of outliers). Then take the average of the remaining ones (or the RMS). This is called a trimmed average.
Related
So I have this issue where I have to find the best distribution that, when passed through a function, matches a known surface. I have written a script that creates the distribution given some parameters and spits out a metric that compares the given surface to the known, but this script takes a non-negligible time, so I can't just run through a very large set of parameters to find the optimal set of parameters. I looked into the simplex method, and it seems to be the right path, but its not quite what I need, because I dont exactly have a set of linear equations, and dont know the constraints for the parameters, but rather one method that gives a single output (an thats all). Can anyone point me in the right direction to how to solve this problem? Thanks!
To quickly go over my process / problem again, I have a set of parameters (at this point 2 but will be expanded to more later) that defines a distribution. This distribution is used to create a surface, which is compared to a known surface, and an error metric is produced. I want to find the optimal set of parameters, but cannot run through an arbitrarily large number of parameters due to the time constraint.
One situation consistent with what you have asked is a model in which you have a reasonably tractable probability distribution which generates an unknown value. This unknown value goes through a complex and not mathematically nice process and generates an observation. Your surface corresponds to the observed probability distribution on the observations. You would be happy finding the parameters that give a good least squares fit between the theoretical and real life surface distribution.
One approximation for the fitting process is that you compute a grid of values in the space output by the probability distribution. Each set of parameters gives you a probability for each point on this grid. The not nice process maps each grid point here to a nearest grid point in the space of the surface. The least squares fit is a quadratic in the probabilities calculated for the first grid, because the probabilities calculated for a grid point in the surface are the sums of the probabilities calculated for values in the first grid that map to something nearer to that point in the surface than any other point in the surface. This means that it has first (and even second) derivatives that you can calculate. If your probability distribution is nice enough you can use the chain rule to calculate derivatives for the least squares fit in the initial parameters. This means that you can use optimization methods to calculate the best fit parameters which require not just a means to calculate the function to be optimized but also its derivatives, and these are generally more efficient than optimization methods which require only function values, such as Nelder-Mead or Torczon Simplex. See e.g. http://commons.apache.org/proper/commons-math/apidocs/org/apache/commons/math4/optim/package-summary.html.
Another possible approach is via something called the EM Algorithm. Here EM stands for Expectation-Maximization. It can be used for finding maximum likelihood fits in cases where the problem would be easy if you could see some hidden state that you cannot actually see. In this case the output produced by the initial distribution might be such a hidden state. One starting point is http://www-prima.imag.fr/jlc/Courses/2002/ENSI2.RNRF/EM-tutorial.pdf.
I have a set of angles. The distribution could be roughly described as:
there a are usually several values very close (0.0-1.0 degree apart) to the correct solution
there are also noisy values being very far from the correct result, even opposite direction
is there a common solution/strategy for such a problem?
For multidimensional data, I would use RANSAC - but I have the impression that it is unusual to apply Ransac on 1-dimensional data. Another problem is computing the mean of an angle. I read some other posts about how to calculate the mean of angles by using vectors, but I just wonder if there isn't a particular fitting solution which deals with both issues already.
You can use RANSAC even in this case, all the necessary conditions (minimal samples, error of a data point, consensus set) are met. Your minimal sample will be 1 point, a randomly picked angle (although you can try all of them, might be fast enough). Then, all the angles (data points) with error (you can use just absolute distance, modulo 360) less than some threshold (e.g. 1 degree), will be considered as inliers, i.e. within the consensus set.
If you want to play with it a bit more, you can make the results more stable by adding some local optimisation, see e.g.:
Lebeda, Matas, Chum: Fixing the Locally Optimized RANSAC, BMVC 2012.
You could try another approaches, e.g. median, or fitting a mixture of Gaussian and uniform distribution, but you would have to deal with the periodicity of the signal somehow, so I guess RANSAC should be your choice.
I'm observing a sinusoidally-varying source, i.e. f(x) = a sin (bx + d) + c, and want to determine the amplitude a, offset c and period/frequency b - the shift d is unimportant. Measurements are sparse, with each source measured typically between 6 and 12 times, and observations are at (effectively) random times, with intervals between observations roughly between a quarter and ten times the period (just to stress, the spacing of observations is not constant for each source). In each source the offset c is typically quite large compared to the measurement error, while amplitudes vary - at one extreme they are only on the order of the measurement error, while at the other extreme they are about twenty times the error. Hopefully that fully outlines the problem, if not, please ask and i'll clarify.
Thinking naively about the problem, the average of the measurements will be a good estimate of the offset c, while half the range between the minimum and maximum value of the measured f(x) will be a reasonable estimate of the amplitude, especially as the number of measurements increase so that the prospects of having observed the maximum offset from the mean improve. However, if the amplitude is small then it seems to me that there is little chance of accurately determining b, while the prospects should be better for large-amplitude sources even if they are only observed the minimum number of times.
Anyway, I wrote some code to do a least-squares fit to the data for the range of periods, and it identifies best-fit values of a, b and d quite effectively for the larger-amplitude sources. However, I see it finding a number of possible periods, and while one is the 'best' (in as much as it gives the minimum error-weighted residual) in the majority of cases the difference in the residuals for different candidate periods is not large. So what I would like to do now is quantify the possibility that the derived period is a 'false positive' (or, to put it slightly differently, what confidence I can have that the derived period is correct).
Does anybody have any suggestions on how best to proceed? One thought I had was to use a Monte-Carlo algorithm to construct a large number of sources with known values for a, b and c, construct samples that correspond to my measurement times, fit the resultant sample with my fitting code, and see what percentage of the time I recover the correct period. But that seems quite heavyweight, and i'm not sure that it's particularly useful other than giving a general feel for the false-positive rate.
And any advice for frameworks that might help? I have a feeling this is something that can likely be done in a line or two in Mathematica, but (a) I don't know it, an (b) don't have access to it. I'm fluent in Java, competent in IDL and can probably figure out other things...
This looks tailor-made for working in the frequency domain. Apply a Fourier transform and identify the frequency based on where the power is located, which should be clear for a sinusoidal source.
ADDENDUM To get an idea of how accurate is your estimate, I'd try a resampling approach such as cross-validation. I think this is the direction that you're heading with the Monte Carlo idea; lots of work is out there, so hopefully that's a wheel you won't need to re-invent.
The trick here is to do what might seem at first to make the problem more difficult. Rewrite f in the similar form:
f(x) = a1*sin(b*x) + a2*cos(b*x) + c
This is based on the identity for the sin(u+v).
Recognize that if b is known, then the problem of estimating {a1, a2, c} is a simple LINEAR regression problem. So all you need to do is use a 1-variable minimization tool, working on the value of b, to minimize the sum of squares of the residuals from that linear regression model. There are many such univariate optimizers to be found.
Once you have those parameters, it is easy to find the parameter a in your original model, since that is all you care about.
a = sqrt(a1^2 + a2^2)
The scheme I have described is called a partitioned least squares.
If you have a reasonable estimate of the size and the nature of your noise (e.g. white Gaussian with SD sigma), you can
(a) invert the Hessian matrix to get an estimate of the error in your position and
(b) should be able to easily derive a significance statistic for your fit residues.
For (a), compare http://www.physics.utah.edu/~detar/phys6720/handouts/curve_fit/curve_fit/node6.html
For (b), assume that your measurement errors are independent and thus the variance of their sum is the sum of their variances.
this is more a mathematical problem. nonethelesse i am looking for the algorithm in pseudocode to solve it.
given is a one dimensional coordinate system, with a number of points. the coordinates of the points may be in floating point.
now i am looking for a factor that scales this coordinate system, so that all points are on fixed number (i.e. integer coordinate)
if i am not mistaken, there should be a solution for this problem as long as the number of points is not infinite.
if i am wrong and there is no analytical solution for this problem, i am interested in an algorithm that approximates the solution as close as possible. (i.e. the coordinates will look like 15.0001)
if you are interested for the concrete problem:
i would like to overcome the well known pixelsnapping problem in adobe flash, which cuts of half-pixels at the border of bitmaps if the whole stage is scaled. i would like to find out an ideal scaling factor for the stage which makes my bitmaps being placed on whole (screen-)pixel coordinates.
since i am placing two bitmaps on the stage, the number of points will be 4 in each direction (x,y).
thanks!
As suggested, you have to convert your floating point numbers to rational ones. Fix a tolerance epsilon, and for each coordinate, find its best rational approximation within epsilon.
An algorithm and definitions is outlined there in this section.
Once you have converted all the coordinates into rational numbers, the scaling is given by the least common multiple of the denominators.
Note that this latter number can become quite huge, so you may want to experiment with epsilon so that to control the denominators.
My own inclination, if I were in your situation, would be to use rational numbers not with floating point.
And the algorithms you are looking for is finding the lowest common denominator.
A floating point number is an integer, multiplied by a power of two (the power might be negative).
So, find the largest necessary power of two among your inputs, and that gives you a scale factor that will work. The power of two isn't just -1 times the exponent of the float, it's a few more than that (according to where the least significant 1 bit is in the significand).
It's also optimal, because if x times a power of 2 is an odd integer then x in its float representation was already in simplest rational form, there's no smaller integer that you can multiply x by to get an integer.
Obviously if you have a mixture of large and small values among your input, then the resulting integers will tend to be bigger than 64 bit. So there is an analytical solution, but perhaps not a very good one given what you want to do with the results.
Note that this approach treats floats as being precise representations, which they are not. You may get more sensible results by representing each float as a rational number with smaller denominator (within some defined tolerance), then taking the lowest common multiple of all the denominators.
The problem there though is the approximation process - if the input float is 0.334[*] then I can't in general be sure whether the person who gave it to me really mean 0.334, or whether it's 1/3 with some inaccuracy. I therefore don't know whether to use a scale factor of 3 and say the scaled result is 1, or use a scale factor of 500 and say the scaled result is 167. And that's just with 1 input, never mind a bunch of them.
With 4 inputs and allowed final tolerance of 0.0001, you could perhaps find the 10 closest rationals to each input with a certain maximum denominator, then try 10^4 different possibilities and see whether the resulting scale factor gives you any values that are too far from an integer. Brute force seems nasty, but you might a least be able to bound the search a bit as you go. Also "maximum denominator" might be expressed in terms of the primes present in the factorization, rather than just the number, since if you can find a lot of common factors among them then they'll have a smaller lcm and hence smaller deviation from integers after scaling.
[*] Not that 0.334 is an exact float value, but that sort of thing. Decimal examples are easier.
If you are talking about single precision floating point numbers, then the number can be expressed like this according to wikipedia:
From this formula you can deduce that you always get an integer if you multiply by 2127+23. (Actually, when e is 0 you have to use another formula for the special range of "subnormal" numbers so 2126+23 is sufficient. See the linked wikipedia article for details.)
To do this in code you will probably need to do some bit twiddling to extract the factors in the above formula from the bits in the floating point value. And then you will need some kind of support for unlimited size numbers to express the integer result of the scaling (e.g. BigInteger in .NET). Normal primitive types in most languages/platforms are typically limited to much smaller sizes.
It's really a problem in statistical inference combined with noise reduction. This is the method I'm going to try out soon. I'm assuming you're trying to get a regularly spaced 2-D grid but a similar method could work on a regularly spaced grid of 3 or more dimensions.
First tabulate all the differences and note that (dx,dy) and (-dx,-dy) denote the same displacement, so there's an equivalence relation. Group those differenecs that are within a pre-assigned threshold (epsilon) of one another. Epsilon should be large enough to capture measurement errors due to random noise or lack of image resolution, but small enough not to accidentally combine clusters.
Sort the clusters by their average size (dr = root(dx^2 + dy^2)).
If the original grid was, indeed, regularly spaced and generated by two independent basis vectors, then the two smallest linearly independent clusters will indicate so. The smallest cluster is the one centered on (0, 0). The next smallest cluster (dx0, dy0) has the first basis vector up to +/- sign (-dx0, -dy0) denotes the same displacement, recall.
The next smallest clusters may be linearly dependent on this (up to the threshold epsilon) by virtue of being multiples of (dx0, dy0). Find the smallest cluster which is NOT a multiple of (dx0, dy0). Call this (dx1, dy1).
Now you have enough to tag the original vectors. Group the vector, by increasing lexicographic order (x,y) > (x',y') if x > x' or x = x' and y > y'. Take the smallest (x0,y0) and assign the integer (0, 0) to it. Take all the others (x,y) and find the decomposition (x,y) = (x0,y0) + M0(x,y) (dx0, dy0) + M1(x,y) (dx1,dy1) and assign it the integers (m0(x,y),m1(x,y)) = (round(M0), round(M1)).
Now do a least-squares fit of the integers to the vectors to the equations (x,y) = (ux,uy) m0(x,y) (u0x,u0y) + m1(x,y) (u1x,u1y)
to find (ux,uy), (u0x,u0y) and (u1x,u1y). This identifies the grid.
Test this match to determine whether or not all the points are within a given threshold of this fit (maybe using the same threshold epsilon for this purpose).
The 1-D version of this same routine should also work in 1 dimension on a spectrograph to identify the fundamental frequency in a voice print. Only in this case, the assumed value for ux (which replaces (ux,uy)) is just 0 and one is only looking for a fit to the homogeneous equation x = m0(x) u0x.
I've been trying to come up with an interest point detection algorithm and this is what I came up with:
You go through the X and the Y axises 3n pixels at a time creating 3n x 3n squares.
For the the n x n square in the middle of the 3n x 3n square (let's call it square Z), the R, G, and B values are averaged and rounded to preset values to limit the number of colors, and that is the color that square will be treated as.
The same is done for the 8 surrounding n x n squares.
After that, the color of square Z is compared to the surrounding squares, if it matches x out of the 8 surrounding squares where x <= 3 or x => 5 then that is an interest point (a corner is detected).
And so on till all the image is covered.
The bigger n is, the faster the image will be scanned and the the less accurate the detection is, and vice versa.
This, supposedly, detects "literal corners", that is corners you can actually SEE on the image.
What do you think of this algorithm? Is it efficient? Can it be used on a live video stream (say from the camera) on a hand-held device?
I'm sorry to say that I don't think this is likely to be very good. Your algorithm looks a bit like a simplistic version of Moravec's algorithm, which is itself one of the simplest corner detection algorithms. The hardcoded limits you test against effectively make your edge test a stepped function, unlike an approach such as summed square differences. This will almost certainly give you discontinuities in your detection function (corners that don't match when they should have), for some values.
You also have the same problem as Moravec, namely that if the edge lies at an angle to the direction of neighbours being considered, then it won't be detected.
Developing algorithms is fun, and if this isn't a business-critical project, then by all means, carry on tinkering and experimenting (and don't be put off by my comments!). But the fact is, for almost any practical problem, a better algorithm for the task you want to solve almost certainly already exists. The real challenge is identifying how you can best model your problem in such a way that you can solve it using an existing, well-understood approach, designed by experts.
In particular, robust identification and analysis of edge-cases and worst-case runtimes is a tricky business; unless you are a professional algorist, you are likely to find the going difficult. But I certainly encourage you to discover this for yourself by trying. nlucaroni mentions some excellent questions to use as starting points for your analysis.
Why not try it and see if it works the way you expect? It sounds like it should. How does the performance compare with other methods? What is the complexity of the algorithm? Is it efficient compared to others? Where can it be improved? What kind of false-positives and false negatives are expected? Are they within reason based on the data I plan to use this on? What threshold should be used to compare surrounding squares? ....
this is stuff you should be doing, not us.
I would suggest you look at the SIFT algorithm. Its the defacto standard for points of interest in an image. Unfortunately, its also patented, because its so good.
If you are interested in a real time version of SIFT you can get it to run on a GPU, but its highly experimental at this point. Note if you are developing a commercial application you'd have to first purchase a license for using SIFT or get approval from David Lowe.