Visualizing Level surfaces - algorithm

I'm trying to develop a level surface visualizer using this method (don't know if this is the standard method or if there's something better):
1. Take any function f(x,y,z)=k (where k is constant), and bounds for x, y, and z. Also take in two grid parameters stepX and stepZ.
2. to reduce to a level curve problem, iterate from zMin to zMax with stepZ intervals. So f(x,y,z)=k => f(x,y,fixedZ)=k
3. Do the same procedure with stepX, reducing the problem to f(fixedX, y, fixedZ)=k
4. Solve f(fixedX, y, fixedZ) - k = 0 for all values of y which will satisfy that equation (using some kind of a root finding algorithm).
5. For all points generated, plot those as a level curve (the inner loop generates level curves at a given z, then for different z values there are just stacks of level curves)
6 (optional). Generate a mesh from these level curves/points which belong to the level set.
The problem I'm running into is with step 4. I have no way of knowing before-hand how many possible values of y will satisfy that equation (more specifically, how many unique and real values of y).
Also, I'm trying to keep the program as general as possible so I'm trying to not limit the original function f(x,y,z)=k to any constraints such as smoothness or polynomial other than k must be constant as required for a level surface.
Is there an algorithm (without using a CAS/symbolic solving) which can identify the root(s) of a function even if it has multiple roots? I know that bisection methods have a hard time with this because of the possibility of no sign changes over the region, but how does the secant/newtons method fare? What set of functions can the secant/newtons method be used on, and can it detect and find all unique real roots within two given bounds? Or is there a better method for generating/visualizing level surfaces?

I think I've found the solution to my problem. I did a little bit more research and discovered that level surface is synonymous with isosurface. So in theory something like a marching cubes method should work.

In case you're in need of an example of the Marching Cubes algorithm, check out
http://stemkoski.github.com/Three.js/Marching-Cubes.html
(uses JavaScript/Three.js for the graphics).
For more details on the theory, you should check out the article at
http://paulbourke.net/geometry/polygonise/

A simple way,
2D: plot (x,y) with color = floor(q*f(x,y)) in grayscale where q is some arbitrary factor.
3D: plot (x,y, floor(q*f(x,y))
Effectively heights of the function that are equivalent will be representing on the same level surface.
If you to get the level curves you can use the 2D method and edge detection/region categorization to get the points (x,y) on the same level.

Related

Algorithm to determine the global minima of a blackbox function

I recently got this question in an interview, and it's kind of making me mad thinking about it.
Suppose you have a family of functions that each take a fixed number of parameters (different functions can take different numbers of parameters), each with the following properties:
Each input is between 0-1
Each output is between 0-1
The function is continuous
The function is a blackbox (i.e you cannot look at the equation for it)
He then asked me to create an algorithm to find the global minima of this function.
To me, looking at this question was like trying to answer the basis of machine learning. Obviously if there was some way to guarantee to find the global minima of a function, then we'd have perfect machine learning algorithms. Obviously we don't, so this question seems kind of impossible.
Anyways, the answer I gave was a mixture of divide and conquer with stochastic gradient descent. Since all functions are continuous, you'll always be able to calculate the partial gradient with respect to a certain dimension. You split each dimension in half and once you've reached a certain granularity, you apply stochastic gradient descent. In gradient descent, you initialize a certain start point, and evaluate the left and right side of that point based on a small delta with respect to every dimension to get the slope at that point. Then you update your point based on a certain learning rate and recalculate your partial derivatives until you've reached a point where the distance between old and new point is below a certain threshold. Then you re-merge and return the minimum of the two sections until you return the minimum value from all your divisions. My hope was to get around the fact that SGD can get stuck in local minima, so I thought dividing the dimension space would reduce the chance of that happening.
He seemed pretty unimpressed with my algorithm in the end. Does anybody have a faster/more accurate way of solving this problem?
The range is [0, 1], therefore f(x) = 0, where x on R^n, is the global minima. Moreover, it's not guaranteed that the function will be a convex, by knowing the domain, range, and continuity holds.
ex. f(x) = sqrt(x), it's a concave function (i.e. has no minimum), and x - [0, 1] belongs to its domain.

Algorithm to approximate non-linear equation system solution

I'm looking for an algorithm to approximate the solution of the following equation system:
The equations have to be solved on an embedded system, in C++.
Background:
We measure the 2 variables X_m and Y_m, so they are known
We want to compute the real values: X_r and Y_r
X and Y are real numbers
We measure the functions f_xy and f_yx during calibration. We have maximal 18 points of each function.
It's possible to store the functions as a look-up table
I tried to approximate the functions with 2nd order polynomials and compute the solution, but it was not accurate enough, because of the fitting error.
I am looking for an algorithm to approximate the results in an embedded system in C++, but I don't even know what to search for. I found some papers on the theory link, but I think there must be an easier way to do it in my case.
Also: how can I determine during calibration, whether the functions can be solved with the algorithm?
Fitting a second-order polynomial through f_xy? That's generally not viable. The go-to solution would be Runga-Kutta interpolation. You pick two known values left and two to the right of your argument, with weights 1,2,2,1. This gets you an estimate d(f_xy)/dx which you can then use for interpolation.
The normal way is by Newton's iterations, starting from the initial approximation (Xm, Ym) [assuming that the f are mere corrections]. Due to the particular shape of the equations, you can reduce to twice a single equation in a single unknown.
Xr = Xm - Fyx(Ym - Fxy(Xr))
Yr = Ym - Fxy(Xm - Fyx(Yr))
The iterations read
Xr <-- Xr - (Xm - Fyx(Ym - Fxy(Xr))) / (1 + Fxy'(Ym - Fxy(Xr)).Fxy'(Xr))
Yr <-- Yr - (Ym - Fxy(Xm - Fyx(Yr))) / (1 + Fyx'(Xm - Fyx(Yr)).Fyx'(Yr))
So you should tabulate the derivatives of f as well, though accuracy is not so critical than for the computation of the f themselves.
If the calibration points aren't too noisy, I would recommend cubic spline interpolation, for which you can precompute all coefficients. At the same time these coefficients allow you to estimate the derivative (as the corresponding quadratic interpolant, which is continuous).
In principle (unless the points are uniformly spaced), you need to perform a dichotomic search to determine the interval in which the argument lies. But here you will evaluate the functions at nearby values, so that a linear search from the previous location should be better.
A different way to address the problem is by considering the bivariate solution surfaces Xr = G(Xm, Ym) and Yr = G(Xm, Ym) that you compute on a grid of points. If the surfaces are smooth enough, you can use a coarse grid.
So by any method (such as the one above), you precompute the solutions at each grid node, as well as the coefficients of some interpolant in the X and Y directions. I recommend a cubic spline, again.
Now to interpolate inside a grid cell, you combine the two univarite interpolants to a bivariate one by means of the Coons formula https://en.wikipedia.org/wiki/Coons_patch.

Math - 3d positioning/multilateration

I have a problem involving 3d positioning - sort of like GPS. Given a set of known 3d coordinates (x,y,z) and their distances d from an unknown point, I want to find the unknown point. There can be any number of reference points, however there will be at least four.
So, for example, points are in the format (x,y,z,d). I might have:
(1,0,0,1)
(0,2,0,2)
(0,0,3,3)
(0,3,4,5)
And here the unknown point would be (0,0,0,0).
What would be the best way to go about this? Is there an existing library that supports 3d multilateration? (I have been unable to find one). Since it's unlikely that my data will have an exact solution (all of the 4+ spheres probably won't have a single perfect intersect point), the algorithm would need to be capable of approximating it.
So far, I was thinking of taking each subset of three points, triangulating the unknown based on those three, and then averaging all of the results. Is there a better way to do this?
You could take a non-linear optimisation approach, by defining a "cost" function that incorporates the distance error from each of your observation points.
Setting the unknown point at (x,y,z), and considering a set of N observation points (xi,yi,zi,di) the following function could be used to characterise the total distance error:
C(x,y,z) = sum( ((x-xi)^2 + (y-yi)^2 + (z-zi)^2 - di^2)^2 )
^^^
^^^ for all observation points i = 1 to N
This is the sum of the squared distance errors for all points in the set. (It's actually based on the error in the squared distance, so that there are no square roots!)
When this function is at a minimum the target point (x,y,z) would be at an optimal position. If the solution gives C(x,y,z) = 0 all observations would be exactly satisfied.
One apporach to minimise this type of equation would be Newton's method. You'd have to provide an initial starting point for the iteration - possibly a mean value of the observation points (if they en-circle (x,y,z)) or possibly an initial triangulated value from any three observations.
Edit: Newton's method is an iterative algorithm that can be used for optimisation. A simple version would work along these lines:
H(X(k)) * dX = G(X(k)); // solve a system of linear equations for the
// increment dX in the solution vector X
X(k+1) = X(k) - dX; // update the solution vector by dX
The G(X(k)) denotes the gradient vector evaluated at X(k), in this case:
G(X(k)) = [dC/dx
dC/dy
dC/dz]
The H(X(k)) denotes the Hessian matrix evaluated at X(k), in this case the symmetric 3x3 matrix:
H(X(k)) = [d^2C/dx^2 d^2C/dxdy d^2C/dxdz
d^2C/dydx d^2C/dy^2 d^2C/dydz
d^2C/dzdx d^2C/dzdy d^2C/dz^2]
You should be able to differentiate the cost function analytically, and therefore end up with analytical expressions for G,H.
Another approach - if you don't like derivatives - is to approximate G,H numerically using finite differences.
Hope this helps.
Non-linear solution procedures are not required. You can easily linearise the system. If you take pair-wise differences
$(x-x_i)^2-(x-x_j)^2+(y-y_i)^2-(y-y_j)^2+(z-z_i)^2-(z-z_j)^2=d_i^2-d_j^2$
then a bit of algebra yields the linear equations
$(x_i-x_j) x +(y_i-y_j) y +(zi-zj) z=-1/2(d_i^2-d_j^2+ds_i^2-ds_j^2)$,
where $ds_i$ is the distance from the $i^{th}$ sensor to the origin. These are the equations of the planes defined by intersecting the $i^{th}$ and the $j^{th}$ spheres.
For four sensors you obtain an over-determined linear system with $4 choose 2 = 6$ equations. If $A$ is the resulting matrix and $b$ the corresponding vector of RHS, then you can solve the normal equations
$A^T A r = A^T b$
for the position vector $r$. This will work as long as your sensors are not coplanar.
If you can spend the time, an iterative solution should approach the correct solution pretty quickly. So pick any point the correct distance from site A, then go round the set working out the distance to the point then adjusting the point so that it's in the same direction from the site but the correct distance. Continue until your required precision is met (or until the point is no longer moving far enough in each iteration that it can meet your precision, as per the possible effects of approximate input data).
For an analytic approach, I can't think of anything better than what you already propose.

Trilateration of a signal using Time Difference of Arrival

I am having some trouble to find or implement an algorithm to find a signal source. The objective of my work is to find the sound emitter position.
To accomplish this I am using three microfones. The technique that I am using is multilateration that is based on the time difference of arrival.
The time difference of arrival between each microfones are found using Cross Correlation of the received signals.
I already implemented the algorithm to find the time difference of arrival, but my problem is more on how multilateration works, it's unclear for me based on my reference, and I couldn't find any other good reference for this that are free/open.
If you have some references on how I can implement a multilateration algorithm, or some other trilateration algorithm that I can use based on time difference of arrival it would be a great help.
Thanks in advance.
The point you are looking for is the intersection of three hyperbolas. I am assuming 2D here since you only use 3 receptors. Technically, you can find a unique 3D solution but as you likely have noise, I assume that if you wanted a 3D result, you would have taken 4 microphones (or more).
The wikipedia page makes some computations for you. They do it in 3D, you just have to set z = 0 and solve for system of equations (7).
The system is overdetermined, so you will want to solve it in the least squares sense (this is the point in using 3 receptors actually).
I can help you with multi-lateration in general.
Basically, if you want a solution in 3d - you have to have at least 4 points and 4 distances from them (2-give you the circle in which is the solution - because that is the intersection between 2 spheres, 3 points give you 2 possible solutions (intersection between 3 spheres) - so, in order to have one solution - you need 4 spheres). So, when you have some points (4+) and the distance between them (there is an easy way to transform the TDOA into the set of equations for just having the length type distances /not time/) you need a way to solve the set of equations. First - you need a cost function (or solution error function, as I call it) which would be something like
err(x,y,z) = sum(i=1..n){sqrt[(x-xi)^2 + (y-yi)^2 + (z-zi)^2] - di}
where x, y, z are coordinates of the current point in the numerical solution and xi, yi, zi and di are the coordinates and distance towards the ith reference point. In order to solve this - my advice is NOT to use Newton/Gauss or Newton methods. You need first and second derivative of the aforementioned function - and those have a finite discontinuation in some points in space - hence that is not a smooth function and these methods won't work. What will work is direct search family of algorithms for optimization of functions (finding minimums and maximums. in our case - you need minimum of the error/cost function).
That should help anyone wanting to find a solution for similar problem.

Help me understand linear separability in a binary SVM

I'm cross-posting this from math.stackexchange.com because I'm not getting any feedback and it's a time-sensitive question for me.
My question pertains to linear separability with hyperplanes in a support vector machine.
According to Wikipedia:
...formally, a support vector machine
constructs a hyperplane or set of
hyperplanes in a high or infinite
dimensional space, which can be used
for classification, regression or
other tasks. Intuitively, a good
separation is achieved by the
hyperplane that has the largest
distance to the nearest training data
points of any class (so-called
functional margin), since in general
the larger the margin the lower the
generalization error of the
classifier.classifier.
The linear separation of classes by hyperplanes intuitively makes sense to me. And I think I understand linear separability for two-dimensional geometry. However, I'm implementing an SVM using a popular SVM library (libSVM) and when messing around with the numbers, I fail to understand how an SVM can create a curve between classes, or enclose central points in category 1 within a circular curve when surrounded by points in category 2 if a hyperplane in an n-dimensional space V is a "flat" subset of dimension n − 1, or for two-dimensional space - a 1D line.
Here is what I mean:
That's not a hyperplane. That's circular. How does this work? Or are there more dimensions inside the SVM than the two-dimensional 2D input features?
This example application can be downloaded here.
Edit:
Thanks for your comprehensive answers. So the SVM can separate weird data well by using a kernel function. Would it help to linearize the data before sending it to the SVM? For example, one of my input features (a numeric value) has a turning point (eg. 0) where it neatly fits into category 1, but above and below zero it fits into category 2. Now, because I know this, would it help classification to send the absolute value of this feature for the SVM?
As mokus explained, support vector machines use a kernel function to implicitly map data into a feature space where they are linearly separable:
Different kernel functions are used for various kinds of data. Note that an extra dimension (feature) is added by the transformation in the picture, although this feature is never materialized in memory.
(Illustration from Chris Thornton, U. Sussex.)
Check out this YouTube video that illustrates an example of linearly inseparable points that become separable by a plane when mapped to a higher dimension.
I am not intimately familiar with SVMs, but from what I recall from my studies they are often used with a "kernel function" - essentially, a replacement for the standard inner product that effectively non-linearizes the space. It's loosely equivalent to applying a nonlinear transformation from your space into some "working space" where the linear classifier is applied, and then pulling the results back into your original space, where the linear subspaces the classifier works with are no longer linear.
The wikipedia article does mention this in the subsection "Non-linear classification", with a link to http://en.wikipedia.org/wiki/Kernel_trick which explains the technique more generally.
This is done by applying what is know as a [Kernel Trick] (http://en.wikipedia.org/wiki/Kernel_trick)
What basically is done is that if something is not linearly separable in the existing input space ( 2-D in your case), it is projected to a higher dimension where this would be separable. A kernel function ( can be non-linear) is applied to modify your feature space. All computations are then performed in this feature space (which can be possibly of infinite dimensions too).
Each point in your input is transformed using this kernel function, and all further computations are performed as if this was your original input space. Thus, your points may be separable in a higher dimension (possibly infinite) and thus the linear hyperplane in higher dimensions might not be linear in the original dimensions.
For a simple example, consider the example of XOR. If you plot Input1 on X-Axis, and Input2 on Y-Axis, then the output classes will be:
Class 0: (0,0), (1,1)
Class 1: (0,1), (1,0)
As you can observe, its not linearly seperable in 2-D. But if I take these ordered pairs in 3-D, (by just moving 1 point in 3-D) say:
Class 0: (0,0,1), (1,1,0)
Class 1: (0,1,0), (1,0,0)
Now you can easily observe that there is a plane in 3-D to separate these two classes linearly.
Thus if you project your inputs to a sufficiently large dimension (possibly infinite), then you'll be able to separate your classes linearly in that dimension.
One important point to notice here (and maybe I'll answer your other question too) is that you don't have to make a kernel function yourself (like I made one above). The good thing is that the kernel function automatically takes care of your input and figures out how to "linearize" it.
For the SVM example in the question given in 2-D space let x1, x2 be the two axes. You can have a transformation function F = x1^2 + x2^2 and transform this problem into a 1-D space problem. If you notice carefully you could see that in the transformed space, you can easily linearly separate the points(thresholds on F axis). Here the transformed space was [ F ] ( 1 dimensional ) . In most cases , you would be increasing the dimensionality to get linearly separable hyperplanes.
SVM clustering
HTH
My answer to a previous question might shed some light on what is happening in this case. The example I give is very contrived and not really what happens in an SVM, but it should give you come intuition.

Resources