I'm currently trying to create an SPH fluid simulator. To get started I've tried to implement the paper from Mรผller. So the whole algorithm is based on calculating three different forces (pressure, viscosity and surface tension).
The pressure-force can be calculated with equation 9 where the derivitive of the kernel function is the partial derivitive whit respect to r_{i,x} r_{i,y} r_{i,z}. So we get a three dimensional vector out of it.
But for viscosity and surface tension we need the second derivitive of W which should be a three dimensional vector too but equation 14 and 19 expect a scalar?
Anyone got a hint for me?
I cannot see any major problems in equations 14, and 19 (but I do not claim to understand the paper too thoroughly). Could it be that the notation has just lead you astray?
The kernel function W(r) is a scalar field (vector parameter, scalar result). If we take its gradient ๐W, we get a vector field. However, if we take the Laplacian (๐ยฒ) of W, it is the same as calculating the divergence of the vector field, i.e., ๐ยท๐W. This, in turn, gives a scalar field by the definition of divergence.
So, with this in mind it seems that both equations 9 and 14 look reasonably sane.
The key takeaway is the Laplacian maps the divergence of the field it is applied to.
This divergence involves a dot product, which describes how aligned vectors are in the field. Thus the laplacian operator describes sum of second order partial derivatives of the field. The components summed together.
The kernel only needs to apply a second order differential because the "gradient" dealing with viscosity, which are the relative velocity vectors, is given.
Laplacian
Related
The following is the plot of a curve f(r), where r is the radial coordinate, and plotted for different values of a parameter as shown:
However, I don't know the functional form of the curve and I am interested to find the same. Are there any numerical methods which can be used to find the functional form of f(r) in terms of the radial coordinate and the parameter?
I had found a solution of the problem based on the suggestion by ja72 to use the Eureqa software which churns through the data to create accurate predictive models using evolutionary search algorithm.
In the question, the different curves corresponds to different values of . So, initially I obtained the best fit equation for different values of and found that the following model equation is suitable for my purpose:
Then, I repeated the process for a large number of values of and calculated the values of the four functions for different values of and then individually fitted these four functions. The following are the results that I obtained:
N.B.: Eureqa gave several other better fitting formulas than those mentioned in the answer. But the formulas that I mentioned are sufficiently accurate for my purpose and have minimum complexity.
A blind curve fit without an underlying model is a dangerous thing.
You need to have an understanding of the physical model behind the data to create a successful fit. The reason is that if r is distance and the best fit curve uses r^0.4072 for example, that dimension raised to a decimal power bears no meaning and it hides any underlying assumptions.Like some other dimension l not included in the model, whereas only the dimensionless quantity (r/l) would make sense to raise to the decimal power.
From a function analysis standpoint
These curves are not the result of any standard math function. Well I am not that familiar with bessel functions, gamma functions and legendre polynomials. But none of the standard functions you find in a scientific calculator jumps out here.
If r is assumed to be dimensionless, then you try to match the asymptotic behavior when r -> 0 and when r -> โ. The would be the baseline curve. To me it does not look hyperbolic, but rather close to 1/LN(1+r).
So change the variables make g=1/LN(1+r) and plot f(r) against g(r) and see what that looks like. Then try another round of curve fitting in the new curves ... and so on.
Nobody can answer this question
Nobody else could effectively answer this question but you, because a) you have the data, and b) you need to make assumptions about what region is important or not, and what is acceptable deviation.
I've been searching everywhere and I've only found how to create a covariance matrix from one vector to another vector, like cov(xi, xj). One thing I'm confused about is, how to get a covariance matrix from a cluster. Each cluster has many vectors. how to get them into one covariance matrix. Any suggestions??
info :
input : vectors in a cluster, Xi = (x0,x1,...,xt), x0 = { 5 1 2 3 4} --> a column vector
(actually it's an MFCC feature vector which has 12 coefficients per vector, after clustering them with k-means, 8 cluster, now i want to get the covariance matrix for each cluster to use it as the covariance matrix in Gaussian Mixture Model)
output : covariance matrix n x n
The question you are asking is: Given a set of N points of dimension D (e.g. the points you initially clustered as "speaker1"), fit a D-dimensional gaussian to those points (which we will call "the gaussian which represents speaker1"). To do so, merely calculate the sample mean and sample covariance: http://en.wikipedia.org/wiki/Multivariate_normal_distribution#Estimation_of_parameters or http://en.wikipedia.org/wiki/Sample_mean_and_covariance
Repeat for the other k=8 speakers. I believe you may be able to use a "non-parametric" stochastic process, or modify the algorithm (e.g. run it a few times on many speakers), to remove your assumption of k=8 speakers. Note that the standard k-means clustering algorithms (and other common algorithms like EM) are very fickle in that they will give you different answers depending on how you initialize, so you may wish to perform appropriate regularization to penalize "bad" solutions as you discover them.
(below is my answer before you clarified your question)
covariance is a property of two random variables, which is a rough measure of how much changing one affects the other
a covariance matrix is merely a representation for the NxM separate covariances, cov(x_i,y_j), each element from the set X=(x1,x2,...,xN) and Y=(y1,y2,...,yN)
So the question boils down to, what you are actually trying to do with this "covariance matrix" you are searching for? Mel-Frequency Cepstral Coefficients... does each coefficient correspond to each note of an octave? You have chosen k=12 as the number of clusters you'd like? Are you basically trying to pick out notes in music?
I'm not sure how covariance generalizes to vectors, but I would guess that the covariance between two vectors x and y is just E[x dot y] - (E[x] dot E[y]) (basically replace multiplication with dot product) which would give you a scalar, one scalar per element of your covariance matrix. Then you would just stick this process inside two for-loops.
Or perhaps you could find the covariance matrix for each dimension separately. Without knowing exactly what you're doing though, one cannot give further advice than that.
I'm cross-posting this from math.stackexchange.com because I'm not getting any feedback and it's a time-sensitive question for me.
My question pertains to linear separability with hyperplanes in a support vector machine.
According to Wikipedia:
...formally, a support vector machine
constructs a hyperplane or set of
hyperplanes in a high or infinite
dimensional space, which can be used
for classification, regression or
other tasks. Intuitively, a good
separation is achieved by the
hyperplane that has the largest
distance to the nearest training data
points of any class (so-called
functional margin), since in general
the larger the margin the lower the
generalization error of the
classifier.classifier.
The linear separation of classes by hyperplanes intuitively makes sense to me. And I think I understand linear separability for two-dimensional geometry. However, I'm implementing an SVM using a popular SVM library (libSVM) and when messing around with the numbers, I fail to understand how an SVM can create a curve between classes, or enclose central points in category 1 within a circular curve when surrounded by points in category 2 if a hyperplane in an n-dimensional space V is a "flat" subset of dimension n โ 1, or for two-dimensional space - a 1D line.
Here is what I mean:
That's not a hyperplane. That's circular. How does this work? Or are there more dimensions inside the SVM than the two-dimensional 2D input features?
This example application can be downloaded here.
Edit:
Thanks for your comprehensive answers. So the SVM can separate weird data well by using a kernel function. Would it help to linearize the data before sending it to the SVM? For example, one of my input features (a numeric value) has a turning point (eg. 0) where it neatly fits into category 1, but above and below zero it fits into category 2. Now, because I know this, would it help classification to send the absolute value of this feature for the SVM?
As mokus explained, support vector machines use a kernel function to implicitly map data into a feature space where they are linearly separable:
Different kernel functions are used for various kinds of data. Note that an extra dimension (feature) is added by the transformation in the picture, although this feature is never materialized in memory.
(Illustration from Chris Thornton, U. Sussex.)
Check out this YouTube video that illustrates an example of linearly inseparable points that become separable by a plane when mapped to a higher dimension.
I am not intimately familiar with SVMs, but from what I recall from my studies they are often used with a "kernel function" - essentially, a replacement for the standard inner product that effectively non-linearizes the space. It's loosely equivalent to applying a nonlinear transformation from your space into some "working space" where the linear classifier is applied, and then pulling the results back into your original space, where the linear subspaces the classifier works with are no longer linear.
The wikipedia article does mention this in the subsection "Non-linear classification", with a link to http://en.wikipedia.org/wiki/Kernel_trick which explains the technique more generally.
This is done by applying what is know as a [Kernel Trick] (http://en.wikipedia.org/wiki/Kernel_trick)
What basically is done is that if something is not linearly separable in the existing input space ( 2-D in your case), it is projected to a higher dimension where this would be separable. A kernel function ( can be non-linear) is applied to modify your feature space. All computations are then performed in this feature space (which can be possibly of infinite dimensions too).
Each point in your input is transformed using this kernel function, and all further computations are performed as if this was your original input space. Thus, your points may be separable in a higher dimension (possibly infinite) and thus the linear hyperplane in higher dimensions might not be linear in the original dimensions.
For a simple example, consider the example of XOR. If you plot Input1 on X-Axis, and Input2 on Y-Axis, then the output classes will be:
Class 0: (0,0), (1,1)
Class 1: (0,1), (1,0)
As you can observe, its not linearly seperable in 2-D. But if I take these ordered pairs in 3-D, (by just moving 1 point in 3-D) say:
Class 0: (0,0,1), (1,1,0)
Class 1: (0,1,0), (1,0,0)
Now you can easily observe that there is a plane in 3-D to separate these two classes linearly.
Thus if you project your inputs to a sufficiently large dimension (possibly infinite), then you'll be able to separate your classes linearly in that dimension.
One important point to notice here (and maybe I'll answer your other question too) is that you don't have to make a kernel function yourself (like I made one above). The good thing is that the kernel function automatically takes care of your input and figures out how to "linearize" it.
For the SVM example in the question given in 2-D space let x1, x2 be the two axes. You can have a transformation function F = x1^2 + x2^2 and transform this problem into a 1-D space problem. If you notice carefully you could see that in the transformed space, you can easily linearly separate the points(thresholds on F axis). Here the transformed space was [ F ] ( 1 dimensional ) . In most cases , you would be increasing the dimensionality to get linearly separable hyperplanes.
SVM clustering
HTH
My answer to a previous question might shed some light on what is happening in this case. The example I give is very contrived and not really what happens in an SVM, but it should give you come intuition.
I'm trying to develop a level surface visualizer using this method (don't know if this is the standard method or if there's something better):
1. Take any function f(x,y,z)=k (where k is constant), and bounds for x, y, and z. Also take in two grid parameters stepX and stepZ.
2. to reduce to a level curve problem, iterate from zMin to zMax with stepZ intervals. So f(x,y,z)=k => f(x,y,fixedZ)=k
3. Do the same procedure with stepX, reducing the problem to f(fixedX, y, fixedZ)=k
4. Solve f(fixedX, y, fixedZ) - k = 0 for all values of y which will satisfy that equation (using some kind of a root finding algorithm).
5. For all points generated, plot those as a level curve (the inner loop generates level curves at a given z, then for different z values there are just stacks of level curves)
6 (optional). Generate a mesh from these level curves/points which belong to the level set.
The problem I'm running into is with step 4. I have no way of knowing before-hand how many possible values of y will satisfy that equation (more specifically, how many unique and real values of y).
Also, I'm trying to keep the program as general as possible so I'm trying to not limit the original function f(x,y,z)=k to any constraints such as smoothness or polynomial other than k must be constant as required for a level surface.
Is there an algorithm (without using a CAS/symbolic solving) which can identify the root(s) of a function even if it has multiple roots? I know that bisection methods have a hard time with this because of the possibility of no sign changes over the region, but how does the secant/newtons method fare? What set of functions can the secant/newtons method be used on, and can it detect and find all unique real roots within two given bounds? Or is there a better method for generating/visualizing level surfaces?
I think I've found the solution to my problem. I did a little bit more research and discovered that level surface is synonymous with isosurface. So in theory something like a marching cubes method should work.
In case you're in need of an example of the Marching Cubes algorithm, check out
http://stemkoski.github.com/Three.js/Marching-Cubes.html
(uses JavaScript/Three.js for the graphics).
For more details on the theory, you should check out the article at
http://paulbourke.net/geometry/polygonise/
A simple way,
2D: plot (x,y) with color = floor(q*f(x,y)) in grayscale where q is some arbitrary factor.
3D: plot (x,y, floor(q*f(x,y))
Effectively heights of the function that are equivalent will be representing on the same level surface.
If you to get the level curves you can use the 2D method and edge detection/region categorization to get the points (x,y) on the same level.
The kernel trick maps a non-linear problem into a linear problem.
My questions are:
1. What is the main difference between a linear and a non-linear problem? What is the intuition behind the difference of these two classes of problem? And How does kernel trick helps use the linear classifiers on a non-linear problem?
2. Why is the dot product so important in the two cases?
Thanks.
When people say linear problem with respect to a classification problem, they usually mean linearly separable problem. Linearly separable means that there is some function that can separate the two classes that is a linear combination of the input variable. For example, if you have two input variables, x1 and x2, there are some numbers theta1 and theta2 such that the function theta1.x1 + theta2.x2 will be sufficient to predict the output. In two dimensions this corresponds to a straight line, in 3D it becomes a plane and in higher dimensional spaces it becomes a hyperplane.
You can get some kind of intuition about these concepts by thinking about points and lines in 2D/3D. Here's a very contrived pair of examples...
This is a plot of a linearly inseparable problem. There is no straight line that can separate the red and blue points.
However, if we give each point an extra coordinate (specifically 1 - sqrt(x*x + y*y)... I told you it was contrived), then the problem becomes linearly separable since the red and blue points can be separated by a 2-dimensional plane going through z=0.
Hopefully, these examples demonstrate part of the idea behind the kernel trick:
Mapping a problem into a space with a larger number of dimensions makes it more likely that the problem will become linearly separable.
The second idea behind the kernel trick (and the reason why it is so tricky) is that it is usually very awkward and computationally expensive to work in a very high-dimensional space. However, if an algorithm only uses the dot products between points (which you can think of as distances), then you only have to work with a matrix of scalars. You can implicitly perform the calculations in the higher-dimensional space without ever actually having to do the mapping or handle the higher-dimensional data.
Many classifiers, among them the linear Support Vector Machine (SVM), can only solve problems that are linearly separable, i.e. where the points belonging to class 1 can be separated from the points belonging to class 2 by a hyperplane.
In many cases, a problem that is not linearly separable can be solved by applying a transform phi() to the data points; this transform is said to transform the points to feature space. The hope is that, in feature space, the points will be linearly separable. (Note: This is not the kernel trick yet... stay tuned.)
It can be shown that, the higher the dimension of the feature space, the greater the number of problems that are linearly separable in that space. Therefore, one would ideally want the feature space to be as high-dimensional as possible.
Unfortunately, as the dimension of feature space increases, so does the amount of computation required. This is where the kernel trick comes in. Many machine learning algorithms (among them the SVM) can be formulated in such a way that the only operation they perform on the data points is a scalar product between two data points. (I will denote a scalar product between x1 and x2 by <x1, x2>.)
If we transform our points to feature space, the scalar product now looks like this:
<phi(x1), phi(x2)>
The key insight is that there exists a class of functions called kernels that can be used to optimize the computation of this scalar product. A kernel is a function K(x1, x2) that has the property that
K(x1, x2) = <phi(x1), phi(x2)>
for some function phi(). In other words: We can evaluate the scalar product in the low-dimensional data space (where x1 and x2 "live") without having to transform to the high-dimensional feature space (where phi(x1) and phi(x2) "live") -- but we still get the benefits of transforming to the high-dimensional feature space. This is called the kernel trick.
Many popular kernels, such as the Gaussian kernel, actually correspond to a transform phi() that transforms into an infinte-dimensional feature space. The kernel trick allows us to compute scalar products in this space without having to represent points in this space explicitly (which, obviously, is impossible on computers with finite amounts of memory).
The main difference (for practical purposes) is: A linear problem either does have a solution (and then it's easily found), or you get a definite answer that there is no solution at all. You do know this much, before you even know the problem at all. As long as it's linear, you'll get an answer; quickly.
The intuition beheind this is the fact that if you have two straight lines in some space, it's pretty easy to see whether they intersect or not, and if they do, it's easy to know where.
If the problem is not linear -- well, it can be anything, and you know just about nothing.
The dot product of two vectors just means the following: The sum of the products of the corresponding elements. So if your problem is
c1 * x1 + c2 * x2 + c3 * x3 = 0
(where you usually know the coefficients c, and you're looking for the variables x), the left hand side is the dot product of the vectors (c1,c2,c3) and (x1,x2,x3).
The above equation is (pretty much) the very defintion of a linear problem, so there's your connection between the dot product and linear problems.
Linear equations are homogeneous, and superposition applies. You can create solutions using combinations of other known solutions; this is one reason why Fourier transforms work so well. Non-linear equations are not homogeneous, and superposition does not apply. Non-linear equations usually have to be solved numerically using iterative, incremental techniques.
I'm not sure how to express the importance of the dot product, but it does take two vectors and returns a scalar. Certainly a solution to a scalar equation is less work than solving a vector or higher-order tensor equation, simply because there are fewer components to deal with.
My intuition in this matter is based more on physics, so I'm having a hard time translating to AI.
I think following link also useful ...
http://www.simafore.com/blog/bid/113227/How-support-vector-machines-use-kernel-functions-to-classify-data