I have m points which I wish to uniformly distribute in n-dimensional space. By "uniformly" I mean that the all shortest-distance-pairs have similar values.
In other words, I would like the points to fill the space as evenly as possible.
Please, does anyone know how to achieve this? Does this problem have a name?
Edit:
For example, when I have 4 points and 2D plane then the coordinates should be [0, 1], [1, 0], [0, -1], [-1, 0]. Just a square. For 3D it's a cube. But I'm not sure what to do if there is different point count than 2^n.
Another way of thinking about it is to consider the points to be charged particles which repel each other. But it's very slow to run such simulation...
I believe you might be interested in low discrepancy sequences. These are used as a deterministic analog to the uniform distribution described in n.m.'s comment. They're often used in so-called "quasi-Monte Carlo" algorithms, where instead of sampling randomly one uses some kind of grid of points distributed more or less evenly over the domain.
Such sequences of points do not necessarily satisfy the condition you gave that "all shortest-distance-pairs have similar values," but I interpreted this more as an attempt at description rather than a hard requirement of the problem. If it's really important then this likely does not solve your problem.
I think you probably want to look into Sphere Packing.
here's another idea (it's not perfect, but i don't think anything here is, and you may need to choose based on details of your particular case): use binary space partitioning (more info here).
the general idea is that you take your n-dimensional space and split it into two using a (n-1)-dimensional surface. then you split those two news spaces, and so on. if you choose your surfaces carefully (so that they divide into approximately equal volumes and avoid funny shapes, for some some definition of funny) then you can see that this will be an approximation to what you're asking.
the main advantage of this approach is that it's typically very fast (it's used in video games and spatial simulations). it's not going to be as fast (or as uniform) as low discrepancy sequences (which sound really cool) but i imagine it would work inside arbitrary convex hulls.
Related
I have a set of angles. The distribution could be roughly described as:
there a are usually several values very close (0.0-1.0 degree apart) to the correct solution
there are also noisy values being very far from the correct result, even opposite direction
is there a common solution/strategy for such a problem?
For multidimensional data, I would use RANSAC - but I have the impression that it is unusual to apply Ransac on 1-dimensional data. Another problem is computing the mean of an angle. I read some other posts about how to calculate the mean of angles by using vectors, but I just wonder if there isn't a particular fitting solution which deals with both issues already.
You can use RANSAC even in this case, all the necessary conditions (minimal samples, error of a data point, consensus set) are met. Your minimal sample will be 1 point, a randomly picked angle (although you can try all of them, might be fast enough). Then, all the angles (data points) with error (you can use just absolute distance, modulo 360) less than some threshold (e.g. 1 degree), will be considered as inliers, i.e. within the consensus set.
If you want to play with it a bit more, you can make the results more stable by adding some local optimisation, see e.g.:
Lebeda, Matas, Chum: Fixing the Locally Optimized RANSAC, BMVC 2012.
You could try another approaches, e.g. median, or fitting a mixture of Gaussian and uniform distribution, but you would have to deal with the periodicity of the signal somehow, so I guess RANSAC should be your choice.
I have an n-by-n symmtric matrix F of non-negative integers: F[i,j] is a measure of how close the guys i and j are. I want to locate n points in the plane representing the n guys in such a way that
two guys which are close are represented by points which are close and,
ideally, two guys which are not that close and which are not even connected by a chain of close friendships are represented by points which are far away.
Is there a standard algorithm to do this?
What you're describing is generically referred to as multi-dimensional scaling (MDS) or Principal Coordinates Analysis (PCA--but be aware that there are other techniques also known as PCA).
There are quite a few well-known algorithms for carrying out MDS. That's mostly because the classical methods are pretty slow--O(N2). Most others are attempts at reducing the run-time while minimizing loss of accuracy.
At least in my experience, Landmark multidimensional scaling (LMDS) maintains pretty close to the accuracy of full MDS, but reduces run-time substantially. The basic idea here is to compute MDS on sub-groups of the points, the compute a way to fit the pieces together.
If you really want maximum speed, and don't care a whole lot about accuracy, you could consider the FastMap algorithm.
For what it's worth: what I've generally found most useful is to reduce the raw data to around 17-21 degrees of freedom using LMDS, then (if you want to display the results) reduce from there to 3 or 2 dimensions using FastMap. I haven't used the full MDS much, but if you're working with few enough points for it to be practical, it's generally the preferred solution.
Here are a few relevant links:
MDS
LMDS
FastMap
I'm trying to implement an SVM algorithm, but I'm having a hard time understanding how d-dimensional data sets are actually handled. In my particular case, each 'point' has nearly 400 identifying features.
In the two dimensional space, it basically tries to find a line between the two groups that maximizes the margin from any point on either side. I can sort of imagine what such a 'line' would look like in a d-dimensional space, but I'm completely lost on how the classification would actually work.
There is a similar question here, but I'm not getting it. I sort of get how the separation would occur after you have the classifier, but I'm lost on how to actually get the classifier.
If you can imagine how the line of the 2D case would become a d-dimensional hyperplane for higher dimensions, then you are pretty much done. The actual classification occurs when you test a point over the hyperplane, which will give you a positive number if the point belongs to class 1 or negative if it belongs to class 2.
Notice that in the formula there is no restriction for the dimension of each point:
[Image courtesy of wikipedia]
And in case you are curious about what happens with the non-linear case when you use the kernel trick, I would like to share with you a video that illustrates very well the idea.
http://www.youtube.com/watch?v=3liCbRZPrZA
I'm displaying a small Google map on a web page using the Google Maps Static API.
I have a set of 15 co-ordinates, which I'd like to represent as points on the map.
Due to the map being fairly small (184 x 90 pixels) and the upper limit of 2000 characters on a Google Maps URL, I can't represent every point on the map.
So instead I'd like to generate a small list of co-ordinates that represents an average of the big list.
So instead of having 15 sets, I'd end up with 5 sets, who's positions approximate the positions of the 15. Say there are 3 points that are in closer proximity to each-other than to any other point on the map, those points will be collapsed into 1 point.
So I guess I'm looking for an algorithm that can do this.
Not asking anyone to spell out every step, but perhaps point me in the direction of a mathematical principle or general-purpose function for this kind of thing?
I'm sure a similar function is used in, say, graphics software, when pixellating an image.
(If I solve this I'll be sure to post my results.)
I recommend K-means clustering when you need to cluster N objects into a known number K < N of clusters, which seems to be your case. Note that one cluster may end up with a single outlier point and another with say 5 points very close to each other: that's OK, it will look closer to your original set than if you forced exactly 3 points into every cluster!-)
If you are searching for such functions/classes, have a look at MarkerClusterer and MarkerManager utility classes. MarkerClusterer closely matches the described functionality, as seen in this demo.
In general I think the area you need to search around in is "Vector Quantization". I've got an old book title Vector Quantization and Signal Compression by Allen Gersho and Robert M. Gray which provides a bunch of examples.
From memory, the Lloyd Iteration was a good algorithm for this sort of thing. It can take the input set and reduce it to a fixed sized set of points. Basically, uniformly or randomly distribute your points around the space. Map each of your inputs to the nearest quantized point. Then compute the error (e.g. sum of distances or Root-Mean-Squared). Then, for each output point, set it to the center of the set that maps to it. This will move the point and possibly even change the set that maps to it. Perform this iteratively until no changes are detected from one iteration to the next.
Hope this helps.
The kernel trick maps a non-linear problem into a linear problem.
My questions are:
1. What is the main difference between a linear and a non-linear problem? What is the intuition behind the difference of these two classes of problem? And How does kernel trick helps use the linear classifiers on a non-linear problem?
2. Why is the dot product so important in the two cases?
Thanks.
When people say linear problem with respect to a classification problem, they usually mean linearly separable problem. Linearly separable means that there is some function that can separate the two classes that is a linear combination of the input variable. For example, if you have two input variables, x1 and x2, there are some numbers theta1 and theta2 such that the function theta1.x1 + theta2.x2 will be sufficient to predict the output. In two dimensions this corresponds to a straight line, in 3D it becomes a plane and in higher dimensional spaces it becomes a hyperplane.
You can get some kind of intuition about these concepts by thinking about points and lines in 2D/3D. Here's a very contrived pair of examples...
This is a plot of a linearly inseparable problem. There is no straight line that can separate the red and blue points.
However, if we give each point an extra coordinate (specifically 1 - sqrt(x*x + y*y)... I told you it was contrived), then the problem becomes linearly separable since the red and blue points can be separated by a 2-dimensional plane going through z=0.
Hopefully, these examples demonstrate part of the idea behind the kernel trick:
Mapping a problem into a space with a larger number of dimensions makes it more likely that the problem will become linearly separable.
The second idea behind the kernel trick (and the reason why it is so tricky) is that it is usually very awkward and computationally expensive to work in a very high-dimensional space. However, if an algorithm only uses the dot products between points (which you can think of as distances), then you only have to work with a matrix of scalars. You can implicitly perform the calculations in the higher-dimensional space without ever actually having to do the mapping or handle the higher-dimensional data.
Many classifiers, among them the linear Support Vector Machine (SVM), can only solve problems that are linearly separable, i.e. where the points belonging to class 1 can be separated from the points belonging to class 2 by a hyperplane.
In many cases, a problem that is not linearly separable can be solved by applying a transform phi() to the data points; this transform is said to transform the points to feature space. The hope is that, in feature space, the points will be linearly separable. (Note: This is not the kernel trick yet... stay tuned.)
It can be shown that, the higher the dimension of the feature space, the greater the number of problems that are linearly separable in that space. Therefore, one would ideally want the feature space to be as high-dimensional as possible.
Unfortunately, as the dimension of feature space increases, so does the amount of computation required. This is where the kernel trick comes in. Many machine learning algorithms (among them the SVM) can be formulated in such a way that the only operation they perform on the data points is a scalar product between two data points. (I will denote a scalar product between x1 and x2 by <x1, x2>.)
If we transform our points to feature space, the scalar product now looks like this:
<phi(x1), phi(x2)>
The key insight is that there exists a class of functions called kernels that can be used to optimize the computation of this scalar product. A kernel is a function K(x1, x2) that has the property that
K(x1, x2) = <phi(x1), phi(x2)>
for some function phi(). In other words: We can evaluate the scalar product in the low-dimensional data space (where x1 and x2 "live") without having to transform to the high-dimensional feature space (where phi(x1) and phi(x2) "live") -- but we still get the benefits of transforming to the high-dimensional feature space. This is called the kernel trick.
Many popular kernels, such as the Gaussian kernel, actually correspond to a transform phi() that transforms into an infinte-dimensional feature space. The kernel trick allows us to compute scalar products in this space without having to represent points in this space explicitly (which, obviously, is impossible on computers with finite amounts of memory).
The main difference (for practical purposes) is: A linear problem either does have a solution (and then it's easily found), or you get a definite answer that there is no solution at all. You do know this much, before you even know the problem at all. As long as it's linear, you'll get an answer; quickly.
The intuition beheind this is the fact that if you have two straight lines in some space, it's pretty easy to see whether they intersect or not, and if they do, it's easy to know where.
If the problem is not linear -- well, it can be anything, and you know just about nothing.
The dot product of two vectors just means the following: The sum of the products of the corresponding elements. So if your problem is
c1 * x1 + c2 * x2 + c3 * x3 = 0
(where you usually know the coefficients c, and you're looking for the variables x), the left hand side is the dot product of the vectors (c1,c2,c3) and (x1,x2,x3).
The above equation is (pretty much) the very defintion of a linear problem, so there's your connection between the dot product and linear problems.
Linear equations are homogeneous, and superposition applies. You can create solutions using combinations of other known solutions; this is one reason why Fourier transforms work so well. Non-linear equations are not homogeneous, and superposition does not apply. Non-linear equations usually have to be solved numerically using iterative, incremental techniques.
I'm not sure how to express the importance of the dot product, but it does take two vectors and returns a scalar. Certainly a solution to a scalar equation is less work than solving a vector or higher-order tensor equation, simply because there are fewer components to deal with.
My intuition in this matter is based more on physics, so I'm having a hard time translating to AI.
I think following link also useful ...
http://www.simafore.com/blog/bid/113227/How-support-vector-machines-use-kernel-functions-to-classify-data