CGAL exact predicate inexact constructions - computational-geometry

I know that it is possible to get exact comparison results without constructing the objects exactly by using exact_predicate_inexact_constructions kernel in CGAL. The thing that I wonder is upto which degree I can use the inexact objects to get an exact result. To be more clear, let me give an example: I want to compare two distances, distance between the points p and q1 and distance between the points p and q2. The problem is, both q1 and q2 are not computed yet, they will be found after a few intersection/projection operations. Moreover, for those geometrical operations, there is need to do some vector/direction computations. At the final, I do not need to know none of those interstep objects (none of the vectors, intersection lines, even q1 and q2). The only thing I want to reach is which distance is less than the other. If I define each object with the above kernel and find each interstep object with projection/intersection operation of that kernel and call the comparison function of the same kernel also, would the result that I reach be exact?

Related

Testing if a vector is in a population of vectors

Suppose that I have an object that has N different scalar qualities, each of which I've measured (for example, the (x,y) coordinates at the tips of the major arms of a leaf). Together, I have N such measurements for each object, which I'll save as a 1D list of N reals.
Now I'm given a large number R of such objects, each with its corresponding N-element list. Let's call this the population. We can represent this as a matrix M with R rows, each of N elements.
I'm now given a new object B, with its 1D N-element list. I'd like to hand Mathematica my matrix M and my new object B, and get back a single number that tells me how confident I can be that B belongs to the population represented by M.
I'd also be happy with a probability, or any other number with a simple interpretation. I'm willing to assume that everything is uncorrelated, that the values in columns of M are normally distributed, and other such typical assumptions.
When N=1, Student's t-test seems the right tool. There seem to be tools built into Mathematica that can solve precisely this problem when N>1, but the documentation (and web references) presume more statistical depth than I have, so I don't have confidence that I know what to do. I feel like the solution is tantalizingly just out of reach. If anyone can provide a code example that solves this problem, I would be very grateful.

Two algorithms to find nearest neighbor with Locality-sensitive hashing, which one?

Currently I'm studying how to find a nearest neighbor using Locality-sensitive hashing. However while I'm reading papers and searching the web I found two algorithms for doing this:
1- Use L number of hash tables with L number of random LSH functions, thus increasing the chance that two documents that are similar to get the same signature. For example if two documents are 80% similar, then there's an 80% chance that they will get the same signature from one LSH function. However if we use multiple LSH functions, then there's a higher chance to get the same signature for the documents from one of the LSH functions. This method is explained in wikipedia and I hope my understanding is correct:
http://en.wikipedia.org/wiki/Locality-sensitive_hashing#LSH_algorithm_for_nearest_neighbor_search
2- The other algorithm uses a method from a paper (section 5) called: Similarity Estimation Techniques from Rounding Algorithms by Moses S. Charikar. It's based on using one LSH function to generate the signature and then apply P permutations on it and then sort the list. Actually I don't understand the method very well and I hope if someone could clarify it.
My main question is: why would anyone use the second method rather than the first method? As I find it's easier and faster.
I really hope someone can help!!!
EDIT:
Actually I'm not sure if #Raff.Edward were mixing between the "first" and the "second". Because only the second method uses a radius and the first just uses a new hash family g composed of the hash family F. Please check the wikipedia link. They just used many g functions to generate different signatures and then for each g function it has a corresponding hash table. In order to find the nearest neighbor of a point you just let the point go through the g functions and check the corresponding hash tables for collisions. Thus how I understood it as more function ... more chance for collisions.
I didn't find any mentioning about radius for the first method.
For the second method they generate only one signature for each feature vector and then apply P permutations on them. Now we have P lists of permutations where each contains n signatures. Now they then sort each list from P. After that given a query point q, they generate the signature for it and then apply the P permutations on it and then use binary search on each permuted and sorted P list to find the most similar signature to the query q. I concluded this after reading many papers about it, but I still don't understand why would anyone use such a method because it doesn't seem fast in finding the hamming distance!!!!
For me I would simply do the following to find the nearest neighbor for a query point q. Given a list of signatures N, I would generate the signature for the query point q and then scan the list N and compute the hamming distance between each element in N and the signature of q. Thus I would end up with the nearest neighbor for q. And it takes O(N)!!!
Your understanding of the first one is a little off. The probability of a collision occurring is not proportional to the similarity, but whether or not it is less than the pre-defined radius. The goal is that anything within the radius will have a high chance of colliding, and anything outside the radius * (1+eps) will have a low chance of colliding (and the area in-between is a little murky).
The first algorithm is actually fairly difficult to implement well, but can get good results. In particular, the first algorithm is for the L1 and L2 (and technically a few more) metrics.
The second algorithm is very simple to implement, though a naive implementation may use up too much memory to be useful depending on your problem size. In this case, the probability of collision is proportional to the similarity of the inputs. However, it only works for the Cosine Similarity (or distance metrics based on a transform of the similarity.)
So which one you would use is based primarily on which distance metric you are using for Nearest Neighbor (or whatever other application).
The second one is actually much easier to understand and implement than the first one, the paper is just very wordy.
The short version: Take a random vector V and give each index a independent random unit normal value. Create as many vectors as you want the signature length to be. The signature is the signs of each index when you do a Matrix Vector product. Now the hamming distance between any two signatures is related to the cosine similarity between the respective data points.
Because you can encode the signature into an int array and use an XOR with a bit count instruction to get the hamming distance very quickly, you can get approximate cosine similarity scores very quickly.
LSH algorithms doesn't have a lot of standardization, and the two papers (and others) use different definitions, so its all a bit confusing at times. I only recently implemented both of these algorithms in JSAT, and am still working on fully understanding them both.
EDIT: Replying to your edit. The wikipedia article is not great for LSH. If you read the original paper, the first method you are talking about only works for a fixed radius. The hash functions are then created based on that radius, and concatenated to increase the probability of getting near by points in a collision. They then construct a system for doing k-NN on-top of this by determine the maximum value of k they wan, and then finding the largest reasonable distance they would find the k'th nearest neighbor in. In this way, a radius search will very likely return the set of k-NNs. To speed this up, they also create a few extra small radius since the density is often not uniform, and the smaller radius you use, the faster the results.
The wikipedia section you linked is taken from the paper description for the "Stable Distribution" section, which presents the hash function for a search of radius r=1.
For the second paper, the "sorting" you describe is not part of the hashing, but part of one-scheme for searching the hamming space more quickly. I as I mentioned, I recently implemented this, and you can see a quick benchmark I did using a brute force search is still much faster than the naive method of NN. Again, you would also pick this method if you need the cosine similarity over the L2 or L1 distance. You will find many other papers proposing different schemes for searching the hamming space created by the signatures.
If you need help convincing yourself fit can be faster even if you were still doing brute force - just look at it this way: Lets say that the average sparse document has 40 common words with another document (a very conservative number in my experience). You have n documents to compare against. Brute force cosine similarity would then involve about 40*n floating point multiplications (and some extra work). If you have a 1024 bit signature, thats only 32 integers. That means we could do a brute force LSH search in 32*n integer operations, which are considerably faster then floating point operations.
There are also other factors at play here as well. For a sparse data set we have to keep both the doubles and integer indices to represent the non zero indexes, so the sparse dot product is doing a lot of additional integer operations to see which indices they have in common. LSH also allows us to save memory, because we don't need to store all of these integers and doubles for each vector, instead we can just keep its hash around - which is only a few bytes.
Reduced memory use can help us better exploit the CPU cache.
Your O(n) is the naive way I have used in my blog post. And it is fast. However, if you sort the bits before hand, you can do the binary search in O(log(n)). Even if you have L of these lists, L << n, and so it should be faster. The only issue is it gets you approximate hamming NN which are already approximating the cosine similarity, so the results can become a bit worse. It depends on what you need.

Collision detection complexity <O(n²): Simpler approach than grid, quadtrees, BSP?

I have a large number of object (balls, for a start) that move stepwise in space, one at a time, and shall not overlap. Currently, for every move I check for collision with every other object. Several other questions here deal with this, however, I thought of a seemingly simple solution that does not seem to come up in this context, and I wonder why.
Why not simply keep 2 collections (for 2D, or 3 in three dimensions) of all objects, sorted by the x and y (and z) coordinate, respectively, and at every move look up all other objects within a given distance (ball diameter here) in each dimension and do the actual collision check only on objects in both (or all 3) result sets?
I realize this only works for equally-sized objects, but alternatively one could use twice as many collections, sorted by the (1) highest (2) lowest coordinate of each object for each dimension. Any reason why this would not work, or give significantly less of an improvement compared to going from O(n) "pairwise check" to "grid method" or "quad/octrees"? I see the update of these sorted collections as the costly operation here, but using e.g. a TreeSet (my implementation would be in Java) it should still be significantly less than O(n), right?
The check for which objects are in both result sets involves looking at all objects in two strips of the plane. That is a much larger area, and therefore involves more objects, than the enclosing square that a quadtree lets you immediately narrow down to. More objects means that it is slower.
You want to use a spatial index or space-filling-curve instead of a quadtree. A sfc reduce the 2d complexity to a 1d complexity and is different from a quadtree because it can only store 1 object per x,y pair? Maybe this works for your problem? You want to search for Nick's hilbert curve quadtree spatial index blog.

Tricky algorithm for sorting symbols in an array while preserving relationships via order

The problem
I have multiple groups which specify the relationships of symbols.. for example:
[A B C]
[A D E]
[X Y Z]
What these groups mean is that (for the first group) the symbols, A, B, and C are related to each other. (The second group) The symbols A, D, E are related to each other.. and so forth.
Given all these data, I would need to put all the unique symbols into a 1-dimension array wherein the symbols which are somehow related to each other would be placed closer to each other. Given the example above, the result should be something like:
[B C A D E X Y Z]
or
[X Y Z D E A B C]
In this resulting array, since the symbol A has multiple relationships (namely with B and C in one group and with D and E in another) it's now located between those symbols, somewhat preserving the relationship.
Note that the order is not important. In the result, X Y Z can be placed first or last since those symbols are not related to any other symbols. However, the closeness of the related symbols is what's important.
What I need help in
I need help in determining an algorithm that takes groups of symbol relationships, then outputs the 1-dimension array using the logic above. I'm pulling my hair out on how to do this since with real data, the number of symbols in a relationship group can vary, there is also no limit to the number of relationship groups and a symbol can have relationships with any other symbol.
Further example
To further illustrate the trickiness of my dilemma, IF you add another relationship group to the example above. Let's say:
[C Z]
The result now should be something like:
[X Y Z C B A D E]
Notice that the symbols Z and C are now closer together since their relationship was reinforced by the additional data. All previous relationships are still retained in the result also.
The first thing you need to do is to precisely define the result you want.
You do this by defining how good a result is, so that you know which is the best one. Mathematically you do this by a cost function. In this case one would typically choose the sum of the distances between related elements, the sum of the squares of these distances, or the maximal distance. Then a list with a small value of the cost function is the desired result.
It is not clear whether in this case it is feasible to compute the best solution by some special method (maybe if you choose the maximal distance or the sum of the distances as the cost function).
In any case it should be easy to find a good approximation by standard methods.
A simple greedy approach would be to insert each element in the position where the resulting cost function for the whole list is minimal.
Once you have a good starting point you can try to improve it further by modifying the list towards better solutions, for example by swapping elements or rotating parts of the list (local search, hill climbing, simulated annealing, other).
I think, because with large amounts of data and lack of additional criteria, it's going to be very very difficult to make something that finds the best option. Have you considered doing a greedy algorithm (construct your solution incrementally in a way that gives you something close to the ideal solution)? Here's my idea:
Sort your sets of related symbols by size, and start with the largest one. Keep those all together, because without any other criteria, we might as well say their proximity is the most important since it's the biggest set. Consider every symbol in that first set an "endpoint", an endpoint being a symbol you can rearrange and put at either end of your array without damaging your proximity rule (everything in the first set is an endpoint initially because they can be rearranged in any way). Then go through your list and as soon as one set has one or more symbols in common with the first set, connect them appropriately. The symbols that you connected to each other are no longer considered endpoints, but everything else still is. Even if a bigger set only has one symbol in common, I'm going to guess that's better than smaller sets with more symbols in common, because this way, at least the bigger set stays together as opposed to possibly being split up if it was put in the array later than smaller sets.
I would go on like this, updating the list of endpoints that existed so that you could continue making matches as you went through your set. I would keep track of if I stopped making matches, and in that case, I'd just go to the top of the list and just tack on the next biggest, unmatched set (doesn't matter if there are no more matches to be made, so go with the most valuable/biggest association). Ditch the old endpoints, since they have no matches, and then all the symbols of the set you just tacked on are the new endpoints.
This may not have a good enough runtime, I'm not sure. But hopefully it gives you some ideas.
Edit: Obviously, as part of the algorithm, ditch duplicates (trivial).
The problem as described is essentially the problem of drawing a graph in one dimension.
Using the relationships, construct a graph. Treat the unique symbols as the vertices of the graph. Place an edge between any two vertices that co-occur in a relationship; more sophisticated would be to construct a weight based on the number of relationships in which the pair of symbols co-occur.
Algorithms for drawing graphs place well-connected vertices closer to one another, which is equivalent to placing related symbols near one another. Since only an ordering is needed, the symbols can just be ranked based on their positions in the drawing.
There are a lot of algorithms for drawing graphs. In this case, I'd go with Fiedler ordering, which orders the vertices using a particular eigenvector (the Fiedler vector) of the graph Laplacian. Fiedler ordering is straightforward, effective, and optimal in a well-defined mathematical sense.
It sounds like you want to do topological sorting: http://en.wikipedia.org/wiki/Topological_sorting
Regarding the initial ordering, it seems like you are trying to enforce some kind of stability condition, but it is not really clear to me what this should be from your question. Could you try to be a bit more precise in your description?

Difference between a linear problem and a non-linear problem? Essence of Dot-Product and Kernel trick

The kernel trick maps a non-linear problem into a linear problem.
My questions are:
1. What is the main difference between a linear and a non-linear problem? What is the intuition behind the difference of these two classes of problem? And How does kernel trick helps use the linear classifiers on a non-linear problem?
2. Why is the dot product so important in the two cases?
Thanks.
When people say linear problem with respect to a classification problem, they usually mean linearly separable problem. Linearly separable means that there is some function that can separate the two classes that is a linear combination of the input variable. For example, if you have two input variables, x1 and x2, there are some numbers theta1 and theta2 such that the function theta1.x1 + theta2.x2 will be sufficient to predict the output. In two dimensions this corresponds to a straight line, in 3D it becomes a plane and in higher dimensional spaces it becomes a hyperplane.
You can get some kind of intuition about these concepts by thinking about points and lines in 2D/3D. Here's a very contrived pair of examples...
This is a plot of a linearly inseparable problem. There is no straight line that can separate the red and blue points.
However, if we give each point an extra coordinate (specifically 1 - sqrt(x*x + y*y)... I told you it was contrived), then the problem becomes linearly separable since the red and blue points can be separated by a 2-dimensional plane going through z=0.
Hopefully, these examples demonstrate part of the idea behind the kernel trick:
Mapping a problem into a space with a larger number of dimensions makes it more likely that the problem will become linearly separable.
The second idea behind the kernel trick (and the reason why it is so tricky) is that it is usually very awkward and computationally expensive to work in a very high-dimensional space. However, if an algorithm only uses the dot products between points (which you can think of as distances), then you only have to work with a matrix of scalars. You can implicitly perform the calculations in the higher-dimensional space without ever actually having to do the mapping or handle the higher-dimensional data.
Many classifiers, among them the linear Support Vector Machine (SVM), can only solve problems that are linearly separable, i.e. where the points belonging to class 1 can be separated from the points belonging to class 2 by a hyperplane.
In many cases, a problem that is not linearly separable can be solved by applying a transform phi() to the data points; this transform is said to transform the points to feature space. The hope is that, in feature space, the points will be linearly separable. (Note: This is not the kernel trick yet... stay tuned.)
It can be shown that, the higher the dimension of the feature space, the greater the number of problems that are linearly separable in that space. Therefore, one would ideally want the feature space to be as high-dimensional as possible.
Unfortunately, as the dimension of feature space increases, so does the amount of computation required. This is where the kernel trick comes in. Many machine learning algorithms (among them the SVM) can be formulated in such a way that the only operation they perform on the data points is a scalar product between two data points. (I will denote a scalar product between x1 and x2 by <x1, x2>.)
If we transform our points to feature space, the scalar product now looks like this:
<phi(x1), phi(x2)>
The key insight is that there exists a class of functions called kernels that can be used to optimize the computation of this scalar product. A kernel is a function K(x1, x2) that has the property that
K(x1, x2) = <phi(x1), phi(x2)>
for some function phi(). In other words: We can evaluate the scalar product in the low-dimensional data space (where x1 and x2 "live") without having to transform to the high-dimensional feature space (where phi(x1) and phi(x2) "live") -- but we still get the benefits of transforming to the high-dimensional feature space. This is called the kernel trick.
Many popular kernels, such as the Gaussian kernel, actually correspond to a transform phi() that transforms into an infinte-dimensional feature space. The kernel trick allows us to compute scalar products in this space without having to represent points in this space explicitly (which, obviously, is impossible on computers with finite amounts of memory).
The main difference (for practical purposes) is: A linear problem either does have a solution (and then it's easily found), or you get a definite answer that there is no solution at all. You do know this much, before you even know the problem at all. As long as it's linear, you'll get an answer; quickly.
The intuition beheind this is the fact that if you have two straight lines in some space, it's pretty easy to see whether they intersect or not, and if they do, it's easy to know where.
If the problem is not linear -- well, it can be anything, and you know just about nothing.
The dot product of two vectors just means the following: The sum of the products of the corresponding elements. So if your problem is
c1 * x1 + c2 * x2 + c3 * x3 = 0
(where you usually know the coefficients c, and you're looking for the variables x), the left hand side is the dot product of the vectors (c1,c2,c3) and (x1,x2,x3).
The above equation is (pretty much) the very defintion of a linear problem, so there's your connection between the dot product and linear problems.
Linear equations are homogeneous, and superposition applies. You can create solutions using combinations of other known solutions; this is one reason why Fourier transforms work so well. Non-linear equations are not homogeneous, and superposition does not apply. Non-linear equations usually have to be solved numerically using iterative, incremental techniques.
I'm not sure how to express the importance of the dot product, but it does take two vectors and returns a scalar. Certainly a solution to a scalar equation is less work than solving a vector or higher-order tensor equation, simply because there are fewer components to deal with.
My intuition in this matter is based more on physics, so I'm having a hard time translating to AI.
I think following link also useful ...
http://www.simafore.com/blog/bid/113227/How-support-vector-machines-use-kernel-functions-to-classify-data

Resources