Why this RethinkDB query
r.polygon(r.point(0,0), r.point(0,10), r.point(10,10), r.point(10,0)).includes(r.point(10,10))
returns true and this query
r.polygon(r.point(0,0), r.point(0,10), r.point(10,10), r.point(10,0)).includes(r.point(10,9))
returns false?
This is most likely a floating-point rounding artifact either in the values themselves or in the intersection algorithm. In general intersection and inclusions aren't well defined for points exactly on the outside lines of a polygon. It's like using equality comparison for limited-precision floating point numbers.
I you replace the longitude of the point by a number minimally smaller than 10, you should be getting stable results.
Related
I know that it is possible to get exact comparison results without constructing the objects exactly by using exact_predicate_inexact_constructions kernel in CGAL. The thing that I wonder is upto which degree I can use the inexact objects to get an exact result. To be more clear, let me give an example: I want to compare two distances, distance between the points p and q1 and distance between the points p and q2. The problem is, both q1 and q2 are not computed yet, they will be found after a few intersection/projection operations. Moreover, for those geometrical operations, there is need to do some vector/direction computations. At the final, I do not need to know none of those interstep objects (none of the vectors, intersection lines, even q1 and q2). The only thing I want to reach is which distance is less than the other. If I define each object with the above kernel and find each interstep object with projection/intersection operation of that kernel and call the comparison function of the same kernel also, would the result that I reach be exact?
I am trying to analyze how floating-point computation becomes more inaccurate when the data size decreases. In order to do that, I wanted to perform simple matrix operations on different variations of floating point representation, such as float64, float32, and float16. Since float64 computation will give the most precise and accurate result out of the three, I assume all float64 computation to give the expected result (i.e., error = 0).
The issue is that when I compare the calculated result with the expected result, I don't have an exact idea of how to quantify all the individual errors that I get into a single metric. I know about certain ways to go about it, such as finding the error mean, or the sum of square of errors (SSE), but I just wanted to know if there was a standard way of calculating the overall error of a given matrix computation.
Perhaps a variant of the condition number can be helpful? See here: https://en.wikipedia.org/wiki/Condition_number#Matrices
if there was a standard way of calculating the overall error of a given matrix computation.
Consider the case when a matrix is size 1. Then we are in a familiar 1 dimension domain.
How to compare y_computed_as_float vs y_expected? Even in this case, there is not a standard of how these should compare as floating point numbers. Subtract? Divide? It is often context sensitive. So "no" to OP's question.
Yet there are common practices. So a potential "yes" to OP question for select cases.
Floating point computations are often assessed by the difference between computed and math expected values scaled by the Unit in the last place*.
error = (y_computed_as_float - y_expected)/ulpf((float) y_expected);
For an N dimension matrix, the matrix error could use a root mean square of the N2 element errors.
* Scaling by ULP has some issues near each power of 2 and more near 0.0. There are ways to mitigate that, but we a getting into the weeds.
I am looking for a hash function that will assign a scalar value for a small binary matrix(7x7). I want it to give different values for 2 different matrices unless one matrix is a 90 degree,180 degree or 270 degree rotation of the other one.
Do you have any suggestions on how I could do this? I was expecting to find a method in image processing as this would be equivalent to a 7x7 binary image but I could not find anything.
Converting my comment to an answer:
If you're trying to find a way to test if two objects are equivalent after doing some sort of transformation, it often helps to pick a single "canonical form" for the object that can easily be computed. In your case, it would probably help a lot to pick a single rotation of the matrix as the "canonical" rotation and compare things that way. One simple option would be to pick the lexicographically first matrix out of all the rotations possible, then use that.
I've got a MongoDB with about 1 million documents in it. These documents all have a string that represents a 256 bit bin of 1s and 0s, like:
0110101010101010110101010101
Ideally, I'd like to query for near binary matches. This means, if the two documents have the following numbers. Yes, this is Hamming Distance.
This is NOT currently supported in Mongo. So, I'm forced to do it in the application layer.
So, given this, I am trying to find a way to avoid having to do individual Hamming distance comparisons between the documents. that makes the time to do this basically impossible.
I have a LOT of RAM. And, in ruby, there seems to be a great gem (algorithms) that can create a number of trees, none of which I can seem to make work (yet) that would reduce the number of queries I'd need to make.
Ideally, I'd like to make 1 million queries, find the near duplicate strings, and be able to update them to reflect that.
Anyone's thoughts would be appreciated.
I ended up doing a retrieval of all the documents into memory.. (subset with the id and the string).
Then, I used a BK Tree to compare the strings.
The Hamming distance defines a metric space, so you could use the O(n log n) algorithm to find the closest pair of points, which is of the typical divide-and-conquer nature.
You can then apply this repeatedly until you have "enough" pairs.
Edit: I see now that Wikipedia doesn't actually give the algorithm, so here is one description.
Edit 2: The algorithm can be modified to give up if there are no pairs at distance less than n. For the case of the Hamming distance: simply count the level of recursion you are in. If you haven't found something at level n in any branch, then give up (in other words, never enter n + 1). If you are using a metric where splitting on one dimension doesn't always yield a distance of 1, you need to adjust the level of recursion where you give up.
As far as I could understand, you have an input string X and you want to query the database for a document containing string field b such that Hamming distance between X and document.b is less than some small number d.
You can do this in linear time, just by scanning all of your N=1M documents and calculating the distance (which takes small fixed time per document). Since you only want documents with distance smaller than d, you can give up comparison after d unmatched characters; you only need to compare all 256 characters if most of them match.
You can try to scan fewer than N documents, that is, to get better than linear time.
Let ones(s) be the number of 1s in string s. For each document, store ones(document.b) as a new indexed field ones_count. Then you can only query documents where number of ones is close enough to ones(X), specifically, ones(X) - d <= document.ones_count <= ones(X) + d. The Mongo index should kick in here.
If you want to find all close enough pairs in the set, see #Philippe's answer.
This sounds like an algorithmic problem of some sort. You could try comparing those with a similar number of 1 or 0 bits first, then work down through the list from there. Those that are identical will, of course, come out on top. I don't think having tons of RAM will help here.
You could also try and work with smaller chunks. Instead of dealing with 256 bit sequences, could you treat that as 32 8-bit sequences? 16 16-bit sequences? At that point you can compute differences in a lookup table and use that as a sort of index.
Depending on how "different" you care to match on, you could just permute changes on the source binary value and do a keyed search to find the others that match.
Say I have a matrix of ones and zeros, and I would like a 'identifier' for this matrix that takes the same value regardless of whether the matrix is rotated by 90, 180, or 270 degrees, i.e. a 4-to-1 mapping. Ideally, this identifier should be 1/4 the size of the matrix. Is it possible to write a function that does this mapping?
Background: I was looking at this problem on the UVa problem set. I don't exactly need such a function to solve the problem, but it seems reasonable that it would exist, and using it would make for a more elegant solution.
Yes. You can take your original matrix A, and rotate it to all the possible configurations A', A'' and A'''. You can then sort these using some sorting of your choosing (just be consistent) , pick the first, and hash that using any hash function of your choosing (again, the actual hash function doesn't matter, just be consistent).
Obviously this can be optimized heavily by not actually doing the full rotation and sorting - you can do the comparisons lazily, stopping as soon as you know which rotation sorts first - but the principle is the same.
You can just bit XOR all the rotations, that will be a symmetric identifier.