Get spiral index from location in 1d, 3d - algorithm

Based on a known {x,y,z,...} coordinate, I'm looking for the index of a location. A 2-dimensional (2d) solution, is provided here.
I'm now trying to extend this to two other dimensions: 1d and 3d (and possibly to generalize to higher dimensions).
For 1d, I ended up with the following algorithm (Matlab code), where the walk alternates between the right and left side of the axis:
n = 20; %number of values
X = -n/2:n/2; %X values (1d)
%we want 'p' the index of the location:
for i=1:numel(X)
if(X(i) > 0)
p(i) = 2*X(i)-1;
else
p(i) = -2*X(i);
end
end
resulting in the following indexes:
However, I have difficulties in vizualizing how the indexation should takes place in 3d (i.e. how the index walks through the nodes in 3d). I'm primarily interested in a C/C++ solution but any other language is fine.
EDIT
Reflecting #Spektre comments and suggestions: I aim at finding the indexes of a set of 3d coordinates {x,y,z}. This can be seen as a way to map the 3d coordinates into a set of indexes (1d). The spiral provides a convenient way to perform such a task in 2d, but cannot be extended in 3d.

well as you chose line/square/cube like coil of screws to fill your 1D/2D/3D space like this:
I would:
Implement line/square/cube maps
These will map between 1D index ix and coordinate for known "radius". It will be similar to this:
template<class T,int N> class cube_map
See the member functions: ix2dir and dir2ix which maps between index and direction vector. No need to store the surface data you just need these conversion functions. However you need to tweak them so the order of points/indexes represent the pattern you want... in 1D,2D is easy but for 3D I would chose something like surface spiral on cube similar to this:
How to distribute points evenly on the surface of hyperspheres in higher dimensions?
Do not forget to handle even and odd screws in 3D differently (mirror) so screws join at correct locations...
Just to be complete the cube map is like single screw in your 3D system it holds the surface of a cube and can convert between direction vector dir and 1D index ix back and forward. Its used to speed up algorithms in both graphics and geometry ... I think its first use was for Bump Mapping fast vector normalizations. By using cube_map you can do analogy in any dimensionality even 2D,1D easily you just use square_map and Line_map instead without any algo changes. I have tested it on OBB 2D/3D after conversion to cube/square maps the algo stayed the same just the vectors had different number of coordinates (without them the algos where very different).
create/derive equation/LUT that describes how many points which radius covers (inside screws included)
So function that returns number of points that are in coil of r screws together. It will be a series so the equation should be derivable/inferable easily. Let call this:
points = volume(r);
now conversion ix -> (x,y,z...)
first find which radius r your ix is so:
volume(r) <= ix < volume(r+1)
simple for loop will do but even better would be binary search of r. Then convert your ix into index iy in line/square/cube map:
iy = ix-volume(r);
Now just use the ix2dir(iy) function to obtain your point position...
Reverse conversion (x,y,z...) -> ix
r = max(|x|,|y|,|z|,...)
iy = dir2ix(x,y,z,...)
ix = volume(r-1) + iy

Related

Data structure and algorithms for 1D velocity model using layers?

This is for a geophysical analysis program I am creating. I already have code to do all this, but I am looking for inspirations and ideas (good datastructures and algorithms).
What I want to model:
Velocity as a function of depth (z)
The model is built up from multiple layers (<10)
Every layer is accessible by an index going from 0 for the top most layer to n for the bottom most layer
Every layer has velocity as a linear function of depth (gradient a_k and axis intercept b_k of the kth layer)
Every layer has a top and bottom depth (z_k-1 and z_k)
The model is complete, there is no space between layers. The point directly between two layers belongs to the lower layer
Requirements:
Get velocity at an arbitrary depth within the model. This will be done on the order of 1k to 10k times, so it should be well optimized.
Access to the top and bottom depths, gradients and intercepts of a layer by the layer index
What I have so far:
I have working Python code where every layer is saved as a numpy array with the values of z_k (bottom depth), z_k-1 (top depth), a_k (velocity gradient) and b_k (axis intercept). To evaluate the model at a certain depth, I get the layer index (, use that to get the parameters of the layer and pass them to a function that evaluates the linear velocity gradient.
So you have piecewise linear dependence, where z-coordinates of pieces ends go irrregular, and want to get function value at given z.
Note that there is no sense to use binary search for 10 pieces (3-4 rounds of BS might be slower than 9 simple comparisons).
But what precision have your depth queries? Note that you can store a table both for 1-meter resolution and for 1 millimeter too - only 10^7 entries provide O(1) access to any precalculated velocity value
For limited number of pieces it is possible to make long formula (involving integer division) but results perhaps should be slower.
Example for arbitrary three-pieces polyline with border points 2 and 4.5:
f = f0 + 0.2*int(z/2.0)*(z-2.0) + 0.03*int(z/4.5)*(z-4.5)

Heuristics to sort array of 2D/3D points according their mutual distance

Consider array of points in 2D,3D,(4D...) space ( e.g. nodes of unstructured mesh ). Initially the index of a point in array is not related to its position in space. In simple case, assume I already know some nearest neighbor connectivity graph.
I would like some heuristics which increase probability that two points which are close to each other in space would have similar index (would be close in array).
I understand that exact solution is very hard (perhaps similar to Travelling salesman problem ) but I don't need exact solution, just something which increase probability.
My ideas on solution:
some naive solution would be like:
1. for each point "i" compute fitness E_i given by sum of distances in array (i.e. index-wise) from its spatial neighbors (i.e. space-wise)
E_i = -Sum_k ( abs( index(i)-index(k) ) )
where "k" are spatial nearest neighbors of "i"
2. for pairs of points (i,j) which have low fitness (E_i,E_j)
try to swap them,
if fitness improves, accept
but the detailed implementation and its performance optimization is not so clear.
Other solution which does not need precomputed nearest-neighbors would be based on some Locality-sensitive_hashing
I think this could be quite common problem, and there may exist good solutions, I do not want to reinvent the wheel.
Application:
improve cache locality, considering that memory access is often bottleneck of graph-traversal
it could accelerate interpolation of unstructured grid, more specifically search for nodes which are near the smaple (e.g. centers of Radial-basis function).
I'd say space filling curves (SPC) are the standard solution to map proximity in space to a linear ordering. The most common ones are Hilbert-curves and z-curves (Morton order).
Hilbert curves have the best proximity mapping, but they are somewhat expensive to calculate. Z-ordering still has a good proximity mapping but is very easy to calculate. For z-ordering, it is sufficient to interleave the bits of each dimension. Assuming integer values, if you have a 64bit 3D point (x,y,z), the z-value is $x_0,y_0,z_0,x_1,y_1,z_1, ... x_63,y_63,z_63$, i.e. a 192 bit value consisting of the first bit of every dimension, followed by the second bit of every dimension, and so on. If your array is ordered according to that z-value, points that are close in space are usually also close in the array.
Here are example functions that interleave (merge) values into a z-value (nBitsPerValue is usually 32 or 64):
public static long[] mergeLong(final int nBitsPerValue, long[] src) {
final int DIM = src.length;
int intArrayLen = (src.length*nBitsPerValue+63) >>> 6;
long[] trg = new long[intArrayLen];
long maskSrc = 1L << (nBitsPerValue-1);
long maskTrg = 0x8000000000000000L;
int srcPos = 0;
int trgPos = 0;
for (int j = 0; j < nBitsPerValue*DIM; j++) {
if ((src[srcPos] & maskSrc) != 0) {
trg[trgPos] |= maskTrg;
} else {
trg[trgPos] &= ~maskTrg;
}
maskTrg >>>= 1;
if (maskTrg == 0) {
maskTrg = 0x8000000000000000L;
trgPos++;
}
if (++srcPos == DIM) {
srcPos = 0;
maskSrc >>>= 1;
}
}
return trg;
}
You can also interleave the bits of floating point values (if encoded with IEEE 754, as they usually are in standard computers), but this results in non-euclidean distance properties. You may have to transform negative values first, see here, section 2.3.
EDIT
Two answer the questions from the comments:
1) I understand how to make space filling curve for regular
rectangular grid. However, if I have randomly positioned floating
points, several points can map into one box. Would that algorithm work
in that case?
There are several ways to use floating point (FP) values. The simplest is to convert them to integer values by multiplying them by a large constant. For example multiply everything by 10^6 to preserve 6 digit precision.
Another way is to use the bitlevel representation of the FP value to turn it into an integer. This has the advantage that no precision is lost and you don't have to determine a multiplication constant. The disadvantage is that euclidean distance metric do not work anymore.
It works as follows: The trick is that the floating point values do not have infinite precision, but are limited to 64bit. Hence they automatically form a grid. The difference to integer values is that floating point values do not form a quadratic grid but a rectangular grid where the rectangles get bigger with growing distance from (0,0). The grid-size is determined by how much precision is available at a given point. Close to (0,0), the precision (=grid_size) is 10^-28, close to (1,1), it is 10^-16 see here. This distorted grid still has the proximity mapping, but distances are not euclidean anymore.
Here is the code to do the transformation (Java, taken from here; in C++ you can simply cast the float to int):
public static long toSortableLong(double value) {
long r = Double.doubleToRawLongBits(value);
return (r >= 0) ? r : r ^ 0x7FFFFFFFFFFFFFFFL;
}
public static double toDouble(long value) {
return Double.longBitsToDouble(value >= 0.0 ? value : value ^ 0x7FFFFFFFFFFFFFFFL);
}
These conversion preserve ordering of the converted values, i.e. for every two FP values the resulting integers have the same ordering with respect to <,>,=. The non-euclidean behaviour is caused by the exponent which is encoded in the bit-string. As mentioned above, this is also discussed here, section 2.3, however the code is slightly less optimized.
2) Is there some algorithm how to do iterative update of such space
filling curve if my points moves in space? ( i.e. without reordering
the whole array each time )
The space filling curve imposes a specific ordering, so for every set of points there is only one valid ordering. If a point is moved, it has to be reinserted at the new position determined by it's z-value.
The good news is that small movement will likely mean that a point may often stay in the same 'area' of your array. So if you really use a fixed array, you only have to shift small parts of it.
If you have a lot of moving objects and the array is to cumbersome, you may want to look into 'moving objects indexes' (MX-CIF-quadtree, etc). I personally can recommend my very own PH-Tree. It is a kind of bitwise radix-quadtree that uses a z-curve for internal ordering. It is quite efficient for updates (and other operations). However, I usually recommend it only for larger datasets, for small datasets a simple quadtree is usually good enough.
The problem you are trying to solve has meaning iff, given a point p and its NN q, then it is true that the NN of q is p.
That is not trivial, since for example the two points can represent positions in a landscape, so the one point can be high in a mountain, so going from the bottom up to mountain costs more that the other way around (from the mountain to the bottom). So, make sure you check that's not your case.
Since TilmannZ already proposed a solution, I would like to emphasize on LSH you mentioned. I would not choose that, since your points lie in a really low dimensional space, it's not even 100, so why using LSH?
I would go for CGAL's algorithm on that case, such as 2D NNS, or even a simple kd-tree. And if speed is critical, but space is not, then why not going for a quadtree (octree in 3D)? I had built one, it won't go beyond 10 dimensions in an 8GB RAM.
If however, you feel that your data may belong in a higher dimensional space in the future, then I would suggest using:
LSH from Andoni, really cool guy.
FLANN, which offers another approach.
kd-GeRaF, which is developed by me.

What exactly is the output of the SURF algorithm and how can I use them for classification (SVM, etc.)?

I am working on a project that tracks humans from aerial videos. One of the algorithms that we will use is SURF. Now I understand that SURF uses interest points, but I'm quite confused with comes after that. How exactly can I use the interest points for classification? I want to identify which detected objects in the video are humans or objects, so of course I need a training set, but what will I use? I've read somewhere that BoW should be used, but are there any other ways of extracting these SURF features? If I read the original SURF paper by Herbert Bay correctly, how the features were extracted, what the output was, and how they were prepared for classification were not mentioned.
I'm really confused. Please help. Thank you!
Let's say you have an image and you divide the image into smaller rectangular areas (called patches). Each patch is a rectangular area (x,y,width,height). Let's say you want to describe the colors inside a patch. Thus, you calculate the histogram in it and the result is a concatenation of numbers (a vector) (eg: [5 11 2 4 5]). This output vector is a description vector (a descriptor). If you use all patches to extract descriptors, the method is called dense sampling. If you say that only some of the patches are important then you use keypoints to specify which are significant and which not.
Keypoints are only points of greater significance than other points in an image.
A descriptor is a vector that encodes color/shape/texture information of a small area (patch).
Edit: The output of SURF is a cv::Mat where the first row has 64 values (L2 normalized). You can compare two L2 normalized vectors with the L2-norm (euclidean distance).
Edit2: A classifier is a different story. I suggest you study the tutorial http://docs.opencv.org/doc/tutorials/ml/introduction_to_svm/introduction_to_svm.html, while keeping in mind that every 2D-point for your case is a Descriptor of 64 values.
I am also working on an object detection project…I'm new to all of this but this might be helpful http://cs229.stanford.edu/proj2011/SchmittMcCoy-ObjectClassificationAndLocalizationUsingSURFDescriptors.pdf
sorry, I just saw that now. SURF has two parts
The interest points which are extracted using the determinant of the Hessian matrix
A description vector describing the neighbourhood of the interest points.
For classifiers you are interested in 2.
As for the output format of the original SURF implementation
(1 + length of descriptor)
number of points
x y a b c l des
x y a b c l des
...
x, y = position of interest point
a, b, c = [a b; b c] entries of second moment matrix. SURF only has circular regions, hence b = 0; a = c -> radius = 1 / a^2
l = sign of laplacian (-1 or 1). This value is very useful as it describes if the detected blob is dark on light background (-1) or light on dark background (+1)
des = descriptor vector itself. See the paper for more.
Hope that helps.

GPS Data time to distance base transformation

I am developing an application that logs a GPS trace over time.
After the trace is complete, I need to convert the time based data to distance based data, that is to say, where the original trace had a lon/lat record every second, I need to convert that into having a lon/lat record every 20 meters.
Smoothing the original data seems to be a well understood problem and I suppose I need something like a smoothing algorithm, but I'm struggling to think how to convert from a time based data set to a distance based data set.
This is an excellent question and what makes it so interesting is the data points should be assumed random. Which means you cannot expect a beginning to end data graph that represents a well behaved polynomial (like SINE or COS wave). So you will have to work in small increments such that values on your x-axis (so to speak) do not oscillate meaning Xn cannot be less than Xn-1. The next consideration would be the case of overlap or near overlap of data points. Imagine I’m recording my GPS coordinates and we have stopped to chat or rest and I walk randomly within a twenty five foot circle for the next five minutes. So the question would be how to ignore this type of “data noise”?
For simplicity let’s consider linear calculations where there is no approximation between two points; it’s a straight line. This will probably be more than sufficient for your calculations. Now given the comment above regarding random data points, you will want to traverse your data from your start point to the end point sequentially. Sequential termination occurs when you exceed the last data point or you have exceeded the overall distance to produce coordinates (like a subset). Let’s assume your plot precision is X. This would be your 20 meters. As you traverse there will be three conditions:
The distance between the two points is greater than your
precision. Therefore save the start point plus the precision X. This
will also become your new start point.
The distance between the two points is equal to your precision.
Therefore save the start point plus the precision X (or save end
point). This will also become your new start point.
The distance between the two points is less than your precision.
Therefore precision is adjusted to precision minus end point. The end
point will become your new start point.
Here is pseudo-code that might help get you started. Note, point y minus point x = distance between. And, point x plus value = new point on line between poing x and point y at distance value.
recordedPoints = received from trace;
newPlotPoints = emplty list of coordinates;
plotPrecision = 20
immedPrecision = plotPrecision;
startPoint = recordedPoints[0];
for(int i = 1; i < recordedPoints.Length – 1; i++)
{
Delta = recordedPoints[i] – startPoint;
if (immedPrecision < Delta)
{
newPlotPoints.Add(startPoint + immedPrecision);
startPoint = startPoint + immedPrecision;
immedPrecsion = plotPrecsion;
i--;
}
else if (immedPrecision = Delta)
{
newPlotPoints.Add(startPoint + immedPrecision);
startPoint = startPoint + immediatePrecision;
immedPrecision = plotPrecision;
}
else if (immedPrecision > Delta)
{
// Store last data point regardless
if (i == recordedPoints.Length - 1)
{
newPlotPoints.Add(startPoint + Delta)
}
startPoint = recordedPoints[i];
immedPrecision = Delta - immedPrecision;
}
}
Previously I mentioned "data noise". You can wrap the "if" and "else if's" in another "if" which detemines scrubs this factor. The easiest way is to ignore a data point if it has not moved a given distance. Keep in mind this magic number must be small enough such that sequentially recorded data points which are ignored don't sum to something large and valuable. So putting a limit on ignored data points might be a benefit.
With all this said, there are many ways to accurately perform this operation. One suggestion to take this subject to the next level is Interpolation. For .NET there is a open source library at http://www.mathdotnet.com. You can use their Numberics library which contains Interpolation at http://numerics.mathdotnet.com/interpolation/. If you choose such a route your next major hurdle will be deciding the appropriate Interpolation technique. If you are not a math guru here is a bit of information to get you started http://en.wikipedia.org/wiki/Interpolation. Frankly, Polynomial Interpolation using two adjacent points would be more than sufficient for your approximations provided you consider the idea of Xn is not < Xn-1 otherwise your approximation will be skewed.
The last item to note, these calculations are two-dimensional and do consider altitude (Azimuth) or the curvature of the earth. Here is some additional information in that regard: Calculate distance between two latitude-longitude points? (Haversine formula).
Never the less, hopefully this will point you in the correct direction. With no doubt this is not a trivial problem therefore keeping the data point range as small as possible while still being accurate will be to your benefit.
One other consideration might be to approximate using actual data points using the precision to disregard excessive data. Therefore you are not essentially saving two lists of coordinates.
Cheers,
Jeff

Do randomly generated 2D points clump together, and how do I stop it?

Say you have a 2D area and you want to generate random points within it, by setting
x = random() * width
y = random() * height
do the points clump around the centre of the area? I remember reading something saying they would, but I can't quite figure out why, and how to prevent it.
Yes. The fewer points you have, the more they will appear to form clusters.
To avoid this, you can use "stratified sampling". it basically means you divide your surface evenly in smaller areas and place your points in there.
For your example, you would divide the square in n*n subsquares. Each point would be placed randomly inside it's subsquare. You can even adjust the randomness factor to make the pattern more or less random/regular:
// I assume random() return a number in the range [0, 1).
float randomnessFactor = 0.5;
int n = 100;
for(int ySub=0; ySub<n; ++ySub){
for(int xSub=0; xSub<n; ++xSub){
float regularity = 0.5 * (1-randomnessFactor);
x = regularity + randomnessFactor * random() + xSub / (float) (n-1);
y = regularity + randomnessFactor * random() + ySub / (float) (n-1);
plot(x, y);
}
}
The reason this works is that you don't actually want randomness. (Clumps are random.) You want the points evenly spread, but without the regular pattern. Placing the points on a grid and offsetting them a bit hides the regularity.
Truly random points will create clusters (or clumps) - it's the effect that can cause confusion when plotting real world data (like cancer cases) and lead to people thinking that there are "hot spots" which must be caused by something.
However, you also need to be careful when generating random numbers that you don't create a new generator every time you want a new number - this will use the same seed value which will result in all the values clustering about a point.
It depends on the distribution of the random number generator. Assuming a perfectly even distribution, then the points are likely to be distributed in a reasonably uniform way.
Also, asking if they clump around the middle is pre-supposing that you don't have the ability to test this!
From my experience, randomly generated points do not clump in the center of the area since every pixel of your screen has the same probability of being selected.
While numbers generated with random() are not truely random, they will be sufficent for putting objects randomly on your screen.
If the random number generator's random() function yields a gaussian distribution, then yes.
You get a clump at the origin if you use polar coordinates instead of carthesian:
r = rand() * Radius;
phi = rand() * 2 * Pi;
The reason is that statistically, the circle r=[0,1] will contain as many points as the ring r=[1,2] even though the ring is three times larger.
Pseudorandom points won't necessarily clump "around the center" of an area, but they will tend to cluster in various random points in an area; in fact these clumps often occur more frequently than people think. A more even distribution of space is often achieved by using so-called quasirandom or low-discrepancy sequences, such as the Sobol sequence, whose Wikipedia article shows a graphic illustrating the difference between Sobol and pseudorandom sequences.
They will not clump, but will form various interesting patterns, in 2d or 3d, depending on the generator you use.

Resources