Related
I am interesting in finding the diameter of two points sets, in 128 dimensions. The first has 10000 points and the second 1000000. For that reason I would like to do something better than the naive approach which takes O(n²). The algorithm will be able to handle any number of points and dimensions, but I am currently very interested in these two particular data sets.
I am very interesting in gaining speed over accuracy, thus, based on this, I would find the (approximate) bounding box of the point set, by computing the min and max value per coordinate, thus O(n*d) time. Then, if I find the diameter of this box, the problem is solved.
In the 3d case, I could find the diameter of the one side, since I know the two edges and then, I could apply the Pythagorean theorem on the other, which is vertical to this side. I am not sure for this however and for sure, I can't see how to generalize it to d dimensions.
An interesting answer can be found here, but it seems to be specific for 3 dimensions and I want a method for d dimensions.
Interesting paper: On computing the diameter of a point set in high dimensional Euclidean space. Link. However, implementing the algorithm seems too much for me in this phase.
The classic 2-approximation algorithm for this problem, with running time O(nd), is to choose an arbitrary point and then return the maximum distance to another point. The diameter is no smaller than this value and no larger than twice this value.
I would like to add a comment, but not enough reputation for that...
I just want to warn other readers that the "bounding box" solution is very inaccurate. Take for example the Euclidean ball of radius one. This set has diameter two, but its bounding box is [-1, 1]^d, which has diameter twice the square root of d. For d = 128, this is already a very bad approximation.
For a crude estimate, I would stay with David Eisenstat's answer.
There is a precision based algorithm which performs very well on any dimension, which is based on computing the dimension of an axial bounding box.
The idea is that it's possible to find the lower and upper boundaries of the axis bounding box length function since it's partial derivatives are limited, and depend on the angle between the axises.
The limit of the local maxima derivatives between two axises in 2d space can be computed as:
sin(a/2)*(1 + tan(a/2))
That means that, for example, for 90deg between axises the boundary is 1.42 (sqrt(2))
Which reduces to a/2 when a => 0, so the upper boundary is proportional to the angle.
For a multidimensional case the formula varies slightly, but still it's easy to compute.
So, the search of local minima convolves in logarithmic time.
The good news is that we can run the search of such local maxima in parallel.
Also, we can filter out both the regions of the search based on the best achieved result so far, as well as the points themselves, which are belo the lower limit of the search in the worst region.
The worst case of the algorithm is where all of the points are placed on the surface of a sphere.
This can be firther improved: when we detect a local search which operates on just few points, we swap to bruteforce for this particular axis. It works fast, because we need only the points which are subject to that particular local search, which can be determined as points actually bound by two opposite spherical cones of a particular angle sharing the same axis.
It's hard to figure out the big O notation, because it depends on desired precision and the distribution of points (bad when most of the points are on a sphere's surface).
The algorithm i use is here:
Set the initial angle a = pi/2.
Take one axis for each dimension. The angle and the axises form the initial 'bucket'
For each axis, compute the span on that axis by projecting all the points onto the axis, and finding min and max of the coordinates on the axis.
Compute the upper and lower bounds of the diameter which is interesting. It's based on the formula: sin(a/2)*(1 + tan(a/2)) and multiplied by assimetry cooficient, computed from the length of the current axis projections.
For the next step, kill all of the points which fall under the lower bound in each dimension at the same time.
For each exis, If the amount of points above the upper bound is less then some reasonable amount (experimentally computed) then compute using a bruteforce (N^2) on the set of the points in question, and adjust the lower bound, and kill the axis for the next step.
For the next step, Kill all of the axises, which have all of their points under the lower bound.
If the precision is satisfactory (upper bound - lower bound) < epsilon, then return the upper bound as the result.
For all of the survived axises, there is a virtual cone on that axis (actually, the two opposite cones), which covers some area on a virtual sphere which encloses a face of the cube. If i'm not mistaken, it's angle would be a * sqrt(2). Set the new angle to a / sqrt(2). Create a whole bucket of new axises (2 * number of dimensions), so the new cone areas would cover the initial cone area. It's the hard part for me, as i have not enough imagination for n>3-dimensional case.
Continue from step (3).
You can paralellize the procedure, synchronizing the limits computed so far for the points from (5) through (7).
I'm going to summarize the algorithm proposed by Timothy Shields.
Pick random point x.
Pick point y furthest from x.
If not done, let x = y, and go to step 2
The more times you repeat, the more accurate the result will be... ??
EDIT: actually this algorithm is not very good. Think about a 2D rectangle with vertices ABCD. There are two maxima: between AC and BD, which are separated by a sizable valley. This algorithm will get stuck at one or the other 50/50. If AC is slightly larger than BD, you'll be getting the wrong answer 50% of the time no matter how many times you iterate. Other regular polygons have the same issue, and in higher dimensions it is even worse.
I would like to write a small program simulating many particle collisions, starting first in 2D (I would extend it to 3D later on), to (in 3D) simulate the convergence towards the Boltzmann distribution and also to see how the distribution evolves in 2D.
I have not yet started programming, so please don't ask for code samples, it is a rather general question that should help me get started. There is no problem for me with the physics behind this problem, it is rather the fact that I will have to simulate at least 200-500 particles, to achieve a pretty good speed distribution. And I would like to do that in real time.
Now, for every time step, I would update first the position of all the particles and then check for collisions, to update the new velocity vector. That, however, includes a lot of checkings, since I would have to see if every single particle undergoes a collision with every other particle.
I found this post to more or less the same problem and the approach used there was also the only one I can think of. I am afraid, however, that this will not work very well in real time, because it would involve too many collision checks.
So now: Even if this approach will work performance wise (getting say 40fps), can anybody think of a way to avoid unnecessary collision checks?
My own idea was splitting up the board (or in 3D: space) into squares (cubes) that have dimensions at least of the diameters of the particles and implement a way of only checking for collisions if the centres of two particles are within adjecent squares in the grid...
I would be happy to hear more ideas, as I would like to increase the amount of particles as much as I can and still have a real time calculation/simulation going on.
Edit: All collisions are purely elastic collisions without any other forces doing work on the particles. The initial situation I will implement to be determined by some variables chosen by the user to choose random starting positions and velocities.
Edit2: I found a good and very helpful paper on the simulation of particle collision here. Hopefully it might help some People that are interested in more depth.
If you think of it, particles moving on a plan are really a 3D system where the three dimensions are x, y and time (t).
Let's say a "time step" goes from t0 to t1. For each particle, you create a 3D line segment going from P0(x0, y0, t0) to P1(x1, y1, t1) based on current particle position, velocity and direction.
Partition the 3D space in a 3D grid, and link each 3D line segments to the cells it cross.
Now, each grid cell should be checked. If it's linked to 0 or 1 segments, it need no further check (mark it as checked). If it contains 2 or more segments, you need to check for collision between them: compute the 3D collision point Pt, shorten the two segments to end at this point (and remove link to cells they doesn't cross anymore), create two new segments going from Pt to newly computed P1 points according to the new direction/velocity of particles. Add these new line segments to the grid and mark the cell as checked. Adding a line segment to the grid turn all crossed cells to unchecked state.
When there is no more unchecked cells in your grid, you've resolved your time step.
EDIT
For 3D particles, adapt above solution to 4D.
Octrees are a nice form of 3D space partitioning grid in this case, as you can "bubble up" checked/unnchecked status to quickly find cells requiring attention.
A good high level example of spatial division is to think about the game of pong, and detecting collisions between the ball and a paddle.
Say the paddle is in the top left corner of the screen, and the ball is near the bottom left corner of the screen...
--------------------
|▌ |
| |
| |
| ○ |
--------------------
It's not necessary to check for collision each time the ball moves. Instead, split the playing field into two right down the middle. Is the ball in the left hand side of the field? (simple point inside rectangle algorithm)
Left Right
|
---------|----------
|▌ | |
| | |
| | |
| ○ | |
---------|----------
|
If the answer is yes, split the the left hand side again, this time horizontally so we have a top left and a bottom left partition.
Left Right
|
---------|----------
|▌ | |
| | |
----------- |
| | |
| ○ | |
---------|----------
|
Is this ball in the same top-left corner of the screen as the paddle? If not, no need to check for collision! Only objects which reside in the same partition need to be tested for collision with each other. By doing a series of simple (and cheap) point inside rectangle checks, you can easily save yourself from doing a more expensive shape/geometry collision check.
You can continue splitting the space down into smaller and smaller chunks until an object spans two partitions. This is the basic principle behind BSP (a technique pioneered in early 3D games like Quake). There is a whole bunch of theory on the web about spatial partitioning in 2 and 3 dimensions.
http://en.wikipedia.org/wiki/Space_partitioning
In 2 dimensions you would often use a BSP or quadtree. In 3 dimensions you would often use an octree. However the underlying principle remains the same.
You can think along the line of 'divide and conquer'. The idea is to identify orthogonal parameters with don't impact each other. e.g. one can think of splitting momentum component along 2 axis in case of 2D (3 axis in 3D) and compute collision/position independently. Another way to to identify such parameters can be grouping of particles which are moving perpendicular to each other. So even if they impact, net momentum along those lines doesn't change.
I agree above doesn't fully answer your question, but it conveys a fundamental idea which you may find useful here.
Let us say that at time t, for each particle, you have:
P position
V speed
and a N*(N-1)/2 array of information between particle A(i) and A(j) where i < j; you use symmetry to evaluate an upper triangular matrix instead of a full N*(N-1) grid.
MAT[i][j] = { dx, dy, dz, sx, sy, sz }.
which means that in respect to particle j, particle j has distance made up of three components dx, dy and dz; and a delta-vee multipled by dt which is sx, sy, sz.
To move to instant t+dt you tentatively update the positions of all particles based on their speed
px[i] += dx[i] // px,py,pz make up vector P; dx,dy,dz is vector V premultiplied by dt
py[i] += dy[i] // Which means that we could use "particle 0" as a fixed origin
pz[i] += dz[i] // except you can't collide with the origin, since it's virtual
Then you check the whole N*(N-1)/2 array and tentatively calculate the new relative distance between every couple of particles.
dx1 = dx + sx
dy1 = dy + sy
dz1 = dz + sz
DN = dx1*dx1+dy1*dy1+dz1*dz1 # This is the new distance
If DN < D^2 with D diameter of the particle, you have had a collision in the dt just past.
You then calculate exactly where this happened, i.e. you calculate the exact d't of collision, which you can do from the old distance squared D2 (dx*dx+dy*dy+dz*dz) and the new DN: it's
d't = [(SQRT(D2)-D)/(SQRT(D2)-SQRT(DN))]*dt
(Time needed to reduce distance from SQRT(D2) to D, at a speed that covers the distance SQRT(D2)-SQRT(DN) in time dt). This makes the hypothesis that particle j, seen from the refrence frame of particle i, hasn't "overshooted".
It is a more hefty calculation, but you only need it when you get a collision.
Knowing d't, and d"t = dt-d't, you can repeat the position calculation on Pi and Pj using dx*d't/dt etc. and obtain the exact position P of particles i and j at the instant of collision; you update speeds, then integrate it for the remaining d"t and get the positions at the end of time dt.
Note that if we stopped here this method would break if a three-particle collision took place, and would only handle two-particle collisions.
So instead of running the calculations we just mark that a collision occurred at d't for particles (i,j), and at the end of the run, we save the minimum d't at which a collision occurred, and between whom.
I.e., say we check particles 25 and 110 and find a collision at 0.7 dt; then we find a collision between 110 and 139 at 0.3 dt. There are no collisions earlier than 0.3 dt.
We enter collision updating phase, and "collide" 110 and 139 and update their position and speed. Then repeat the 2*(N-2) calculations for each (i, 110) and (i, 139).
We will discover that there probably still is a collision with particle 25, but now at 0.5 dt, and maybe, say, another between 139 and 80 at 0.9 dt. 0.5 dt is the new minimum, so we repeat collision calculation between 25 and 110, and repeat, suffering a slight "slow down" in the algorithm for each collision.
Thus implemented, the only risk now is that of "ghost collisions", i.e., a particle is at D > diameter from a target at time t-dt, and is at D > diameter on the other side at time t.
This you can only avoid by choosing a dt so that no particle ever travels more than half its own diameter in any given dt. Actually, you might use an adaptive dt based on the speed of the fastest particle. Ghost glancing collisions are still possible; a further refinement is to reduce dt based on the nearest distance between any two particles.
This way, it is true that the algorithm slows down considerably in the vicinity of a collision, but it speeds up enormously when collisions aren't likely. If the minimum distance (which we calculate at almost no cost during the loop) between two particles is such that the fastest particle (which also we find out at almost no cost) can't cover it in less than fifty dts, that's a 4900% speed increase right there.
Anyway, in the no-collision generic case we have now done five sums (actually more like thirty-four due to array indexing), three products and several assignments for every particle couple. If we include the (k,k) couple to take into account the particle update itself, we have a good approximation of the cost so far.
This method has the advantage of being O(N^2) - it scales with the number of particles - instead of being O(M^3) - scaling with the volume of space involved.
I'd expect a C program on a modern processor to be able to manage in real time a number of particles in the order of the tens of thousands.
P.S.: this is actually very similar to Nicolas Repiquet's approach, including the necessity of slowing down in the 4D vicinity of multiple collisions.
Until a collision between two particles (or between a particle and a wall), happens, the integration is trivial. The approach here is to calculate the time of the first collision, integrate until then, then calculate the time of the second collision and so on. Let's define tw[i] as the time the ith particle takes to hit the first wall. It is quite easy to calculate, although you must take into account the diameter of the sphere.
The calculation of the time tc[i,j] of the collision between two particles i and j takes a little more time, and follows from the study in time of their distance d:
d^2=Δx(t)^2+Δy(t)^2+Δz(t)^2
We study if there exists t positive such that d^2=D^2, being D the diameter of the particles(or the sum of the two radii of the particles, if you want them different). Now, consider the first term of the sum at the RHS,
Δx(t)^2=(x[i](t)-x[j](t))^2=
Δx(t)^2=(x[i](t0)-x[j](t0)+(u[i]-u[j])t)^2=
Δx(t)^2=(x[i](t0)-x[j](t0))^2+2(x[i](t0)-x[j](t0))(u[i]-u[j])t + (u[i]-u[j])^2t^2
where the new terms appearing define the law of motion of the two particles for the x coordinate,
x[i](t)=x[i](t0)+u[i]t
x[j](t)=x[j](t0)+u[j]t
and t0 is the time of the initial configuration. Let then (u[i],v[i],w[i]) be the three components of velocities of the i-th particle. Doing the same for the other three coordinates and summing up, we get to a 2nd order polynomial equation in t,
at^2+2bt+c=0,
where
a=(u[i]-u[j])^2+(v[i]-v[j])^2+(w[i]-w[j])^2
b=(x[i](t0)-x[j](t0))(u[i]-u[j]) + (y[i](t0)-y[j](t0))(v[i]-v[j]) + (z[i](t0)-z[j](t0))(w[i]-w[j])
c=(x[i](t0)-x[j](t0))^2 + (y[i](t0)-y[j](t0))^2 + (z[i](t0)-z[j](t0))^2-D^2
Now, there are many criteria to evaluate the existence of a real solution, etc... You can evaluate that later if you want to optimize it. In any case you get tc[i,j], and if it is complex or negative you set it to plus infinity. To speed up, remember that tc[i,j] is symmetric, and you also want to set tc[i,i] to infinity for convenience.
Then you take the minimum tmin of the array tw and of the matrix tc, and integrate in time for the time tmin.
You now subtract tmin to all elements of tw and of tc.
In case of an elastic collision with the wall of the i-th particle, you just flip the velocity of that particle, and recalculate only tw[i] and tc[i,k] for every other k.
In case of a collision between two particles, you recalculate tw[i],tw[j] and tc[i,k],tc[j,k] for every other k. The evaluation of an elastic collision in 3D is not trivial, maybe you can use this
http://www.atmos.illinois.edu/courses/atmos100/userdocs/3Dcollisions.html
About how does the process scale, you have an initial overhead that is O(n^2). Then the integration between two timesteps is O(n), and hitting the wall or a collision requires O(n) recalculation. But what really matters is how the average time between collisions scales with n. And there should be an answer somewhere in statistical physics for this :-)
Don't forget to add further intermediate timesteps if you want to plot a property against time.
You can define a repulsive force between particles, proportional to 1/(distance squared). At each iteration, calculate all the forces between particle pairs, add all the forces acting on each particle, calculate the particle acceleration, then particle velocity and finally the particle new position. Collisions will be handled naturally in this way. But dealing with interactions between particles and walls is another problem and must be handled in other way.
Suppose to have a GPS trajectory - i.e.: a series of spatio-temporal coords, every coord is a (x,y,t) information, where x is longitude, y is latitude and t is the time stamp.
Suppose each trajectory identified by 1000 (x,y) points, a compressed trajectory is trajectory with fewer points than the original, for instance 300 points. A compression algorithm (Douglas-Peucker, Bellman, etc) decide what points will be in compressed trajectory and what point will be discarded.
Each algorithm make his own choice. Better algorithms choice the points not only by spatial characteristics (x, y) but using spatio-temporal characteristics (x,y,t).
Now I need a way to compare two compressed trajectories against the original to understand what compression algorithm better reduce a spatio-temporal (temporal component is really important) trajectory.
I've thinked of DTW algorithm to check trajectory similarity, but this probably don't care about temporal component. What algorithm can I use to make this control?
What is the best compression algorithm depends to a large extent on what you are trying to achieve with it, and is dependent on other external variables. Typically, were going to identify and remove spikes, and then remove redundant data. For example;
Known minimum and maximum velocity, acceleration, and ability to tuen will let you remove spikes. If we look at the join distance between a pair of points divided by the time where
velocity = sqrt((xb - xa)^2 + (yb - ya))/(tb-ta)
we can eliminate points where the distance couldn't be travelled in the elapsed time given the speed constraint. We can do the same with acceleration constraints, and change in direction constraints for a given velocity. These constraints change whether the GPS receiver is static, hand held, in a car, in an aeroplane etc...
We can remove redundant points using a moving window looking at three points, where if the an interpolated (x,y,t) for middle point can be compared with an observed point, and the observed point removed if it lies within a specified distance + time tolerance of the interpolated point. We can also curve fit the data and consider the distance to the curve rather than using a moving 3 point window.
The compression may also have differing goals based on the constraints given, e.g. to simply reduce the data size by removing redundant observations and spikes, or to smooth the data as well.
For the former, after checking for spikes based on defined constraints, we simply check the 3d distance of each point to the polyline connecting the compressed points. This is achieved by finding the pair of points before and after the point that has been removed, interpolating a position on the line connecting those points based on the observed time, and comparing the interpolated position with the observed position. The amount of points removed will increase as we allow this distance tolerance to increase.
For the latter we also have to consider how well the smoothed result models the data, the weights imposed by the constraints, and the design shape / curve parameters.
Hope this makes some sense.
Maybe you could use mean square distance between trajectories over time.
Probably, simply looking at distance at time 1s,2s,... will be enough, but you can also do it more precise between time stamps integrating, (x1(t)-x2(t))^2 + (y1(t)-y2(t))^2. Note that between 2 time stamps both trajectories will be straight line.
I've found what I need to compute spatio-temporal error.
As written in paper "Compression and Mining of GPS Trace Data:
New Techniques and Applications" by Lawson, Ravi & Hwang:
Synchronized Euclidean distance (sed) measures the distance between
two points at identical time stamps. In Figure 1, five time steps (t1
through t5) are shown. The simplified line (which can be thought of as
the compressed representation of the trace) is comprised of only two
points (P't1 and P't5); thereby, it does not include points P't2, P't3
and P't4. To quantify the error introduced by these missing points,
distance is measured at the identical time steps. Since three points
were removed between P't1 and P't5, the line is divided into four
equal sized line segments using the three points P't2, P't3 and P't4
for the purposes of measuring the error. The total error is measured
as the sum of the distance between all points at the synchronized time
instants, as shown below. (In the following expression, n represents
the total number of points considered.)
I'm looking for an algorithm that would move a point around an arbitrary closed polygon that is not self-crossing in N time. For example, move a point smoothly around a circle in 3 seconds.
1) Calculate the polygon's perimeter (or estimate it if exact time of circling
is not critical)
2) divide the perimieter by the time desired for circling
3) move point around polygon at this speed.
Edit following ire and curses' comment.
Maybe I read the question too literally, I fail to see the difficulty or the points raised by ire_and_curses.
The following describes more specifically the logic I imagine for step #3. A more exact description would require knowing details about the coordinate system, the structure used to describe the polygon, and indication abou the desired/allowed animation refreshing frequency.
The "travellng point" which goes around the polygon would start on any edge of the polygop (maybe on a vertex, as so to make the start and end point more obvious) and would stay on an edge at all time.
From this starting point, be it predetermined or randomly selected), the traveling point would move towards towards a vertex, at the calculated speed. Once there it would go towards the other vertex of the edge it just arrived to, and proceed till it returns to the starting point.
The equations for calculating the points on a given edge are the same that for tracing a polygon: simple trig or even pythagoras (*). The visual effect is based on refreshing the position of the traveling point at least 15 times or so per second. The refresh frequency (or rather its period) can be used to determine the distance of two consecutive points of the animation.
The only less trivial task is to detect the end of a given edge, i.e. when the traveling point needs to "turn" to follow the next edge. On these occasions, a fractional travel distance needs to be computed so that the next point in the animation is on the next edge. Special mention also for extremely short edges, as these may require the fractional distance logic to be repeated (or done differently).
Sorry for such a verbose explanation for a rather straight forward literally ;-) algorithm...
Correction: as pointed out by Jefromi in comment for other response, all that is needed with regard to the tracing is merely to decompose the x and y components of the motion. Although we do need Pythagoras for calculating the distance between each vertex for the perimeter calculation, and we do need to extrapolate because the number of animation steps on an edge is not [necessarily] a integer.
For the record, a circle is not a polygon--it's the limit of a regular polygon as the number of sides go to infinity, but it's not a polygon. What I'm giving you isn't going to work if you don't have defined points.
Assuming you have your polygon stored in some format like a list of adjacent vertices, do a O(n) check to calculate the perimeter by iterating through them and computing the distance between each point. Divide that by the time to get the velocity that you should travel.
Now, if you want to compute the path, iterate back through your vertices again and calculate from your current position where your actual position should be on the next timestep, whatever your refresh time step may be (if you need to move down a different edge, calculate how much time it would take to get to the end of your first edge, then continue on from there..). To travel along the edge, decompose your velocity vector into its components (because you know the slope of the edge from its two endpoints).
A little code might answer this with fewer words (though I'm probably too late for any votes here). Below is Python code that moves a point around a polygon at constant speed.
from turtle import *
import numpy as nx
import time
total_time = 10. # time in seconds
step_time = .02 # time for graphics to make each step
# define the polygone by the corner points
# repeat the start point for a closed polygon
p = nx.array([[0.,0.], [0.,200.], [50.,150.], [200.,200.], [200.,0.], [0.,0.]])
perim = sum([nx.sqrt(sum((p[i]-p[i+1])**2)) for i in range(len(p)-1)])
distance_per_step = (step_time/total_time)*perim
seg_start = p[0] # segment start point
goto(seg_start[0], seg_start[1]) # start the graphic at this point
for i in range(len(p)-1):
seg_end = p[i+1] # final point on the segment
seg_len = nx.sqrt(sum((seg_start-seg_end)**2))
n_steps_this_segment = int(seg_len/distance_per_step)
step = (seg_end-seg_start)/n_steps_this_segment # the vector step
#
last_point = seg_start
for i in range(n_steps_this_segment):
x = last_point + step
goto(x[0], x[1])
last_point = x
time.sleep(step_time)
seg_start = seg_end
Here I calculated the step size from the step_time (anticipating an graphics delay) but one could calculate the step size, from whatever was needed, for example, the desired speed.
my question might be a little strange. I've "developed" an algorithm and don't know if there's a similar algorithm already out there.
The situation: I've got a track defined by track points (2D). The track points represent turns for instance. Between the track points there are only straight lines. Now I'm given a set of coordinates in this 2D space. I calculate the distance from the first track point to the new coordinates and the distance for the interval for the first two track points. If the distance to the measured coordinates is shorter than the distance from the first to the second track point, I'm assuming that this point lies in between this interval. I then do a linear interpolation on that. If it's bigger, I'll check with the next interval.
So it's basically taking interval distances and trying to fit them in there. I'm trying to track an object moving approximately along this track.
Does this sound familiar to somebody? Can somebody come up with a suggestion for a similiar existing algorithm?
EDIT: From what I've stated so far, I want to clarify that a position is not multiply associated to track points. Consider the fine ASCII drawing Jonathan made:
The X position is found to be within Segment 1 and 2 (S12). Now the next position is Y, which is not to be considered close enough to be on S12. I'll move on to S23, and check if it's in.
If it's in, I won't be checking S12 for any other value, because I found one in the next segment already. The algorithm "doesn't look back".
But if it doesn't find the right segment from there on, because it happenend to be to far away from the first segment, but still further away from any other segment anyhow, I will drop the value and the next position will be looked for back in S12, again.
The loop still remains a problem. Consider I get Y for S23 and then skip two or three positions (as they are too far off), I might be losing track. I could determine one position in S34 where it would be already in S56.
Maybe I can come up with some average speed to vage tell in what segment it should be.
It seems the bigger the segments are, the bigger the chance to make a right decision.
What concerns me about the algorithm you've described is that it is 'greedy' and could choose the 'wrong' track segment (or, at least, a track segment that is not the closest to the point).
Time to push ASCII art to the limits. Consider the following path (numbers represent the sequence in the list of track points), and the coordinate X (and, later, Y).
1-------------2
|
| Y
X |
5-----+-----6
| |
| |
4-----3
How are we supposed to interpret your description?
[C]alculate the distance from the first track point to the new coordinates and the distance for the interval for the first two track points. If the distance to the measured coordinates is shorter than the distance from the first to the second track point, [assume] that this point lies in between this interval; [...] [i]f it's bigger, [...] check with the next interval.
I think the first sentence means:
Calculate the distance from TP1 (track point 1) to TP2 - call it D12.
Calculate the distance from TP1 to X (call it D1X) and from TP2 to X (call it D2X).
The tricky part is the interpretation of the conditional sentence.
My impression is that if either D1X or D2X is less than D12, then X will be assumed to be on (or closest too) the track segment TP1 to TP2 (call it segment S12).
Looking at the position of X in the diagram, it is moderately clear that both D1X and D2X are smaller than D12, so my interpretation of your algorithm would interpret X as being associated with S12, yet X is clearly closer to S23 or S56 than it is to S12 (but those are discarded without even being considered).
Have I misunderstood something about your algorithm?
Thinking about it a bit: what I've interpreted your algorithm to mean is that if the point X lies within either the circle of radius D12 centred at TP1 or the circle of radius D12 centred at TP2, then you associate X with S12. However, if we also consider point Y, the algorithm I suggest you are using would also associate it with S12.
If the algorithm is refined to say MAX(D1Y, D2Y) < D12, then it does not consider Y as being related to S12. However, X is probably still considered to be related to S12 rather than S23 or S56.
The first part of this algorithm reminds me of moving through a discretised space. An example of representing such a space is the Z-order space-filling curve. I've used this technique to represent a quadtree, the data structure for an adaptive mesh refinement code I once worked on, and used an algorithm very like the one you describe to traverse the grid and determine distances between particles.
The similarity may not be immediately obvious. Since you are only concerned about interval locations, you are effectively treating all points on the interval as equivalent in this step. This is the same as choosing a space which only has discretised points - you're effectively 'snapping' your points to a grid.