Strategies to detect and delete cluttering aggregations of GPS points? - algorithm

my problem is that I have a large set of GPS tracks from different GPS loggers used in cars. When not turned off these cheap devices log phantom movements even if standing still:
As you can see in the image above, about a thousand points get visualized in a kind of congestion. Now I want to remove all of these points so that the red track coming from the left ends before the jitter starts.
My approach is to "draw" two or three circles around each point in the track, check how many other points are located within these circles and check the ratio:
(#points / covered area) > threshold?
If the threshold exceeds a certain ratio (purple circles), I could delete all points within. So: easy method, but has huge disadvantages, e.g. computation time, deleting "innocent" tracks only passing through the circle, doesn't detect outliers like the single points at the bottom of the picture).
I am looking for a better way to detect large heaps of points like in the picture. It should not remove false positives (of perhaps 5 or 10 points, these aggregations don't matter to me). Also, it should not simplify the rest of the track!
Edit: The result in given example should look like this:

My first step would be to investigate the speeds implied by the 'movements' of your stationary car and the changes in altitude. If either of these changes too quickly or too slowly (you'll have to decide the thresholds here) then you can probably conclude that they are due to the GPS jitter.
What information, other than position at time, does your GPS device report ?
EDIT (after OP's comment)
The problem is to characterise part of the log as 'car moving' and part of the log as 'car not moving but GPS location jittering'. I suggested one approach, Benjamin suggested another. If speed doesn't discriminate accurately enough, try acceleration. Try rate of change of heading. If none of these simple approaches work, I think it's time for you to break out your stats textbooks and start figuring out autocorrelation of random processes and the like. At this point I quietly slink away ...

Similarly to High Performance Mark's answer, you could look for line intersections that happen within a short number of points. When driving on a road, the route of the last n points rarely intersects with itself, but it does in your stationary situation because of the jitter. A single intersection could be a person doubling-back or circling around a block, but multiple intersections should be rarer. The angle of intersection will also be sharper for the jitter case.

What is the data interval of the GPS Points, it seems that these are in seconds. There may be one other way to add to the logic previously mentioned.
sum_of_distance(d0,d1,d2....dn)>=80% of sum_of_distance(d0,dn)
This 0 to n th value can iterate in smaller and larger chunks, as the traveled distance within that range will not be much. So, you can iterate over may be 60 points of data initially, and within that data iterate in 10 number of data in each iteration.

Related

how to imitate water on a landscape

I have a double array that contains the ground height and the water height of each 'block' of land, and I am trying to create a function move_water() that will mutate this array so that repeated calls of the function will imitate water moving along the terrain...
My first instinct was:
For each block, look at the nearby 4 other blocks and compare water levels.
Give 1/2 of the water from the middle block to the other 4 blocks (split evenly, but only if they are lower).
This doesn't really work very well though and creates some weird wave patterns as the water level on any given block seems to oscillate between 2 values
The water simulation doesn't have to be perfect, I just want it to flow to the lowest point
Since you say it doesn't have to be perfect, updating in steps defined in terms of how much water has moved, might not be a problem - even though the amount of time it takes for half the water to move will vary according to the slope and the amount of water. It may still look odd therefore that half of a large amount of water on a steep slope takes the same amount of time as a smaller amount on a less steep slope. But your method may still have potential.
Its not clear to me though if you update one block per call or all of them for each call to move_water, I'm going to assume its not just one because that will look odd.
Assuming you process all the blocks, your rule will give different results depending on the order you process the blocks. If you just process them in order of increasing x coordinate, I can imagine why you might see unnatural waves (A lower block can gain from another block, then give to another block then gain again). If on the other hand you processed the highest points first, or processed in order of the highest height difference, you may get better results.
You need to consider the combined height of the land and water, and I would suggest trying moving half of the height difference, not half of the total water.
If you haven't already done this, you might find it helps to consider 1 dimension, flat terrain, placing different amounts of water in the block to start - just to make it easier to work out what's happening.
Finally just moving water to 4 of the surrounding blocks will look a bit odd, if you mean up, down left, and right without water moving diagonally. Once you've got the flow working well in one dimension consider moving to all 8 nearby blocks in the 2D case (assuming the blocks are in a rectangular grid)
If you are not concerned about erosion or where the sources of the water are located then I'd go with the simple solution you got from your last question. You'd have to build a one-dimensional array from your landscape and after you got the new mean (see my answer there) you run through your two-dimensional array and adjust the heights that fall below that mean value.

Distance matrix between 500,000 sets of coordinates

I'm working on a project with 500,000 participants. We have in our database the precise coordinates of their home, and we want to release this data to someone who needs it to evaluate how close our participants live to one another.
We are very reluctant to release the precise coordinates, because this is an anonymized project and the risk for re-identification would be very high. Rounded coordinates (to something like 100m or 1km) are apparently not precise enough for what they're trying to achieve.
A nice workaround would have been to send them a 500,000 by 500,000 matrix with the absolute distance between each pair of participants, but this means 250 billion entries, or rather 125 billion if we remove half the matrix since |A-B| = |B-A|.
I've never worked with this type of data before, so I was wondering if anyone had a clever idea on how to deal with this? (Something that would not involve sending them 2 TB of data!)
Thanks.
Provided that the recipient of the data is happy to perform the great circle calculation to calculate the distance themselves, then you only need to send the 500,000 lines, but with transposed latitudes and longitudes.
First of all identify an approximate geospatial centre of your dataset, and then work out the offsets needed to transpose this centre to 0°N and 0°E. Then apply these same offsets to the users' latitudes and longitudes. This will centre the results around the equator and the prime meridian.
Provided your real data isn't too close to the poles, the distance calculated between real points A and B will be very close to the corresponding offset points.
Obviously the offsets applied need to be kept secret.
This approach may not work if it is known that your data is based around a particular place - the recipient may be able to deduce where the real points are - but that is something you'll need to decide yourself.

What type of smoothing to use?

Not sure if this may or may not be valid here on SO, but I was hoping someone can advise of the correct algorithm to use.
I have the following RAW data.
In the image you can see "steps". Essentially I wish to get these steps, but then get a moving average of all the data between. In the following image, you can see the moving average:
However you will notice that at the "steps", the moving average decreases the gradient where I wish to keep the high vertical gradient.
Is there any smoothing technique that will take into account a large vertical "offset", but smooth the other data?
Yup, I had to do something similar with images from a spacecraft.
Simple technique #1: use a median filter with a modest width - say about 5 samples, or 7. This provides an output value that is the median of the corresponding input value and several of its immediate neighbors on either side. It will get rid of those spikes, and do a good job preserving the step edges.
The median filter is provided in all number-crunching toolkits that I know of such as Matlab, Python/Numpy, IDL etc., and libraries for compiled languages such as C++, Java (though specific names don't come to mind right now...)
Technique #2, perhaps not quite as good: Use a Savitzky-Golay smoothing filter. This works by effectively making least-square polynomial fits to the data, at each output sample, using the corresponding input sample and a neighborhood of points (much like the median filter). The SG smoother is known for being fairly good at preserving peaks and sharp transistions.
The SG filter is usually provided by most signal processing and number crunching packages, but might not be as common as the median filter.
Technique #3, the most work and requiring the most experience and judgement: Go ahead and use a smoother - moving box average, Gaussian, whatever - but then create an output that blends between the original with the smoothed data. The blend, controlled by a new data series you create, varies from all-original (blending in 0% of the smoothed) to all-smoothed (100%).
To control the blending, start with an edge detector to detect the jumps. You may want to first median-filter the data to get rid of the spikes. Then broaden (dilation in image processing jargon) or smooth and renormalize the the edge detector's output, and flip it around so it gives 0.0 at and near the jumps, and 1.0 everywhere else. Perhaps you want a smooth transition joining them. It is an art to get this right, which depends on how the data will be used - for me, it's usually images to be viewed by Humans. An automated embedded control system might work best if tweaked differently.
The main advantage of this technique is you can plug in whatever kind of smoothing filter you like. It won't have any effect where the blend control value is zero. The main disadvantage is that the jumps, the small neighborhood defined by the manipulated edge detector output, will contain noise.
I recommend first detecting the steps and then smoothing each step individually.
You know how to do the smoothing, and edge/step detection is pretty easy also (see here, for example). A typical edge detection scheme is to smooth your data and then multiply/convolute/cross-corelate it with some filter (for example the array [-1,1] that will show you where the steps are). In a mathematical context this can be viewed as studying the derivative of your plot to find inflection points (for some of the filters).
An alternative "hackish" solution would be to do a moving average but exclude outliers from the smoothing. You can decide what an outlier is by using some threshold t. In other words, for each point p with value v, take x points surrounding it and find the subset of those points which are between v - t and v + t, and take the average of these points as the new value of p.

SNAKES: Active Contour Model

I got the code of Snakes algorithm from here (Implemented in MatLab)
http://www.mathworks.com/matlabcentral/fileexchange/28109-snakes-active-contour-models
when you give it the initial indices surrounding the contour, It runs perfectly. but, unfortunately that isn't what I want.
Imagine that there is a mountain, I want to detect it's contour. But, I only have the index of the top of the mountain. So, the initial indices are the indices surrounding this pixel. But when running the algorithm, the snake is getting smaller and smaller till vanishing.
I want the snake to grow up till it founds the contour. Is that feasible?
I'm not an expert, but I have done a little reading on this topic. From what I understand many snake algorithms tend to shrink in the absence of any image forcing because they punish the first derivative (the integral of |x'|^2) and that inadvertently punishes area.
If you can access it, they talk about this problem in this paper and try and alter it to get an expanding snake by adding a volume term to the cost function.
http://www.springerlink.com/index/10.1007/s00791-012-0178-8
Hope that helps.
You want to increment the weight of External Forces (the force field generated by the mountain contour upon the snake points) and decrease the weight of Internal Forces (the elasticity of the snake, the "rubber band" effect).
If you do that, the snake will be less elastic (less of a rubber band) and more plastic (more like a string of beads).

Automatic tracking algorithm

I'm trying to write a simple tracking routine to track some points on a movie.
Essentially I have a series of 100-frames-long movies, showing some bright spots on dark background.
I have ~100-150 spots per frame, and they move over the course of the movie. I would like to track them, so I'm looking for some efficient (but possibly not overkilling to implement) routine to do that.
A few more infos:
the spots are a few (es. 5x5) pixels in size
the movement are not big. A spot generally does not move more than 5-10 pixels from its original position. The movements are generally smooth.
the "shape" of these spots is generally fixed, they don't grow or shrink BUT they become less bright as the movie progresses.
the spots don't move in a particular direction. They can move right and then left and then right again
the user will select a region around each spot and then this region will be tracked, so I do not need to automatically find the points.
As the videos are b/w, I though I should rely on brigthness. For instance I thought I could move around the region and calculate the correlation of the region's area in the previous frame with that in the various positions in the next frame. I understand that this is a quite naïve solution, but do you think it may work? Does anyone know specific algorithms that do this? It doesn't need to be superfast, as long as it is accurate I'm happy.
Thank you
nico
Sounds like a job for Blob detection to me.
I would suggest the Pearson's product. Having a model (which could be any template image), you can measure the correlation of the template with any section of the frame.
The result is a probability factor which determine the correlation of the samples with the template one. It is especially applicable to 2D cases.
It has the advantage to be independent from the sample absolute value, since the result is dependent on the covariance related with the mean of the samples.
Once you detect an high probability, you can track the successive frames in the neightboor of the original position, and select the best correlation factor.
However, the size and the rotation of the template matter, but this is not the case as I can understand. You can customize the detection with any shape since the template image could represent any configuration.
Here is a single pass algorithm implementation , that I've used and works correctly.
This has got to be a well reasearched topic and I suspect there won't be any 100% accurate solution.
Some links which might be of use:
Learning patterns of activity using real-time tracking. A paper by two guys from MIT.
Kalman Filter. Especially the Computer Vision part.
Motion Tracker. A student project, which also has code and sample videos I believe.
Of course, this might be overkill for you, but hope it helps giving you other leads.
Simple is good. I'd start doing something like:
1) over a small rectangle, that surrounds a spot:
2) apply a weighted average of all the pixel coordinates in the area
3) call the averaged X and Y values the objects position
4) while scanning these pixels, do something to approximate the bounding box size
5) repeat next frame with a slightly enlarged bounding box so you don't clip spot that moves
The weight for the average should go to zero for pixels below some threshold. Number 4 can be as simple as tracking the min/max position of anything brighter than the same threshold.
This will of course have issues with spots that overlap or cross paths. But for some reason I keep thinking you're tracking stars with some unknown camera motion, in which case this should be fine.
I'm afraid that blob tracking is not simple, not if you want to do it well.
Start with blob detection as genpfault says.
Now you have spots on every frame and you need to link them up. If the blobs are moving independently, you can use some sort of correspondence algorithm to link them up. See for instance http://server.cs.ucf.edu/~vision/papers/01359751.pdf.
Now you may have collisions. You can use mixture of gaussians to try to separate them, give up and let the tracks cross, use any other before-and-after information to resolve the collisions (e.g. if A and B collide and A is brighter before and will be brighter after, you can keep track of A; if A and B move along predictable trajectories, you can use that also).
Or you can collaborate with a lab that does this sort of stuff all the time.

Resources