Google Maps Timeline - How does the segmentation algorithm work? - algorithm

Google Timeline shows a very nice segmentation of my location history. It clearly identifies periods of time (i.e. segments) in which I stayed in the same location, and periods of time in which I moved from one location to another - ignoring the jitter that happens from GPS inaccuracy and small movements.
Does anybody know the algorithm that Google use for the segmentation? Can you suggest an algorithm that could do it, preferably with a link to an academic paper? We had some ideas of our own, but I would like to hear better suggestions that would consider things like GPS inaccuracy, slow movement, jitter, etc.
Notice that the algorithm is not a simple clustering algorithm, because it considers the order of the points - a sequence of points nearby is considered as staying in the same location, and the points between such sequences are considered as a movement from one place to another (I suppose the time gaps between points also has some effect).
Thanks!

You probably only need a simple filter and threshold approach.
Filter the data. Take the average position of the last 10 minutes.
Threshold: if the position changed by more than e.g. 50 meters, consider the user to be moving.
Filter again: Remove any too-short stationary or moving interval.
O(n) in complexity, as good as it gets.

Related

Is GPS inaccuracy consistent over short time spans?

I'm interested in developing a semi-autonomous RC lawnmower.
That is, the operator would decide when to stop, turn, etc., but could request "slightly overlap previous cut" and the mower would automatically do so. (Having operated high-end RC mowers at trade shows, this is the tedious part. Overcoming that, plus the high cost -- which I believe is possible -- would make a commercial success.)
This feature would require accurate horizontal positioning. I have investigated ultrasonic, laser, optical, and GPS. Each has its problems in this application. (I'll resist the temptation to go off on these tangents here.)
So... my question...
I know GPS horizontal accuracy is only 3-4m. Not good enough, but:
I don't need to know where I am on the planet. I only need to know where I am relative to where I was a minute ago.
So, my question is, is the inaccuracy consistent in the short term? if so, I think it would work for me. If it varies wildly by +- 1.5m from one second to the next, then it will not work.
I have tried to find this information but have had no success (possibly because of the ubiquity of other GPS-accuracy discussion), so I appreciate any guidance.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Edit ~~~~~~~~~~~~~~~~~~~~~~
It's looking to me like GPS is not just skewed but granular. I'd be interested in hearing from anyone who can give better insight into this, but for now I'm going to explore other options.
I realized that even though my intended application is "outdoor", this question is technically in the field of "indoor positioning systems" so I am adding that tag.
My latest thinking is to have 3 "intelligent" high-dB ultrasonic (US) speaker units. The mower emits RF requests for a tone from each speaker in rapid sequence, measuring the time it takes to "hear" each unit's response, thereby calculating distance to each of these fixed point and using trilateration to get position. if the fixed-point speakers are 300' away from the mower, the mower may have moved several feet between the 1st and 3rd response, so this would have to be allowed for in the software. If it is possible to differentiate 3 different US frequencies, they could be requested/received "simultaneously". Though you still run into issues when you're close to one fixed unit and far from another. So some software correction may still be necessary. If we can assume the mower is moving in a straight line, this isn't too complicated.
Another variation is the mower does not request the tones. The fixed units send RF "here comes tone from unit A" etc., and the mower unit just monitors both RF info and US tones. This may simplify things somewhat, but it seems it really requires the ability to determine which speaker a tone is coming from.
This seems like the kind of thing you could (and should) measure empirically. Just set a GPS of your liking down in the middle of a field on a clear day and wait an hour. Then come back and see what you find.
Because I'm in a city, I can't run out and do this for you. However, I found a paper entitled iGeoTrans – A novel iOS application for GPS positioning in geosciences.
That includes this figure which duplicates the test I propose. You'll note that both the iPhone4 and Garmin eTrex10 perform pretty poorly versus the accuracy you say you need.
But the authors do some Math Magic™ to reduce the uncertainty in the position, presumably by using some kind of averaging. That gets them to a 3.53m RMSE measure.
If you have real-time differential GPS, you can do better. But this requires relatively expensive hardware and software.
Even aside from the above, you have the potential issue of GPS reflection and multipath error. What if your mower has to go under a deck, or thick trees, or near the wall of a house? These common yard features will likely break the assumptions needed to make a good averaging algorithm work and even frustrate attempts at DGPS by blocking critical signals.
To my mind, this seems like a computer vision problem. And not just because that'll give you more accurate row overlaps... you definitely don't want to run over a dog!
In my opinion a standard GPS is no way accurate enough for this application. A typical consumer grade receiver that I have used has a position accuracy defined as a CEP of 2.5 metres. This means that for a stationary receiver in a "perfect" sky view environment over time 50% of the position fixes will lie within a circle with a radius of 2.5 metres. If you look at the position that the receiver reports it appears to wander at random around the true position sometimes moving a number of metres away from its true location. When I have monitored the position data from a number of stationary units that I have used they could appear to be moving at speeds of up to 0.5 metres per second. In your application this would mean that the lawnmower could be out of position by some not insignificant distance (with disastrous consequences for your prized flowerbeds).
There is a way that this can be done, as has been proved by the tractor manufacturers who can position the seed drills and agricultural sprayers to millimetre accuracy. These systems use Differential GPS where there is a fixed reference station positioned in the neighbourhood of the tractor being controlled. This reference station transmits error corrections to the mobile unit allowing it to correct its reported position to a high degree of accuracy. Unfortunately this sort of positioning system is very expensive.

How to calculate C-SCAN algorithm?

I'm learning operating system disk management algorithm. There is a specific algorithm called C-SCAN which is a variant of SCAN (or elevator) disk scheduling algorithm. Now in the example section of Wikipedia, there is an example of calculating SCAN vs C-SCAN method.
In case of C-SCAN when the head is moved from 100 to 0 we are calculating the whole seek time of 100. But I think if C-SCAN is imagined as scanning a cylinder then why would we consider the jump as a seek time?
There is an another article where the jump is not calculated. So what to do? I would be glad if anyone answer my question with proper example.
The algorithms aren't talking about the '100' as a time, but rather the distance the head will move. The numbers they refer too are track numbers (cylinders are logically all the tracks above and below each other when viewed from above the disk). So track 20 and 21 are neighbours for the examples given.
The elevator technique is pretty simple, once you start going in a direction, keep going until you have no more requests in that direction, then change direction and go all the way in the other direction. Think of it like a book, you are on page (track) 20, so keep going forward, until the end, then work through the book backwards.
C-SCAN is similiar, except rather than reversing direction at the end, it goes back to the extreme lowest and starts again.
The are subtle differences in I/O latency under high load with these two variations even though they appear very close.

Techniques to evaluate the "twistiness" of a road in Google Maps?

As per the title. I want to, given a Google maps URL, generate a twistiness rating based on how windy the roads are. Are there any techniques available I can look into?
What do I mean by twistiness? Well I'm not sure exactly. I suppose it's characterized by a high turn -to-distance ratio, as well as high angle-change-per-turn number. I'd also say that elevation change of a road comes in to it as well.
I think that once you know exactly what you want to measure, the implementation is quite straightforward.
I can think of several measurements:
the ratio of the road length to the distance between start and end (this would make a long single curve "twisty", so it is most likely not the complete answer)
the number of inflection points per unit length (this would make an almost straight road with a lot of little swaying "twisty", so it is most likely not the complete answer)
These two could be combined by multiplication, so that you would have:
road-length * inflection-points
--------------------------------------
start-end-distance * road-length
You can see that this can be shortened to "inflection-points per start-end-distance", which does seem like a good indicator for "twistiness" to me.
As for taking elevation into account, I think that making the whole calculation in three dimensions is enough for a first attempt.
You might want to handle left-right inflections separately from up-down inflections, though, in order to make it possible to scale the elevation inflections by some factor.
Try http://www.hardingconsultants.co.nz/transportationconference2007/images/Presentations/Technical%20Conference/L1%20Megan%20Fowler%20Canterbury%20University.pdf as a starting point.
I'd assume that you'd have to somehow capture the road centreline from Google Maps as a vectorised dataset & analyse using GIS software to do what you describe. Maybe do a screen grab then a raster-to-vector conversion to start with.
Cumulative turn angle per Km is a commonly-used measure in road assessment. Vertex density is also useful. Note that these measures depend upon an assumption that vertices have been placed at some form of equal density along the line length whilst they were captured, rather than being manually placed. Running a GIS tool such as a "bendsimplify" algorithm on the line should solve this. I have written scripts in Python for ArcGIS 10 to define these measures if anyone wants them.
Sinuosity is sometimes used for measuring bends in rivers - see the help pages for Hawths Tools for ArcGIS for a good description. It could be misleading for roads that have major
changes in course along their length though.

Smoothing values over time: moving average or something better?

I'm coding something at the moment where I'm taking a bunch of values over time from a hardware compass. This compass is very accurate and updates very often, with the result that if it jiggles slightly, I end up with the odd value that's wildly inconsistent with its neighbours. I want to smooth those values out.
Having done some reading around, it would appear that what I want is a high-pass filter, a low-pass filter or a moving average. Moving average I can get down with, just keep a history of the last 5 values or whatever, and use the average of those values downstream in my code where I was once just using the most recent value.
That should, I think, smooth out those jiggles nicely, but it strikes me that it's probably quite inefficient, and this is probably one of those Known Problems to Proper Programmers to which there's a really neat Clever Math solution.
I am, however, one of those awful self-taught programmers without a shred of formal education in anything even vaguely related to CompSci or Math. Reading around a bit suggests that this may be a high or low pass filter, but I can't find anything that explains in terms comprehensible to a hack like me what the effect of these algorithms would be on an array of values, let alone how the math works. The answer given here, for instance, technically does answer my question, but only in terms comprehensible to those who would probably already know how to solve the problem.
It would be a very lovely and clever person indeed who could explain the sort of problem this is, and how the solutions work, in terms understandable to an Arts graduate.
If you are trying to remove the occasional odd value, a low-pass filter is the best of the three options that you have identified. Low-pass filters allow low-speed changes such as the ones caused by rotating a compass by hand, while rejecting high-speed changes such as the ones caused by bumps on the road, for example.
A moving average will probably not be sufficient, since the effects of a single "blip" in your data will affect several subsequent values, depending on the size of your moving average window.
If the odd values are easily detected, you may even be better off with a glitch-removal algorithm that completely ignores them:
if (abs(thisValue - averageOfLast10Values) > someThreshold)
{
thisValue = averageOfLast10Values;
}
Here is a guick graph to illustrate:
The first graph is the input signal, with one unpleasant glitch. The second graph shows the effect of a 10-sample moving average. The final graph is a combination of the 10-sample average and the simple glitch detection algorithm shown above. When the glitch is detected, the 10-sample average is used instead of the actual value.
If your moving average has to be long in order to achieve the required smoothing, and you don't really need any particular shape of kernel, then you're better off if you use an exponentially decaying moving average:
a(i+1) = tiny*data(i+1) + (1.0-tiny)*a(i)
where you choose tiny to be an appropriate constant (e.g. if you choose tiny = 1- 1/N, it will have the same amount of averaging as a window of size N, but distributed differently over older points).
Anyway, since the next value of the moving average depends only on the previous one and your data, you don't have to keep a queue or anything. And you can think of this as doing something like, "Well, I've got a new point, but I don't really trust it, so I'm going to keep 80% of my old estimate of the measurement, and only trust this new data point 20%". That's pretty much the same as saying, "Well, I only trust this new point 20%, and I'll use 4 other points that I trust the same amount", except that instead of explicitly taking the 4 other points, you're assuming that the averaging you did last time was sensible so you can use your previous work.
Moving average I can get down with ...
but it strikes me that it's probably
quite inefficient.
There's really no reason a moving average should be inefficient. You keep the number of data points you want in some buffer (like a circular queue). On each new data point, you pop the oldest value and subtract it from a sum, and push the newest and add it to the sum. So every new data point really only entails a pop/push, an addition and a subtraction. Your moving average is always this shifting sum divided by the number of values in your buffer.
It gets a little trickier if you're receiving data concurrently from multiple threads, but since your data is coming from a hardware device that seems highly doubtful to me.
Oh and also: awful self-taught programmers unite! ;)
An exponentially decaying moving average can be calculated "by hand" with only the trend if you use the proper values. See http://www.fourmilab.ch/hackdiet/e4/ for an idea on how to do this quickly with a pen and paper if you are looking for “exponentially smoothed moving average with 10% smoothing”. But since you have a computer, you probably want to be doing binary shifting as opposed to decimal shifting ;)
This way, all you need is a variable for your current value and one for the average. The next average can then be calculated from that.
there's a technique called a range gate that works well with low-occurrence spurious samples. assuming the use of one of the filter techniques mentioned above (moving average, exponential), once you have "sufficient" history (one Time Constant) you can test the new, incoming data sample for reasonableness, before it is added to the computation.
some knowledge of the maximum reasonable rate-of-change of the signal is required. the raw sample is compared to the most recent smoothed value, and if the absolute value of that difference is greater than the allowed range, that sample is thrown out (or replaced with some heuristic, eg. a prediction based on slope; differential or the "trend" prediction value from double exponential smoothing)

another Game of Life question (infinite grid)?

I have been playing around with Conway's Game of life and recently discovered some amazingly fast implementations such as Hashlife and Golly. (download Golly here - http://golly.sourceforge.net/)
One thing that I cant get my head around is how do coders implement the infinite grid? We can't keep an infinite array of anything, if you run golly and get a few gliders to fly off past the edges, wait for a few mins and zoom right out, you will see the gliders still there out in space running away, so how in gods name is this concept of infinity dealt with programmatically? Is there a well documented pattern or what?
Many thanks
It is possible to represent living nodes with some type of sparse matrix in this situation. For instance, if we store a list of (LivingNode, Coordinate) pairs instead of an array of Nodes where each is either living or dead, we are simply changing the Coordinates rather than increasing an array's size. Thus, the space required for this is proportional to the number of LivingNodes.
This solution doesn't work for states where the number of living nodes is constantly increasing, but it works very well for gliders.
EDIT: So that was off the top of my head. Turns out Wikipedia has an article that shows a much more well-thought out solution. Oh well! :) Enjoy.
Wikipedia explains it.
The basic idea is that Conway's Game of Life exhibits locality, since information travels at a slow speed compared to the pattern size and the maximum density of filled cells is around 1/2 of the cells in any region. (More will kill off cells due to overcrowding.)
Since there is locality, you can separate the field in different sections and simulate each section independently. If you choose your locality well, you will often see the same patterns. You can simulate how those evolve and store the results in a lookup table, so that other instances of the same pattern do not need to be simulated more than once. Combining adjacent patterns into larger 'metapatterns' allows you to precalculate those as well, and so on.

Resources