Circular graph that plots weekdays and events? - time

I'm trying to find/create a graph that allows me to plot the various day of the week as circles within circles. So the innermost ring would be Sunday, the next outer ring would be Monday, etc. And in a clockwise fashion, it would be the 24 hours of the day. The goal is to see recurring patterns over the course of a day's time (the clockwise portion) as well as recurring activities (the circles part).
The goal being to analyze data so that I could say something like "every Monday at 9AM our power output spikes; maybe we should check into that anomaly and see why it happens".
It doesn't necessarily have to be a circular graph like I described, I just believe that would be the best way to visualize what I'm looking for, which is the ability to find patterns over the course of time.
I haven't really found any good graphs that meet this need. But I'm not really sure what this type of graph would be called, either. If anyone has any thoughts they'd be greatly appreciated!

A "radar chart" would probably meet your needs: http://en.wikipedia.org/wiki/Radar_chart
Basically the angle is the hour, and the radius is whatever magnitude you're plotting. I've used these to plot cyclical data in the past with some success.
examples: http://images.google.com/search?q=radar+chart&hl=en&safe=off&prmd=ivns&tbm=isch

Related

Google Maps Timeline - How does the segmentation algorithm work?

Google Timeline shows a very nice segmentation of my location history. It clearly identifies periods of time (i.e. segments) in which I stayed in the same location, and periods of time in which I moved from one location to another - ignoring the jitter that happens from GPS inaccuracy and small movements.
Does anybody know the algorithm that Google use for the segmentation? Can you suggest an algorithm that could do it, preferably with a link to an academic paper? We had some ideas of our own, but I would like to hear better suggestions that would consider things like GPS inaccuracy, slow movement, jitter, etc.
Notice that the algorithm is not a simple clustering algorithm, because it considers the order of the points - a sequence of points nearby is considered as staying in the same location, and the points between such sequences are considered as a movement from one place to another (I suppose the time gaps between points also has some effect).
Thanks!
You probably only need a simple filter and threshold approach.
Filter the data. Take the average position of the last 10 minutes.
Threshold: if the position changed by more than e.g. 50 meters, consider the user to be moving.
Filter again: Remove any too-short stationary or moving interval.
O(n) in complexity, as good as it gets.

Techniques to evaluate the "twistiness" of a road in Google Maps?

As per the title. I want to, given a Google maps URL, generate a twistiness rating based on how windy the roads are. Are there any techniques available I can look into?
What do I mean by twistiness? Well I'm not sure exactly. I suppose it's characterized by a high turn -to-distance ratio, as well as high angle-change-per-turn number. I'd also say that elevation change of a road comes in to it as well.
I think that once you know exactly what you want to measure, the implementation is quite straightforward.
I can think of several measurements:
the ratio of the road length to the distance between start and end (this would make a long single curve "twisty", so it is most likely not the complete answer)
the number of inflection points per unit length (this would make an almost straight road with a lot of little swaying "twisty", so it is most likely not the complete answer)
These two could be combined by multiplication, so that you would have:
road-length * inflection-points
--------------------------------------
start-end-distance * road-length
You can see that this can be shortened to "inflection-points per start-end-distance", which does seem like a good indicator for "twistiness" to me.
As for taking elevation into account, I think that making the whole calculation in three dimensions is enough for a first attempt.
You might want to handle left-right inflections separately from up-down inflections, though, in order to make it possible to scale the elevation inflections by some factor.
Try http://www.hardingconsultants.co.nz/transportationconference2007/images/Presentations/Technical%20Conference/L1%20Megan%20Fowler%20Canterbury%20University.pdf as a starting point.
I'd assume that you'd have to somehow capture the road centreline from Google Maps as a vectorised dataset & analyse using GIS software to do what you describe. Maybe do a screen grab then a raster-to-vector conversion to start with.
Cumulative turn angle per Km is a commonly-used measure in road assessment. Vertex density is also useful. Note that these measures depend upon an assumption that vertices have been placed at some form of equal density along the line length whilst they were captured, rather than being manually placed. Running a GIS tool such as a "bendsimplify" algorithm on the line should solve this. I have written scripts in Python for ArcGIS 10 to define these measures if anyone wants them.
Sinuosity is sometimes used for measuring bends in rivers - see the help pages for Hawths Tools for ArcGIS for a good description. It could be misleading for roads that have major
changes in course along their length though.

Classical task-scheduling assignment

I am working on a flight scheduling app (disclaimer: it's for a college project, so no code answers, please). Please read this question w/ a quantum of attention before answering as it has a lot of peculiarities :(
First, some terminology issues:
You have planes and flights, and you have to pair them up. For simplicity's sake, we'll assume that a plane is free as soon as the flight using it prior lands.
Flights are seen as tasks:
They have a duration
They have dependencies
They have an expected date/time for
beginning
Planes can be seen as resources to be used by tasks (or flights, in our terminology).
Flights have a specific type of plane needed. e.g. flight 200 needs a plane of type B.
Planes obviously are of one and only one specific type, e.g., Plane Airforce One is of type C.
A "project" is the set of all the flights by an airline in a given time period.
The functionality required is:
Finding the shortest possible
duration for a said project
The earliest and latest possible
start for a task (flight)
The critical tasks, with basis on
provided data, complete with
identifiers of preceding tasks.
Automatically pair up flights and
planes, so as to get all flights
paired up with a plane. (Note: the
duration of flights is fixed)
Get a Gantt diagram with the projects
scheduling, in which all flights
begin as early as possible, showing
all previously referred data
graphically (dependencies, time info,
etc.)
So the questions is: How in the world do I achieve this? Particularly:
We are required to use a graph.
What do the graph's edges and nodes
respectively symbolise?
Are we required to discard tasks to
achieve the critical tasks set?
If you could also recommend some algorithms for us to look up, that'd be great.
Here some suggestions.
In principle you can have a graph where every node is a flight and there is an edge from flight A to flight B if B depends on A, i.e. B can't take off before A has landed. You can use this dependency graph to calculate the shortest possible duration for the project --- find the path through the graph that has maximum duration when you add the durations of all the flights on the path together. This is the "critical path" of your project.
However, the fact that you need to pair with planes makes it more difficult, esp. as I guess it is assumed that the planes are not allowed to fly without passengers, i.e. a plane must take off from the same city where it landed last.
If you have an excessive number of planes, allocating them to the flights can be most likely easily with a combinatorial optimization algorithm like simulated annealing. If the plan is very tight, i.e. you don't have excess planes, it could be a hard problem.
To set the actual take-off times for your flights, you can for example formulate the allowed schedules as a linear programming problem, or as a semi-definite / quadratic programming problem.
Here some references:
http://en.wikipedia.org/wiki/Simulated_annealing
http://en.wikipedia.org/wiki/Linear_programming
http://en.wikipedia.org/wiki/Quadratic_programming
http://en.wikipedia.org/wiki/Gradient_descent
http://en.wikipedia.org/wiki/Critical_path_method
Start with drawing out a domain model (class diagram) and make a clear separation in your mind between:
planning-immutable facts: PlaneType, Plane, Flight, FlightBeforeFlightConstraint, ...
planning variables: PlaneToFlightAssignment
Wrap all those instances in that Project class (= a Solution).
Then define a score function (AKA fitness function) on such a Solution. For example, if there are 2 PlaneToFlightAssignments which are not ok with a FlightBeforeFlightConstraint (= flight dependency), then lower the score.
Then it's just a matter for finding the Solution with the best score, by changing the PlaneToFlightAssignment instances. There are several algorithms you can use to find that best solution. If your data set is really really small (say 10 planes), you might be able to use brute force.

Algorithm for finding the best routes for food distribution in game

I'm designing a city building game and got into a problem.
Imagine Sierra's Caesar III game mechanics: you have many city districts with one market each. There are several granaries over the distance connected with a directed weighted graph. The difference: people (here cars) are units that form traffic jams (here goes the graph weights).
Note: in Ceasar game series, people harvested food and stockpiled it in several big granaries, whereas many markets (small shops) took food from the granaries and delivered it to the citizens.
The task: tell each district where they should be getting their food from while taking least time and minimizing congestions on the city's roads.
Map example
Suppose that yellow districts need 7, 7 and 4 apples accordingly.
Bluish granaries have 7 and 11 apples accordingly.
Suppose edges weights to be proportional to their length. Then, the solution should be something like the gray numbers indicated on the edges. Eg, first district gets 4 apples from the 1st and 3 apples from the 2nd granary, while the last district gets 4 apples from only the 2nd granary.
Here, vertical roads are first occupied to the max, and then the remaining workers are sent to the diagonal paths.
Question
What practical and very fast algorithm should I use? I was looking at some papers (Congestion Games: Optimization in Competition etc.) describing congestion games, but could not get the big picture.
You want to look into the Max-flow problem. Seems like in this case it is a bipartite graph, which should make things easier to visualize.
This is a Multi-source Multi-sink Maximum Flow Problem which can easily be converted into a simple Maximum Flow Problem by creating a super source and a super sink as described in the link. There are many efficient solutions to Maximum Flow Problems.
One thing you could do, which would address the incremental update problem discussed in another answer and which might also be cheaper to computer, is forget about a globally optimal solution. Let each villager participate in something like ant colony optimization.
Consider preventing the people on the bottom-right-hand yellow node in your example from squeezing out those on the far-right-hand yellow node by allowing the people at the far-right-hand yellow node to bid up the "price" of buying resources from the right-hand blue node, which would encourage some of those from the bottom-right-hand yellow node to take the slightly longer walk to the left-hand blue node.
I agree with Larry and mathmike, it certainly seems like this problem is a specialization of network flow.
On another note, the problem may get easier if your final algorithm finds a spanning tree for each market to its resources (granaries), consumes those resources greedily based on shortest path first, then moves onto the next resource pile.
It may help to think about it in terms of using a road to max capacity first (maximizing road efficiency), rather than trying to minimize congestion.
This goes to the root of the problem - in general, it's easier to find close to optimal solutions in graph problems and in terms of game dev, close to optimal is probably good enough.
Edit: Wanted to also point out that mathmike's link to Wikipedia also talks about Maximum Flow Problem with Vertex Capacities where each of your granaries can be thought of as vertices with finite capacity.
Something you have to note, is that your game is continuous. If you have a solution X at time t, and some small change occurs (e.g: the player builds another road, or one of the cities gain more population), the solution that the Max Flow algorithms give you may change drastically, but you'd probably want the solution at t+1 to be similar to X. A totally different solution at each time step is unrealistic (1 new road is built at the southern end of the map, and all routes are automatically re-calculated).
I would use some algorithm to calculate initial solution (or when a major change happens, like an earthquake destroys 25% of the roads), but most of the time only update it incrementally: meaning, define some form of valid transformation on a solution (e.g. 1 city tries to get 1 food unit from a different granary than it does now) - you try the update (simulate the expected congestion), and keep the updated solution if its better than the existing solution. Run this step N times after each game turn or some unit of time.
Its both efficient computationally (don't need to run full Max Flow every second) and will get you more realistic, smooth changes in behavior.
It might be more fun to have a dynamic that models a behavior resulting in a good reasonable solution, rather than finding an ideal solution to drive the behavior. Suppose you plan each trip individually. If you're a driver and you need to get from point A to point B, how would you get there? You might consider a few things:
I know about typical traffic conditions at this hour and I'll try to find ways around roads that are usually busy. You might model this as an averaged traffic value at different times, as the motorists don't necessarily have perfect information about the current traffic, but may learn and identify trends over time.
I don't like long, confusing routes with a lot of turns. When planning a trip, you might penalize those with many edges.
If speed limits and traffic lights are included in your model, I'd want to avoid long stretches with low speed limits and/or a lot of traffic lights. I'd prefer freeways or highways for longer trips, even if they have more traffic.
There may be other interesting dynamics that evolve from considering the problem behaviorally rather than as a pure optimization. In real life, traffic rarely converges on optimal solutions, so a big part of the challenge in transportation engineering is coming up with incentives, penalties and designs that encourage a better solution from the natural dynamics playing out in the drivers' decisions.

Building an activity chart combining two kinds of data

I'm building, for example, an application that monitors your health. Each day, you're jogging and doing push-ups and you enter the information on a web site.
What I would like to do is building a chart combining the hours you jogged and the number of push-ups/sit-ups you did. Let's say on the first day, you jogged 1 hour and did 10 push-ups and on the second day, you jogged 50 minutes and did 20 push-ups, you would see a progression in your training.
I know it may sound strange but I want to have an overall-view of your health, not different views for jogging and push-ups. I don't want a double y-axis chart because if I have, as example, 6 runners, I will end up with 12 lines on the chart.
First I would redefine your terms. You are not tracking "health" here, you are tracking level of exertion through exercise.
Max Exertion != Max Health. If you exert yourself to the max and don't eat or drink, you will actually damage your health. :-)
To combine and plot your total "level of exertion" for multiple exercises you need to convert them to a common unit ... something like "calories burned".
I'm pretty sure there are many sources for reference tables with rough conversion factors for how many calories various exercises burn.
Does that help any?
Then you need a model of how push-ups and jogging affect yourself, and for this you should be asking a doctor or fitness expert, not a programmer :-). This question should probably be taken elsewhere.
Sounds like a double y-axis chart.
You can just do a regular excel-type chart with 2 lines, scaled appropriately, one for push-ups, one for jogging time. There are graphics libraries that let you do that in back-end language of your choice. X-axys is date.
You may want to have 2 scaled graphs, one for last week and one for last year (ala Yahoo Finance charts for different intervals).
Show the first set of values as a line graph above the x axis, and the second set below the x axis. If both sets of values increase over time this will show as an "expansion" of the graph; should be easy to recognize if one set is growing but the other is not.
Because the two quantities have no intrinsic relationship, you're stuck with either displaying them independently, such as two curves with two y-axes, or making up a measure that combines them, such as an estimate of calories burned, muscles used, mental anguish from exercising, etc. But it's tricky... taking from your example, I suspect one will never approach the calories burned from a 50 mile run by doing push-ups. Combining these in a meaningful way depends not on mathematics but on approximations and knowledge of the quantities that you start with and are interested in.
One compromise might be a graph with a single y-axis that shows some combined quantity, but where the independent values at each point are also graphically represented, for example, by a line where the local color represents the ratio of miles to pushups, or any of the many variants that display information in the shapes or colors in the plot.
Another option is to do a 3D plot, and then rotate it around and look for trends or whatever interests you.
If you want one overall measure of exercise levels, you could try using total exercise time. Another alternative is to define a points system, whereby users score points for each exercise.
I do think that there is virtue in letting the users see how much of each individual exercise they have done - in this case use a different graph for different exercises rather than using dual y-axes, if the scales are not comparable (e.g. time jogging and number of push-ups). There is a very good article on the problems with dual y-axes by business intelligence guru Stephen Few, here (pdf).
If you want to know more about presenting data well, I can also recommend his book "Now you see it", and the classic "The Visual Display of Quantitative Information" by Edward Tufte.

Resources