Group locations on map based on distance - algorithm

I've given the following problem. A set of locations (e.x. around 200 soccer clubs) is spread over a map. I want to group the locations based on their distance to each other. The result should be a list of groups (around 10 to 20) so that the distance each soccer club has to drive to visit all other clubs within their group is minimized.
I'm pretty sure an algorithm exists already. I probably only need the "official" name of this problem.
Can anyone please help me ?

You're probably looking for Data Clustering Algorithms. Since you have an idea of the number of clusters, a simple algorithm is k-means clustering.

If you want to choose the maximum distance d at the outset (and then determine how few groups suffice to guarantee that no team needs to drive more than this distance to get to another team in their own group) then you can formulate the problem as a graph colouring problem: make a vertex for each team, and put an edge between two vertices whenever the distance between them exceeds d. The solution to a graph colouring problem assigns a "colour" (just a label) to each vertex so that (a) no two vertices connected by an edge are assigned the same colour and (b) the number of distinct colours used is minimal. (In other words, edges represent "conflicts", indicating that the two endpoints cannot belong to the same group.) So here, each colour corresponds to a group, which is guaranteed to consist only of teams that are all <= d from each other, and the solution will try to minimise the total number of groups. You might need to rerun with a few different values of d until you get a solution with acceptably few groups.
Note that this is an NP-hard problem, so it might take a long time to find an exact (minimal-group-count) solution. There are many heuristics that are much quicker and still do a decent job, though.

Related

Merge adjacent vertices of a graph until single vertex left in the fewest steps possible

I have a game system that can be represented as an undirected, unweighted graph where each vertex has one (relevant) property: a color. The goal of the game in terms of the graph representation is to reduce it down to one vertex in the fewest "steps" possible. In each step, the player can change the color of any one vertex, and all adjacent vertices of the same color are merged with it. (Note that in the example below I just happened to show the user only changing one specific vertex the whole game, but the user can pick any vertex in each step.)
What I am after is a way to compute the fewest amount of steps necessary to "beat" a given graph per the procedure described above, and also provide the specific moves needed to do so. I'm familiar with the basics of path-finding, BFS, and things of that nature, but I'm having a hard time framing this problem in terms of a "fastest path" solution.
I am unable to find this same problem anywhere on Google, or even a graph-theory term that encapsulates the problem. Does anyone have an idea of at least how to get started approaching this problem? Can anyone point me in the right direction?
EDIT Since this problem seems to be really difficult to solve efficiently, perhaps I could change the aim of my question. Could someone describe how I would even set up a brute force, breadth first search for this? (Brute force could possibly be okay, since in practice these graphs will only be 20 vertices at most.) I know how to write a BFS for a normal linked graph data structure... but in this case it seems quite weird since each vertex would have to contain a whole graph within itself, and the next vertices in the search graph would have to be generated based on possible moves to make in the graph within the vertex. How would one setup the data structure and search algorithm to accomplish this?
EDIT 2 This is an old question, but I figured it might help to just state outright what the game was. The game was essentially to be a rip-off of Kami 2 for iOS, except my custom puzzle editor would automatically figure out the quickest possible way to solve your puzzle, instead of having to find the shortest move number by trial and error yourself. I'm not sure if Kami was a completely original game concept, or if there is a whole class of games like it with the same "flood-fill" mechanic that I'm unaware of. If this is a common type of game, perhaps knowing the name of it could allow finding more literature on the algorithm I'm seeking.
EDIT 3 This Stack Overflow question seems like it may have some relevant insights.
Intuitively, the solution seems global. If you take a larger graph, for example, which dot you select first will have an impact on the direct neighbours which will have an impact on their neighbours and so on.
It sounds as if it were of the same breed of problems as the map colouring problem. Not because of the colours but because of the implications of a local selection to the other end of the graph down the road. In the map colouring, you have to decide what colour to draw a country and its neighbouring countries so two countries that touch don't have the same colour. That first set of selections have an impact on whether there is a solution in the subsequent iterations.
Just to show how complex problem is.
Lets check simpler problem where graph is changed with a tree, and only root vertex can change a colour. In that case path to a leaf can be represented as a sequence of colours of vertices on that path. Sequence A of colour changes collapses a leaf if leaf's sequence is subsequence of A.
Problem can be stated that for given set of sequences problem is to find minimal length sequence (S) so that each initial sequence is contained in S. That is called shortest common supersequence problem, and it is NP-complete.
Your problem is for sure more complex than this one :-/
Edit *
This is a comment on question's edit. Check this page for a terms.
Number of minimal possible moves is >= than graph radius. With that it seems good strategy to:
use central vertices for moves,
use moves that reduce graph radius, or at least reduce distance from central vertices to 'large' set of vertices.
I would go with a strategy that keeps track of central vertices and distances of all graph vertices to these central vertices. Step is to check all meaningful moves and choose one that reduce radius or distance to central vertices the most. I think BFS can be used for distance calculation and how move influences them. There are tricky parts, like when central vertices changes after moves. Maybe it is good idea to use not only central vertices but also vertices close to central.
I think the graph term you are looking for is the "valence" of a graph, which is the number of edges that a node is connected to. It looks like you want to change the color based on what node has the highest valence. Then in the resulting graph change the color for the node that has the highest valence, etc. until you have just one node left.

Given a pile of items, split them into meaningful groups by comparing them

Given that I have a 'pile' of items that need to be split in groups, and given that I can express how much these items differ, relative to eachother, in a number, a score if you will, how would I separate this input into meaningful groups?
I recognise that this is a bit of an abstract question, so to try and make it clearer here is what I have tried so far:
I have tried representing the input as a weighted graph in which every vertex is connected to every other vertex, with the 'strength' of the edge being their relative score. Then I'd take the longest edge of the graph, and separate every other vertex by 'closeness' to the vertices at the end of that longest edge. This works reasonably well, but has the disadvantage of always yielding two groups for a result, which might not necessarily be logical.
For example: say I can express the differentness of fruits in a number. Then given a pile of apples, the different brand of apples would form different categories, like Elstar, Jonagold, what have you... But when I'd have a pile consisting of apples, pears, and oranges, then the apples would be relatively similar and should fall into the same category.
I'm guessing I'd have to remove every edge of the graph bigger than the mean plus the standard deviation or something like that, and then see how many disjointed subgraphs appear, but I'd like to hear the approach of someone with more mathematical knowledge than me.
This is a bit long for a comment.
What you are referring to is clustering. You seem to have a "distance" matrix between two items, although this is probably some inverse of the "strength" metric. A distance metric is non-negative and 0 when two things are equal. The larger the value the further apart the items.
When you have a generic "distance" matrix, a typical clustering method is hierarchical/agglomerative clustering ("distance" is in quotes because it might not meet all the formal qualities of a distance). A good place to start in understanding this technique is the Wikipedia page. The ideas behind hierarchical clustering can be applied to non-fully connected graphs.
I would expect almost every statistics package to include some form of hierarchical clusters.

How can I sort this "matrix" of data?

I run a sporting website, and it's often useful to rank people against each other based on their previous meetings.
See this example set of data:
On the left is the raw "unsorted" view. On the right is the correctly sorted (in my opinion) list.
Each square shows the number of times they've competed against each other, and the percentage of victories. They're shaded based on the percentage.
I have this in a webpage, with "up" and "down" controls alongside each row, and I can manually nudge them around until I get what I want.
I'm just not quite sure of the best way to automate this.
The numbers at the end of each row are a quick idea I roughed out, and equal to the sum across the row of (percentage-50 * column number). As you can see, they do a fairly good job, with only the first two rows being "wrong". They don't give any weight to number of meetings though, only the win percentage.
The final-column numbers change depending on the row order as well, as can be seen by comparing the left and right tables in the image, so sorting on the initial values wouldn't work that well. Looping a sort + re-calc a couple of times might do the job.
I expect I can cobble something together to make this work... but I feel SO will have some much better ideas, and I'm all ears.
A tournament is a graph in which there is exactly one directed edge between every pair of vertices; to create such a graph from your input, you could make a graph with a vertex for each player, and then "point" each edge between two players in the direction of the player who won a majority of games (you will need to break ties somehow).
If you do this, and if there are no cycles A > B > ... > A of the type I mentioned in my comment in this graph, then the graph is transitive, and you can order the players using a topological sort of the graph. This takes just linear time in the number of edges, i.e. O(n^2) for n players.
If there are such cycles, then there's no "perfect" ordering of players: any ordering will place at least one player after some player that they have beaten. In that case, a reasonable alternative is to look for orderings that minimise the number of these "edge violations". This turns out to be a well-studied NP-hard problem in computer science called (Minimum) Feedback Arc Set in Tournaments (FAST). A feedback arc set is a set of directed edges ("arcs") which, if deleted from the graph, would leave a graph with no directed cycles -- which can then be easily turned into an order using topological sort as before.
This paper describes a recent attack on the problem. I haven't read the paper, but this is an active area of research and so the algorithm is probably quite complicated -- but it might give you ideas about how to create a simpler algorithm that runs slower (but acceptably fast on such small instances), or how to create a heuristic.
Just to add to j_random's answer, you can isolate cycles using something like Tarjan's strongly connected components algorithm. Within a strongly connected component, you could use another method for sorting the items.

Marauders dilemma algorithm

I'm making this repost after the earlier one here with more details.
PROBLEM :
The problem consists of a marauder who has to travel to different cities spread over a map. The starting location is known. Each city has a fixed loot associated with it. The aim of marauder is to travel across various nature of terrain. By nature of terrain, I mean there is a varied cost of travel between each pair of cities. He has to maximize the booty gained.
What we have done:
We have generated an adjacancy matrix (booty-path cost in place for each node) and then employed a heuristic analysis. It gave some output which is reasonable.
Now, the problem now is that each city has few or more vehicles in them, which can be bought (by paying) and can be used to travel. What vehicle does in actual is that it reduces the path cost. Once a vehicle is bought, it remains upto the time when next vehicle is bought. It is to upto to decide whether to buy the vehicle or not and how.
I need help at this point. How to integrate the idea of vehicle into what we already have? Plus, any further ideas which may help us to maximize the profit. I can post the code, if required. Thanks!
One way to do it would be to have a directed edge bearing the cost of the vehicle towards a duplicate graph with the reduced costs. You can even make it so that the reduction is finer than just a percentage if you want to.
The downside is that this will probably increase the size of the graph a lot (as many copies as you have different vehicles, plus the links between them), and if your heuristic is not optimal, you may have to modify it so that it considers the new edge positively.
It sounds as though beam search would suit this problem. Beam search uses a heuristic function H and a parameter k and works like this:
Initialize the set S to the initial game position.
Set T to the empty set.
For each game position in S, generate all possible successor positions to S after one move by the marauder. (A move being to loot, to purchase a vehicle, to move to an adjacent city, or whatever else a marauder can do.) Add each such successor position to the set T.
For each position p in T, evaluate H(p) for a heuristic function H. (The heuristic function can take into account the amount of loot, the possession of a vehicle, the number of remaining unlooted cities, and whatever else you think is relevant and easy to compute.)
If you've run out of search time, return the best-scoring position in T.
Otherwise, set S to the best-scoring k positions in T and go back to step 2.
The algorithm works well if you store T in the form of a heap with k elements.

A special case of grouping coordinates

I'm trying to write a program to place students in cars for carpooling to an event. I have the addresses for each student, and can geocode each address to get coordinates (the addresses are close enough that I can simply use euclidean distances between coordinates.) Some of the students have cars and can drives others. How can I efficiently group students in cars? I know that grouping is usually done using algorithms like K-Mean, but I can only find algorithms to group N points into M arbitrary-sized groups. My groups are of a specific size and positioning. Where can I start? A simply greedy algorithm will ensure the first cars assigned have minimum pick-up distance, but the average will be high, I imagine.
Say that you are trying to minimize the total distance traveled. Clearly traveling salesman problem is a special instance of your problem so your problem is NP-hard. That puts us in the heuristics/approximation algorithms domain.
The problem also needs some more specification, for example howmany students can fit in a given car. Lets say, as many as you want.
How about you solve it as a minimum spanning tree rooted at the final destination. Then each student with the car is is responsible for collecting all its children nodes. So the total distance traveled in at most 2x the total length of spanning tree which is a 2x bound right there. Of course this is ridiculous 'coz the nodes next to root will be driving a mega bus instead of a car in this case.
So then you start playing the packing game where you try to fill the cars greedily.
I know this is not a solution, but this might help you specify the problem better.
This is an old question, but since I found it, others will as well.
Group students together by distance. Find the distance between all sets of two students. Start with the closest students and add them in a group, and continue adding until all students are in groups. If students are beyond a threshold distance, like 50 miles, don't combine them into a group (this will cause a few students to go solo). If students have different sized cars, stop adding them when the max car size has been reached between the students in the group (and whichever one you're trying to add).
Finding the optimal (you asked for efficient) solution would require a more defined problem, which it seems like you don't have. If you wanted to eliminate individual drivers though, taking the above solution and special casing the outliers, working them individually into groups and swapping people around adjacent groups to fit them in, could find a very strong solution.

Resources