I'm trying to write a program to place students in cars for carpooling to an event. I have the addresses for each student, and can geocode each address to get coordinates (the addresses are close enough that I can simply use euclidean distances between coordinates.) Some of the students have cars and can drives others. How can I efficiently group students in cars? I know that grouping is usually done using algorithms like K-Mean, but I can only find algorithms to group N points into M arbitrary-sized groups. My groups are of a specific size and positioning. Where can I start? A simply greedy algorithm will ensure the first cars assigned have minimum pick-up distance, but the average will be high, I imagine.
Say that you are trying to minimize the total distance traveled. Clearly traveling salesman problem is a special instance of your problem so your problem is NP-hard. That puts us in the heuristics/approximation algorithms domain.
The problem also needs some more specification, for example howmany students can fit in a given car. Lets say, as many as you want.
How about you solve it as a minimum spanning tree rooted at the final destination. Then each student with the car is is responsible for collecting all its children nodes. So the total distance traveled in at most 2x the total length of spanning tree which is a 2x bound right there. Of course this is ridiculous 'coz the nodes next to root will be driving a mega bus instead of a car in this case.
So then you start playing the packing game where you try to fill the cars greedily.
I know this is not a solution, but this might help you specify the problem better.
This is an old question, but since I found it, others will as well.
Group students together by distance. Find the distance between all sets of two students. Start with the closest students and add them in a group, and continue adding until all students are in groups. If students are beyond a threshold distance, like 50 miles, don't combine them into a group (this will cause a few students to go solo). If students have different sized cars, stop adding them when the max car size has been reached between the students in the group (and whichever one you're trying to add).
Finding the optimal (you asked for efficient) solution would require a more defined problem, which it seems like you don't have. If you wanted to eliminate individual drivers though, taking the above solution and special casing the outliers, working them individually into groups and swapping people around adjacent groups to fit them in, could find a very strong solution.
Related
Given that I have a 'pile' of items that need to be split in groups, and given that I can express how much these items differ, relative to eachother, in a number, a score if you will, how would I separate this input into meaningful groups?
I recognise that this is a bit of an abstract question, so to try and make it clearer here is what I have tried so far:
I have tried representing the input as a weighted graph in which every vertex is connected to every other vertex, with the 'strength' of the edge being their relative score. Then I'd take the longest edge of the graph, and separate every other vertex by 'closeness' to the vertices at the end of that longest edge. This works reasonably well, but has the disadvantage of always yielding two groups for a result, which might not necessarily be logical.
For example: say I can express the differentness of fruits in a number. Then given a pile of apples, the different brand of apples would form different categories, like Elstar, Jonagold, what have you... But when I'd have a pile consisting of apples, pears, and oranges, then the apples would be relatively similar and should fall into the same category.
I'm guessing I'd have to remove every edge of the graph bigger than the mean plus the standard deviation or something like that, and then see how many disjointed subgraphs appear, but I'd like to hear the approach of someone with more mathematical knowledge than me.
This is a bit long for a comment.
What you are referring to is clustering. You seem to have a "distance" matrix between two items, although this is probably some inverse of the "strength" metric. A distance metric is non-negative and 0 when two things are equal. The larger the value the further apart the items.
When you have a generic "distance" matrix, a typical clustering method is hierarchical/agglomerative clustering ("distance" is in quotes because it might not meet all the formal qualities of a distance). A good place to start in understanding this technique is the Wikipedia page. The ideas behind hierarchical clustering can be applied to non-fully connected graphs.
I would expect almost every statistics package to include some form of hierarchical clusters.
I've given the following problem. A set of locations (e.x. around 200 soccer clubs) is spread over a map. I want to group the locations based on their distance to each other. The result should be a list of groups (around 10 to 20) so that the distance each soccer club has to drive to visit all other clubs within their group is minimized.
I'm pretty sure an algorithm exists already. I probably only need the "official" name of this problem.
Can anyone please help me ?
You're probably looking for Data Clustering Algorithms. Since you have an idea of the number of clusters, a simple algorithm is k-means clustering.
If you want to choose the maximum distance d at the outset (and then determine how few groups suffice to guarantee that no team needs to drive more than this distance to get to another team in their own group) then you can formulate the problem as a graph colouring problem: make a vertex for each team, and put an edge between two vertices whenever the distance between them exceeds d. The solution to a graph colouring problem assigns a "colour" (just a label) to each vertex so that (a) no two vertices connected by an edge are assigned the same colour and (b) the number of distinct colours used is minimal. (In other words, edges represent "conflicts", indicating that the two endpoints cannot belong to the same group.) So here, each colour corresponds to a group, which is guaranteed to consist only of teams that are all <= d from each other, and the solution will try to minimise the total number of groups. You might need to rerun with a few different values of d until you get a solution with acceptably few groups.
Note that this is an NP-hard problem, so it might take a long time to find an exact (minimal-group-count) solution. There are many heuristics that are much quicker and still do a decent job, though.
I run a sporting website, and it's often useful to rank people against each other based on their previous meetings.
See this example set of data:
On the left is the raw "unsorted" view. On the right is the correctly sorted (in my opinion) list.
Each square shows the number of times they've competed against each other, and the percentage of victories. They're shaded based on the percentage.
I have this in a webpage, with "up" and "down" controls alongside each row, and I can manually nudge them around until I get what I want.
I'm just not quite sure of the best way to automate this.
The numbers at the end of each row are a quick idea I roughed out, and equal to the sum across the row of (percentage-50 * column number). As you can see, they do a fairly good job, with only the first two rows being "wrong". They don't give any weight to number of meetings though, only the win percentage.
The final-column numbers change depending on the row order as well, as can be seen by comparing the left and right tables in the image, so sorting on the initial values wouldn't work that well. Looping a sort + re-calc a couple of times might do the job.
I expect I can cobble something together to make this work... but I feel SO will have some much better ideas, and I'm all ears.
A tournament is a graph in which there is exactly one directed edge between every pair of vertices; to create such a graph from your input, you could make a graph with a vertex for each player, and then "point" each edge between two players in the direction of the player who won a majority of games (you will need to break ties somehow).
If you do this, and if there are no cycles A > B > ... > A of the type I mentioned in my comment in this graph, then the graph is transitive, and you can order the players using a topological sort of the graph. This takes just linear time in the number of edges, i.e. O(n^2) for n players.
If there are such cycles, then there's no "perfect" ordering of players: any ordering will place at least one player after some player that they have beaten. In that case, a reasonable alternative is to look for orderings that minimise the number of these "edge violations". This turns out to be a well-studied NP-hard problem in computer science called (Minimum) Feedback Arc Set in Tournaments (FAST). A feedback arc set is a set of directed edges ("arcs") which, if deleted from the graph, would leave a graph with no directed cycles -- which can then be easily turned into an order using topological sort as before.
This paper describes a recent attack on the problem. I haven't read the paper, but this is an active area of research and so the algorithm is probably quite complicated -- but it might give you ideas about how to create a simpler algorithm that runs slower (but acceptably fast on such small instances), or how to create a heuristic.
Just to add to j_random's answer, you can isolate cycles using something like Tarjan's strongly connected components algorithm. Within a strongly connected component, you could use another method for sorting the items.
If you're not familiar with it, the game consists of a collection of cars of varying sizes, set either horizontally or vertically, on a NxM grid that has a single exit.
Each car can move forward/backward in the directions it's set in, as long as another car is not blocking it. You can never change the direction of a car.
There is one special car, usually it's the red one. It's set in the same row that the exit is in, and the objective of the game is to find a series of moves (a move - moving a car N steps back or forward) that will allow the red car to drive out of the maze.
I've been trying to think how to generate instances for this problem, generating levels of difficulty based on the minimum number to solve the board.
Any idea of an algorithm or a strategy to do that?
Thanks in advance!
The board given in the question has at most 4*4*4*5*5*3*5 = 24.000 possible configurations, given the placement of cars.
A graph with 24.000 nodes is not very large for todays computers. So a possible approach would be to
construct the graph of all positions (nodes are positions, edges are moves),
find the number of winning moves for all nodes (e.g. using Dijkstra) and
select a node with a large distance from the goal.
One possible approach would be creating it in reverse.
Generate a random board, that has the red car in the winning position.
Build the graph of all reachable positions.
Select a position that has the largest distance from every winning position.
The number of reachable positions is not that big (probably always below 100k), so (2) and (3) are feasible.
How to create harder instances through local search
It's possible that above approach will not yield hard instances, as most random instances don't give rise to a complex interlocking behavior of the cars.
You can do some local search, which requires
a way to generate other boards from an existing one
an evaluation/fitness function
(2) is simple, maybe use the length of the longest solution, see above. Though this is quite costly.
(1) requires some thought. Possible modifications are:
add a car somewhere
remove a car (I assume this will always make the board easier)
Those two are enough to reach all possible boards. But one might to add other ways, because of removing makes the board easier. Here are some ideas:
move a car perpendicularly to its driving direction
swap cars within the same lane (aaa..bb.) -> (bb..aaa.)
Hillclimbing/steepest ascend is probably bad because of the large branching factor. One can try to subsample the set of possible neighbouring boards, i.e., don't look at all but only at a few random ones.
I know this is ancient but I recently had to deal with a similar problem so maybe this could help.
Constructing instances by applying random operators from a terminal state (i.e., reverse) will not work well. This is due to the symmetry in the state space. On average you end up in a state that is too close to the terminal state.
Instead, what worked better was to generate initial states (by placing random cars on the grid) and then to try to solve it with some bounded heuristic search algorithm such as IDA* or branch and bound. If an instance cannot be solved under the bound, discard it.
Try to avoid A*. If you have your definition of what you mean is a "hard" instance (I find 16 moves to be pretty difficult) you can use A* with a pruning rule that prevents expansion of nodes x with g(x)+h(x)>T (T being your threshold (e.g., 16)).
Heuristics function - Since you don't have to be optimal when solving it, you can use any simple inadmissible heuristic such as number of obstacle squares to the goal. Alternatively, if you need a stronger heuristic function, you can implement a manhattan distance function by generating the entire set of winning states for the generated puzzle and then using the minimal distance from a current state to any of the terminal state.
I'm making this repost after the earlier one here with more details.
PROBLEM :
The problem consists of a marauder who has to travel to different cities spread over a map. The starting location is known. Each city has a fixed loot associated with it. The aim of marauder is to travel across various nature of terrain. By nature of terrain, I mean there is a varied cost of travel between each pair of cities. He has to maximize the booty gained.
What we have done:
We have generated an adjacancy matrix (booty-path cost in place for each node) and then employed a heuristic analysis. It gave some output which is reasonable.
Now, the problem now is that each city has few or more vehicles in them, which can be bought (by paying) and can be used to travel. What vehicle does in actual is that it reduces the path cost. Once a vehicle is bought, it remains upto the time when next vehicle is bought. It is to upto to decide whether to buy the vehicle or not and how.
I need help at this point. How to integrate the idea of vehicle into what we already have? Plus, any further ideas which may help us to maximize the profit. I can post the code, if required. Thanks!
One way to do it would be to have a directed edge bearing the cost of the vehicle towards a duplicate graph with the reduced costs. You can even make it so that the reduction is finer than just a percentage if you want to.
The downside is that this will probably increase the size of the graph a lot (as many copies as you have different vehicles, plus the links between them), and if your heuristic is not optimal, you may have to modify it so that it considers the new edge positively.
It sounds as though beam search would suit this problem. Beam search uses a heuristic function H and a parameter k and works like this:
Initialize the set S to the initial game position.
Set T to the empty set.
For each game position in S, generate all possible successor positions to S after one move by the marauder. (A move being to loot, to purchase a vehicle, to move to an adjacent city, or whatever else a marauder can do.) Add each such successor position to the set T.
For each position p in T, evaluate H(p) for a heuristic function H. (The heuristic function can take into account the amount of loot, the possession of a vehicle, the number of remaining unlooted cities, and whatever else you think is relevant and easy to compute.)
If you've run out of search time, return the best-scoring position in T.
Otherwise, set S to the best-scoring k positions in T and go back to step 2.
The algorithm works well if you store T in the form of a heap with k elements.