How can I sort this "matrix" of data? - sorting

I run a sporting website, and it's often useful to rank people against each other based on their previous meetings.
See this example set of data:
On the left is the raw "unsorted" view. On the right is the correctly sorted (in my opinion) list.
Each square shows the number of times they've competed against each other, and the percentage of victories. They're shaded based on the percentage.
I have this in a webpage, with "up" and "down" controls alongside each row, and I can manually nudge them around until I get what I want.
I'm just not quite sure of the best way to automate this.
The numbers at the end of each row are a quick idea I roughed out, and equal to the sum across the row of (percentage-50 * column number). As you can see, they do a fairly good job, with only the first two rows being "wrong". They don't give any weight to number of meetings though, only the win percentage.
The final-column numbers change depending on the row order as well, as can be seen by comparing the left and right tables in the image, so sorting on the initial values wouldn't work that well. Looping a sort + re-calc a couple of times might do the job.
I expect I can cobble something together to make this work... but I feel SO will have some much better ideas, and I'm all ears.

A tournament is a graph in which there is exactly one directed edge between every pair of vertices; to create such a graph from your input, you could make a graph with a vertex for each player, and then "point" each edge between two players in the direction of the player who won a majority of games (you will need to break ties somehow).
If you do this, and if there are no cycles A > B > ... > A of the type I mentioned in my comment in this graph, then the graph is transitive, and you can order the players using a topological sort of the graph. This takes just linear time in the number of edges, i.e. O(n^2) for n players.
If there are such cycles, then there's no "perfect" ordering of players: any ordering will place at least one player after some player that they have beaten. In that case, a reasonable alternative is to look for orderings that minimise the number of these "edge violations". This turns out to be a well-studied NP-hard problem in computer science called (Minimum) Feedback Arc Set in Tournaments (FAST). A feedback arc set is a set of directed edges ("arcs") which, if deleted from the graph, would leave a graph with no directed cycles -- which can then be easily turned into an order using topological sort as before.
This paper describes a recent attack on the problem. I haven't read the paper, but this is an active area of research and so the algorithm is probably quite complicated -- but it might give you ideas about how to create a simpler algorithm that runs slower (but acceptably fast on such small instances), or how to create a heuristic.

Just to add to j_random's answer, you can isolate cycles using something like Tarjan's strongly connected components algorithm. Within a strongly connected component, you could use another method for sorting the items.

Related

Given a pile of items, split them into meaningful groups by comparing them

Given that I have a 'pile' of items that need to be split in groups, and given that I can express how much these items differ, relative to eachother, in a number, a score if you will, how would I separate this input into meaningful groups?
I recognise that this is a bit of an abstract question, so to try and make it clearer here is what I have tried so far:
I have tried representing the input as a weighted graph in which every vertex is connected to every other vertex, with the 'strength' of the edge being their relative score. Then I'd take the longest edge of the graph, and separate every other vertex by 'closeness' to the vertices at the end of that longest edge. This works reasonably well, but has the disadvantage of always yielding two groups for a result, which might not necessarily be logical.
For example: say I can express the differentness of fruits in a number. Then given a pile of apples, the different brand of apples would form different categories, like Elstar, Jonagold, what have you... But when I'd have a pile consisting of apples, pears, and oranges, then the apples would be relatively similar and should fall into the same category.
I'm guessing I'd have to remove every edge of the graph bigger than the mean plus the standard deviation or something like that, and then see how many disjointed subgraphs appear, but I'd like to hear the approach of someone with more mathematical knowledge than me.
This is a bit long for a comment.
What you are referring to is clustering. You seem to have a "distance" matrix between two items, although this is probably some inverse of the "strength" metric. A distance metric is non-negative and 0 when two things are equal. The larger the value the further apart the items.
When you have a generic "distance" matrix, a typical clustering method is hierarchical/agglomerative clustering ("distance" is in quotes because it might not meet all the formal qualities of a distance). A good place to start in understanding this technique is the Wikipedia page. The ideas behind hierarchical clustering can be applied to non-fully connected graphs.
I would expect almost every statistics package to include some form of hierarchical clusters.

Generating Random Puzzle Boards for Rush Hour Game

If you're not familiar with it, the game consists of a collection of cars of varying sizes, set either horizontally or vertically, on a NxM grid that has a single exit.
Each car can move forward/backward in the directions it's set in, as long as another car is not blocking it. You can never change the direction of a car.
There is one special car, usually it's the red one. It's set in the same row that the exit is in, and the objective of the game is to find a series of moves (a move - moving a car N steps back or forward) that will allow the red car to drive out of the maze.
I've been trying to think how to generate instances for this problem, generating levels of difficulty based on the minimum number to solve the board.
Any idea of an algorithm or a strategy to do that?
Thanks in advance!
The board given in the question has at most 4*4*4*5*5*3*5 = 24.000 possible configurations, given the placement of cars.
A graph with 24.000 nodes is not very large for todays computers. So a possible approach would be to
construct the graph of all positions (nodes are positions, edges are moves),
find the number of winning moves for all nodes (e.g. using Dijkstra) and
select a node with a large distance from the goal.
One possible approach would be creating it in reverse.
Generate a random board, that has the red car in the winning position.
Build the graph of all reachable positions.
Select a position that has the largest distance from every winning position.
The number of reachable positions is not that big (probably always below 100k), so (2) and (3) are feasible.
How to create harder instances through local search
It's possible that above approach will not yield hard instances, as most random instances don't give rise to a complex interlocking behavior of the cars.
You can do some local search, which requires
a way to generate other boards from an existing one
an evaluation/fitness function
(2) is simple, maybe use the length of the longest solution, see above. Though this is quite costly.
(1) requires some thought. Possible modifications are:
add a car somewhere
remove a car (I assume this will always make the board easier)
Those two are enough to reach all possible boards. But one might to add other ways, because of removing makes the board easier. Here are some ideas:
move a car perpendicularly to its driving direction
swap cars within the same lane (aaa..bb.) -> (bb..aaa.)
Hillclimbing/steepest ascend is probably bad because of the large branching factor. One can try to subsample the set of possible neighbouring boards, i.e., don't look at all but only at a few random ones.
I know this is ancient but I recently had to deal with a similar problem so maybe this could help.
Constructing instances by applying random operators from a terminal state (i.e., reverse) will not work well. This is due to the symmetry in the state space. On average you end up in a state that is too close to the terminal state.
Instead, what worked better was to generate initial states (by placing random cars on the grid) and then to try to solve it with some bounded heuristic search algorithm such as IDA* or branch and bound. If an instance cannot be solved under the bound, discard it.
Try to avoid A*. If you have your definition of what you mean is a "hard" instance (I find 16 moves to be pretty difficult) you can use A* with a pruning rule that prevents expansion of nodes x with g(x)+h(x)>T (T being your threshold (e.g., 16)).
Heuristics function - Since you don't have to be optimal when solving it, you can use any simple inadmissible heuristic such as number of obstacle squares to the goal. Alternatively, if you need a stronger heuristic function, you can implement a manhattan distance function by generating the entire set of winning states for the generated puzzle and then using the minimal distance from a current state to any of the terminal state.

Marauders dilemma algorithm

I'm making this repost after the earlier one here with more details.
PROBLEM :
The problem consists of a marauder who has to travel to different cities spread over a map. The starting location is known. Each city has a fixed loot associated with it. The aim of marauder is to travel across various nature of terrain. By nature of terrain, I mean there is a varied cost of travel between each pair of cities. He has to maximize the booty gained.
What we have done:
We have generated an adjacancy matrix (booty-path cost in place for each node) and then employed a heuristic analysis. It gave some output which is reasonable.
Now, the problem now is that each city has few or more vehicles in them, which can be bought (by paying) and can be used to travel. What vehicle does in actual is that it reduces the path cost. Once a vehicle is bought, it remains upto the time when next vehicle is bought. It is to upto to decide whether to buy the vehicle or not and how.
I need help at this point. How to integrate the idea of vehicle into what we already have? Plus, any further ideas which may help us to maximize the profit. I can post the code, if required. Thanks!
One way to do it would be to have a directed edge bearing the cost of the vehicle towards a duplicate graph with the reduced costs. You can even make it so that the reduction is finer than just a percentage if you want to.
The downside is that this will probably increase the size of the graph a lot (as many copies as you have different vehicles, plus the links between them), and if your heuristic is not optimal, you may have to modify it so that it considers the new edge positively.
It sounds as though beam search would suit this problem. Beam search uses a heuristic function H and a parameter k and works like this:
Initialize the set S to the initial game position.
Set T to the empty set.
For each game position in S, generate all possible successor positions to S after one move by the marauder. (A move being to loot, to purchase a vehicle, to move to an adjacent city, or whatever else a marauder can do.) Add each such successor position to the set T.
For each position p in T, evaluate H(p) for a heuristic function H. (The heuristic function can take into account the amount of loot, the possession of a vehicle, the number of remaining unlooted cities, and whatever else you think is relevant and easy to compute.)
If you've run out of search time, return the best-scoring position in T.
Otherwise, set S to the best-scoring k positions in T and go back to step 2.
The algorithm works well if you store T in the form of a heap with k elements.

Calculate minimum moves to solve a puzzle

I'm in the process of creating a game where the user will be presented with 2 sets of colored tiles. In order to ensure that the puzzle is solvable, I start with one set, copy it to a second set, then swap tiles from one set to another. Currently, (and this is where my issue lies) the number of swaps is determined by the level the user is playing - 1 swap for level 1, 2 swaps for level 2, etc. This same number of swaps is used as a goal in the game. The user must complete the puzzle by swapping a tile from one set to the other to make the 2 sets match (by color). The order of the tiles in the (user) solved puzzle doesn't matter as long as the 2 sets match.
The problem I have is that as the number of swaps I used to generate the puzzle approaches the number of tiles in each set, the puzzle becomes easier to solve. Basically, you can just drag from one set in whatever order you need for the second set and solve the puzzle with plenty of moves left. What I am looking to do is after I finish building the puzzle, calculate the minimum number of moves required to solve the puzzle. Again, this is almost always less than the number of swaps used to create the puzzle, especially as the number of swaps approaches the number of tiles in each set.
My goal is to calculate the best case scenario and then give the user a "fudge factor" (i.e. 1.2 times the minimum number of moves). Solving the puzzle in under this number of moves will result in passing the level.
A little background as to how I currently have the game configured:
Levels 1 to 10: 9 tiles in each set. 5 different color tiles.
Levels 11 to 20: 12 tiles in each set. 7 different color tiles.
Levels 21 to 25: 15 tiles in each set. 10 different color tiles.
Swapping within a set is not allowed.
For each level, there will be at least 2 tiles of a given color (one for each set in the solved puzzle).
Is there any type of algorithm anyone could recommend to calculate the minimum number of moves to solve a given puzzle?
The minimum moves to solve a puzzle is essentially the shortest path from that unsolved state to a solved state. Your game implicitly defines a graph where the vertices are legal states, and there's an edge between two states if there's a legal move that enables that transition.
Depending on the size of your search space, a simple breadth-first search would be feasible, and would give you the minimum number of steps to reach any given state. In fact, you can generate the problems this way too: instead of making random moves to arrive at a state and checking its "distance" from the initial state, simply explore the search space in breadth-first/level-order, and pick a state at a given "distance" for your puzzle.
Related questions
Rush Hour - Solving the Game
BFS is used to solve Rush Hour, with source code in Java
Alternative
IF the search space is too huge for BFS (and I'm not yet convinced that it is), you can use iterative deepening depth-first search instead. It's space-efficient like DFS, but (cummulatively) level-order like BFS. Even though nodes would be visited many times, it is still asymptotically identical to BFS, but requiring much leser space.
I didn't quite understand the puzzle from your description, but two general ideas often useful in solving that kind of puzzles are backtracking and branch and bound.
The A* search algorithm. The idea is that you have some measure of how close a position is to the solution. A* is then a "best first" search in the sense that at each step it considers moves from the best position found so far. It's up to you to come up with some kind of measure of how close you are to a solution. (It doesn't have to be accurate, it's just a heuristic to guide the search.) In practice it often performs much better than a pure breadth first search because it's always guided by your closeness scoring function. But without understanding your problem description, it's hard to say. (A rule of thumb is that if there's a sense of "making progress" while doing a puzzle, rather than it all suddenly coming together at the end, then A* is a good choice.)

How hard is this graph problem?

I have a problem to solve for a social-networks application, and it sounds hard: I'm not sure if its NP-complete or not. It smells like it might be NP-complete, but I don't have a good sense for these things. In any case, an algorithm would be much better news for me.
Anyhow, the input is some graph, and what I want to do is partition the nodes into two sets so that neither set contains a triangle. If it helps, I know this particular graph is 3-colorable, though I don't actually know a coloring.
Heuristically, a "greedy" algorithm seems to converge quickly: I just look for triangles in either side of the partition, and break them when I find them.
The decision version of problem is NP-Complete for general graphs: http://users.soe.ucsc.edu/~optas/papers/G-free-complex.pdf and is applicable not just for triangles.
Of course, that still does not help resolve the question for the search version of 3-colourable graphs and triangle freeness (the decision version is trivially in P).
Here's an idea that might work. I doubt this is the ideal algorithm, so other people, please build off of this.
Go through your graph and find all the triangles first. I know it seems ridiculous, but it wouldn't be too bad complexity-class wise, I think. You can find any triangles a given node is part of just by following all its edges three hops and seeing if you get to where you started. (I suspect there's a way to get all of the triangles in a graph that's faster than just finding the triangles for each node.)
Once you have the triangles, you should be able to split them any way you please. By definition, once you split them up, there are no more triangles left, so I don't think you have to worry about connections between the triangles or adjacent triangles or anything horrible like that.
This is not possible for any set with 5 tightly interconnected nodes, and I can prove it with a simple thought experiment. 5 tightly interconnected nodes is very common in social networks; a cursory glance at my facebook profile found with among my family members and one among a group of coworkers.
By 'tightly interconnected graph', I mean a set of nodes where the nodes have a connection to every other node. 5 nodes like this would look like a star within a pentagon.
Lets start with a set of 5 cousins named Anthony, Beatrice, Christopher, Daniel, and Elisabeth. As cousins, they are all connected to each other.
1) Lets put Anthony in Collection #1.
2) Lets put Beatrice in Collection #1.
3) Along comes Christopher through our algorithm... we can't put him in collection #1, since that would form a triangle. We put him in Collection #2.
4) Along comes Daniel. We can't put him in collection #1, because that would form a triangle, so we put him in Collection #2.
5) Along comes Elisabeth. We can't put her in Collection #1, because that would form a triangle with Anthony and Beatrice. We can't put her in Collection #2, because that would for a triangle with Christopher and Daniel.
Even if we varied the algorithm to put Beatruce in Collection #2, the thought experiment concludes with a similar problem. Reordering the people causes the same problem. No matter how you pace them, the 5th person cannot go anywhere - this is a variation of the 'pidgenhole principle'.
Even if you loosened the requirement to ask "what is the smallest number of graphs I can partition a graph into so that there are no triangles, I think this would turn into a variation of the Travelling Salesman problem, with no definitive solution.
MY ANSWER IS WRONG
I'll keep it up for discussion. Please don't down vote, the idea might still be helpful.
I'm going to argue that it's NP-hard based on the claim that coloring a 3-colorable graph with 4 colors is NP-hard (On the hardness of 4-coloring a 3-collorable graph).
We give a new proof showing that it is NP-hard to color a 3-colorable graph using just four colors. This result is already known, but our proof is novel as [...]
Suppose we can partition a 3-colorable graph into 2 sets A, B, such that neither has a triangle, in polynomial time. Then we can solve the 4-coloring as follows:
color set A with C1,C2 and set B with C3,C4.
each set is 2-colorable since it has no triangle <- THIS IS WHERE I GOT IT WRONG
2-coloring a 2-colorable graph is polynomial
we have then 4-colored a 3-colorable graph in polynomial time
Through this reduction, I claim that what you are doing must be NP-hard.
This problem has an O(n^5) algorithm I think, where n is the number of vertices.

Resources