Build top-list from player-player duel result - algorithm

I have a list of duels between players. The data consists of 2 user IDs, where the first one is the winner.
How can I build a graph of this list to find the best players?
Furthermore, how do I decide what it means to be best?
Perhaps players should be ranked by the number of opponents beaten, and the rank of those opponents (recursively).
I have previously tried doing this using the PageRank algorithm, but it does not account for losses in a good way (i.e. the rank should go down from a loss).
For example:
1 won against 3
1 won against 4
1 won against 5
2 won against 1
This list should put 2 at the top, because it beat 1.
This presents one problem - it should be required to duel those with a high rank.
Those who have not dueled players above a certain rank should be told to do so, in order to be in the top-list.

Define player X beating player Y as a relation such that there exists vertices X and Y and there exists an edge from Y to X.
Then, after processing all game information, you may run DFS on the graph, recording in some array A the nodes from which you can not traverse deeper.
As you did not specify that the given graph is a tree, considering also that the edges are directed, there are no guarantees that DFS starting from any node will converge to a single root, so you need to keep some sort of a list of such nodes that beat others.
Once that initial traversal is done, reverse all the edges, and run a DFS for each tree in the forest that is your graph and root of each is an element in A. As you traverse the tree rooted in A[i], record in each node the depth it is located at, relative to the root node, A[i].
Then on, depending on your definition of top players, you may traverse the roots in A and go as deep as that definition allows you to, picking every element you encounter. If the final list you require should actually sort the nodes descendant of different roots in A, you may sort the final structure you will have your list in, using the depth as the comparison criterion. Aside from the final sorting I mentioned, as all we did so far is DFS, this approach is O(V+E), V being the number of vertices and E, the number of edges. If you take into account the sorting of elements in different trees, then you'd have an overall complexity of O((V+E) + VlogV).
If you are willing to sacrifice a bit more of the performance, then you may connect the roots in A to a global root R (i.e. add node R to graph and edges from R to each A[i]) and run Dijkstra's algorithm, visiting the nodes with less depth first, and basically appending each visited node to your list until you consider you list is large enough, based on your definition of top players.
Note that this solution does not work if you have cycles in the graph, regardless of whether you use DFS or Dijkstra's for the final traversal. However it may be adapted to players having multiple matches by using edges with positive weights. An edge from X to Y with weight k would then indicate the number of times X defeated Y, which you will take into account while updating the depth of node during your traversal with DFS.

Related

AI minimax algorithm to process a state reachable from root through multiple paths, of different lengths

A standard minimax algorithm considers root level as MAX and subsequent levels alternating between MIN and MAX. Consider a tree-node that can be reached through more than one paths. If the difference in path lengths is odd, it implies different levels so should that node be MIN or MAX ? Is it more likely if branching factor > 2 ? If not possible, please explain why.
Consider the state node N which is reachable from root through multiple paths. The downward path from N to the goal states should be identical.
Keeping in mind the purpose of minimax algorithm, the node N (and its downstream subtree) should be duplicated in the tree to create a loop-free star topology. The two nodes (and downstream nodes in the subtrees) will now operate as MIN or MAX as per individual path lengths.
Let us define a "node" as a full state, which includes which player is to move (the MAX or the MIN player). Then, it is not possible to have a single node that is reachable both by MIN and MAX, because they would be, by definition, different nodes.
In chess, you can reach the exact same position of pieces in the board with the only difference being whether white or black is to move. But those are fundamentally different game-states, and therefore different nodes in the tree!
So, to answer your questions:
player-to-move is an important part of node identity
this makes it impossible to reach the same node through odd-length paths from the root, by the very definition of node identity. If it is reachable through an odd-length path, it is a different node.
high branching factors (assuming same total number of possible nodes) make it more likely to find previously-encountered nodes (more positions would be repeated in chess if, say, pawns could go backwards) -- but do not alter the above.

Depth First Search - Advantages of traversing random neighbours

Depth First Search allows to traverse adjacent vertices in an arbitrary order.
Are there any advantages in choosing random neighbours vs choosing neighbours in ascending fashion?
Consider the exploration order of following graph:
0 -> 9 -> 8 -> 7 ..
0 -> 1 -> 8 -> 7 ..
Can random choice lead to more favourable results?
Can I find a situation in which there is an advantage?
Yes. Easily. If I have a connected graph, traversing with random choices is guaranteed to eventually reach every node. Traversing in order is guaranteed to eventually wind up in a loop.
Is there an advantage in general?
No. For example in the connected example, a simple "keep track of where we have been" makes both reach everything. Which one will find a target node first will be a question of chance.
First of all, the DFS is not responsible for the arbitrary order.
The way of traversing the nodes depend on 2 things:
The order in which you pushed the adjacent nodes in the adjacency list (Generally influenced by the order of the edges provided in the input).
The custom order which you decide for traversal (as you mentioned, the sorted order).
Answer to your question:
Choosing the order in which you pushed the adjacent nodes in the adjacency list does not affect the complexity.
If you decide to traverse adjacent nodes in sorted, then you need to maintain the adjacency list accordingly, which comes at the cost of extra factor of V*log(V).
Overall time complexity = O(V+E) + O(V * log(V)).
O(V+E) => for DFS.
O(V * log(V)) => for priority queue/sorting the adjacency list.
Here, V = number of nodes in the graph and E = number of edges.

how to partition the nodes of an undirected graph into k sets

I have an undirected graph G=(V,E) where each vertex represents a router in a large network. Each edge represents a network hop from one router to the other therefore, all edges have the same weight. I wish to partition this network of routers into 3 or k different sets clustered by Hop count.
Motivation:
The idea is to replicate some data in routers contained in each of these 3 sets. This is so that whenever a node( or client or whatever) in the network graph requests for a certain data item, I can search for it in the 3 sets and get a responsible node(one that has cached that particular data) from each set. Then I'd select the node which is at the minimum hop count away from the requesting node and continue with my algorithms and tests.
The cache distribution and request response are not in the scope of this question. I just need a way to partition the network into 3 sets so that I can perform the above operations on it.
Which clustering algorithm could be used in such a scenario. I have almost 9000 nodes in the graph and I wish to get 3 sets of ~3000 nodes each
In the graph case, a clustering method based on minimum spanning trees can be used.
The regular algorithm is the following:
Find the minimum spanning tree of the graph.
Remove the k - 1 longest edges in the spanning tree, where k is the desired number of clusters.
However, this works only if the edges differ in length (or weight). In the case of edges of equal length, every spanning tree is a minimum one so this would not work well. However, putting a little thinking into it, a different algorithm came to my mind which uses BFS.
The algorithm:
1. for i = 1..k do // for each cluster
2. choose the number of nodes N in cluster i
3. choose an arbitrary node n
4. run breadth-first search (BFS) from n until N
5. assign the first N nodes (incl. n) tapped by the BFS to the i-th cluster
6. remove these nodes (and the incident edges) from the graph
7. done
This algorithm (the results) hugely depends on how step 3, i.e. choosing the "root" node of a cluster, is implemented. For the sake of simplicity I choose an arbitrary node, but it could be more sophisticated. The best nodes are those that are the at the "end" of the graph. You could find a center of the graph (a node that has the lowest sum of lengths of paths to all other nodes) and then use the nodes that are the furthes from this center.
The real issue is that your edges are equal (if I understood your problem statement well) and you have no information about the nodes (i.e. their coordinates - then you could use e.g. k-means).

Explore every node in a graph

I am given a connected graph with N nodes (numbered from 1..N) and M bidirectional edges consisting in a couple (A,B). Edges are unweighted.
I have K people starting at node 1 and I want to explore every node of the graph. I takes one unit of time to a person to travel from one node to one of its neighbor.
How long will it take to explore every node? I am searching for an efficient algorithm to compute the minimum traversal time, but I am afraid it is an NP-complete problem. (The constraints on the number of edges and number of people are small though).
Suppose K were 1. Then the minimisation problem reduces to finding a minimum length path that touches every node at least once.
If we construct a new weighted graph G' with the same nodes and with edges between every two nodes whose weight is the minimum distance between those nodes in the original graph, then the minimum length path through all the nodes in G is the minimum length Hamiltonian path through G', the travelling salesperson problem, which is well-known to be NP-complete.
So for at least one value of K, the problem is NP-complete. However, for large values of K (say, ≥ N), we can produce a minimum solution in much less time, since we can just construt the minimum spanning tree and find the distance of the furthest element. I doubt whether there is any such simplified solution for small values of K, but I'd definitely use the MST as a heuristic for finding a reasonable solution.
To me this seems like BFS.
You can view the graph like a tree, where the starting node is the root. From this perspective, the answer would be the deepest leaf(farthest away from the root) if the number of leafs <= the number of people, with the answer being the depth of that leaf.
This is correct because if each person visits every leaf, then in the process of doing so visits all the nodes are visited.
However, if there are still nodes left unvisited after every person visits a leaf, then you have add to the answer the max time it takes(or distance) for the closest person to visit the unvisited node.
This is not the complete answer, however. It's more complicated than that.
If you have the following image:
You wouldn't want to blindly bfs. You would want to visit the nodes in order from least deep to deepest, as that way you don't have to go down and up the path again. For example, 0 -> 1 -> 0 -> 2 -> 0 -> 3 -> 0 -> 4 is more efficient than 0 -> 4 -> 0 -> 3 -> 0 -> 2 -> 0 -> 1. The reason this is correct is because you can only save time on your last traversal, so you want to make that one the longest.
Furthermore, perhaps getting a person from a different branch to visit to unvisited node may be more efficient(to help the person from the current branch), so you want to assign unvisited nodes to the people of the surrounding branches if the time it takes for that person to get to 0 is less than the time it takes for the person(s) in the current branch to visit all the nodes in this branch. If one person from a branch can be assigned to multiple other branches, you want to take the branch that has the greatest number of unvisited nodes. This "helper" person(s) is also why you want to visit the nodes from least deep to deepest in order instead of just visiting the deepest node last.
All that might sound confusing, but the key is BFS. That's the solution to your problem. It's basically BFS with modifications.
As to implementation, you can use recursions(or stacks), which are usually used for tree traversals. And note that for the case that unvisited nodes > num of people, you don't have to simulate the traveling to the remaining unvisited nodes.

Find all subtrees of size N in an undirected graph

Given an undirected graph, I want to generate all subgraphs which are trees of size N, where size refers to the number of edges in the tree.
I am aware that there are a lot of them (exponentially many at least for graphs with constant connectivity) - but that's fine, as I believe the number of nodes and edges makes this tractable for at least smallish values of N (say 10 or less).
The algorithm should be memory-efficient - that is, it shouldn't need to have all graphs or some large subset of them in memory at once, since this is likely to exceed available memory even for relatively small graphs. So something like DFS is desirable.
Here's what I'm thinking, in pseudo-code, given the starting graph graph and desired length N:
Pick any arbitrary node, root as a starting point and call alltrees(graph, N, root)
alltrees(graph, N, root)
given that node root has degree M, find all M-tuples with integer, non-negative values whose values sum to N (for example, for 3 children and N=2, you have (0,0,2), (0,2,0), (2,0,0), (0,1,1), (1,0,1), (1,1,0), I think)
for each tuple (X1, X2, ... XM) above
create a subgraph "current" initially empty
for each integer Xi in X1...XM (the current tuple)
if Xi is nonzero
add edge i incident on root to the current tree
add alltrees(graph with root removed, N-1, node adjacent to root along edge i)
add the current tree to the set of all trees
return the set of all trees
This finds only trees containing the chosen initial root, so now remove this node and call alltrees(graph with root removed, N, new arbitrarily chosen root), and repeat until the size of the remaining graph < N (since no trees of the required size will exist).
I forgot also that each visited node (each root for some call of alltrees) needs to be marked, and the set of children considered above should only be the adjacent unmarked children. I guess we need to account for the case where no unmarked children exist, yet depth > 0, this means that this "branch" failed to reach the required depth, and cannot form part of the solution set (so the whole inner loop associated with that tuple can be aborted).
So will this work? Any major flaws? Any simpler/known/canonical way to do this?
One issue with the algorithm outlined above is that it doesn't satisfy the memory-efficient requirement, as the recursion will hold large sets of trees in memory.
This needs an amount of memory that is proportional to what is required to store the graph. It will return every subgraph that is a tree of the desired size exactly once.
Keep in mind that I just typed it into here. There could be bugs. But the idea is that you walk the nodes one at a time, for each node searching for all trees that include that node, but none of the nodes that were searched previously. (Because those have already been exhausted.) That inner search is done recursively by listing edges to nodes in the tree, and for each edge deciding whether or not to include it in your tree. (If it would make a cycle, or add an exhausted node, then you can't include that edge.) If you include it your tree then the used nodes grow, and you have new possible edges to add to your search.
To reduce memory use, the edges that are left to look at is manipulated in place by all of the levels of the recursive call rather than the more obvious approach of duplicating that data at each level. If that list was copied, your total memory usage would get up to the size of the tree times the number of edges in the graph.
def find_all_trees(graph, tree_length):
exhausted_node = set([])
used_node = set([])
used_edge = set([])
current_edge_groups = []
def finish_all_trees(remaining_length, edge_group, edge_position):
while edge_group < len(current_edge_groups):
edges = current_edge_groups[edge_group]
while edge_position < len(edges):
edge = edges[edge_position]
edge_position += 1
(node1, node2) = nodes(edge)
if node1 in exhausted_node or node2 in exhausted_node:
continue
node = node1
if node1 in used_node:
if node2 in used_node:
continue
else:
node = node2
used_node.add(node)
used_edge.add(edge)
edge_groups.append(neighbors(graph, node))
if 1 == remaining_length:
yield build_tree(graph, used_node, used_edge)
else:
for tree in finish_all_trees(remaining_length -1
, edge_group, edge_position):
yield tree
edge_groups.pop()
used_edge.delete(edge)
used_node.delete(node)
edge_position = 0
edge_group += 1
for node in all_nodes(graph):
used_node.add(node)
edge_groups.append(neighbors(graph, node))
for tree in finish_all_trees(tree_length, 0, 0):
yield tree
edge_groups.pop()
used_node.delete(node)
exhausted_node.add(node)
Assuming you can destroy the original graph or make a destroyable copy I came up to something that could work but could be utter sadomaso because I did not calculate its O-Ntiness. It probably would work for small subtrees.
do it in steps, at each step:
sort the graph nodes so you get a list of nodes sorted by number of adjacent edges ASC
process all nodes with the same number of edges of the first one
remove those nodes
For an example for a graph of 6 nodes finding all size 2 subgraphs (sorry for my total lack of artistic expression):
Well the same would go for a bigger graph, but it should be done in more steps.
Assuming:
Z number of edges of most ramificated node
M desired subtree size
S number of steps
Ns number of nodes in step
assuming quicksort for sorting nodes
Worst case:
S*(Ns^2 + MNsZ)
Average case:
S*(NslogNs + MNs(Z/2))
Problem is: cannot calculate the real omicron because the nodes in each step will decrease depending how is the graph...
Solving the whole thing with this approach could be very time consuming on a graph with very connected nodes, however it could be paralelized, and you could do one or two steps, to remove dislocated nodes, extract all subgraphs, and then choose another approach on the remainder, but you would have removed a lot of nodes from the graph so it could decrease the remaining run time...
Unfortunately this approach would benefit the GPU not the CPU, since a LOT of nodes with the same number of edges would go in each step.... and if parallelization is not used this approach is probably bad...
Maybe an inverse would go better with the CPU, sort and proceed with nodes with the maximum number of edges... those will be probably less at start, but you will have more subgraphs to extract from each node...
Another possibility is to calculate the least occuring egde count in the graph and start with nodes that have it, that would alleviate the memory usage and iteration count for extracting subgraphs...
Unless I'm reading the question wrong people seem to be overcomplicating it.
This is just "all possible paths within N edges" and you're allowing cycles.
This, for two nodes: A, B and one edge your result would be:
AA, AB, BA, BB
For two nodes, two edges your result would be:
AAA, AAB, ABA, ABB, BAA, BAB, BBA, BBB
I would recurse into a for each and pass in a "template" tuple
N=edge count
TempTuple = Tuple_of_N_Items ' (01,02,03,...0n) (Could also be an ordered list!)
ListOfTuple_of_N_Items ' Paths (could also be an ordered list!)
edgeDepth = N
Method (Nodes, edgeDepth, TupleTemplate, ListOfTuples, EdgeTotal)
edgeDepth -=1
For Each Node In Nodes
if edgeDepth = 0 'Last Edge
ListOfTuples.Add New Tuple from TupleTemplate + Node ' (x,y,z,...,Node)
else
NewTupleTemplate = TupleTemplate + Node ' (x,y,z,Node,...,0n)
Method(Nodes, edgeDepth, NewTupleTemplate, ListOfTuples, EdgeTotal
next
This will create every possible combination of vertices for a given edge count
What's missing is the factory to generate tuples given an edge count.
You end up with a list of possible paths and the operation is Nodes^(N+1)
If you use ordered lists instead of tuples then you don't need to worry about a factory to create the objects.
If memory is the biggest problem you can use a NP-ish solution using tools from formal verification. I.e., guess a subset of nodes of size N and check whether it's a graph or not. To save space you can use a BDD (http://en.wikipedia.org/wiki/Binary_decision_diagram) to represent the original graph's nodes and edges. Plus you can use a symbolic algorithm to check if the graph you guessed is really a graph - so you don't need to construct the original graph (nor the N-sized graphs) at any point. Your memory consumption should be (in big-O) log(n) (where n is the size of the original graph) to store the original graph, and another log(N) to store every "small graph" you want.
Another tool (which is supposed to be even better) is to use a SAT solver. I.e., construct a SAT formula that is true iff the sub-graph is a graph and supply it to a SAT solver.
For a graph of Kn there are approximately n! paths between any two pairs of vertices. I haven't gone through your code but here is what I would do.
Select a pair of vertices.
Start from a vertex and try to reach the destination vertex recursively (something like dfs but not exactly). I think this would output all the paths between the chosen vertices.
You could do the above for all possible pairs of vertices to get all simple paths.
It seems that the following solution will work.
Go over all partitions into two parts of the set of all vertices. Then count the number of edges which endings lie in different parts (k); these edges correspond to the edge of the tree, they connect subtrees for the first and the second parts. Calculate the answer for both parts recursively (p1, p2). Then the answer for the entire graph can be calculated as sum over all such partitions of k*p1*p2. But all trees will be considered N times: once for each edge. So, the sum must be divided by N to get the answer.
Your solution as is doesn't work I think, although it can be made to work. The main problem is that the subproblems may produce overlapping trees so when you take the union of them you don't end up with a tree of size n. You can reject all solutions where there is an overlap, but you may end up doing a lot more work than needed.
Since you are ok with exponential runtime, and potentially writing 2^n trees out, having V.2^V algorithms is not not bad at all. So the simplest way of doing it would be to generate all possible subsets n nodes, and then test each one if it forms a tree. Since testing whether a subset of nodes form a tree can take O(E.V) time, we are potentially talking about V^2.V^n time, unless you have a graph with O(1) degree. This can be improved slightly by enumerating subsets in a way that two successive subsets differ in exactly one node being swapped. In that case, you just have to check if the new node is connected to any of the existing nodes, which can be done in time proportional to number of outgoing edges of new node by keeping a hash table of all existing nodes.
The next question is how do you enumerate all the subsets of a given size
such that no more than one element is swapped between succesive subsets. I'll leave that as an exercise for you to figure out :)
I think there is a good algorithm (with Perl implementation) at this site (look for TGE), but if you want to use it commercially you'll need to contact the author. The algorithm is similar to yours in the question but avoids the recursion explosion by making the procedure include a current working subtree as a parameter (rather than a single node). That way each edge emanating from the subtree can be selectively included/excluded, and recurse on the expanded tree (with the new edge) and/or reduced graph (without the edge).
This sort of approach is typical of graph enumeration algorithms -- you usually need to keep track of a handful of building blocks that are themselves graphs; if you try to only deal with nodes and edges it becomes intractable.
This algorithm is big and not easy one to post here. But here is link to reservation search algorithm using which you can do what you want. This pdf file contains both algorithms. Also if you understand russian you can take a look to this.
So you have a graph with with edges e_1, e_2, ..., e_E.
If I understand correctly, you are looking to enumerate all subgraphs which are trees and contain N edges.
A simple solution is to generate each of the E choose N subgraphs and check if they are trees.
Have you considered this approach? Of course if E is too large then this is not viable.
EDIT:
We can also use the fact that a tree is a combination of trees, i.e. that each tree of size N can be "grown" by adding an edge to a tree of size N-1. Let E be the set of edges in the graph. An algorithm could then go something like this.
T = E
n = 1
while n<N
newT = empty set
for each tree t in T
for each edge e in E
if t+e is a tree of size n+1 which is not yet in newT
add t+e to newT
T = newT
n = n+1
At the end of this algorithm, T is the set of all subtrees of size N. If space is an issue, don't keep a full list of the trees, but use a compact representation, for instance implement T as a decision tree using ID3.
I think problem is under-specified. You mentioned that graph is undirected and that subgraph you are trying to find is of size N. What is missing is number of edges and whenever trees you are looking for binary or you allowed to have multi-trees. Also - are you interested in mirrored reflections of same tree, or in other words does order in which siblings are listed matters at all?
If single node in a tree you trying to find allowed to have more than 2 siblings which should be allowed given that you don't specify any restriction on initial graph and you mentioned that resulting subgraph should contain all nodes.
You can enumerate all subgraphs that have form of tree by performing depth-first traversal. You need to repeat traversal of the graph for every sibling during traversal. When you'll need to repeat operation for every node as a root.
Discarding symmetric trees you will end up with
N^(N-2)
trees if your graph is fully connected mesh or you need to apply Kirchhoff's Matrix-tree theorem

Resources