Explore every node in a graph - algorithm

I am given a connected graph with N nodes (numbered from 1..N) and M bidirectional edges consisting in a couple (A,B). Edges are unweighted.
I have K people starting at node 1 and I want to explore every node of the graph. I takes one unit of time to a person to travel from one node to one of its neighbor.
How long will it take to explore every node? I am searching for an efficient algorithm to compute the minimum traversal time, but I am afraid it is an NP-complete problem. (The constraints on the number of edges and number of people are small though).

Suppose K were 1. Then the minimisation problem reduces to finding a minimum length path that touches every node at least once.
If we construct a new weighted graph G' with the same nodes and with edges between every two nodes whose weight is the minimum distance between those nodes in the original graph, then the minimum length path through all the nodes in G is the minimum length Hamiltonian path through G', the travelling salesperson problem, which is well-known to be NP-complete.
So for at least one value of K, the problem is NP-complete. However, for large values of K (say, ≥ N), we can produce a minimum solution in much less time, since we can just construt the minimum spanning tree and find the distance of the furthest element. I doubt whether there is any such simplified solution for small values of K, but I'd definitely use the MST as a heuristic for finding a reasonable solution.

To me this seems like BFS.
You can view the graph like a tree, where the starting node is the root. From this perspective, the answer would be the deepest leaf(farthest away from the root) if the number of leafs <= the number of people, with the answer being the depth of that leaf.
This is correct because if each person visits every leaf, then in the process of doing so visits all the nodes are visited.
However, if there are still nodes left unvisited after every person visits a leaf, then you have add to the answer the max time it takes(or distance) for the closest person to visit the unvisited node.
This is not the complete answer, however. It's more complicated than that.
If you have the following image:
You wouldn't want to blindly bfs. You would want to visit the nodes in order from least deep to deepest, as that way you don't have to go down and up the path again. For example, 0 -> 1 -> 0 -> 2 -> 0 -> 3 -> 0 -> 4 is more efficient than 0 -> 4 -> 0 -> 3 -> 0 -> 2 -> 0 -> 1. The reason this is correct is because you can only save time on your last traversal, so you want to make that one the longest.
Furthermore, perhaps getting a person from a different branch to visit to unvisited node may be more efficient(to help the person from the current branch), so you want to assign unvisited nodes to the people of the surrounding branches if the time it takes for that person to get to 0 is less than the time it takes for the person(s) in the current branch to visit all the nodes in this branch. If one person from a branch can be assigned to multiple other branches, you want to take the branch that has the greatest number of unvisited nodes. This "helper" person(s) is also why you want to visit the nodes from least deep to deepest in order instead of just visiting the deepest node last.
All that might sound confusing, but the key is BFS. That's the solution to your problem. It's basically BFS with modifications.
As to implementation, you can use recursions(or stacks), which are usually used for tree traversals. And note that for the case that unvisited nodes > num of people, you don't have to simulate the traveling to the remaining unvisited nodes.

Related

Center of a graph

Given an unoriented tree with weightless edges with N vertices and N-1 edges and a number K find K nodes so that every node from a tree is within S distance of at least one of the K nodes. Also, S has to be the smallest possible S, so that if there were S' < S at least one node would be unreachable in S' steps.
I tried solving this problem, however, I feel that my supposed solution is not very fast.
My solution:
set x=1
find nodes which are x distance from every node
let the node which has the most nodes in its distance be one of the K nodes.
recompute for every node whilst not counting already covered nodes.
do this till I find K number of K nodes. Then if every node is covered we are done else increase x.
This problem is called p-center, and you can find several papers online about it such as this. It is indeed NP for general graphs, but polynomial on trees, both weighted and unweighted.
For me it looks like a clustering problem. Try it with the k-Means (wikipedia) algorithm where k equals to your K. Since you have a tree and all vertices are connected, you can use as distance measurement the distance/number of edges between your vertices.
When the algorithm converts you get the K nodes which should be found. Then you can determine S by iterating through all k clusters. There you calculate the maximum distance for every node in the cluster to the center node. And the overall max should be S.
Update: But actually I see that the k-means algorithm does not produce a global optimum, so this algorithm wouldn't also produce the best result ...
You say N nodes and N-1 vertices so your graph is a tree. You are actually looking for a connected K-subset of nodes minimizing the longest edge.
A polynomial algorithm may be:
Sort all your edges increasing distance.
Then loop on edges:
if none of the 2 nodes are in a group, create a new group.
else if one node is in 1 existing goup, add the other to the group
else both nodes are in 2 different groups, then fuse the groups
When a group reach K, break the loop and you have your connected K-subset.
Nevertheless, you have to note that your group can contain more than K nodes. You can imagine the problem of having 4 nodes, closed two by two. There would be no exact 3-subset solution of your problem.

Build top-list from player-player duel result

I have a list of duels between players. The data consists of 2 user IDs, where the first one is the winner.
How can I build a graph of this list to find the best players?
Furthermore, how do I decide what it means to be best?
Perhaps players should be ranked by the number of opponents beaten, and the rank of those opponents (recursively).
I have previously tried doing this using the PageRank algorithm, but it does not account for losses in a good way (i.e. the rank should go down from a loss).
For example:
1 won against 3
1 won against 4
1 won against 5
2 won against 1
This list should put 2 at the top, because it beat 1.
This presents one problem - it should be required to duel those with a high rank.
Those who have not dueled players above a certain rank should be told to do so, in order to be in the top-list.
Define player X beating player Y as a relation such that there exists vertices X and Y and there exists an edge from Y to X.
Then, after processing all game information, you may run DFS on the graph, recording in some array A the nodes from which you can not traverse deeper.
As you did not specify that the given graph is a tree, considering also that the edges are directed, there are no guarantees that DFS starting from any node will converge to a single root, so you need to keep some sort of a list of such nodes that beat others.
Once that initial traversal is done, reverse all the edges, and run a DFS for each tree in the forest that is your graph and root of each is an element in A. As you traverse the tree rooted in A[i], record in each node the depth it is located at, relative to the root node, A[i].
Then on, depending on your definition of top players, you may traverse the roots in A and go as deep as that definition allows you to, picking every element you encounter. If the final list you require should actually sort the nodes descendant of different roots in A, you may sort the final structure you will have your list in, using the depth as the comparison criterion. Aside from the final sorting I mentioned, as all we did so far is DFS, this approach is O(V+E), V being the number of vertices and E, the number of edges. If you take into account the sorting of elements in different trees, then you'd have an overall complexity of O((V+E) + VlogV).
If you are willing to sacrifice a bit more of the performance, then you may connect the roots in A to a global root R (i.e. add node R to graph and edges from R to each A[i]) and run Dijkstra's algorithm, visiting the nodes with less depth first, and basically appending each visited node to your list until you consider you list is large enough, based on your definition of top players.
Note that this solution does not work if you have cycles in the graph, regardless of whether you use DFS or Dijkstra's for the final traversal. However it may be adapted to players having multiple matches by using edges with positive weights. An edge from X to Y with weight k would then indicate the number of times X defeated Y, which you will take into account while updating the depth of node during your traversal with DFS.

how to partition the nodes of an undirected graph into k sets

I have an undirected graph G=(V,E) where each vertex represents a router in a large network. Each edge represents a network hop from one router to the other therefore, all edges have the same weight. I wish to partition this network of routers into 3 or k different sets clustered by Hop count.
Motivation:
The idea is to replicate some data in routers contained in each of these 3 sets. This is so that whenever a node( or client or whatever) in the network graph requests for a certain data item, I can search for it in the 3 sets and get a responsible node(one that has cached that particular data) from each set. Then I'd select the node which is at the minimum hop count away from the requesting node and continue with my algorithms and tests.
The cache distribution and request response are not in the scope of this question. I just need a way to partition the network into 3 sets so that I can perform the above operations on it.
Which clustering algorithm could be used in such a scenario. I have almost 9000 nodes in the graph and I wish to get 3 sets of ~3000 nodes each
In the graph case, a clustering method based on minimum spanning trees can be used.
The regular algorithm is the following:
Find the minimum spanning tree of the graph.
Remove the k - 1 longest edges in the spanning tree, where k is the desired number of clusters.
However, this works only if the edges differ in length (or weight). In the case of edges of equal length, every spanning tree is a minimum one so this would not work well. However, putting a little thinking into it, a different algorithm came to my mind which uses BFS.
The algorithm:
1. for i = 1..k do // for each cluster
2. choose the number of nodes N in cluster i
3. choose an arbitrary node n
4. run breadth-first search (BFS) from n until N
5. assign the first N nodes (incl. n) tapped by the BFS to the i-th cluster
6. remove these nodes (and the incident edges) from the graph
7. done
This algorithm (the results) hugely depends on how step 3, i.e. choosing the "root" node of a cluster, is implemented. For the sake of simplicity I choose an arbitrary node, but it could be more sophisticated. The best nodes are those that are the at the "end" of the graph. You could find a center of the graph (a node that has the lowest sum of lengths of paths to all other nodes) and then use the nodes that are the furthes from this center.
The real issue is that your edges are equal (if I understood your problem statement well) and you have no information about the nodes (i.e. their coordinates - then you could use e.g. k-means).

How to detect if the given graph has a cycle containing all of its nodes? Does the suggested algorithm have any flaws?

I have a connected, non-directed, graph with N nodes and 2N-3 edges. You can consider the graph as it is built onto an existing initial graph, which has 3 nodes and 3 edges. Every node added onto the graph and has 2 connections with the existing nodes in the graph. When all nodes are added to the graph (N-3 nodes added in total), the final graph is constructed.
Originally I'm asked, what is the maximum number of nodes in this graph that can be visited exactly once (except for the initial node), i.e., what is the maximum number of nodes contained in the largest Hamiltonian path of the given graph? (Okay, saying largest Hamiltonian path is not a valid phrase, but considering the question's nature, I need to find a max. number of nodes that are visited once and the trip ends at the initial node. I thought it can be considered as a sub-graph which is Hamiltonian, and consists max. number of nodes, thus largest possible Hamiltonian path).
Since i'm not asked to find a path, I should check if a hamiltonian path exists for given number of nodes first. I know that planar graphs and cycle graphs (Cn) are hamiltonian graphs (I also know Ore's theorem for Hamiltonian graphs, but the graph I will be working on will not be a dense graph with a great probability, thus making Ore's theorem pretty much useless in my case). Therefore I need to find an algorithm for checking if the graph is cycle graph, i.e. does there exist a cycle which contains all the nodes of the given graph.
Since DFS is used for detecting cycles, I thought some minor manipulation to the DFS can help me detect what I am looking for, as in keeping track of explored nodes, and finally checking if the last node visited has a connection to the initial node. Unfortunately
I could not succeed with that approach.
Another approach I tried was excluding a node, and then try to reach to its adjacent node starting from its other adjacent node. That algorithm may not give correct results according to the chosen adjacent nodes.
I'm pretty much stuck here. Can you help me think of another algorithm to tell me if the graph is a cycle graph?
Edit
I realized by the help of the comment (thank you for it n.m.):
A cycle graph consists of a single cycle and has N edges and N vertices. If there exist a cycle which contains all the nodes of the given graph, that's a Hamiltonian cycle. – n.m.
that I am actually searching for a Hamiltonian path, which I did not intend to do so:)
On a second thought, I think checking the Hamiltonian property of the graph while building it will be more efficient, which is I'm also looking for: time efficiency.
After some thinking, I thought whatever the number of nodes will be, the graph seems to be Hamiltonian due to node addition criteria. The problem is I can't be sure and I can't prove it. Does adding nodes in that fashion, i.e. adding new nodes with 2 edges which connect the added node to the existing nodes, alter the Hamiltonian property of the graph? If it doesn't alter the Hamiltonian property, how so? If it does alter, again, how so? Thanks.
EDIT #2
I, again, realized that building the graph the way I described might alter the Hamiltonian property. Consider an input given as follows:
1 3
2 3
1 5
1 3
these input says that 4th node is connected to node 1 and node 3, 5th to node 2 and node 3 . . .
4th and 7th node are connected to the same nodes, thus lowering the maximum number of nodes that can be visited exactly once, by 1. If i detect these collisions (NOT including an input such as 3 3, which is an example that you suggested since the problem states that the newly added edges are connected to 2 other nodes) and lower the maximum number of nodes, starting from N, I believe I can get the right result.
See, I do not choose the connections, they are given to me and I have to find the max. number of nodes.
I think counting the same connections while building the graph and subtracting the number of same connections from N will give the right result? Can you confirm this or is there a flaw with this algorithm?
What we have in this problem is a connected, non-directed graph with N nodes and 2N-3 edges. Consider the graph given below,
A
/ \
B _ C
( )
D
The Graph does not have a Hamiltonian Cycle. But the Graph is constructed conforming to your rules of adding nodes. So searching for a Hamiltonian Cycle may not give you the solution. More over even if it is possible Hamiltonian Cycle detection is an NP-Complete problem with O(2N) complexity. So the approach may not be ideal.
What I suggest is to use a modified version of Floyd's Cycle Finding algorithm (Also called the Tortoise and Hare Algorithm).
The modified algorithm is,
1. Initialize a List CYC_LIST to ∅.
2. Add the root node to the list CYC_LIST and set it as unvisited.
3. Call the function Floyd() twice with the unvisited node in the list CYC_LIST for each of the two edges. Mark the node as visited.
4. Add all the previously unvisited vertices traversed by the Tortoise pointer to the list CYC_LIST.
5. Repeat steps 3 and 4 until no more unvisited nodes remains in the list.
6. If the list CYC_LIST contains N nodes, then the Graph contains a Cycle involving all the nodes.
The algorithm calls Floyd's Cycle Finding Algorithm a maximum of 2N times. Floyd's Cycle Finding algorithm takes a linear time ( O(N) ). So the complexity of the modied algorithm is O(N2) which is much better than the exponential time taken by the Hamiltonian Cycle based approach.
One possible problem with this approach is that it will detect closed paths along with cycles unless stricter checking criteria are implemented.
Reply to Edit #2
Consider the Graph given below,
A------------\
/ \ \
B _ C \
|\ /| \
| D | F
\ / /
\ / /
E------------/
According to your algorithm this graph does not have a cycle containing all the nodes.
But there is a cycle in this graph containing all the nodes.
A-B-D-C-E-F-A
So I think there is some flaw with your approach. But suppose if your algorithm is correct, it is far better than my approach. Since mine takes O(n2) time, where as yours takes just O(n).
To add some clarification to this thread: finding a Hamiltonian Cycle is NP-complete, which implies that finding a longest cycle is also NP-complete because if we can find a longest cycle in any graph, we can find the Hamiltonian cycle of the subgraph induced by the vertices that lie on that cycle. (See also for example this paper regarding the longest cycle problem)
We can't use Dirac's criterion here: Dirac only tells us minimum degree >= n/2 -> Hamiltonian Cycle, that is the implication in the opposite direction of what we would need. The other way around is definitely wrong: take a cycle over n vertices, every vertex in it has exactly degree 2, no matter the size of the circle, but it has (is) an HC. What you can tell from Dirac is that no Hamiltonian Cycle -> minimum degree < n/2, which is of no use here since we don't know whether our graph has an HC or not, so we can't use the implication (nevertheless every graph we construct according to what OP described will have a vertex of degree 2, namely the last vertex added to the graph, so for arbitrary n, we have minimum degree 2).
The problem is that you can construct both graphs of arbitrary size that have an HC and graphs of arbitrary size that do not have an HC. For the first part: if the original triangle is A,B,C and the vertices added are numbered 1 to k, then connect the 1st added vertex to A and C and the k+1-th vertex to A and the k-th vertex for all k >= 1. The cycle is A,B,C,1,2,...,k,A. For the second part, connect both vertices 1 and 2 to A and B; that graph does not have an HC.
What is also important to note is that the property of having an HC can change from one vertex to the other during construction. You can both create and destroy the HC property when you add a vertex, so you would have to check for it every time you add a vertex. A simple example: take the graph after the 1st vertex was added, and add a second vertex along with edges to the same two vertices of the triangle that the 1st vertex was connected to. This constructs from a graph with an HC a graph without an HC. The other way around: add now a 3rd vertex and connect it to 1 and 2; this builds from a graph without an HC a graph with an HC.
Storing the last known HC during construction doesn't really help you because it may change completely. You could have an HC after the 20th vertex was added, then not have one for k in [21,2000], and have one again for the 2001st vertex added. Most likely the HC you had on 23 vertices will not help you a lot.
If you want to figure out how to solve this problem efficiently, you'll have to find criteria that work for all your graphs that can be checked for efficiently. Otherwise, your problem doesn't appear to me to be simpler than the Hamiltonian Cycle problem is in the general case, so you might be able to adjust one of the algorithms used for that problem to your variant of it.
Below I have added three extra nodes (3,4,5) in the original graph and it does seem like I can keep adding new nodes indefinitely while keeping the property of Hamiltonian cycle. For the below graph the cycle would be 0-1-3-5-4-2-0
1---3---5
/ \ / \ /
0---2---4
As there were no extra restrictions about how you can add a new node with two edges, I think by construction you can have a graph that holds the property of hamiltonian cycle.

Spanning tree that minimizes a dynamic 'metric'

Let us have a graph. When we remove an edge, 2 'cars' are created, one from each vertice of the edge. when these 2 cars meet they stop. The problem is to create a spanning tree so that the sum of the numbers of cars that pass through each vertice is minimal.
The added difficulty is that if a vertice has n cars passing from it, then the cost is K^n and not n*K.
some thoughts. We could find the shortest chordless cycles as a start but the position of those chordless cycles, ie whether they touch each other, changes the metric and thus what the shortest cycle is.
This is not a minimum spanning tree problem. I want to solve this because each car represents a varriable and the spanning tree is the most efficient way to compute an optimization problem. When 2 cars from the same edge meet and stop, I have a reduction of one varriable from the optimization.
edit:
The process is like this. We remove a number of edges to make the graph a spanning tree. Each removed edge creates 2 cars, one at each vertice of the removed edge, that need to meet each other. We fix a path for each of those twin cars. We then check how many cars (from all the edges that we removed) pass through each vertice. If the number of the cars that pass from a vertice is n, the cost is K^n where K is a constant. We then add all the costs and that is the global cost that needs to be minimized.
please tell me if something is unclear.
https://mathoverflow.net/questions/86301/spanning-tree-that-minimizes-a-dynamic-metric
Here's one insight - cars will never pass through an articulation point, so you can break the graph up into its blocks and solve for each block separately (the minimum overall cost function is the sum of the minimum cost for each block).
To find the minimum cost for a block - you could enumerate all the spanning trees for that block and calculate the cost for each one - a brute force approach, but it should work.

Resources