Algorithm to Maximize Degree Centrality of Subgraph - algorithm

Say I have some graph with nodes and undirected edges (the edges may have a weight associated to them).
I want to find all (or at least one) connected subgraphs that maximize the sum of the degree centrality of all nodes in the subgraph (the degree centrality is based on the original graph) under the constraint that the sum of the weighted edges is < X.
Is there an algorithm that will do this?

A quick search took me to this description of degree centrality. It turns out that the "degree centrality" of a vertex is simply its degree (neighbour count).
Unfortunately your problem is NP-hard, so it's very unlikely that any algorithm exists that can solve every instance quickly. First notice that, assuming edge weights are positive, the edges in any optimal solution necessarily form a tree, since in any non-tree you can delete at least 1 edge without destroying connectivity, and doing so will decrease the total edge weight of the subgraph. (So, as a positive spinoff: If you compute the minimum spanning tree of your input graph and find that it happens to have total weight < X, then you can simply include every vertex in the graph in your solution.)
Let's formulate a decision version of your problem. Given a graph G = (V, E) with positive (I'll assume) weights on the edges, a number X and a number Y, we want to know: Does there exist a connected subgraph G' = (V', E') of G such that the sum of the edge weights in E' is at most X, and the sum of the degrees of V' (w.r.t. G) is at least Y? (Clearly this is no harder than your original problem: If you had an algorithm to solve your problem, then you could just run it, add up the degrees of the vertices in the subgraph it found and compare this to Y to answer "my" problem.)
Here's a reduction from the NP-hard Steiner Tree in Graphs problem, where we are given a graph G = (V, E) with positive weights on the edges, a subset S of its vertices, and a number k, and the task is to determine whether it's possible to connect the vertices in S using a subset of edges with total weight at most k. (As I showed above, the solution will necessarily be a tree.) If the sum of all degrees in G is d, then all we need to do to transform G into an input for your problem is the following: For each vertex s_i in S we add enough new "ballast" vertices that are each connected only to s_i, via edges with weight X+1, to bring the degree of s_i up to d+1. We set X to k, and set Y to |S|(d+1).
Now suppose that the solution to the Steiner Tree problem is YES -- that is, there exists a subset of edges having total weight <= k that does connect all the vertices in S. In that case, it's clear that the same subgraph in the instance of your problem constructed above connects (possibly among others) all the vertices in S, and since each vertex in S has degree d+1, the total degree is at least |S|(d+1), so the answer to your decision problem is also YES.
In the other direction, suppose that the answer to your decision problem is YES -- that is, there exists a subset of edges having total weight <= X ( = k) that connects a set of vertices having total degree at least |S|(d+1). We need to show that this implies a YES answer to the original Steiner Tree problem. Clearly it suffices to show that the vertex set V' of any subgraph satisfying the conditions above (i.e. edges have total weight <= k and vertices have total degree >= |S|(d+1)) contains S (possibly among other vertices). So let V' be the vertex set of such a solution, and suppose to the contrary that there is some vertex u in S that is not in V'. But then the largest sum of degrees that we could possibly make would be to include all other non-ballast vertices in the graph in V', which would give a degree total of at most (|S|-1)(d+1) + d (the first term is the degree sum for the other vertices in S; the second is an upper bound on the degree sum of all non-S vertices in G; note that none of the ballast vertices we added in could be in the subgraph, because the only way to include any of them is to use an edge of weight X+1, which we obviously can't do). But clearly (|S|-1)(d+1) + d = |S|(d+1) - 1, which is strictly less than |S|(d+1), contradicting our assumption that V' has a degree total at least |S|(d+1). So it follows that S is a subset of V', and thus that it is possible to use the same subset of edges to connect the vertices in S for a total weight of at most k, i.e. that the answer to the Steiner Tree problem is also YES.
So a YES answer to either problem implies a YES answer to the other one, in turn implying that a NO answer to either implies a NO answer to the other. Thus if it were possible to solve the decision version of your problem in polynomial time, it would imply a polynomial-time solution to the NP-hard Steiner Tree in Graphs problem. This means the decision version of your problem is itself NP-hard, and so is the optimisation version (which as I said above is at least as hard). (The decision form is also NP-complete, since a YES answer can be easily verified in polynomial time.)
Sidenote: At first I thought I had a very straightforward reduction from the NP-hard Knapsack problem: Given a list of n weights w_1, ..., w_n and a list of n profits p_1, ..., p_n, make a single central vertex c, and n other vertices v_1, ..., v_n. For each v_i, attach it to c with an edge of weight w_i, and add p_i other leaf vertices, each attached only to v_i with an edge of weight X+1. However this reduction doesn't actually work, because the profits can be exponential in the input size n, meaning that the constructed instance of your problem might need to have an exponential number of vertices, which isn't allowed for a polynomial-time reduction.

Related

Backtrack after running Johnson algorithm

I have a question which I was asked in some past exams at my school and I can't find an answer to it.
Is it possible knowing the final matrix after running the Johnson Algorithm on a graph, to know if it previously had negative cycles or not? Why?
Johnson Algorithm
Johnson's Algorithm is a technique that is able to compute shortest paths on graphs. Which is able to handle negative weights on edges, as long as there does not exist a cycle with negative weight.
The algorithm consists of (from Wikipedia):
First, a new node q is added to the graph, connected by zero-weight edges to each of the other nodes.
Second, the Bellman–Ford algorithm is used, starting from the new vertex q, to find for each vertex v the minimum weight h(v) of a path from q to v. If this step detects a negative cycle, the algorithm is terminated.
Next the edges of the original graph are reweighted using the values computed by the Bellman–Ford algorithm: an edge from u to v, having length w(u, v), is given the new length w(u,v) + h(u) − h(v).
Finally, q is removed, and Dijkstra's algorithm is used to find the shortest paths from each node s to every other vertex in the reweighted graph.
If I understood your question correctly, which should have been as follows:
Is it possible knowing the final pair-wise distances matrix after running the Johnson Algorithm on a graph, to know if it originally had any negative-weight edges or not? Why?
As others commented here, we must first assume the graph has no negative weight cycles, since otherwise the Johnson algorithm halts and returns False (due to the internal Bellman-Form detection of negative weight cycles).
The answer then is that if any negative weight edge e = (u, v) exists in the graph, then the shortest weighted distance between u --> v cannot be > 0 (since at the worst case you can travel the negative edge e between those vertices).
Therefore, at least one of the edges had negative weight in the original graph iff any value in the final pair-wise distances is < 0
If the question is supposed to be interpreted as:
Is it possible, knowing the updated non-negative edge weights after running the Johnson Algorithm on a graph, to know if it originally had any negative-weight edges or not? Why?
Then no, you can't tell.
Running the Johnson algorithm on a graph that has only non-negative edge weights will leave the weights unchanged. This is because all of the shortest distances q -> v will be 0. Therefore, given the edge weights after running Johnson, the initial weights could have been exactly the same.

Find Path of a Specific Weight in a Weighted DAG

Given a DAG where are Edges have a Positive Edge Weight. Given a Value N.
Algorithm to calculate a simple (no cycles or node repetitions) Path with the Total weight N?
I am aware of the Algorithm where we have to find a Path of Given Path Length (number of Edges) but somewhat confused about for the Given Path Weight?
Can Dijkstra be modified for this case? Or anything else?
This is NP-complete, so don't expect any reasonably fast (polynomial-time) algorithm. Here's a reduction from the NP-complete Subset Sum problem, where we are given a multiset of n integers X = {x_1, x_2, ..., x_n} and a number k, and asked if there is any submultiset of the n numbers that sum to exactly k:
Create a graph G with n+1 vertices v_1, v_2, ..., v_{n+1}. For each vertex v_i, add edges to every higher-numbered vertex v_j, and give all these edges weight x_i. This graph has O(n^2) edges and can be constructed in O(n^2) time. Clearly it contains no cycles.
Suppose the answer to the Subset Sum problem is YES: That is, there exists a submultiset Y of X such that the numbers in Y total to exactly k. Actually, let Y = {y_1, y_2, ..., y_m} consist of the m <= n indices 1 <= i <= n of the selected elements of X. Then there is a corresponding path in the graph G with exactly the same weight -- namely the path that starts at v_{y_1}, takes the edge to v_{y_2} (which is of weight x_{y_1}), then takes the edge to v_{y_3}, and so on, finally arriving at v_{y_m} and taking a final edge (which is of weight x_{y_m}) to the terminal vertex v_{n+1}.
In the other direction, suppose that there is a simple path in G of total weight exactly k. Since the path is simple, each vertex appears at most once. Thus each edge in the path leaves a unique vertex. For each vertex v_i in the path except the last, add x_i to a set of chosen numbers: these numbers correspond to the edge weights in the path, so clearly they sum to exactly k, implying that the solution to the Subset Sum problem is also YES. (Notice that the position of the final vertex in the path doesn't matter, since we only care about the vertex that it leaves, and all edges leaving a vertex have the same weight.)
A YES answer to either problem implies a YES answer to the other problem, so a NO answer to either problem implies a NO answer to the other problem. Thus the answer to any Subset Sum problem instance can be found by first constructing the specified instance of your problem in polynomial time, and then using any algorithm for your problem to solve that instance -- so if an algorithm exists that can solve any instance of your problem in polynomial time, the NP-hard Subset Sum problem can also be solved in polynomial time.

Finding MST such that a specific vertex has a minimum degree

Given undirected, connected graph G={V,E}, a vertex in V(G), label him v, and a weight function f:E->R+(Positive real numbers), I need to find a MST such that v's degree is minimal. I've already noticed that if all the edges has unique weight, the MST is unique, so I believe it has something to do with repetitive weights on edges. I though about running Kruskal's algorithm, but when sorting the edges, I'll always consider edges that occur on v last. For example, if (a,b),(c,d),(v,e) are the only edges of weight k, so the possible permutations of these edges in the sorted edges array are: {(a,b),(c,d),(v,e)} or {(c,d),(a,b),(v,e)}. I've ran this variation over several graphs and it seems to work, but I couldn't prove it. Does anyone know how to prove the algorithm's correct (Meaning proving v's degree is minimal), or give a contrary example of the algorithm failing?
First note that Kruskal's algorithm can be applied to any weighted graph, whether or not it is connected. In general it results in a minimum-weight spanning forest (MSF), with one MST for each connected component. To prove that your modification of Kruskal's algorithm succeeds in finding the MST for which v has minimal degree, it helps to prove the slightly stronger result that if you apply your algorithm to a possibly disconnected graph then it succeeds in finding the MSF where the degree of v is minimized.
The proof is by induction on the number, k, of distinct weights.
Basis Case (k = 1). In this case weights can be ignored and we are trying to find a spanning forest in which the degree of v is minimized. In this case, your algorithm can be described as follows: pick edges for as long as possible according to the following two rules:
1) No selected edge forms a cycle with previously selected edges
2) An edge involving v isn't selected unless any edge which doesn't
involve v violates rule 1.
Let G' denote the graph from which v and all incident edges have been removed from G. It is easy to see that the algorithm in this special case works as follows. It starts by creating a spanning forest for G'. Then it takes those trees in the forest that are contained in v's connected component in the original graph G and connects each component to v by a single edge. Since the components connected to v in the second stage can be connected to each other in no other way (since if any connecting edge not involving v exists it would have been selected by rule 2) it is easy to see that the degree of v is minimal.
Inductive Case: Suppose that the result is true for k and G is a weighted graph with k+1 distinct weights and v is a specified vertex in G. Sort the distinct weights in increasing order (so that weight k+1 is the longest of the distinct weights -- say w_{k+1}). Let G' be the sub-graph of G with the same vertex set but with all edges of weight w_{k+1} removed. Since the edges are sorted in the order of increasing weight, note that the modified Kruskal's algorithm in effect starts by applying itself to G'. Thus -- by the induction hypothesis prior to considering edges of weight w_{k+1}, the algorithm has succeeded in constructing an MSF F' of G' for which the degree, d' of v in G' is minimized.
As a final step, modified Kruskal's applied to the overall graph G will merge certain of the trees in F' together by adding edges of weight w_{k+1}. One way to conceptualize the final step is the think of F' as a graph where two trees are connected exactly when there is an edge of weight w_{k+1} from some node in the first tree to some node in the second tree. We have (almost) the basis case with F'. Modified Kruskal's will add edged of weight w_{k+1} until it can't do so anymore -- and won't add an edge connecting to v unless there is no other way to connect to trees in F' that need to be connected to get a spanning forest for the original graph G.
The final degree of v in the resulting MSF is d = d'+d" where d" is the number of edges of weight w_{k+1} added at the final step. Neither d' nor d" can be made any smaller, hence it follows that d can't be made any smaller (since the degree of v in any spanning forest can be written as the sum of the number of edges whose weight is less than w_{k+1} coming into v and the number off edges of weight w_{k+1} coming into v).
QED.
There is still an element of hand-waving in this, especially with the final step -- but Stack Overflow isn't a peer-reviewed journal. Anyway, the overall logic should be clear enough.
One final remark -- it seems fairly clear that Prim's algorithm can be similarly modified for this problem. Have you looked into that?

Minimal spanning tree with degree constraint

I have to solve this problem:
Given a weighted connected undirected graph G=(V,E) and vertex u in V.
Describe an algorithm that finds MST for G such that the degree of u
is minimal; the output T of the algorithm is MST and for each another
minimal spanning tree T' being the degree of u in T less than or
equal to the degree of u in T'.
I thought about this algorithm (after some googling I found this solution for similar problem here):
Temporarily delete vertex u.
For each of the resulting connected components C1,…,Cm find a MST using e.g. Kruskal's or Prim's algorithm.
Re-add vertex u and for each Ci add the cheapest edge between 1 and Ci.
EDIT:
I understood that this algorithm may get a wrong MST (see #AndyG comment) so I thought about another one:
let k be the minimal increment between each two weights in G and add 0 < x < k to each adjacent edge of u. (e.g. if all the weights are natural numbers so k=1 and x is fraction).
find a MST using Kruskal's algorithm.
This solution is based on the fact that Kruskal's algorithm iterate the edges
ordered by weigh, so the difference between all the MSTs of G is each edge was chosen from among all edges of the same weight. Therefore, if we increase the degree of the adjacent edges of u, the algorithm will choose the others edges in the same degree and not the adjacent of u unless this edge is necessary for the MST and the degree of u will be the minimal in all the MSTs of G.
I still don't know if it works and how to prove the correctness of this algorithm.
I will appreciate any help.
To summarize the suggested algorithm [with tightened requirements on epsilon (which you called x)]:
Pick a tiny epsilon (such that epsilon * deg(u) is less than d, the smallest non-zero weight difference between any pair of subgraphs). In the case all the original weights are natural numbers, epsilon = 1/(deg(u)+1) suffices.
Add the epsilon to the weights of all edges incident to u
Find a minimal spanning tree.
We'll prove that this procedure finds an MST of the original graph that minimizes the number of edges incident to u.
Let W be the weight of any minimal spanning tree in the original graph.
First, we'll show every MST of the new graph is an MST of the original graph. Any non-MST in the original graph must have weight at least W + d. Any MST in the new graph must have weight at most W + deg(u)*epsilon (since any MST in the original graph has at most this weight in the new graph). Since we chose epsilon such that deg(u)*epsilon < d, we conclude that any MST in the new graph is also an MST in the original graph.
Second, we'll show that the MST of the new graph is the MST of the original graph that minimizes the number of edges incident to u. An MST, T, of the original graph has weight W + k * epsilon in the new graph, where k is the number of edges of T incident to u. We've already shown that every MST of the new graph is also an MST of the original graph. Therefore, the MST of the new graph is the MST of the original graph that minimizes k (the number of edges incident to u).

Directed maximum weighted bipartite matching allowing sharing of start/end vertices

Let G (U u V, E) be a weighted directed bipartite graph (i.e. U and V are the two sets of nodes of the bipartite graph and E contains directed weighted edges from U to V or from V to U). Here is an example:
In this case:
U = {A,B,C}
V = {D,E,F}
E = {(A->E,7), (B->D,1), (C->E,3), (F->A,9)}
Definition: DirectionalMatching (I made up this term just to make things clearer): set of directed edges that may share the start or end vertices. That is, if U->V and U'->V' both belong to a DirectionalMatching then V /= U' and V' /= U but it may be that U = U' or V = V'.
My question: How to efficiently find a DirectionalMatching, as defined above, for a bipartite directional weighted graph which maximizes the sum of the weights of its edges?
By efficiently, I mean polynomial complexity or faster, I already know how to implement a naive brute force approach.
In the example above the maximum weighted DirectionalMatching is: {F->A,C->E,B->D}, with a value of 13.
Formally demonstrating the equivalence of this problem to any other well known problem in graph theory would also be valuable.
Thanks!
Note 1: This question is based on Maximum weighted bipartite matching _with_ directed edges but with the extra relaxation that it is allowed for edges in the matching to share the origin or destination. Since that relaxation makes a big difference, I created an independent question.
Note 2: This is a maximum weight matching. Cardinality (how many edges are present) and the number of vertices covered by the matching is irrelevant for a correct result. Only the maximum weight matters.
Note 2: During my research to solve the problem I found this paper, I think it would be helpful to others trying to find a solution: Alternating cycles and paths in edge-coloured
multigraphs: a survey
Note 3: In case it helps, you can also think of the graph as its equivalent 2-edge coloured undirected bipartite multigraph. The problem formulation would then turn into: Find the set of edges without colour-alternating paths or cycles which has maximum weight sum.
Note 4: I suspect that the problem might be NP-hard, but I am not that experienced with reductions so I haven't managed to prove it yet.
Yet another example:
Imagine you had
4 vertices: {u1, u2} {v1, v2}
4 edges: {u1->v1, u1->v2, u2->v1, v2->u2}
Then, regardless of their weights, u1->v2 and v2->u2 cannot be in the same DirectionalMatching, neither can v2->u2 and u2->v1. However u1->v1 and u1->v2 can, and so can u1->v1 and u2->v1.
Define a new undirected graph G' from G as follows.
G' has a node (A, B) with weight w for each directed edge (A, B) with weight w in G
G' has undirected edge ((A, B),(B, C)) if (A, B) and (B, C) are both directed edges in G
http://en.wikipedia.org/wiki/Line_graph#Line_digraphs
Now find a maximal (weighted) independent vertex set in G'.
http://en.wikipedia.org/wiki/Vertex_independent_set
Edit: stuff after this point only works if all of the edge weights are the same - when the edge weights have different values its a more difficult problem (google "maximum weight independent vertex set" for possible algorithms)
Typically this would be an NP-hard problem. However, G' is a bipartite graph -- it contains only even cycles. Finding the maximal (weighted) independent vertex set in a bipartite graph is not NP-hard.
The algorithm you will run on G' is as follows.
Find the connected components of G', say H_1, H_2, ..., H_k
For each H_i do a 2-coloring (say red and blue) of the nodes. The cookbook approach here is to do a depth-first search on H_i alternating colors. A simple approach would be to color each vertex in H_i based on whether the corresponding edge in G goes from U to V (red) or from V to U (blue).
The two options for which nodes to select from H_i are either all the red nodes or all the blue nodes. Choose the colored node set with higher weight. For example, the red node set has weight equal to H_i.nodes.where(node => node.color == red).sum(node => node.w). Call the higher-weight node set N_i.
Your maximal weighted independent vertex set is now union(N_1, N_2, ..., N_k).
Since each vertex in G' corresponds to one of the directed edges in G, you have your maximal DirectionalMatching.
This problem can be solved in polynomial time using the Hungarian Algorithm. The "proof" by Vor above is wrong.
The method of structuring the problem for the above example is as follows:
D E F
A # 7 9
B 1 # #
C # 3 #
where "#" means negative infinity. You then resolve the matrix using the Hungarian algorithm to determine the maximum matching. You can multiply the numbers by -1 if you want to find a minimum matching.

Resources