minimum weight vertex cover of a tree - algorithm

There's an existing question dealing with trees where the weight of a vertex is its degree, but I'm interested in the case where the vertices can have arbitrary weights.
This isn't homework but it is one of the questions in the algorithm design manual, which I'm currently reading; an answer set gives the solution as
Perform a DFS, at each step update Score[v][include], where v is a vertex and include is either true or false;
If v is a leaf, set Score[v][false] = 0, Score[v][true] = wv, where wv is the weight of vertex v.
During DFS, when moving up from the last child of the node v, update Score[v][include]:
Score[v][false] = Sum for c in children(v) of Score[c][true] and Score[v][true] = wv + Sum for c in children(v) of min(Score[c][true]; Score[c][false])
Extract actual cover by backtracking Score.
However, I can't actually translate that into something that works. (In response to the comment: what I've tried so far is drawing some smallish graphs with weights and running through the algorithm on paper, up until step four, where the "extract actual cover" part is not transparent.)
In response Ali's answer: So suppose I have this graph, with the vertices given by A etc. and the weights in parens after:
A(9)---B(3)---C(2)
\ \
E(1) D(4)
The right answer is clearly {B,E}.
Going through this algorithm, we'd set values like so:
score[D][false] = 0; score[D][true] = 4
score[C][false] = 0; score[C][true] = 2
score[B][false] = 6; score[B][true] = 3
score[E][false] = 0; score[E][true] = 1
score[A][false] = 4; score[A][true] = 12
Ok, so, my question is basically, now what? Doing the simple thing and iterating through the score vector and deciding what's cheapest locally doesn't work; you only end up including B. Deciding based on the parent and alternating also doesn't work: consider the case where the weight of E is 1000; now the correct answer is {A,B}, and they're adjacent. Perhaps it is not supposed to be confusing, but frankly, I'm confused.

There's no actual backtracking done (or needed). The solution uses dynamic programming to avoid backtracking, since that'd take exponential time. My guess is "backtracking Score" means the Score contains the partial results you would get by doing backtracking.
The cover vertex of a tree allows to include alternated and adjacent vertices. It does not allow to exclude two adjacent vertices, because it must contain all of the edges.
The answer is given in the way the Score is recursively calculated. The cost of not including a vertex, is the cost of including its children. However, the cost of including a vertex is whatever is less costly, the cost of including its children or not including them, because both things are allowed.
As your solution suggests, it can be done with DFS in post-order, in a single pass. The trick is to include a vertex if the Score says it must be included, and include its children if it must be excluded, otherwise we'd be excluding two adjacent vertices.
Here's some pseudocode:
find_cover_vertex_of_minimum_weight(v)
find_cover_vertex_of_minimum_weight(left children of v)
find_cover_vertex_of_minimum_weight(right children of v)
Score[v][false] = Sum for c in children(v) of Score[c][true]
Score[v][true] = v weight + Sum for c in children(v) of min(Score[c][true]; Score[c][false])
if Score[v][true] < Score[v][false] then
add v to cover vertex tree
else
for c in children(v)
add c to cover vertex tree

It actually didnt mean any thing confusing and it is just Dynamic Programming, you seems to almost understand all the algorithm. If I want to make it any more clear, I have to say:
first preform DFS on you graph and find leafs.
for every leaf assign values as the algorithm says.
now start from leafs and assign values to each leaf parent by that formula.
start assigning values to parent of nodes that already have values until you reach the root of your graph.
That is just it, by backtracking in your algorithm it means that you assign value to each node that its child already have values. As I said above this kind of solving problem is called dynamic programming.
Edit just for explaining your changes in the question. As you you have the following graph and answer is clearly B,E but you though this algorithm just give you B and you are incorrect this algorithm give you B and E.
A(9)---B(3)---C(2)
\ \
E(1) D(4)
score[D][false] = 0; score[D][true] = 4
score[C][false] = 0; score[C][true] = 2
score[B][false] = 6 this means we use C and D; score[B][true] = 3 this means we use B
score[E][false] = 0; score[E][true] = 1
score[A][false] = 4 This means we use B and E; score[A][true] = 12 this means we use B and A.
and you select 4 so you must use B and E. if it was just B your answer would be 3. but as you find it correctly your answer is 4 = 3 + 1 = B + E.
Also when E = 1000
A(9)---B(3)---C(2)
\ \
E(1000) D(4)
it is 100% correct that the answer is B and A because it is wrong to use E just because you dont want to select adjacent nodes. with this algorithm you will find the answer is A and B and just by checking you can find it too. suppose this covers :
C D A = 15
C D E = 1006
A B = 12
Although the first two answer have no adjacent nodes but they are bigger than last answer that have adjacent nodes. so it is best to use A and B for cover.

Related

Removing unnecessary nodes in graph

I have a graph that has two distinct classes of nodes, class A nodes and class B nodes.
Class A nodes are not connected to any other A nodes and class B nodes aren’t connected to any other B nodes, but B nodes are connected to A nodes and vice versa. Some B nodes are connected to lots of A nodes and most A nodes are connected to lots of B nodes.
I want to eliminate as many of the A nodes as possible from the graph.
I must keep all of the B nodes, and they must still be connected to at least one A node (preferably only one A node).
I can eliminate an A node when it has no B nodes connected only to it. Are there any algorithms that could find an optimal, or at least close to optimal, solution for which A nodes I can remove?
Old, Incorrect Answer, But Start Here
First, you need to recognize that you have a bipartite graph. That is, you can colour the nodes red and blue such that no edges connect a red node to a red node or a blue node to a blue node.
Next, recognize that you're trying to solve a vertex cover problem. From Wikipedia:
In the mathematical discipline of graph theory, a vertex cover (sometimes node cover) of a graph is a set of vertices such that each edge of the graph is incident to at least one vertex of the set. The problem of finding a minimum vertex cover is a classical optimization problem in computer science and is a typical example of an NP-hard optimization problem that has an approximation algorithm.
Since you have a special graph, it's reasonable to think that maybe the NP-hard doesn't apply to you. This thought brings us to Kőnig's theorem which relates the maximum matching problem to the minimum vertex cover problem. Once you know this, you can apply the Hopcroft–Karp algorithm to solve the problem in O(|E|√|V|) time, though you'll probably need to jigger it a bit to ensure you keep all the B nodes.
New, Correct Answer
It turns out this jiggering is the creation of a "constrained bipartitate graph vertex cover problem", which asks us if there is a vertex cover that uses less than a A-nodes and less than b B-nodes. The problem is NP-complete, so that's a no go. The jiggering was hard than I thought!
But using less than the minimum number of nodes isn't the constraint we want. We want to ensure that the minimum number of A-nodes is used and the maximum number of B-nodes.
Kőnig's theorem, above, is a special case of the maximum flow problem. Thinking about the problem in terms of flows brings us pretty quickly to minimum-cost flow problems.
In these problems we're given a graph whose edges have specified capacities and unit costs of transport. The goal is to find the minimum cost needed to move a supply of a given quantity from an arbitrary set of source nodes to an arbitrary set of sink nodes.
It turns out your problem can be converted into a minimum-cost flow problem. To do so, let us generate a source node that connects to all the A nodes and a sink node that connects to all the B nodes.
Now, let us make the cost of using a Source->A edge equal to 1 and give all other edges a cost of zero. Further, let us make the capacity of the Source->A edges equal to infinity and the capacity of all other edges equal to 1.
This looks like the following:
The red edges have Cost=1, Capacity=Inf. The blue edges have Cost=0, Capacity=1.
Now, solving the minimum flow problem becomes equivalent to using as few red edges as possible. Any red edge that isn't used allocates 0 flow to its corresponding A node and that node can be removed from the graph. Conversely, each B node can only pass 1 unit of flow to the sink, so all B nodes must be preserved in order for the problem to be solved.
Since we've recast your problem into this standard form, we can leverage existing tools to get a solution; namely, Google's Operation Research Tools.
Doing so gives the following answer to the above graph:
The red edges are unused and the black edges are used. Note that if a red edge emerges from the source the A-node it connects to generates no black edges. Note also that each B-node has at least one in-coming black edge. This satisfies the constraints you posed.
We can now detect the A-nodes to be removed by looking for Source->A edges with zero usage.
Source Code
The source code necessary to generate the foregoing figures and associated solutions is as follows:
#!/usr/bin/env python3
#Documentation: https://developers.google.com/optimization/flow/mincostflow
#Install dependency: pip3 install ortools
from __future__ import print_function
from ortools.graph import pywrapgraph
import matplotlib.pyplot as plt
import networkx as nx
import random
import sys
def GenerateGraph(Acount,Bcount):
assert Acount>5
assert Bcount>5
G = nx.DiGraph() #Directed graph
source_node = Acount+Bcount
sink_node = source_node+1
for a in range(Acount):
for i in range(random.randint(0.2*Bcount,0.3*Bcount)): #Connect to 10-20% of the Bnodes
b = Acount+random.randint(0,Bcount-1) #In the half-open range [0,Bcount). Offset from A's indices
G.add_edge(source_node, a, capacity=99999, unit_cost=1, usage=1)
G.add_edge(a, b, capacity=1, unit_cost=0, usage=1)
G.add_edge(b, sink_node, capacity=1, unit_cost=0, usage=1)
G.node[a]['type'] = 'A'
G.node[b]['type'] = 'B'
G.node[source_node]['type'] = 'source'
G.node[sink_node]['type'] = 'sink'
G.node[source_node]['supply'] = Bcount
G.node[sink_node]['supply'] = -Bcount
return G
def VisualizeGraph(graph, color_type):
gcopy = graph.copy()
for p, d in graph.nodes(data=True):
if d['type']=='source':
source = p
if d['type']=='sink':
sink = p
Acount = len([1 for p,d in graph.nodes(data=True) if d['type']=='A'])
Bcount = len([1 for p,d in graph.nodes(data=True) if d['type']=='B'])
if color_type=='usage':
edge_color = ['black' if d['usage']>0 else 'red' for u,v,d in graph.edges(data=True)]
elif color_type=='unit_cost':
edge_color = ['red' if d['unit_cost']>0 else 'blue' for u,v,d in graph.edges(data=True)]
Ai = 0
Bi = 0
pos = dict()
for p,d in graph.nodes(data=True):
if d['type']=='source':
pos[p] = (0, Acount/2)
elif d['type']=='sink':
pos[p] = (3, Bcount/2)
elif d['type']=='A':
pos[p] = (1, Ai)
Ai += 1
elif d['type']=='B':
pos[p] = (2, Bi)
Bi += 1
nx.draw(graph, pos=pos, edge_color=edge_color, arrows=False)
plt.show()
def GenerateMinCostFlowProblemFromGraph(graph):
start_nodes = []
end_nodes = []
capacities = []
unit_costs = []
min_cost_flow = pywrapgraph.SimpleMinCostFlow()
for node,neighbor,data in graph.edges(data=True):
min_cost_flow.AddArcWithCapacityAndUnitCost(node, neighbor, data['capacity'], data['unit_cost'])
supply = len([1 for p,d in graph.nodes(data=True) if d['type']=='B'])
for p, d in graph.nodes(data=True):
if (d['type']=='source' or d['type']=='sink') and 'supply' in d:
min_cost_flow.SetNodeSupply(p, d['supply'])
return min_cost_flow
def ColorGraphEdgesByUsage(graph, min_cost_flow):
for i in range(min_cost_flow.NumArcs()):
graph[min_cost_flow.Tail(i)][min_cost_flow.Head(i)]['usage'] = min_cost_flow.Flow(i)
def main():
"""MinCostFlow simple interface example."""
# Define four parallel arrays: start_nodes, end_nodes, capacities, and unit costs
# between each pair. For instance, the arc from node 0 to node 1 has a
# capacity of 15 and a unit cost of 4.
Acount = 20
Bcount = 20
graph = GenerateGraph(Acount, Bcount)
VisualizeGraph(graph, 'unit_cost')
min_cost_flow = GenerateMinCostFlowProblemFromGraph(graph)
# Find the minimum cost flow between node 0 and node 4.
if min_cost_flow.Solve() != min_cost_flow.OPTIMAL:
print('Unable to find a solution! It is likely that one does not exist for this input.')
sys.exit(-1)
print('Minimum cost:', min_cost_flow.OptimalCost())
ColorGraphEdgesByUsage(graph, min_cost_flow)
VisualizeGraph(graph, 'usage')
if __name__ == '__main__':
main()
Despite this is an old question, I see it has not been correctly answered yet.
An analogous question to this one has also been answered earlier in this post.
The problem you are presenting here is indeed the Minimum Set Cover Problem, which is one of the well-known NP-hard problems. From the Wikipedia, the Minimum Set Cover Problem can be formulated as:
Given a set of elements {1,2,...,n} (called the universe) and a collection S of m sets whose union equals the universe, the set cover problem is to identify the smallest sub-collection of S whose union equals the universe. For example, consider the universe U={1,2,3,4,5} and the collection of sets S={{1,2,3},{2,4},{3,4},{4,5}}. Clearly the union of S is U. However, we can cover all of the elements with the following, smaller number of sets: {{1,2,3},{4,5}}.
In your formulation, B nodes represent the elements in the universe, A nodes represent the sets and edges between A nodes and B nodes determine which elements (B nodes) belong to each set (A node). Then, the minimum set cover is equivalent to the minimum number of A nodes so that they are connected to all B nodes. Consequently, the maximum number of A nodes which can be removed from the graph while being connected to every B node are those which do not belong to the minimum set cover.
Since it is NP-hard, there is no polinomial time algorithm for computing the optimum, but a simple greedy algorithm can efficiently provide approximate solutions with tight bounds to the optimum. From the Wikipedia:
There is a greedy algorithm for polynomial time approximation of set covering that chooses sets according to one rule: at each stage, choose the set that contains the largest number of uncovered elements.

Dijkstra's algorithm: is my implementation flawed?

In order to train myself both in Python and graph theory, I tried to implement the Dijkstra algo using Python 3, and submitted it against several online judges, to see if it was correct.
It works well in many cases, but not always.
For example, I am stuck with this one: the test case works fine and I also have tried custom test cases of my own, but when I submit the following solution, the judge keeps telling me "wrong answer", and the expected result is very different from my output, indeed.
Notice that the judge tests it against quite a complex graph (10000 nodes with 100000 edges), while all the cases I tried before never exceeded 20 nodes and around 20-40 edges.
Here is my code.
Given al an adjacency list in the following form:
al[n] = [(a1, w1), (a2, w2), ...]
where
n is the node id;
a1, a2, etc. are its adjacent nodes and w1, w2, etc. the respective weights for the given edge;
and supposing that maximum distance never exceeds 1 billion, I implemented Dijkstra's algorithm this way:
import queue
distance = [1000000000] * (N+1) # this is the array where I store the shortest path between 1 and each other node
distance[1] = 0 # starting from node 1 with distance 0
pq = queue.PriorityQueue()
pq.put((0, 1)) # same as above
visited = [False] * (N+1)
while not pq.empty():
n = pq.get()[1]
if visited[n]:
continue
visited[n] = True
for edge in al[n]:
if distance[edge[0]] > distance[n] + edge[1]:
distance[edge[0]] = distance[n] + edge[1]
pq.put((distance[edge[0]], edge[0]))
Could you please help me understand wether my implementation is flawed, or if I simply ran into some bugged online judge?
Thank you very much.
UPDATE
As requested, I'm providing the snippet I use to populate the adjacency list al for the linked problem.
N,M = input().split()
N,M = int(N), int(M)
al = [[] for n in range(N+1)]
for m in range(M):
try:
a,b,w = input().split()
a,b,w = int(a), int(b), int(w)
al[a].append((b, w))
al[b].append((a, w))
except:
pass
(Please don't mind the ugly "except: pass", I was using it just for debugging purposes... :P)
Primary problem in interpreting the question:
According to your parsing code, you are treating the input data as an undirected graph, i.e. each edge from A to B also is an edge from B to A.
Is seems like this premise is not valid and it should instead be a directed graph, i.e. you have to remove this line:
al[b].append((a, w)) # no back reference!
Previous problem, now already fixed in the code:
Currently, you are using the never-changing weight of the edges in your queue:
pq.put((edge[1], edge[0]))
This way, the nodes always end up at the same position in the queue, no matter at what stage of the algorithm and how far the path to reach that node actually is.
Instead, you should use the new distance to the target node edge[0], i.e. distance[edge[0]] as the priority in the queue:
pq.put((distance[edge[0]], edge[0]))

Pathfinding while forcing unique node attributes -- which algorithm should I use?

Update 2011-12-28: Here's a blog post with a less vague description of the problem I was trying to solve, my work on it, and my current solution: Watching Every MLB Team Play A Game
I'm trying to solve a kind of strange pathfinding challenge. I have an acyclic directional graph, and every edge has a distance value. And I want to find a shortest path. Simple, right? Well, there are a couple of reasons I can't just use Dijkstra's or A*.
I don't care at all what the starting node of my path is, nor the ending node. I just need a path that includes exactly 10 nodes. But:
Each node has an attribute, let's say it's color. Each node has one of 20 different possible colors.
The path I'm trying to find is the shortest path with exactly 10 nodes, where each node is a different color. I don't want any of the nodes in my path to have the same color as any other node.
It'd be nice to be able to force my path to have one value for one of the attributes ("at least one node must be blue", for instance), but that's not really necessary.
This is a simplified example. My full data set actually has three different attributes for each node that must all be unique, and I have 2k+ nodes each with an average of 35 outgoing edges. Since getting a perfect "shortest path" may be exponential or factorial time, an exhaustive search is really not an option. What I'm really looking for is some approximation of a "good path" that meets the criterion under #3.
Can anyone point me towards an algorithm that I might be able to use (even modified)?
Some stats on my full data set:
Total nodes: 2430
Total edges: 86524
Nodes with no incoming edges: 19
Nodes with no outgoing edges: 32
Most outgoing edges: 42
Average edges per node: 35.6 (in each direction)
Due to the nature of the data, I know that the graph is acyclic
And in the full data set, I'm looking for a path of length 15, not 10
It is the case when the question actually contains most of the answer.
Do a breadth-first search starting from all root nodes. When the number of parallelly searched paths exceeds some limit, drop the longest paths. Path length may be weighed: last edges may have weight 10, edges passed 9 hops ago - weight 1. Also it is possible to assign lesser weight to all paths having the preferred attribute or paths going through the weakly connected nodes. Store last 10 nodes in the path to the hash table to avoid duplication. And keep somewhere the minimum sum of the last 9 edge lengths along with the shortest path.
If the number of possible values is low, you can use the Floyd algorithm with a slight modification: for each path you store a bitmap that represents the different values already visited. (In your case the bitmap will be 20 bits wide per path.
Then when you perform the length comparison, you also AND your bitmaps to check whether it's a valid path and if it is, you OR them together and store that as the new bitmap for the path.
Have you tried a straight-forward approach and failed? From your description of the problem, I see no reason a simple greedy algorithm like depth-first search might be just fine:
Pick a start node.
Check the immediate neighbors, are there any nodes that are ok to append to the path? Expand the path with one of them and repeat the process for that node.
If you fail, backtrack to the last successful state and try a new neighbor.
If you run out of neighbors to check, this node cannot be the start node of a path. Try a new one.
If you have 10 nodes, you're done.
Good heuristics for picking a start node is hard to give without any knowledge about how the attributes are distributed, but it is possible that it is beneficial to nodes with high degree first.
It looks like a greedy depth first search will be your best bet. With a reasonable distribution of attribute values, I think finding a single valid sequence is E[O(1)] time, that is expected constant time. I could probably prove that, but it might take some time. The proof would use the assumption that there is a non-zero probability that a valid next segment of the sequence could be found at every step.
The greedy search would backtracking whenever the unique attribute value constraint is violated. The search stops when a 15 segment path is found. If we accept my hunch that each sequence can be found in E[O(1)], then it is a matter of determining how many parallel searches to undertake.
For those who want to experiment, here is a (postgres) sql script to generate some fake data.
SET search_path='tmp';
-- DROP TABLE nodes CASCADE;
CREATE TABLE nodes
( num INTEGER NOT NULL PRIMARY KEY
, color INTEGER
-- Redundant fields to flag {begin,end} of paths
, is_root boolean DEFAULT false
, is_terminal boolean DEFAULT false
);
-- DROP TABLE edges CASCADE;
CREATE TABLE edges
( numfrom INTEGER NOT NULL REFERENCES nodes(num)
, numto INTEGER NOT NULL REFERENCES nodes(num)
, cost INTEGER NOT NULL DEFAULT 0
);
-- Generate some nodes, set color randomly
INSERT INTO nodes (num)
SELECT n
FROM generate_series(1,2430) n
WHERE 1=1
;
UPDATE nodes SET COLOR= 1+TRUNC(20*random() );
-- (partial) cartesian product nodes*nodes. The ordering guarantees a DAG.
INSERT INTO edges(numfrom,numto,cost)
SELECT n1.num ,n2.num, 0
FROM nodes n1 ,nodes n2
WHERE n1.num < n2.num
AND random() < 0.029
;
UPDATE edges SET cost = 1+ 1000 * random();
ALTER TABLE edges
ADD PRIMARY KEY (numfrom,numto)
;
ALTER TABLE edges
ADD UNIQUE (numto,numfrom)
;
UPDATE nodes no SET is_root = true
WHERE NOT EXISTS (
SELECT * FROM edges ed
WHERE ed.numfrom = no.num
);
UPDATE nodes no SET is_terminal = true
WHERE NOT EXISTS (
SELECT * FROM edges ed
WHERE ed.numto = no.num
);
SELECT COUNT(*) AS nnode FROM nodes;
SELECT COUNT(*) AS nedge FROM edges;
SELECT color, COUNT(*) AS cnt FROM nodes GROUP BY color ORDER BY color;
SELECT COUNT(*) AS nterm FROM nodes no WHERE is_terminal = true;
SELECT COUNT(*) AS nroot FROM nodes no WHERE is_root = true;
WITH zzz AS (
SELECT numto, COUNT(*) AS fanin
FROM edges
GROUP BY numto
)
SELECT zzz.fanin , COUNT(*) AS cnt
FROM zzz
GROUP BY zzz.fanin
ORDER BY zzz.fanin
;
WITH zzz AS (
SELECT numfrom, COUNT(*) AS fanout
FROM edges
GROUP BY numfrom
)
SELECT zzz.fanout , COUNT(*) AS cnt
FROM zzz
GROUP BY zzz.fanout
ORDER BY zzz.fanout
;
COPY nodes(num,color,is_root,is_terminal)
TO '/tmp/nodes.dmp';
COPY edges(numfrom,numto, cost)
TO '/tmp/edges.dmp';
The problem may be solving by dynamic programming as follows. Let's start by formally defining its solution.
Given a DAG G = (V, E), let C the be set of colors of vertices visited so far and let w[i, j] and c[i] be respectively the weight (distance) associated to edge (i, j) and the color of a vertex i. Note that w[i, j] is zero if the edge (i, j) does not belong to E.
Now define the distance d for going from vertex i to vertex j taking into account C as
d[i, j, C] = w[i, j] if i is not equal to j and c[j] does not belong to C
= 0 if i = j
= infinite if i is not equal to j and c[j] belongs to C
We are now ready to define our subproblems as follows:
A[i, j, k, C] = shortest path from i to j that uses exactly k edges and respects the colors in C so that no two vertices in the path are colored using the same color (one of the colors in C)
Let m be the maximum number of edges permitted in the path and assume that the vertices are labeled 1, 2, ..., n. Let P[i,j,k] be the predecessor vertex of j in the shortest path satisfying the constraints from i to j. The following algorithm solves the problem.
for k = 1 to m
for i = 1 to n
for j = 1 to n
A[i,j,k,C] = min over x belonging to V {d[i,x,C] + A[x,j,k-1,C union c[x]]}
P[i,j,k] = the vertex x that minimized A[i,j,k,C] in the previous statement
Set the initial conditions as follows:
A[i,j,k,C] = 0 for k = 0
A[i,j,k,C] = 0 if i is equal to j
A[i,j,k,C] = infinite in all of the other cases
The overall computational complexity of the algorithm is O(m n^3); taking into account that in your particular case m = 14 (since you want exactly 15 nodes), it follows that m = O(1) so that the complexity actually is O(n^3). To represent the set C use an hash table so that insertion and membership testing require O(1) on average. Note that in the algorithm the operation C union c[x] is actually an insert operation in which you add the color of vertex x into the hash table for C. However, since you are inserting just an element, the set union operation leads to exactly the same result (if the color is not in the set, it is added; otherwise, it is simply discarded and the set does not change). Finally, to represent the DAG, use the adjacency matrix.
Once the algorithm is done, to find the minimum shortest path among all possible vertices i and j, simply find the minimum among the values A[i,j,m,C]. Note that if this value is infinite, then no valid shortest path exists. If a valid shortest path exists, then you can actually determine it by using the P[i,j,k] values and tracing backwards through predecessor vertices. For instance, starting from a = P[i,j,m] the last edge on the shortest path is (a,j), the previous edge is given by b = P[i,a,m-1] and its is (b,a) and so on.

Finding the path with the maximum minimal weight

I'm trying to work out an algorithm for finding a path across a directed graph. It's not a conventional path and I can't find any references to anything like this being done already.
I want to find the path which has the maximum minimum weight.
I.e. If there are two paths with weights 10->1->10 and 2->2->2 then the second path is considered better than the first because the minimum weight (2) is greater than the minimum weight of the first (1).
If anyone can work out a way to do this, or just point me in the direction of some reference material it would be incredibly useful :)
EDIT:: It seems I forgot to mention that I'm trying to get from a specific vertex to another specific vertex. Quite important point there :/
EDIT2:: As someone below pointed out, I should highlight that edge weights are non negative.
I am copying this answer and adding also adding my proof of correctness for the algorithm:
I would use some variant of Dijkstra's. I took the pseudo code below directly from Wikipedia and only changed 5 small things:
Renamed dist to width (from line 3 on)
Initialized each width to -infinity (line 3)
Initialized the width of the source to infinity (line 8)
Set the finish criterion to -infinity (line 14)
Modified the update function and sign (line 20 + 21)
1 function Dijkstra(Graph, source):
2 for each vertex v in Graph: // Initializations
3 width[v] := -infinity ; // Unknown width function from
4 // source to v
5 previous[v] := undefined ; // Previous node in optimal path
6 end for // from source
7
8 width[source] := infinity ; // Width from source to source
9 Q := the set of all nodes in Graph ; // All nodes in the graph are
10 // unoptimized – thus are in Q
11 while Q is not empty: // The main loop
12 u := vertex in Q with largest width in width[] ; // Source node in first case
13 remove u from Q ;
14 if width[u] = -infinity:
15 break ; // all remaining vertices are
16 end if // inaccessible from source
17
18 for each neighbor v of u: // where v has not yet been
19 // removed from Q.
20 alt := max(width[v], min(width[u], width_between(u, v))) ;
21 if alt > width[v]: // Relax (u,v,a)
22 width[v] := alt ;
23 previous[v] := u ;
24 decrease-key v in Q; // Reorder v in the Queue
25 end if
26 end for
27 end while
28 return width;
29 endfunction
Some (handwaving) explanation why this works: you start with the source. From there, you have infinite capacity to itself. Now you check all neighbors of the source. Assume the edges don't all have the same capacity (in your example, say (s, a) = 300). Then, there is no better way to reach b then via (s, b), so you know the best case capacity of b. You continue going to the best neighbors of the known set of vertices, until you reach all vertices.
Proof of correctness of algorithm:
At any point in the algorithm, there will be 2 sets of vertices A and B. The vertices in A will be the vertices to which the correct maximum minimum capacity path has been found. And set B has vertices to which we haven't found the answer.
Inductive Hypothesis: At any step, all vertices in set A have the correct values of maximum minimum capacity path to them. ie., all previous iterations are correct.
Correctness of base case: When the set A has the vertex S only. Then the value to S is infinity, which is correct.
In current iteration, we set
val[W] = max(val[W], min(val[V], width_between(V-W)))
Inductive step: Suppose, W is the vertex in set B with the largest val[W]. And W is dequeued from the queue and W has been set the answer val[W].
Now, we need to show that every other S-W path has a width <= val[W]. This will be always true because all other ways of reaching W will go through some other vertex (call it X) in the set B.
And for all other vertices X in set B, val[X] <= val[W]
Thus any other path to W will be constrained by val[X], which is never greater than val[W].
Thus the current estimate of val[W] is optimum and hence algorithm computes the correct values for all the vertices.
You could also use the "binary search on the answer" paradigm. That is, do a binary search on the weights, testing for each weight w whether you can find a path in the graph using only edges of weight greater than w.
The largest w for which you can (found through binary search) gives the answer. Note that you only need to check if a path exists, so just an O(|E|) breadth-first/depth-first search, not a shortest-path. So it's O(|E|*log(max W)) in all, comparable to the Dijkstra/Kruskal/Prim's O(|E|log |V|) (and I can't immediately see a proof of those, too).
Use either Prim's or Kruskal's algorithm. Just modify them so they stop when they find out that the vertices you ask about are connected.
EDIT: You ask for maximum minimum, but your example looks like you want minimum maximum. In case of maximum minimum Kruskal's algorithm won't work.
EDIT: The example is okay, my mistake. Only Prim's algorithm will work then.
I am not sure that Prim will work here. Take this counterexample:
V = {1, 2, 3, 4}
E = {(1, 2), (2, 3), (1, 4), (4, 2)}
weight function w:
w((1,2)) = .1,
w((2,3)) = .3
w((1,4)) = .2
w((4,2)) = .25
If you apply Prim to find the maxmin path from 1 to 3, starting from 1 will select the 1 --> 2 --> 3 path, while the max-min distance is attained for the path that goes through 4.
This can be solved using a BFS style algorithm, however you need two variations:
Instead of marking each node as "visited", you mark it with the minimum weight along the path you took to reach it.
For example, if I and J are neighbors, I has value w1, and the weight of the edge between them is w2, then J=min(w1, w2).
If you reach a marked node with value w1, you might need to remark and process it again, if assigning a new value w2 (and w2>w1). This is required to make sure you get the maximum of all minimums.
For example, if I and J are neighbors, I has value w1, J has value w2, and the weight of the edge between them is w3, then if min(w2, w3) > w1 you must remark J and process all it's neighbors again.
Ok, answering my own question here just to try and get a bit of feedback I had on the tentative solution I worked out before posting here:
Each node stores a "path fragment", this is the entire path to itself so far.
0) set current vertex to the starting vertex
1) Generate all path fragments from this vertex and add them to a priority queue
2) Take the fragment off the top off the priority queue, and set the current vertex to the ending vertex of that path
3) If the current vertex is the target vertex, then return the path
4) goto 1
I'm not sure this will find the best path though, I think the exit condition in step three is a little ambitious. I can't think of a better exit condition though, since this algorithm doesn't close vertices (a vertex can be referenced in as many path fragments as it likes) you can't just wait until all vertices are closed (like Dijkstra's for example)
You can still use Dijkstra's!
Instead of using +, use the min() operator.
In addition, you'll want to orient the heap/priority_queue so that the biggest things are on top.
Something like this should work: (i've probably missed some implementation details)
let pq = priority queue of <node, minimum edge>, sorted by min. edge descending
push (start, infinity) on queue
mark start as visited
while !queue.empty:
current = pq.top()
pq.pop()
for all neighbors of current.node:
if neighbor has not been visited
pq.decrease_key(neighbor, min(current.weight, edge.weight))
It is guaranteed that whenever you get to a node you followed an optimal path (since you find all possibilities in decreasing order, and you can never improve your path by adding an edge)
The time bounds are the same as Dijkstra's - O(Vlog(E)).
EDIT: oh wait, this is basically what you posted. LOL.

Algorithm to establish ordering amongst a set of items

I have a set of students (referred to as items in the title for generality). Amongst these students, some have a reputation for being rambunctious. We are told about a set of hate relationships of the form 'i hates j'. 'i hates j' does not imply 'j hates i'. We are supposed to arrange the students in rows (front most row numbered 1) in a way such that if 'i hates j' then i should be put in a row that is strictly lesser numbered than that of j (in other words: in some row that is in front of j's row) so that i doesn't throw anything at j (Turning back is not allowed). What would be an efficient algorithm to find the minimum number of rows needed (each row need not have the same number of students)?
We will make the following assumptions:
1) If we model this as a directed graph, there are no cycles in the graph. The most basic cycle would be: if 'i hates j' is true, 'j hates i' is false. Because otherwise, I think the ordering would become impossible.
2) Every student in the group is at least hated by one other student OR at least hates one other student. Of course, there would be students who are both hated by some and who in turn hate other students. This means that there are no stray students who don't form part of the graph.
Update: I have already thought of constructing a directed graph with i --> j if 'i hates j and doing topological sorting. However, since the general topological sort would suit better if I had to line all the students in a single line. Since there is a variation of the rows here, I am trying to figure out how to factor in the change into topological sort so it gives me what I want.
When you answer, please state the complexity of your solution. If anybody is giving code and you don't mind the language, then I'd prefer Java but of course any other language is just as fine.
JFYI This is not for any kind of homework (I am not a student btw :)).
It sounds to me that you need to investigate topological sorting.
This problem is basically another way to put the longest path in a directed graph problem. The number of rows is actually number of nodes in path (number of edges + 1).
Assuming the graph is acyclic, the solution is topological sort.
Acyclic is a bit stronger the your assumption 1. Not only A -> B and B -> A is invalid. Also A -> B, B -> C, C -> A and any cycle of any length.
HINT: the question is how many rows are needed, not which student in which row. The answer to the question is the length of the longest path.
It's from a project management theory (or scheduling theory, I don't know the exact term). There the task is about sorting jobs (vertex is a job, arc is a job order relationship).
Obviously we have some connected oriented graph without loops. There is an arc from vertex a to vertex b if and only if a hates b. Let's assume there is a source (without incoming arcs) and destination (without outgoing arcs) vertex. If that is not the case, just add imaginary ones. Now we want to find length of a longest path from source to destination (it will be number of rows - 1, but mind the imaginary verteces).
We will define vertex rank (r[v]) as number of arcs in a longest path between source and this vertex v. Obviously we want to know r[destination]. Algorithm for finding rank:
0) r_0[v] := 0 for all verteces v
repeat
t) r_t[end(j)] := max( r_{t-1}[end(j)], r_{t-1}[start(j)] + 1 ) for all arcs j
until for all arcs j r_{t+1}[end(j)] = r_t[end(j)] // i.e. no changes on this iteration
On each step at least one vertex increases its rank. Therefore in this form complexity is O(n^3).
By the way, this algorithm also gives you student distribution among rows. Just group students by their respective ranks.
Edit: Another code with the same idea. Possibly it is better understandable.
# Python
# V is a list of vertex indices, let it be something like V = range(N)
# source has index 0, destination has index N-1
# E is a list of edges, i.e. tuples of the form (start vertex, end vertex)
R = [0] * len(V)
do:
changes = False
for e in E:
if R[e[1]] < R[e[0]] + 1:
changes = True
R[e[1]] = R[e[0]] + 1
while changes
# The answer is derived from value of R[N-1]
Of course this is the simplest implementation. It can be optimized, and time estimate can be better.
Edit2: obvious optimization - update only verteces adjacent to those that were updated on the previous step. I.e. introduce a queue with verteces whose rank was updated. Also for edge storing one should use adjacency lists. With such optimization complexity would be O(N^2). Indeed, each vertex may appear in the queue at most rank times. But vertex rank never exceeds N - number of verteces. Therefore total number of algorithm steps will not exceed O(N^2).
Essentailly the important thing in assumption #1 is that there must not be any cycles in this graph. If there are any cycles you can't solve this problem.
I would start by seating all of the students that do not hate any other students in the back row. Then you can seat the students who hate these students in the next row and etc.
The number of rows is the length of the longest path in the directed graph, plus one. As a limit case, if there is no hate relationship everyone can fit on the same row.
To allocate the rows, put everyone who is not hated by anyone else on the row one. These are the "roots" of your graph. Everyone else is put on row N + 1 if N is the length of the longest path from any of the roots to that person (this path is of length one at least).
A simple O(N^3) algorithm is the following:
S = set of students
for s in S: s.row = -1 # initialize row field
rownum = 0 # start from first row below
flag = true # when to finish
while (flag):
rownum = rownum + 1 # proceed to next row
flag = false
for s in S:
if (s.row != -1) continue # already allocated
ok = true
foreach q in S:
# Check if there is student q who will sit
# on this or later row who hates s
if ((q.row == -1 or q.row = rownum)
and s hated by q) ok = false; break
if (ok): # can put s here
s.row = rownum
flag = true
Simple answer = 1 row.
Put all students in the same row.
Actually that might not solve the question as stated - lesser row, rather than equal row...
Put all students in row 1
For each hate relation, put the not-hating student in a row behind the hating student
Iterate till you have no activity, or iterate Num(relation) times.
But I'm sure there are better algorithms - look at acyclic graphs.
Construct a relationship graph where i hates j will have a directed edge from i to j. So end result is a directed graph. It should be a DAG otherwise no solutions as it's not possible to resolve circular hate relations ship.
Now simply do a DFS search and during the post node callbacks, means the once the DFS of all the children are done and before returning from the DFS call to this node, simply check the row number of all the children and assign the row number of this node as row max row of the child + 1. Incase if there is some one who doesn't hate anyone basically node with no adjacency list simply assign him row 0.
Once all the nodes are processed reverse the row numbers. This should be easy as this is just about finding the max and assigning the row numbers as max-already assigned row numbers.
Here is the sample code.
postNodeCb( graph g, int node )
{
if ( /* No adj list */ )
row[ node ] = 0;
else
row[ node ] = max( row number of all children ) + 1;
}
main()
{
.
.
for ( int i = 0; i < NUM_VER; i++ )
if ( !visited[ i ] )
graphTraverseDfs( g, i );`enter code here`
.
.
}

Resources