I have a graph in form of a rectangular grid, i.e. N nodes and 2N edges, all adjacent nodes are connected.
This means it is two-colourable, and hence it is possible to do bipartite matching on it.
Each (undirected) edge has a weight assigned to it - either -2, -1, 0, 1 or 2. No other values are allowed
How would I go about finding the matching on this graph that maximises the sum of the weighs in the matching? Pseudocode would be nice, don't bother with specific languages.
Ideally, I am looking for an algorithm that runs in quadratic time - maybe O(n^2 log n) at worst.
Before you propose a solution, I have tried doing a max match using edges of weight 2, then of weight 1 (without going over edges of weight 2). I have scored 98% with this implementation (the problem is from an informatics olympiad), and wondering what is the 100% solution.
Not sure why you are thinking of min cut. A cut is not guaranteed to give you matching in this case. What you need to do is solve assignment problem.Assignment Problem. The successive shortest math algorithm solves it in O(EV log V) which in your case is O(n^2 log n).
Related
I am given a directed acyclic graph G = (V,E), which can be assumed to be topologically ordered (if needed). The edges in G have two types of costs - a nominal cost w(e) and a spiked cost p(e).
The goal is to find the shortest path from a node s to a node t which minimizes the following cost:
sum_e (w(e)) + max_e (p(e)), where the sum and maximum are taken over all edges in the path.
Standard dynamic programming methods show that this problem is solvable in O(E^2) time. Is there a more efficient way to solve it? Ideally, an O(E*polylog(E,V)) algorithm would be nice.
---- EDIT -----
This is the O(E^2) solution I found using dynamic programming.
First, order all costs p(e) in an ascending order. This takes O(Elog(E)) time.
Second, define the state space consisting of states (x,i) where x is a node in the graph and i is in 1,2,...,|E|. It represents "We are in node x, and the highest edge weight p(e) we have seen so far is the i-th largest".
Let V(x,i) be the length of the shortest path (in the classical sense) from s to x, where the highest p(e) encountered was the i-th largest. It's easy to compute V(x,i) given V(y,j) for any predecessor y of x and any j in 1,...,|E| (there are two cases to consider - the edge y->x is has the j-th largest weight, or it does not).
At every state (x,i), this computation finds the minimum of about deg(x) values. Thus the complexity is O(|E| * sum_(x\in V) deg(x)) = O(|E|^2), as each node is associated to |E| different states.
I don't see any way to get the complexity you want. Here's an algorithm that I think would be practical in real life.
First, reduce the graph to only vertices and edges between s and t, and do a topological sort so that you can easily find shortest paths in O(E) time.
Let W(m) be the minimum sum(w(e)) cost of paths max(p(e)) <= m, and let P(m) be the smallest max(p(e)) among those shortest paths. The problem solution corresponds to W(m)+P(m) for some cost m. Note that we can find W(m) and P(m) simultaneously in O(E) time by finding a shortest W-cost path, using P-cost to break ties.
The relevant values for m are the p(e) costs that actually occur, so make a sorted list of those. Then use a Kruskal's algorithm variant to find the smallest m that connects s to t, and calculate P(infinity) to find the largest relevant m.
Now we have an interval [l,h] of m-values that might be the best. The best possible result in the interval is W(h)+P(l). Make a priority queue of intervals ordered by best possible result, and repeatedly remove the interval with the best possible result, and:
stop if the best possible result = an actual result W(l)+P(l) or W(h)+P(h)
stop if there are no p(e) costs between l and P(h)
stop if the difference between the best possible result and an actual result is within some acceptable tolerance; or
stop if you have exceeded some computation budget
otherwise, pick a p(e) cost t between l and P(h), find a shortest path to get W(t) and P(t), split the interval into [l,t] and [t,h], and put them back in the priority queue and repeat.
The worst case complexity to get an exact result is still O(E2), but there are many economies and a lot of flexibility in how to stop.
This is only a 2-approximation, not an approximation scheme, but perhaps it inspires someone to come up with a better answer.
Using binary search, find the minimum spiked cost θ* such that, letting C(θ) be the minimum nominal cost of an s-t path using edges with spiked cost ≤ θ, we have C(θ*) = θ*. Every solution has either nominal or spiked cost at least as large as θ*, hence θ* leads to a 2-approximate solution.
Each test in the binary search involves running Dijkstra on the subset with spiked cost ≤ θ, hence this algorithm takes time O(|E| log2 |E|), well, if you want to be technical about it and use Fibonacci heaps, O((|E| + |V| log |V|) log |E|).
Some definitions first: an "accumulated graph" is a graph where edges are added to it later on. The topic is orientation problem (making an undirected graph directed), when our goal is to make the maximum out-degree the lowest possible.
Now, they gave this algorithm: "When the edge (u,v) is added, we will direct it from the vertex with the lower outdegree to the higher one. if equal, choose randomly."
For example, if outdegree(u) = 3 and outdegree(v) = 4, and we just added (u,v), the algorithm will direct it from u-->v. If their outdegree was equal, both u,v and v,u were possible.
The question is to prove an upper bound of O(log n) for the maximum outdegree possible. The second question is to form a graph where this algorithm will lead to Omega(log n) maximum outdegree.
So far, all I can point out is that the question is wrong.
First of all, my friend suggested that they actually meant O(log m) [m = # of edges], since that for n vertices in an undirected graph, we have at most n(n-1)/2 edges, which is eventually O(n^2). If we assume that in a complete graph, every node's outdegree is log(n), we get total of n*log(n) outdegrees, which is significally smaller than n^2 (and doesn't make sense because all outdegrees should sum to the # of edges.).
I came up with this algorithm that proves that because we choose randomly in cases both are equal, the highest possible outdegree is of O(n):
Direct n-1 vertices to the last one (possible in the worst-case scenario where all orientations are in the same direction). We now have n-1 vertices with an outdegree of 1, and 1 with 0. Then repeat recusively to accomplish n+(n-1)+(n-2)+...+1+0 outdegrees.
I might be understanding the problem wrong but that's what I got so far.
EDIT: The instructor edited the question and said the graph in this question is a forest (which means you can have at most V-1 edges).
I believe I managed to prove the first question:
Imagine this: Since the only way (worst case scenario) to have an outdegree of 2 is to connect two nodes with an outdegree of 1, we have to have 4 nodes where A is connected to B and C is connected to D in order to add an edge from A to C and make one of them with an outdegree of 2. Because this is the smallest tree possible to create an outdegree of 2, the only way to create an outdegree of 3 is to build two identical trees and connect them together again. If we repeat this process until we reach n vertices, we will notice we eventually get to 1+2+4+8+...+log n as our outdegree.
Right now I keep thinking about the second question.
First of all, because m = O(n^2), we have O(log m) = O(log n^2) = O(2 * log(n)) = O(log n). In other words, the problem doesn't contradict with the reasoning suggested by your friend.
But you are right that the problem (as you've stated) is not correct -- the worst-case outdegree is O(n), using the process you described. (But the highest outdegree is n-1, not n as you've written.)
Perhaps the problem actually asks you to bound the maximum expected outdegree?
I strongly suspect that the intended problem was to bound the competitive ratio between this online algorithm and the offline optimum (e.g., if the offline optimum has maximum out degree 1, then the online solution has maximum out degree O(log n); if the offline optimum has maximum out degree √n, then the online solution has maximum out degree O((√n) log n)). The lower bound is easy enough (see the lower bound for union-by-rank union-find for inspiration), and I conjecture that the upper bound is attainable as well.
Here's the problem.
A weighted undirected connected graph G is given. The weights are constant. The task is to come up with an algorithm that would find the total weight of a spanning tree for G that fulfills these two conditions (ordered by priority):
The spanning tree has to have maximum number of edges with the same weight (the actual repeated weight value is irrelevant);
The total spanning tree weight should be minimized. That means, for example, that the spanning tree T1 with weight 120 that has at most 4 edges with the same weight (and the weight of each of those four is 15) should be preferred over the spanning tree T2 with weight 140 that has at most 4 edges with the same weight (and the weight of each of those four is 8).
I've been stuck on that for quite a while now. I've already implemented Boruvka's MST search algorithm for the graph, and now I'm not sure whether I should perform any operations after MST is found, or it's better to modify MST-search algorithm itself.
Any suggestions welcome!
This can be done naively in O(m^2), and without much effort in O(mn). It seems like it is possible to do it even faster, maybe something like O(m log^2(n)) or even O(m log(n)) with a little work.
The basic idea is this. For any weight k, let MST(k) be a spanning tree which contains the maximum possible number of edges of weight k, and has minimum overall weight otherwise. There may be multiple such trees, but they will all have the same total weight, so it doesn't matter which one you pick. MST(k) can be found by using any MST algorithm and treating all edges of weight k as having weight -Infinity.
The solution to this problem will be MST(k) for some k. So the naive solution is to generate all the MST(k) and picking the one with the max number of identical edge weights and then minimum overall weight. Using Kruskal's algorithm, you can do this in O(m^2) since you only have to sort the edges once.
This can be improved to O(mn) by first finding the MST using the original weights, and then for each k, modifying the tree to MST(k) by reducing the weight of each edge of weight k to -Infinity. Updating an MST for a reduced edge weight is an O(n) operation, since you just need to find the maximum weight edge in the corresponding fundamental cycle.
To do better than O(mn) using this approach, you would have to preprocess the original MST in such a way that these edge weight reductions can be performed more quickly. It seems that something like a heavy path decomposition should work here, but there are some details to work out.
Say there exists a directed graph, G(V, E) (V represents vertices and E represents edges), where each edge (x, y) is associated with a weight (x, y) where the weight is an integer between 1 and 10.
Assume s and tare some vertices in V.
I would like to compute the shortest path from s to t in time O(m + n), where m is the number of vertices and n is the number of edges.
Would I be on the right track in implementing topological sort to accomplish this? Or is there another technique that I am overlooking?
The algorithm you need to use for finding the minimal path from a given vertex to another in a weighted graph is Dijkstra's algorithm. Unfortunately its complexity is O(n*log(n) + m) which may be more than you try to accomplish.
However in your case the edges are special - their weights have only 10 valid values. Thus you can implement a special data structure(kind of a heap, but takes advantage of the small dataset for the wights) to have all operations constant.
One possible way to do that is to have 10 lists - one for each weight. Adding an edge in the data structure is simply append to a list. Finding the minimum element is iteration over the 10 lists to find the first one that is non-empty. This still is constant as no more than 10 iterations will be performed. Removing the minimum element is also pretty straight-forward - simple removal from a list.
Using Dijkstra's algorithm with some data structure of the same asymptotic complexity will be what you need.
Suppose that the number of edges of a connected graph is known and the weight of each edge is distinct, would it possible to create a minimal spanning tree in linear time?
To do this we must look at each edge; and during this loop there can contain no searches otherwise it would result in at least n log n time. I'm not sure how to do this without searching in the loop. It would mean that, somehow we must only look at each edge once, and decide rather to include it or not based on some "static" previous values that does not involve a growing data structure.
So.. let's say we keep the endpoints of the node in question, then look at the next node, if the next node has the same vertices as prev, then compare the weight of prev and current node and keep the lower one. If the current node's endpoints are not equal to prev, then it is in a different component .. now I am stuck because we cannot create a hash or array to keep track of the component nodes that are already added while look through each edge in linear time.
Another approach I thought of is to find the edge with the minimal weight; since the edge weights are distinct this edge will be part of any MST. Then.. I am stuck. Since we cannot do this for n - 1 edges in linear time.
Any hints?
EDIT
What if we know the number of nodes, the number of edges and also that each edge weight is distinct? Say, for example, there are n nodes, n + 6 edges?
Then we would only have to find and remove the correct 7 edges correct?
To the best of my knowledge there is no way to compute an MST faster by knowing how many edges there are in the graph and that they are distinct. In the worst case, you would have to look at every edge in the graph before finding the minimum-cost edge (which must be in the MST), which takes Ω(m) time in the worst case. Therefore, I'll claim that any MST algorithm must take Ω(m) time in the worst case.
However, if we're already doing Ω(m) work in the worst-case, we could do the following preprocessing step on any MST algorithm:
Scan over the edges and count up how many there are.
Add an epsilon value to each edge weight to ensure the edges are unique.
This can be done in time Ω(m) as well. Consequently, if there were a way to speed up MST computation knowing the number of edges and that the edge costs are distinct, we would just do this preprocessing step on any current MST algorithm to try to get faster performance. Since to the best of my knowledge no MST algorithm actually tries to do this for performance reasons, I would suspect that there isn't a (known) way to get a faster MST algorithm based on this extra knowledge.
Hope this helps!
There's a famous randomised linear-time algorithm for minimum spanning trees whose complexity is linear in the number of edges. See "A randomized linear-time algorithm to find minimum spanning trees" by Karger, Klein, and Tarjan.
The key result in the paper is their "sampling lemma" -- that, if you independently randomly select a subset of the edges with probability p and find the minimum spanning tree of this subgraph, then there are only |V|/p edges that are better than the worst edge in the tree path connecting its ends.
As templatetypedef noted, you can't beat linear-time. That all edge weights are distinct is a common assumption that simplifies analysis; if anything, it makes MST algorithms run a little slower.
The fact that a number of edges (N) is known does not influence the complexity in any way. N is still a finite but unbounded variable, and each graph will have different N. If you place a upper bound on N, say, 1 million, then the complexity is O(1 million log 1 million) = O(1).
The fact that each edge has distinct weight does not influence the program either, because it does not say anything about the graph's structure. Therefore knowledge about current case cannot influence further processing, as we cannot predict how the graph's structure will look like in the next step.
If the number of edges is close to n, like in this case n-6 (after edit), we know that we only need to remove 7 edges as every spanning tree has only n-1 edges.
The Cycle Property shows that the most expensive edge in a cycle does not belong to any Minimum Spanning tree(assuming all edges are distinct) and thus, should be removed.
Now you can simply apply BFS or DFS to identify a cycle and remove the most expensive edge. So, overall, we need to run BFS 7 times. This takes 7*n time and gives us a time complexity of O(n). Again, this is only true if the number of edges is close to the number of nodes.