Directed graph (topological sort)

Directed graph (topological sort) - performance

Say there exists a directed graph, G(V, E) (V represents vertices and E represents edges), where each edge (x, y) is associated with a weight (x, y) where the weight is an integer between 1 and 10.
Assume s and tare some vertices in V.
I would like to compute the shortest path from s to t in time O(m + n), where m is the number of vertices and n is the number of edges.
Would I be on the right track in implementing topological sort to accomplish this? Or is there another technique that I am overlooking?

The algorithm you need to use for finding the minimal path from a given vertex to another in a weighted graph is Dijkstra's algorithm. Unfortunately its complexity is O(n*log(n) + m) which may be more than you try to accomplish.
However in your case the edges are special - their weights have only 10 valid values. Thus you can implement a special data structure(kind of a heap, but takes advantage of the small dataset for the wights) to have all operations constant.
One possible way to do that is to have 10 lists - one for each weight. Adding an edge in the data structure is simply append to a list. Finding the minimum element is iteration over the 10 lists to find the first one that is non-empty. This still is constant as no more than 10 iterations will be performed. Removing the minimum element is also pretty straight-forward - simple removal from a list.
Using Dijkstra's algorithm with some data structure of the same asymptotic complexity will be what you need.

Related

Shortest Path in a Directed Acyclic Graph with two types of costs

I am given a directed acyclic graph G = (V,E), which can be assumed to be topologically ordered (if needed). The edges in G have two types of costs - a nominal cost w(e) and a spiked cost p(e).
The goal is to find the shortest path from a node s to a node t which minimizes the following cost:
sum_e (w(e)) + max_e (p(e)), where the sum and maximum are taken over all edges in the path.
Standard dynamic programming methods show that this problem is solvable in O(E^2) time. Is there a more efficient way to solve it? Ideally, an O(E*polylog(E,V)) algorithm would be nice.
---- EDIT -----
This is the O(E^2) solution I found using dynamic programming.
First, order all costs p(e) in an ascending order. This takes O(Elog(E)) time.
Second, define the state space consisting of states (x,i) where x is a node in the graph and i is in 1,2,...,|E|. It represents "We are in node x, and the highest edge weight p(e) we have seen so far is the i-th largest".
Let V(x,i) be the length of the shortest path (in the classical sense) from s to x, where the highest p(e) encountered was the i-th largest. It's easy to compute V(x,i) given V(y,j) for any predecessor y of x and any j in 1,...,|E| (there are two cases to consider - the edge y->x is has the j-th largest weight, or it does not).
At every state (x,i), this computation finds the minimum of about deg(x) values. Thus the complexity is O(|E| * sum_(x\in V) deg(x)) = O(|E|^2), as each node is associated to |E| different states.

I don't see any way to get the complexity you want. Here's an algorithm that I think would be practical in real life.
First, reduce the graph to only vertices and edges between s and t, and do a topological sort so that you can easily find shortest paths in O(E) time.
Let W(m) be the minimum sum(w(e)) cost of paths max(p(e)) <= m, and let P(m) be the smallest max(p(e)) among those shortest paths. The problem solution corresponds to W(m)+P(m) for some cost m. Note that we can find W(m) and P(m) simultaneously in O(E) time by finding a shortest W-cost path, using P-cost to break ties.
The relevant values for m are the p(e) costs that actually occur, so make a sorted list of those. Then use a Kruskal's algorithm variant to find the smallest m that connects s to t, and calculate P(infinity) to find the largest relevant m.
Now we have an interval [l,h] of m-values that might be the best. The best possible result in the interval is W(h)+P(l). Make a priority queue of intervals ordered by best possible result, and repeatedly remove the interval with the best possible result, and:
stop if the best possible result = an actual result W(l)+P(l) or W(h)+P(h)
stop if there are no p(e) costs between l and P(h)
stop if the difference between the best possible result and an actual result is within some acceptable tolerance; or
stop if you have exceeded some computation budget
otherwise, pick a p(e) cost t between l and P(h), find a shortest path to get W(t) and P(t), split the interval into [l,t] and [t,h], and put them back in the priority queue and repeat.
The worst case complexity to get an exact result is still O(E2), but there are many economies and a lot of flexibility in how to stop.

This is only a 2-approximation, not an approximation scheme, but perhaps it inspires someone to come up with a better answer.
Using binary search, find the minimum spiked cost θ* such that, letting C(θ) be the minimum nominal cost of an s-t path using edges with spiked cost ≤ θ, we have C(θ*) = θ*. Every solution has either nominal or spiked cost at least as large as θ*, hence θ* leads to a 2-approximate solution.
Each test in the binary search involves running Dijkstra on the subset with spiked cost ≤ θ, hence this algorithm takes time O(|E| log2 |E|), well, if you want to be technical about it and use Fibonacci heaps, O((|E| + |V| log |V|) log |E|).

Finding size of largest connected component of a graph

Consider we have a random undirected graph G = (V,E) with n vertices, now suppose for any two vertices u and v ∈ V, the probability that the edge between u and v ∈ E is 1/n. We need to figure out the size of the largest connected component in the undirected graph C(n).
C(n) should be equal to Θ(n**a), we need to run some experiments to give an estimate of a.
I am a bit confused on how to link the probability 1/n to the largest connected component, is there any way I can do so?

The process you're simulating here is called the Erdős–Rényi model. You have a collection of n nodes, and each pair of nodes has probability p of being linked. The (expected) shape of the resulting graph depends heavily on the choice of p, and there are a lot of famous results about this.
As for how to do this: one option would be to create a collection of n nodes, iterate over all pairs of nodes, and link them with probability 1/n. You can then run an algorithm like BFS or DFS over the graph to find and size the connected components.
Another would be to use the above approach, except that instead of doing a BFS or DFS to use a disjoint-set forest to perform the links and find the largest connected component.
Alternatively, because each edge is absent or present with equal probability and independently of every other edge, the number of edges you have is binomially distributed and pretty tightly packed around n total edges. You could therefore generate n random edges, add them into the graph, then use the above techniques. (This will be much faster, as this does O(n) work rather than O(n2) work to process the edges.)
Once you've gotten this worked out, you can vary n over a large range and run some sort of polynomial regression on it to find the best-first curve. That's something you could either code up yourself, or which you could do by importing your data into Excel and using its regression tools.
As a spoiler, when you're done you'll find that the number of nodes in the largest connected component is Θ(n2/3). If you search for "Erdős–Rényi critical case," you can find online proofs of this result. It's not a trivial result to prove (and definitely isn't obvious!), but it'll drop out of your empirical analysis.

Design an algorithm which finds a minimum spanning tree of this graph in linear time

I am working on a problem in which I am given an undirected graph G on n vertices and with m edges, such that each edge e has a weight w(e) ∈ {1, 2, 3}. The task is to design an algorithm which finds a minimum spanning tree of G in linear time (O(n + m)).
These are my thoughts so far:
In the Algorithmic Graph Theory course which I am currently studying, we have covered Kruskal's and Prim's MST Algorithms. Perhaps I can modify these in some way, in order to gain linear time.
Sorting of edges generally takes log-linear (O(mlog(m))) time; however, since all edge weights are either 1, 2 or 3, Bucket Sort can be used to sort the edges in time linear in the number of edges (O(m)).
I am using the following version of Kruskal's algorithm:
Kruskal(G)
for each vertex 𝑣 ∈ 𝑉 do MAKE−SET(𝑣)
sort all edges in non-decreasing order
for edge 𝑢, 𝑣 ∈ 𝐸 (in the non-decreasing order) do
if FIND 𝑢 ≠ FIND(𝑣) then
colour (𝑢, 𝑣) blue
UNION(𝑢, 𝑣)
od
return the tree formed by blue edges
Also, MAKE-SET(x), UNION(x, y) and FIND(x) are defined as follows:
MAKE-SET(𝒙)
Create a new tree rooted at 𝑥
PARENT(𝑥)=x
UNION(𝒙, 𝒚)
PARENT FIND(𝑥) ≔ 𝐹𝐼𝑁𝐷(𝑦)
FIND(𝒙)
𝑦 ≔ 𝑥
while 𝑦 ≠ PARENT(𝑦) do
𝑦 ≔ PARENT(𝑦)
return y
The issue I have at the moment is that, although I can implement the first two lines of Kruskal's in linear time, I have not managed to do the same for the next four lines of the algorithm (from 'for edge u, ...' until 'UNION (u, v)').
I would appreciate hints as to how to implement the rest of the algorithm in linear time, or how to find a modification of Kruskal's (or some other minimum spanning tree algorithm) in linear time.
Thank you.

If you use the Disjoint Sets data structure with both path compression and union by rank, you get a data structure whose each operation's complexity grows extremely slowly - it is something like the inverse of the Ackermann function, and is not that large for sizes such as the estimated number of atoms in the universe. Effectively, then, each operation is considered constant time, and so the rest of the algorithm is considered linear time as well.
From the same wikipedia article
Since α(n) is the inverse of this function, α(n) is less than 5 for all remotely practical values of n. Thus, the amortized running time per operation is effectively a small constant.

Path from s to e in a weighted DAG graph with limitations

Consider a directed graph with n nodes and m edges. Each edge is weighted. There is a start node s and an end node e. We want to find the path from s to e that has the maximum number of nodes such that:
the total distance is less than some constant d
starting from s, each node in the path is closer than the previous one to the node e. (as in, when you traverse the path you are getting closer to your destination e. in terms of the edge weight of the remaining path.)
We can assume there are no cycles in the graph. There are no negative weights. Does an efficient algorithm already exist for this problem? Is there a name for this problem?

Whatever you end up doing, do a BFS/DFS starting from s first to see if e can even be reached; this only takes you O(n+m) so it won't add to the complexity of the problem (since you need to look at all vertices and edges anyway). Also, delete all edges with weight 0 before you do anything else since those never fulfill your second criterion.
EDIT: I figured out an algorithm; it's polynomial, depending on the size of your graphs it may still not be sufficiently efficient though. See the edit further down.
Now for some complexity. The first thing to think about here is an upper bound on how many paths we can actually have, so depending on the choice of d and the weights of the edges, we also have an upper bound on the complexity of any potential algorithm.
How many edges can there be in a DAG? The answer is n(n-1)/2, which is a tight bound: take n vertices, order them from 1 to n; for two vertices i and j, add an edge i->j to the graph iff i<j. This sums to a total of n(n-1)/2, since this way, for every pair of vertices, there is exactly one directed edge between them, meaning we have as many edges in the graph as we would have in a complete undirected graph over n vertices.
How many paths can there be from one vertex to another in the graph described above? The answer is 2n-2. Proof by induction:
Take the graph over 2 vertices as described above; there is 1 = 20 = 22-2 path from vertex 1 to vertex 2: (1->2).
Induction step: assuming there are 2n-2 paths from the vertex with number 1 of an n vertex graph as described above to the vertex with number n, increment the number of each vertex and add a new vertex 1 along with the required n edges. It has its own edge to the vertex now labeled n+1. Additionally, it has 2i-2 paths to that vertex for every i in [2;n] (it has all the paths the other vertices have to the vertex n+1 collectively, each "prefixed" with the edge 1->i). This gives us 1 + Σnk=2 (2k-2) = 1 + Σn-2k=0 (2k-2) = 1 + (2n-1 - 1) = 2n-1 = 2(n+1)-2.
So we see that there are DAGs that have 2n-2 distinct paths between some pairs of their vertices; this is a bit of a bleak outlook, since depending on weights and your choice of d, you may have to consider them all. This in itself doesn't mean we can't choose some form of optimum (which is what you're looking for) efficiently though.
EDIT: Ok so here goes what I would do:
Delete all edges with weight 0 (and smaller, but you ruled that out), since they can never fulfill your second criterion.
Do a topological sort of the graph; in the following, let's only consider the part of the topological sorting of the graph from s to e, let's call that the integer interval [s;e]. Delete everything from the graph that isn't strictly in that interval, meaning all vertices outside of it along with the incident edges. During the topSort, you'll also be able to see whether there is a
path from s to e, so you'll know whether there are any paths s-...->e. Complexity of this part is O(n+m).
Now the actual algorithm:
traverse the vertices of [s;e] in the order imposed by the topological
sorting
for every vertex v, store a two-dimensional array of information; let's call it
prev[][] since it's gonna store information about the predecessors
of a node on the paths leading towards it
in prev[i][j], store how long the total path of length (counted in
vertices) i is as a sum of the edge weights, if j is the predecessor of the
current vertex on that path. For example, pres+1[1][s] would have
the weight of the edge s->s+1 in it, while all other entries in pres+1
would be 0/undefined.
when calculating the array for a new vertex v, all we have to do is check
its incoming edges and iterate over the arrays for the start vertices of those
edges. For example, let's say vertex v has an incoming edge from vertex w,
having weight c. Consider what the entry prev[i][w] should be.
We have an edge w->v, so we need to set prev[i][w] in v to
min(prew[i-1][k] for all k, but ignore entries with 0) + c (notice the subscript of the array!); we effectively take the cost of a
path of length i - 1 that leads to w, and add the cost of the edge w->v.
Why the minimum? The vertex w can have many predecessors for paths of length
i - 1; however, we want to stay below a cost limit, which greedy minimization
at each vertex will do for us. We will need to do this for all i in [1;s-v].
While calculating the array for a vertex, do not set entries that would give you
a path with cost above d; since all edges have positive weights, we can only get
more costly paths with each edge, so just ignore those.
Once you reached e and finished calculating pree, you're done with this
part of the algorithm.
Iterate over pree, starting with pree[e-s]; since we have no cycles, all
paths are simple paths and therefore the longest path from s to e can have e-s edges. Find the largest
i such that pree[i] has a non-zero (meaning it is defined) entry; if non exists, there is no path fitting your criteria. You can reconstruct
any existing path using the arrays of the other vertices.
Now that gives you a space complexity of O(n^3) and a time complexity of O(n²m) - the arrays have O(n²) entries, we have to iterate over O(m) arrays, one array for each edge - but I think it's very obvious where the wasteful use of data structures here can be optimized using hashing structures and other things than arrays. Or you could just use a one-dimensional array and only store the current minimum instead of recomputing it every time (you'll have to encapsulate the sum of edge weights of the path together with the predecessor vertex though since you need to know the predecessor to reconstruct the path), which would change the size of the arrays from n² to n since you now only need one entry per number-of-nodes-on-path-to-vertex, bringing down the space complexity of the algorithm to O(n²) and the time complexity to O(nm). You can also try and do some form of topological sort that gets rid of the vertices from which you can't reach e, because those can be safely ignored as well.

Prim and Kruskal's algorithms complexity

Given an undirected connected graph with weights. w:E->{1,2,3,4,5,6,7} - meaning there is only 7 weights possible.
I need to find a spanning tree using Prim's algorithm in O(n+m) and Kruskal's algorithm in O( m*a(m,n)).
I have no idea how to do this and really need some guidance about how the weights can help me in here.

You can sort edges weights faster.
In Kruskal algorithm you don't need O(M lg M) sort, you just can use count sort (or any other O(M) algorithm). So the final complexity is then O(M) for sorting and O(Ma(m)) for union-find phase. In total it is O(Ma(m)).
For the case of Prim algorithm. You don't need to use heap, you need 7 lists/queues/arrays/anything (with constant time insert and retrieval), one for each weight. And then when you are looking for cheapest outgoing edge you check is one of these lists is nonempty (from the cheapest one) and use that edge. Since 7 is a constant, whole algorithms runs in O(M) time.

As I understand, it is not popular to answer homework assignments, but this could hopefully be usefull for other people than just you ;)
Prim:
Prim is an algorithm for finding a minimum spanning tree (MST), just as Kruskal is.
An easy way to visualize the algorithm, is to draw the graph out on a piece of paper.
Then you create a moveable line (cut) over all the nodes you have selected. In the example below, the set A will be the nodes inside the cut. Then you chose the smallest edge running through the cut, i.e. from a node inside of the line to a node on the outside. Always chose the edge with the lowest weight. After adding the new node, you move the cut, so it contains the newly added node. Then you repeat untill all nodes are within the cut.
A short summary of the algorithm is:
Create a set, A, which will contain the chosen verticies. It will initially contain a random starting node, chosen by you.
Create another set, B. This will initially be empty and used to mark all chosen edges.
Choose an edge E (u, v), that is, an edge from node u to node v. The edge E must be the edge with the smallest weight, which has node u within the set A and v is not inside A. (If there are several edges with equal weight, any can be chosen at random)
Add the edge (u, v) to the set B and v to the set A.
Repeat step 3 and 4 until A = V, where V is the set of all verticies.
The set A and B now describe you spanning tree! The MST will contain the nodes within A and B will describe how they connect.
Kruskal:
Kruskal is similar to Prim, except you have no cut. So you always chose the smallest edge.
Create a set A, which initially is empty. It will be used to store chosen edges.
Chose the edge E with minimum weight from the set E, which is not already in A. (u,v) = (v,u), so you can only traverse the edge one direction.
Add E to A.
Repeat 2 and 3 untill A and E are equal, that is, untill you have chosen all edges.
I am unsure about the exact performance on these algorithms, but I assume Kruskal is O(E log E) and the performance of Prim is based on which data structure you use to store the edges. If you use a binary heap, searching for the smallest edge is faster than if you use an adjacency matrix for storing the minimum edge.
Hope this helps!

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio