does DFS and BFS in O(n+m) change? - algorithm

We know there is an O(n+m)‌ ‌solution (DFS ‌or BFS) for checking if there is a path from s to t in a Undirected Graph G with n vertexes and m edges... that would be implemented via an adjacency List.
If I implement my program with Adjacency Matrix, will the runtime be affected? Is this a good or bad choice?
Edit: I‌ Need to calculate the time complexity, in this way, any idea?

Assuming that the input to your code will be n and m ( number of nodes and the number of edges ) followed by m lines of the type a b signifying there is an edge between vertex a and vertex b. Now you take an adjacency matrix M[][] such that M[i][j]=1 if there is an edge between i and j otherwise M[i][j]=0 ( as graph is undirected the matrix will be symmetric, thus you can only store the upper/lower half matrix reducing memory by half ). Now you will have to take the matrix and initialize it to 0 ( all the cells ) and while scanning the edges mark M[a][b]=M[b][a]=1. Now the initializing part is O(n^2). Scanning and marking the edges is O(m). Now lets look at the BFS/DFS routine. When you are at a node you try to see all its unvisited vertices. Now say we want to know the neighbors of vertex a, you will have to do for(int i=0;i<n;i++) if (M[a][i]==1) ( assuming 0 based indexing ). Now this has to be done for each vertex and thus the complexity of routine becomes O(n^2) even if m < (n*(n-1))/2 ( assuming simple graph with no multiple edges and loops m can at maximum be (n*(n-1))/2 ). Thus overall your complexity becomes O(n^2). Then whats the use of adjacency matrix ? well the DFS/BFS might be just a part of a big algorithm, your algorithm might also require one tell if there is an edge between node a and b at which adjacency matrix takes O(1) time. Thus whether to choose adjacency list or adjacency matrix really depends on your algorithm ( such as maximum memory you can take, time complexity for things like DFS/BFS routine or answering queries whether two vertices are connected etc. ) .
Hope I answered your query.

Related

Time Complexity Analysis of BFS

I know that there are a ton of questions out there about the time complexity of BFS which is : O(V+E)
However I still struggle to understand why is the time complexity O(V+E) and not O(V*E)
I know that O(V+E) stands for O(max[V,E]) and my only guess is that it has something to do with the density of the graph and not with the algorithm itself unlike say Merge Sort where it's time complexity is always O(n*logn).
Examples I've thought of are :
A Directed Graph with |E| = |V|-1 and yeah the time complexity will be O(V)
A Directed Graph with |E| = |V|*|V-1| and the complexity would in fact be O(|E|) = O(|V|*|V|) as each vertex has an outgoing edge to every other vertex besides itself
Am I in the right direction? Any insight would be really helpful.
Your "examples of thought" illustrate that the complexity is not O(V*E), but O(E). True, E can be a large number in comparison with V, but it doesn't matter when you say the complexity is O(E).
When the graph is connected, then you can always say it is O(E). The reason to include V in the time complexity, is to cover for the graphs that have many more vertices than edges (and thus are disconnected): the BFS algorithm will not only have to visit all edges, but also all vertices, including those that have no edges, just to detect that they don't have edges. And so we must say O(V+E).
The complexity comes off easily if you walk through the algorithm. Let Q be the FIFO queue where initially it contains the source node. BFS basically does the following
while Q not empty
pop u from Q
for each adjacency v of u
if v is not marked
mark v
push v into Q
Since each node is added once and removed once then the while loop is done O(V) times. Also each time we pop u we perform |adj[u]| operations where |adj[u]| is the number of
adjacencies of u.
Therefore the total complexity is Sum (1+|adj[u]|) over all V which is O(V+E) since the sum of adjacencies is O(E) (2E for undirected graph and E for a directed one)
Consider a situation when you have a tree, maybe even with cycles, you start search from the root and your target is the last leaf of your tree. In this case you will traverse all the edges before you get into your destination.
E.g.
0 - 1
1 - 2
0 - 2
0 - 3
In this scenario you will check 4 edges before you actually find a node #3.
It depends on how the adjacency list is implemented. A properly implemented adjacency list is a list/array of vertices with a list of related edges attached to each vertex entry.
The key is that the edge entries point directly to their corresponding vertex array/list entry, they never have to search through the vertex array/list for a matching entry, they can just look it up directly. This insures that the total number of edge accesses is 2E and the total number of vertex accesses is V+2E. This makes the total time O(E+V).
In improperly implemented adjacency lists, the vertex array/list is not directly indexed, so to go from an edge entry to a vertex entry you have to search through the vertex list which is O(V), which means that the total time is O(E*V).

Finding size of largest connected component of a graph

Consider we have a random undirected graph G = (V,E) with n vertices, now suppose for any two vertices u and v ∈ V, the probability that the edge between u and v ∈ E is 1/n. We need to figure out the size of the largest connected component in the undirected graph C(n).
C(n) should be equal to Θ(n**a), we need to run some experiments to give an estimate of a.
I am a bit confused on how to link the probability 1/n to the largest connected component, is there any way I can do so?
The process you're simulating here is called the Erdős–Rényi model. You have a collection of n nodes, and each pair of nodes has probability p of being linked. The (expected) shape of the resulting graph depends heavily on the choice of p, and there are a lot of famous results about this.
As for how to do this: one option would be to create a collection of n nodes, iterate over all pairs of nodes, and link them with probability 1/n. You can then run an algorithm like BFS or DFS over the graph to find and size the connected components.
Another would be to use the above approach, except that instead of doing a BFS or DFS to use a disjoint-set forest to perform the links and find the largest connected component.
Alternatively, because each edge is absent or present with equal probability and independently of every other edge, the number of edges you have is binomially distributed and pretty tightly packed around n total edges. You could therefore generate n random edges, add them into the graph, then use the above techniques. (This will be much faster, as this does O(n) work rather than O(n2) work to process the edges.)
Once you've gotten this worked out, you can vary n over a large range and run some sort of polynomial regression on it to find the best-first curve. That's something you could either code up yourself, or which you could do by importing your data into Excel and using its regression tools.
As a spoiler, when you're done you'll find that the number of nodes in the largest connected component is Θ(n2/3). If you search for "Erdős–Rényi critical case," you can find online proofs of this result. It's not a trivial result to prove (and definitely isn't obvious!), but it'll drop out of your empirical analysis.

Graph In-degree Calculation from Adjacency-list

I came across this question in which it was required to calculate in-degree of each node of a graph from its adjacency list representation.
for each u
for each Adj[i] where i!=u
if (i,u) ∈ E
in-degree[u]+=1
Now according to me its time complexity should be O(|V||E|+|V|^2) but the solution I referred instead described it to be equal to O(|V||E|).
Please help and tell me which one is correct.
Rather than O(|V||E|), the complexity of computing indegrees is O(|E|). Let us consider the following pseudocode for computing indegrees of each node:
for each u
indegree[u] = 0;
for each u
for each v \in Adj[u]
indegree[v]++;
First loop has linear complexity O(|V|). For the second part: for each v, the innermost loop executes at most |E| times, while the outermost loop executes |V| times. Therefore the second part appears to have complexity O(|V||E|). In fact, the code executes an operation once for each edge, so a more accurate complexity is O(|E|).
According to http://www.cs.yale.edu/homes/aspnes/pinewiki/C(2f)Graphs.html, Section 4.2, with an adjacency list representation,
Finding predecessors of a node u is extremely expensive, requiring looking through every list of every node in time O(n+m), where m is the total number of edges.
So, in the notation used here, the time complexity of computing the in-degree of a node is O(|V| + |E|).
This can be reduced at the cost of additional space of using extra space, however. The Wiki also states that
adding a second copy of the graph with reversed edges lets us find all predecessors of u in O(d-(u)) time, where d-(u) is u's in-degree.
An example of a package which implements this approach is the Python package Networkx. As you can see from the constructor of the DiGraph object for directional graphs, networkx keeps track of both self._succ and self._pred, which are dictionaries representing the successors and predecessors of each node, respectively. This allows it to compute each node's in_degree efficiently.
O(|V|+|E|) is the correct answer, because you visit each vertex in O(|V|) and each time you visit a fraction of the edges so O(|E|) in total, also usually |E|>>|V| so O(|E|) is also correct

Directed graph (topological sort)

Say there exists a directed graph, G(V, E) (V represents vertices and E represents edges), where each edge (x, y) is associated with a weight (x, y) where the weight is an integer between 1 and 10.
Assume s and tare some vertices in V.
I would like to compute the shortest path from s to t in time O(m + n), where m is the number of vertices and n is the number of edges.
Would I be on the right track in implementing topological sort to accomplish this? Or is there another technique that I am overlooking?
The algorithm you need to use for finding the minimal path from a given vertex to another in a weighted graph is Dijkstra's algorithm. Unfortunately its complexity is O(n*log(n) + m) which may be more than you try to accomplish.
However in your case the edges are special - their weights have only 10 valid values. Thus you can implement a special data structure(kind of a heap, but takes advantage of the small dataset for the wights) to have all operations constant.
One possible way to do that is to have 10 lists - one for each weight. Adding an edge in the data structure is simply append to a list. Finding the minimum element is iteration over the 10 lists to find the first one that is non-empty. This still is constant as no more than 10 iterations will be performed. Removing the minimum element is also pretty straight-forward - simple removal from a list.
Using Dijkstra's algorithm with some data structure of the same asymptotic complexity will be what you need.

Path from s to e in a weighted DAG graph with limitations

Consider a directed graph with n nodes and m edges. Each edge is weighted. There is a start node s and an end node e. We want to find the path from s to e that has the maximum number of nodes such that:
the total distance is less than some constant d
starting from s, each node in the path is closer than the previous one to the node e. (as in, when you traverse the path you are getting closer to your destination e. in terms of the edge weight of the remaining path.)
We can assume there are no cycles in the graph. There are no negative weights. Does an efficient algorithm already exist for this problem? Is there a name for this problem?
Whatever you end up doing, do a BFS/DFS starting from s first to see if e can even be reached; this only takes you O(n+m) so it won't add to the complexity of the problem (since you need to look at all vertices and edges anyway). Also, delete all edges with weight 0 before you do anything else since those never fulfill your second criterion.
EDIT: I figured out an algorithm; it's polynomial, depending on the size of your graphs it may still not be sufficiently efficient though. See the edit further down.
Now for some complexity. The first thing to think about here is an upper bound on how many paths we can actually have, so depending on the choice of d and the weights of the edges, we also have an upper bound on the complexity of any potential algorithm.
How many edges can there be in a DAG? The answer is n(n-1)/2, which is a tight bound: take n vertices, order them from 1 to n; for two vertices i and j, add an edge i->j to the graph iff i<j. This sums to a total of n(n-1)/2, since this way, for every pair of vertices, there is exactly one directed edge between them, meaning we have as many edges in the graph as we would have in a complete undirected graph over n vertices.
How many paths can there be from one vertex to another in the graph described above? The answer is 2n-2. Proof by induction:
Take the graph over 2 vertices as described above; there is 1 = 20 = 22-2 path from vertex 1 to vertex 2: (1->2).
Induction step: assuming there are 2n-2 paths from the vertex with number 1 of an n vertex graph as described above to the vertex with number n, increment the number of each vertex and add a new vertex 1 along with the required n edges. It has its own edge to the vertex now labeled n+1. Additionally, it has 2i-2 paths to that vertex for every i in [2;n] (it has all the paths the other vertices have to the vertex n+1 collectively, each "prefixed" with the edge 1->i). This gives us 1 + Σnk=2 (2k-2) = 1 + Σn-2k=0 (2k-2) = 1 + (2n-1 - 1) = 2n-1 = 2(n+1)-2.
So we see that there are DAGs that have 2n-2 distinct paths between some pairs of their vertices; this is a bit of a bleak outlook, since depending on weights and your choice of d, you may have to consider them all. This in itself doesn't mean we can't choose some form of optimum (which is what you're looking for) efficiently though.
EDIT: Ok so here goes what I would do:
Delete all edges with weight 0 (and smaller, but you ruled that out), since they can never fulfill your second criterion.
Do a topological sort of the graph; in the following, let's only consider the part of the topological sorting of the graph from s to e, let's call that the integer interval [s;e]. Delete everything from the graph that isn't strictly in that interval, meaning all vertices outside of it along with the incident edges. During the topSort, you'll also be able to see whether there is a
path from s to e, so you'll know whether there are any paths s-...->e. Complexity of this part is O(n+m).
Now the actual algorithm:
traverse the vertices of [s;e] in the order imposed by the topological
sorting
for every vertex v, store a two-dimensional array of information; let's call it
prev[][] since it's gonna store information about the predecessors
of a node on the paths leading towards it
in prev[i][j], store how long the total path of length (counted in
vertices) i is as a sum of the edge weights, if j is the predecessor of the
current vertex on that path. For example, pres+1[1][s] would have
the weight of the edge s->s+1 in it, while all other entries in pres+1
would be 0/undefined.
when calculating the array for a new vertex v, all we have to do is check
its incoming edges and iterate over the arrays for the start vertices of those
edges. For example, let's say vertex v has an incoming edge from vertex w,
having weight c. Consider what the entry prev[i][w] should be.
We have an edge w->v, so we need to set prev[i][w] in v to
min(prew[i-1][k] for all k, but ignore entries with 0) + c (notice the subscript of the array!); we effectively take the cost of a
path of length i - 1 that leads to w, and add the cost of the edge w->v.
Why the minimum? The vertex w can have many predecessors for paths of length
i - 1; however, we want to stay below a cost limit, which greedy minimization
at each vertex will do for us. We will need to do this for all i in [1;s-v].
While calculating the array for a vertex, do not set entries that would give you
a path with cost above d; since all edges have positive weights, we can only get
more costly paths with each edge, so just ignore those.
Once you reached e and finished calculating pree, you're done with this
part of the algorithm.
Iterate over pree, starting with pree[e-s]; since we have no cycles, all
paths are simple paths and therefore the longest path from s to e can have e-s edges. Find the largest
i such that pree[i] has a non-zero (meaning it is defined) entry; if non exists, there is no path fitting your criteria. You can reconstruct
any existing path using the arrays of the other vertices.
Now that gives you a space complexity of O(n^3) and a time complexity of O(n²m) - the arrays have O(n²) entries, we have to iterate over O(m) arrays, one array for each edge - but I think it's very obvious where the wasteful use of data structures here can be optimized using hashing structures and other things than arrays. Or you could just use a one-dimensional array and only store the current minimum instead of recomputing it every time (you'll have to encapsulate the sum of edge weights of the path together with the predecessor vertex though since you need to know the predecessor to reconstruct the path), which would change the size of the arrays from n² to n since you now only need one entry per number-of-nodes-on-path-to-vertex, bringing down the space complexity of the algorithm to O(n²) and the time complexity to O(nm). You can also try and do some form of topological sort that gets rid of the vertices from which you can't reach e, because those can be safely ignored as well.

Resources