Graph represented as adjacency list, as binary tree, is it possible? - algorithm

apologies first, english is not my first language.
So here's my understanding on graph that's represented as adjancey list: It's usually used for sparse graph, which is the case for most of graphs, and it uses V (number of vertex) lists. so, V head pointers + 2e (# of edges) nodes for undirected graph. Therefore, space complexity = O(E+V)
Since any node can have upto V-1 edges (excluding itself) it has time complexity of O(V) to check a node's adjacency.
As to check all the edges, it takes O(2e + V) so O(v + e)
Now, since it's mostly used for sparse graph, it's rarely O(v) to check adjacency, but simply the number of edges a given vertex has (which is O(V) at worst since V-1 is the possible maximum)
What I'm wondering is, is it possible to make the list (the edge nodes) binary tree? So to find out whether node A is adjacent to node B, time complexity would be O(logn) and not linear O(n).
If it is possible, is it actually done quite often? Also, what is that kind of data structure called? I've been googling if such combinations are possible but couldn't find anything. I would be very grateful if anyone could explain this to me in detail as i'm new to data structure. Thank you.
Edit: I know binary search can be performed on arrays. I'm talking about linked list representation, I thought I made it obvious when I said heads to the lists but wow

There's no reason the adjacency list for each vertex couldn't be stored as a binary tree, but there are tradoffs.
As you say, this adjacency list representation is often used for sparse graphs. Often, "sparse graph" means that a particular vertex is adjacent to few others. So your "adjacency list" for a particular vertex would be very small. Whereas it's true that binary search is O(log n) and sequential search is O(n), when n is very small sequential search is faster. I've seen cases where sequential search beats binary search when n is smaller than 16. It depends on the implementation, of course, but don't count on binary search being faster for small lists.
Another thing to think about is memory. Linked list overhead is one pointer per node. Unless, of course, you're using a doubly linked list. Binary tree overhead is two pointers per node. Perhaps not a big deal, but if you're trying to represent a very large graph, that extra pointer will become important.
If the graph will be updated frequently at run time, you have to take that into account, too. Adding a new edge to a linked list of edges is an O(1) operation. But adding an edge to a binary tree will require O(log n). And you want to make sure you keep that tree balanced. An unbalanced tree starts to act like a linked list.
So, yes, you could make your adjacency lists binary trees. You have to decide whether it's worth the extra effort, based on your application's speed requirements and the nature of your data.

Related

Efficient way to check whether a graph has become disconnected?

Using a disjoint set forest, we can efficiently check incrementally whether a graph has become connected while adding edges to it. (Or rather, it allows us to check whether two vertices already belong to the same connected component.) This is useful, for example, in Kruskal's algorithm for computing the minimum spanning tree.
Is there a similarly efficient data structure and algorithm to check whether a graph has become disconnected while removing edges from it?
More formally, I have a connected graph G = (V, E) from which I'm iteratively removing edges one by one. Without loss of generality, let's say these are e_1, e_2, and so on. I want to stop right before G becomes disconnected, so I need a way to check whether the removal of e_i will make the graph disconnected. This can of course be done by removing e_i and then doing a breadth-first or depth-first search from an arbitrary vertex, but this is an O(|E|) operation, making the entire algorithm O(|E|²). I'm hoping there is a faster way by somehow leveraging results from the previous iteration.
I've looked at partition refinement but it doesn't quite solve this problem.
There’s a family of data structures that are designed to support queries of the form “add an edge between two nodes,” “delete an edge between two nodes,” and “is there a path from node x to node y?” This is called the dynamic connectivity problem and there are many data structures that you can use to solve it. Unfortunately, these data structures tend to be much more complicated than a disjoint-set forest, so you may want to search around to see if you can find a ready-made implementation somewhere.
The layered forest structure of Holm et al works by maintaining a hierarchy of spanning forests for the graph and doing some clever bookkeeping with edges to avoid rescanning them when seeing if two parts of the graph are still linked. It supports adding and deleting edges in amortized time O(log2 n) and queries of the form “are these two nodes connected?” in time O(log n / log log n).
A more recent randomized structure called the randomized cutset was developed by Kapron et al. It has worst-case O(log5 n) costs for all three operations (IIRC).
There might be a better way to solve your particular problem that doesn’t require these heavyweight approaches, but I figured I’d mention these in case they’re helpful.

Can binomial heap be used to find connected components in a graph?

How can a binomial heap be useful in finding connected components of a graph, it cannot be used then why?
I've never seen binomial heaps used this way, since graph connected components are usually found using a depth-first search or breadth-first search, and neither algorithm requires you to use any sort of priority queue. You could, of course, do a sort of "priority-first search" to find connected components by replacing the stack or queue of DFS or BFS with a priority queue, but there's little reason to do so. That would slow the cost of finding connected components down to O(m + n log n) rather than the O(m + n) you'd get from a vanilla BFS or DFS.
There is one way in which you can tenuously say that binomial heaps might be useful, and that's in a different strategy for finding connected components. You can, alternatively, use a disjoint-set forest to identify connected components. You begin with each node in its own partition, then call the union operation for each edge to link nodes together. When you've finished, you will end up with a collection of trees, each of which represents one connected component.
There are many strategies for determining how to link trees in a disjoint-set forest. One of them is union-by-size, in which whenever you need to pick which representative to change, you pick the tree of smaller size and point it at the tree of larger size. You can prove that the smallest tree of height k that can be formed this way is a binomial tree of rank k. That's formed by pairing off all the nodes, then taking the representatives and pairing them off, etc. (Try it for yourself - isn't that cool?)
But that, to me, feels more like a coincidence than anything else. This is less about binomial heaps and more about binomial trees, and this particular shape only arises if you're looking for a pathological case rather than as a matter of course in the execution of the algorithm.
So the best answer I have is "technically you could do this, but you shouldn't, and technically binomial trees arise in this other context that's related to connected components, but that's not the same as using binomial heaps."
Hope this helps!

What is the Time complexity for finding universal sink given the adjecency list representation

There are many variants of this question asking the solution in O(|V|) time.
But what is the worst case bound if I wanna compute if there is a universal sink in the graph and I have graph represented in adjacency lists. This is important because all other algorithms seem to be better for adjacency lists, so if finding universal sink is not too frequent operation that I need, I will definitely go ahead for lists rather than matrix.
In my opinion, the time complexity would be the size of the graph, that is O(|V| + |E|). the algorithm for finding universal sink of a graph is as follows. Assuming in-neighbor list, Start from the index 1 of a graph. Check the length of adjacency list at index 1, if it is |V| - 1, then traverse the list to check if there is a self loop. If list does not have a self loop and all other vertices are part of a list, store the list index. Then, we must go through other lists to check if this vertex is part of their list. If it is, then the stored vertex cannot be a universal sink. Continue the search from the next index. Even if list is out-neighbor list, we will have to search the vertices which have list with length = 0, then search all other lists to check if this vertex exists in their respective lists.
As it can be concluded from above explanation, no matter what form of adjacency list is considered, in worst case, finding the universal sink must traverse through all the vertices and edges once, hence the complexity is the size of the graph, i.e. O(|V|+|E|)
But my friend who has recently joined as a assistant professor at a university, mentioned it has to be O(|V|*|V|). I am reviewing his notes before he starts teaching the course in the spring, but before correcting it I wanna be one hundred percent sure.
You're quite correct. We can build the structures we need to track all of the intermediate results, but the basic complexity is still straightforward: we go through all of our edges once, marking and counting references. We can even build a full transition matrix in O(E) time.
Depending on the data structures, we may find an improvement by a second pass over all edges, but 2 * O(E) is still O(E).
Then we traverse each node once, looking for in/out counts and a self-loop.

Do I have to implement Adjacency matrix with BFS?

I am trying to implement BFS algorithm using queue and I do not want to look for any online code for learning purposes. All what I am doing is just following algorithms and try to implement it. I have a question regarding for Adjacency matrix (data structure for graph).
I know one common graph data structures is adjacency matrix. So, my question here, Do I have to implement Adjacency matrix along with BFS algorithm or it does not matter.
I really got confused.
one of the things that confused me, the data for graph, where these data should be stored if there is not data structure ?
Sincerely
Breadth-first search assumes you have some kind of way of representing the graph structure that you're working with and its efficiency depends on the choice of representation you have, but you aren't constrained to use an adjacency matrix. Many implementations of BFS have the graph represented implicitly somehow (for example, as a 2D array storing a maze or as some sort of game) and work just fine. You can also use an adjacency list, which is particularly efficient for us in BFS.
The particular code you'll be writing will depend on how the graph is represented, but don't feel constrained to do it one way. Choose whatever's easiest for your application.
The best way to choose data structures is in terms of the operations. With a complete list of operations in hand, evaluate implementations wrt criteria important to the problem: space, speed, code size, etc.
For BFS, the operations are pretty simple:
Set<Node> getSources(Graph graph) // all in graph with no in-edges
Set<Node> getNeighbors(Node node) // all reachable from node by out-edges
Now we can evaluate graph data structure options in terms of n=number of nodes:
Adjacency matrix:
getSources is O(n^2) time
getNeighbors is O(n) time
Vector of adjacency lists (alone):
getSources is O(n) time
getNeighbors is O(1) time
"Clever" vector of adjacency lists:
getSources is O(1) time
getNeighbors is O(1) time
The cleverness is just maintaining the sources set as the graph is constructed, so the cost is amortized by edge insertion. I.e., as you create a node, add it to the sources list because it has no out edges. As you add an edge, remove the to-node from the sources set.
Now you can make an informed choice based on run time. Do the same for space, simplicity, or whatever other considerations are in play. Then choose and implement.

Time/Space Complexity of Depth First Search

I've looked at various other StackOverflow answer's and they all are different to what my lecturer has written in his slides.
Depth First Search has a time complexity of O(b^m), where b is the
maximum branching factor of the search tree and m is the maximum depth
of the state space. Terrible if m is much larger than d, but if search
tree is "bushy", may be much faster than Breadth First Search.
He goes on to say..
The space complexity is O(bm), i.e. space linear in length of action
sequence! Need only store a single path from the root to the leaf
node, along with remaining unexpanded sibling nodes for each node on
path.
Another answer on StackOverflow states that it is O(n + m).
Time Complexity: If you can access each node in O(1) time, then with branching factor of b and max depth of m, the total number of nodes in this tree would be worst case = 1 + b + b2 + … + bm-1. Using the formula for summing a geometric sequence (or even solving it ourselves) tells that this sums to = (bm - 1)/(b - 1), resulting in total time to visit each node proportional to bm. Hence the complexity = O(bm).
On the other hand, if instead of using the branching factor and max depth you have the number of nodes n, then you can directly say that the complexity will be proportional to n or equal to O(n).
The other answers that you have linked in your question are similarly using different terminologies. The idea is same everywhere. Some solutions have added the edge count too to make the answer more precise, but in general, node count is sufficient to describe the complexity.
Space Complexity: The length of longest path = m. For each node, you have to store its siblings so that when you have visited all the children, and you come back to a parent node, you can know which sibling to explore next. For m nodes down the path, you will have to store b nodes extra for each of the m nodes. That’s how you get an O(bm) space complexity.
The complexity is O(n + m) where n is the number of nodes in your tree, and m is the number of edges.
The reason why your teacher represents the complexity as O(b ^ m), is probably because he wants to stress the difference between Depth First Search and Breadth First Search.
When using BFS, if your tree has a very large amount of spread compared to it's depth, and you're expecting results to be found at the leaves, then clearly DFS would make much more sense here as it reaches leaves faster than BFS, even though they both reach the last node in the same amount of time (work).
When a tree is very deep, and non-leaves can give information about deeper nodes, BFS can detect ways to prune the search tree in order to reduce the amount of nodes necessary to find your goal. Clearly, the higher up the tree you discover you can prune a sub tree, the more nodes you can skip.
This is harder when you're using DFS, because you're prioritize reaching a leaf over exploring nodes that are closer to the root.
I suppose this DFS time/space complexity is taught on an AI class but not on Algorithm class.
The DFS Search Tree here has slightly different meaning:
A node is a bookkeeping data structure used to represent the search
tree. A state corresponds to a configuration of the world. ...
Furthermore, two different nodes can contain the same world state if
that state is generated via two different search paths.
Quoted from book 'Artificial Intelligence - A Modern Approach'
So the time/space complexity here is focused on you visit nodes and check whether this is the goal state. #displayName already give a very clear explanation.
While O(m+n) is in algorithm class, the focus is the algorithm itself, when we store the graph as adjacency list and how we discover nodes.

Resources