I am trying to implement BFS algorithm using queue and I do not want to look for any online code for learning purposes. All what I am doing is just following algorithms and try to implement it. I have a question regarding for Adjacency matrix (data structure for graph).
I know one common graph data structures is adjacency matrix. So, my question here, Do I have to implement Adjacency matrix along with BFS algorithm or it does not matter.
I really got confused.
one of the things that confused me, the data for graph, where these data should be stored if there is not data structure ?
Sincerely
Breadth-first search assumes you have some kind of way of representing the graph structure that you're working with and its efficiency depends on the choice of representation you have, but you aren't constrained to use an adjacency matrix. Many implementations of BFS have the graph represented implicitly somehow (for example, as a 2D array storing a maze or as some sort of game) and work just fine. You can also use an adjacency list, which is particularly efficient for us in BFS.
The particular code you'll be writing will depend on how the graph is represented, but don't feel constrained to do it one way. Choose whatever's easiest for your application.
The best way to choose data structures is in terms of the operations. With a complete list of operations in hand, evaluate implementations wrt criteria important to the problem: space, speed, code size, etc.
For BFS, the operations are pretty simple:
Set<Node> getSources(Graph graph) // all in graph with no in-edges
Set<Node> getNeighbors(Node node) // all reachable from node by out-edges
Now we can evaluate graph data structure options in terms of n=number of nodes:
Adjacency matrix:
getSources is O(n^2) time
getNeighbors is O(n) time
Vector of adjacency lists (alone):
getSources is O(n) time
getNeighbors is O(1) time
"Clever" vector of adjacency lists:
getSources is O(1) time
getNeighbors is O(1) time
The cleverness is just maintaining the sources set as the graph is constructed, so the cost is amortized by edge insertion. I.e., as you create a node, add it to the sources list because it has no out edges. As you add an edge, remove the to-node from the sources set.
Now you can make an informed choice based on run time. Do the same for space, simplicity, or whatever other considerations are in play. Then choose and implement.
Related
I understand there are 3 common ways to represent graphs:
Adjacency Matrix
Adjacency List
Edge list
That said, problems I’ve solved on LeetCode often use matrices and the solution requires DFS or BFS. For example, given the matrix below, find if a target string exists when you go left, right, up, and down (but not diagonal).
[
[‘a’,‘p’,’p’],
[‘e’,’a’,’l’],
[‘r’,’t’,’e’]
]
This required a DFS approach. Is this because this matrix represents a graph or does DFS and BFS apply to matrices too and not just trees and graphs?
Are DFS and BFS always/mostly used against matrices (2D arrays) in implementation or are there cases where it’s used against a Graph class?
Graph algorithms are often used to solve problems on data structures that do not explicitly represent a graph, like your matrix. The graph is not in the data. It's in your head when you solve the problem. For example, "If I think of this as a graph, the I can solve the problem with DFS or BFS". Then you write a BFS or DFS algorithm, mapping the traversal operations to whatever is equivalent in the data structure you do have.
This is called operating on the "implicit graph": https://en.wikipedia.org/wiki/Implicit_graph
If you actually made a graph data structure out of your data -- an explicit graph -- then you could write a BFS or DFS on that directly, but it's often unnecessary and in fact wasteful.
I want to implement (in Java) a Graph class using AdjacencyLists, I'd use this class on minimum spanning tree for Prim's Algorithm.
I read that there's many way for doing this but I can't use data structures built upon simpler primitive data types (LinkedList, stack and so on) so I thought that maybe a good solution would be using HashTable and merge them with ArrayList instead of LinkedList.
I read that the goal of merging LinkedList with HashTable is merging advantages of LinkedList (optimal enumeration of adjacency list of vertex) and HashTable (fast searching and adding edges).
I'm wondering about two things:
Would I keep those proprieties by using ArrayList instead of LinkedList?
Would it be better using HashTable linked to another HashTable?
Any other suggestion? If I use HashTable, what would be the best way to solve collisions? I was thinking about Separate Chaining.
I assume that you desired Graph structure would be a HashTable<Vertex,ArrayList<Pair<Vertex,Float>>> mapping each vertex to its adjacent together with an edge weight.
You can use an ArrayList since you don't need to remove processed edges from the adjacency list.
In general I would not recommend linking the HashTable to a second one due to memory usage because the algorithm processes all adjacent edges of a vertex. Only if you wanted to remove a processed edge, it would help you to remove the edge for the other direction.
Note that while the HashMap + ArrayList approach is space efficient and sufficient for this algorithm to run in O(V^2), it is not recommended for dense graphs when many edge lookups are required. Checking whether an edge from A to B exists is linear in the number of adjacent vertices of A or B. If you want to retrieve them in O(1), you would want a second HashTable to store the edges. An example is given in the JGraphT Library.
Note also that it's generally recommended to use HashMap over HashTable
apologies first, english is not my first language.
So here's my understanding on graph that's represented as adjancey list: It's usually used for sparse graph, which is the case for most of graphs, and it uses V (number of vertex) lists. so, V head pointers + 2e (# of edges) nodes for undirected graph. Therefore, space complexity = O(E+V)
Since any node can have upto V-1 edges (excluding itself) it has time complexity of O(V) to check a node's adjacency.
As to check all the edges, it takes O(2e + V) so O(v + e)
Now, since it's mostly used for sparse graph, it's rarely O(v) to check adjacency, but simply the number of edges a given vertex has (which is O(V) at worst since V-1 is the possible maximum)
What I'm wondering is, is it possible to make the list (the edge nodes) binary tree? So to find out whether node A is adjacent to node B, time complexity would be O(logn) and not linear O(n).
If it is possible, is it actually done quite often? Also, what is that kind of data structure called? I've been googling if such combinations are possible but couldn't find anything. I would be very grateful if anyone could explain this to me in detail as i'm new to data structure. Thank you.
Edit: I know binary search can be performed on arrays. I'm talking about linked list representation, I thought I made it obvious when I said heads to the lists but wow
There's no reason the adjacency list for each vertex couldn't be stored as a binary tree, but there are tradoffs.
As you say, this adjacency list representation is often used for sparse graphs. Often, "sparse graph" means that a particular vertex is adjacent to few others. So your "adjacency list" for a particular vertex would be very small. Whereas it's true that binary search is O(log n) and sequential search is O(n), when n is very small sequential search is faster. I've seen cases where sequential search beats binary search when n is smaller than 16. It depends on the implementation, of course, but don't count on binary search being faster for small lists.
Another thing to think about is memory. Linked list overhead is one pointer per node. Unless, of course, you're using a doubly linked list. Binary tree overhead is two pointers per node. Perhaps not a big deal, but if you're trying to represent a very large graph, that extra pointer will become important.
If the graph will be updated frequently at run time, you have to take that into account, too. Adding a new edge to a linked list of edges is an O(1) operation. But adding an edge to a binary tree will require O(log n). And you want to make sure you keep that tree balanced. An unbalanced tree starts to act like a linked list.
So, yes, you could make your adjacency lists binary trees. You have to decide whether it's worth the extra effort, based on your application's speed requirements and the nature of your data.
As an exercise I have to build a satnav system which plans the shortest and fastest routes from location to location. It has to be as fast as I can possibly make it without using too much memory.
I am having trouble deciding which structure to use to represent the graph. I understand that a matrix is better for dense graphs and that a list would be better for sparse graphs. I'm leaning more towards using a list as I'm assuming that adding vertexes will be the most taxing part of this program.
I just want get some of your guys' opinions. If I were to look at a typical road-map as a graph with various locations being nodes and the roads being edges. Would you consider it to be sparse or dense? Which structure seems better in this scenario?
I would go for lists because its only 1 time investment. The good thing about it is that it is able to iterate over all the adjacent vertices faster than matrix which is an important and most frequent steps in most of the shortest path algorithms.
So where matrix is O(N) adjacency list goes only O(k) where k is number of adjacent vertices.
What kind of problems on graphs is faster (in terms of big-O) to solve using incidence matrix data structures instead of more widespread adjacency matrices?
The space complexities of the structures are:
Adjacency: O(V^2)
Incidence: O(VE)
With the consequence that an incidence structure saves space if there are many more vertices than edges.
You can look at the time complexity of some typical graph operations:
Find all vertices adjacent to a vertex:
Adj: O(V)
Inc: O(VE)
Check if two vertices are adjacent:
Adj: O(1)
Inc: O(E)
Count the valence of a vertex:
Adj: O(V)
Inc: O(E)
And so on. For any given algorithm, you can use building blocks like the above to calculate which representation gives you better overall time complexity.
As a final note, using a matrix of any kind is extremely space-inefficient for all but the most dense of graphs, and I recommend against using either unless you've consciously dismissed alternatives like adjacency lists.
I personally have never found a real application of the incidence matrix representation in a programming contest or research problem. I think that is may be useful for proving some theorems or for some very special problems. One book gives an example of "counting the number of spanning trees" as a problem in which this representation is useful.
Another issue with this representation is that it makes no sense to store it, because it is really easy to compute it dynamically (to answer what given cell contains) from the list of edges.
It may seem more useful in hyper-graphs however, but only if it is dense.
So my opinion is - it is useful only for theoretical works.