What is the distinction between sparse and dense graphs? - data-structures

I read it is ideal to represent sparse graphs by adjacency lists and dense graphs by an adjacency matrix. But I would like to understand the main difference between sparse and dense graphs.

Dense graph is a graph in which the number of edges is close to the maximal number of edges.
Sparse graph is a graph in which the number of edges is close to the minimal number of edges. Sparse graph can be a disconnected graph.

As the names indicate sparse graphs are sparsely connected (eg: Trees). Usually the number of edges is in O(n) where n is the number of vertices. Therefore adjacency lists are preferred since they require constant space for every edge.
Dense graphs are densely connected. Here number of edges is usually O(n^2). Therefore adjacency matrix is preferred.
To give a comparison, let us assume graph has 1000 vertices.
Irrespective of whether the graph is dense or sparse, adjacency matrix requires 1000^2 = 1,000,000 values to be stored.
If the graph is minimally connected (i.e. it is a tree), the adjacency list requires storing 2,997 values. If the graph is fully connected it requires storing 3,000,000 values.

From Data Structures and Algorithms with Object-Oriented Design Patterns in C++
, p. 534, by Bruno P. Reiss:
Informally, a graph with relatively few edges is sparse, and a graph with many edges is dense.
Definition (Sparse Graph): A sparse graph is a graph G = (V, E) in which |E| = O(|V|).
Definition (Dense Graph) A dense graph is a graph G = (V, E) in which |E| = Θ(|V|2).

Main graph integral characteristics are number of vertices V and number of edges E. The relation of these two determines whether graph is sparse or dense (wiki page here).
The whole theory behind choosing graph in-memory representation is about determining the optimal access time vs memory footprint tradeoff, considering subject domain and usage specifics.
Generally you want to have O(1) access time (and thus store the graph as a dense adjacency matrix) unless you can't tolerate memory footprint, in which case you choose the most appropriate sparse matrix representation (wiki page here).

In mathematics, a dense graph is a graph in which the number of edges is close to the maximal number of edges. The opposite, a graph with only a few edges, is a sparse graph. The distinction between sparse and dense graphs is rather vague, and depends on the context.

Sparse Graphs - Graphs with relatively few edges (generally if it edges < |V| log |V|) are called sparse graphs.
Dense Graphs- Graphs with relatively few of the possible edges missing (as compared to complete graph) are called dense graph.

If the number of the edges is close to the maximum number of edges in a graph, then that graph is a Dense graph. In a dense graph, every pair of vertices is connected by one edge.
The Sparse graph is completely the opposite. If a graph has only a few edges (the number of edges is close to the maximum number of edges), then it is a sparse graph.
There is no strict distinction between the sparse and the dense graphs. Typically, a sparse (connected) graph has about as many edges as vertices, and a dense graph has nearly the maximum number of edges.

Adding to above answers, A Simpler explanation can be,
Dense Graph
If all vertices or nodes in a graph is densely connected (i.e. a node that connects with all neighbour nodes with all possible edges). Here possibly, total number of edges > total number of nodes.
Sparse Graph
Its an vice versa of dense graph, Here we can observe that a node or vertices is not fully connected to its neighbouring nodes (i.e. it has unconnected/remaining edges). Here possibly, total number of edges <= total number of nodes.

Related

faster graph traversal algorithms compared to dfs

I have an undirected unweighted graph represented using adjacency matrix where each node of the graph represents a space partition (e.g. State) while the edges represent the neiborhood relationship (i.e. neighboring states sharing common boundaries). My baseline algorithm uses DFS to traverse the graph and form subgraphs after each step (i.e. adding the new node visited which would result in a bunch of contiguous states). With that subgraph I perform a statistical significance test on certain patterns which exist in the nodes of the graph (i.e. within the states).
At this point I am essentially trying to make the traversal step faster.
I was wondering if you all could suggest any algorithm or resources (e.g. research paper) which performs graph traversal computationally faster than DFS.
Thanks for your suggestion and your time!
Most graph algorithms contain "for given vertex u, list all its neighbors v" as a primitive. Not sure, but sounds like you might want to speed up this piece. Indeed, each country has only few neighbors, typically much less than the total number of countries. If this is the case, replace adjacency matrix graph representation with adjacency lists.
Note that the algorithm itself (DFS or other) will likely remain the same, with just a few changes where it uses this primitive.

Is there any other Data structure to represent Graph other than Adjacency List or Adjacency Matrix?

I was looking for different Data structures for representing Graph and I came accross Nvidia CUDA Toolkit and found out new way to represent graph with the help of source_indices, destination_offsets.
Fascinated by this innovative representation of graph, I searched out for other ways of representing Graphs. But not found anything new.
I was wondering if there was any other way to represent Graph other than Adjacency Matrix or Lists...
I was wondering if there was any other way to represent Graph other
than Adjacency Matrix or Lists...
There are alternatives to the adjacency list or the adjacency matrix, such as edge list, adjacency map or forward star to name a few. Given this graph (images taken from here):
this is the adjacency matrix representation:
this is the adjacency list representation:
this would be another alternative, the edge list:
and another pretty common one is the forward star representation:
If you get into this research field you will find a good number of approaches, mainly optimizations for specific cases, taking into account factors such as:
Graph size (number of nodes)
Density of the graph
Directed or undirected graph
Static or dynamic graph
Graph known at compile time or constructed at runtime
Node IDs (labeled sequentially or not)
...
These optimizations can, for example, support reordering of the nodes in a preprocessing stage to increase reference locality. There is a lot of work for shortest path algorithms, specially when calculating the shortest path in a world map.
One example of optimization would be a dynamic graph structure (Packed-Memory Graph (PMG)) which is suited for large-scale transportation networks.
There is another representation of graphs using Adjacency Set. It is very much similar to adjacency list but instead of using Linked lists, Disjoint Sets [Union-Find] are used. You can read about disjoint sets ADT here.
If E is the number of edges and V is the number of vertices in the graph, then Adjacency set representation of graph takes up (E+V) space.
Complexities of other operations while using adjacency set representation:
Checking edge between vertex v and w : log(Degree(v))
Iterate over edges incident to vertex v: Degree(v)

Is there a way to count the degree of a vertex in a weighted graph using an adjacency matrix?

If the graph was unweighted you would just go to the row/column (in degree/out degree) of that vertex and count the 1s. But what about a weighted graph? My professor said just count all the non-zero edges but to my understanding it is possible for an edge to have a weight of zero, correct?
So in short: Given an adjacency matrix of a weighted graph with weights of zero allowed, how can you count the degree of a certain vertex?
It doesn't really make sense to have a zero-weight edge in a weighted graph. That depends on what your weights mean for the system you are modelling, of course.
You could have a graph where some have weights and some do not, potentially. In which case, you can't record this as an adjacency matrix since you can't distinguish between 'no edge' and 'unweighted edge'.
If you really needed to have a graph where some edges have no weight encoded in a matrix, then I guess you could do some kind of simple trick like add 1 to all weights when storing in the matrix, then subtract one when you want to calculate properties over those weights.

Maximum weights bipartite matching with weighted vertices

I have a bipartite graph with two sets of vertices A and B. Edges have no weights. However, vertices in one of the sets(say set B) have positive weights assigned to them (wb1,wb2...)
I want to find a matching in this bipartite graph so as to maximize the sum of weights of vertices matched from set B.
After an extensive online search, this is what I have come up with : Assign weight wbi to all edges incident on vertex bi and run the Hungarian algorithm.
Is there a more efficient way to look at this problem, since it's different from weighted maximum matching (here vertices have weights as opposed to edges)
If my language is not clear, feel free to edit. Thank you.
If an improvement from O(V^3) to O(V E) and a simpler algorithm is worth it (it isn't asymptotically for the densest graphs), you could exploit the matroid structure of matchings as follows. Instantiate Ford--Fulkerson by repeatedly choosing a path to an unmatched vertex in B whose weight is as large as possible.

Which algorithm+representation should I use for finding minimum path distance in a huge sparse undirected graph?

I need to find the minimum distance between two nodes in a undirected graph, here are some details
The graph is undirected and huge(no of nodes is in order of 100,000)
The graph is sparse, no of edges is less than no of nodes
I am not interested in the actual path, just the distance.
What representation and algorithm should I use for a) Space efficiency b)time efficiency?
EDIT: If it matters,
The wieghts are all non zero positive integers.
No node connects to itself.
Only single edge between two adjacent nodes
It depends on the weights of the edges. If they are non-negative - Dijkstra suits you. If your graph is planar, you can use A*.
If you have negative weights, you have to use Bellman-Ford.

Resources