Storing a big factor graph using linked lists - data-structures

Could you please tell me can we use adjacency linked lists to store a big factor graph (an undirected factor graph with 500,000 nodes and around 1,000,000 edges) considering each node is a vector? If we cannot, what is the best option for factor graph implementation?
Thanks in advance.

I don't see why not, although using arrays will be more compact than linked lists.
If the factor graph has a fixed regular structure (e.g. a grid or a long HMM) then you could do away with the adjacency lists entirely if you index your nodes to take advantage of the structure.

Related

Searching for shortest path

Isn't it always better when searching for shortest path to use for connected nodes lists instead of grid?
When using grid, you have to iterate over the grid every time, whereas using lists saves lots of time.
With adjacency matrix usually each check costs you O(n) time. It may be a bit slower than a list of connected nodes. However, you can do some fancy stuff with it. For example, if you want to delete a lot of edges, you can do it in O(1) using adjacency matrix (it may take a lot longer using a list of nodes depending on what data structure you use for it). Adjacency matrix is also a matrix. What do I mean by that? If you want to check in how many ways you can get from node A to node B in k steps, you can raise this matrix to the power of k, which is impossible to do with a list.

Implementing Graph by AdjacencyLists

I want to implement (in Java) a Graph class using AdjacencyLists, I'd use this class on minimum spanning tree for Prim's Algorithm.
I read that there's many way for doing this but I can't use data structures built upon simpler primitive data types (LinkedList, stack and so on) so I thought that maybe a good solution would be using HashTable and merge them with ArrayList instead of LinkedList.
I read that the goal of merging LinkedList with HashTable is merging advantages of LinkedList (optimal enumeration of adjacency list of vertex) and HashTable (fast searching and adding edges).
I'm wondering about two things:
Would I keep those proprieties by using ArrayList instead of LinkedList?
Would it be better using HashTable linked to another HashTable?
Any other suggestion? If I use HashTable, what would be the best way to solve collisions? I was thinking about Separate Chaining.
I assume that you desired Graph structure would be a HashTable<Vertex,ArrayList<Pair<Vertex,Float>>> mapping each vertex to its adjacent together with an edge weight.
You can use an ArrayList since you don't need to remove processed edges from the adjacency list.
In general I would not recommend linking the HashTable to a second one due to memory usage because the algorithm processes all adjacent edges of a vertex. Only if you wanted to remove a processed edge, it would help you to remove the edge for the other direction.
Note that while the HashMap + ArrayList approach is space efficient and sufficient for this algorithm to run in O(V^2), it is not recommended for dense graphs when many edge lookups are required. Checking whether an edge from A to B exists is linear in the number of adjacent vertices of A or B. If you want to retrieve them in O(1), you would want a second HashTable to store the edges. An example is given in the JGraphT Library.
Note also that it's generally recommended to use HashMap over HashTable

Do I have to implement Adjacency matrix with BFS?

I am trying to implement BFS algorithm using queue and I do not want to look for any online code for learning purposes. All what I am doing is just following algorithms and try to implement it. I have a question regarding for Adjacency matrix (data structure for graph).
I know one common graph data structures is adjacency matrix. So, my question here, Do I have to implement Adjacency matrix along with BFS algorithm or it does not matter.
I really got confused.
one of the things that confused me, the data for graph, where these data should be stored if there is not data structure ?
Sincerely
Breadth-first search assumes you have some kind of way of representing the graph structure that you're working with and its efficiency depends on the choice of representation you have, but you aren't constrained to use an adjacency matrix. Many implementations of BFS have the graph represented implicitly somehow (for example, as a 2D array storing a maze or as some sort of game) and work just fine. You can also use an adjacency list, which is particularly efficient for us in BFS.
The particular code you'll be writing will depend on how the graph is represented, but don't feel constrained to do it one way. Choose whatever's easiest for your application.
The best way to choose data structures is in terms of the operations. With a complete list of operations in hand, evaluate implementations wrt criteria important to the problem: space, speed, code size, etc.
For BFS, the operations are pretty simple:
Set<Node> getSources(Graph graph) // all in graph with no in-edges
Set<Node> getNeighbors(Node node) // all reachable from node by out-edges
Now we can evaluate graph data structure options in terms of n=number of nodes:
Adjacency matrix:
getSources is O(n^2) time
getNeighbors is O(n) time
Vector of adjacency lists (alone):
getSources is O(n) time
getNeighbors is O(1) time
"Clever" vector of adjacency lists:
getSources is O(1) time
getNeighbors is O(1) time
The cleverness is just maintaining the sources set as the graph is constructed, so the cost is amortized by edge insertion. I.e., as you create a node, add it to the sources list because it has no out edges. As you add an edge, remove the to-node from the sources set.
Now you can make an informed choice based on run time. Do the same for space, simplicity, or whatever other considerations are in play. Then choose and implement.

Adjacency matrix vs adjacency list for directed weighted graph

As an exercise I have to build a satnav system which plans the shortest and fastest routes from location to location. It has to be as fast as I can possibly make it without using too much memory.
I am having trouble deciding which structure to use to represent the graph. I understand that a matrix is better for dense graphs and that a list would be better for sparse graphs. I'm leaning more towards using a list as I'm assuming that adding vertexes will be the most taxing part of this program.
I just want get some of your guys' opinions. If I were to look at a typical road-map as a graph with various locations being nodes and the roads being edges. Would you consider it to be sparse or dense? Which structure seems better in this scenario?
I would go for lists because its only 1 time investment. The good thing about it is that it is able to iterate over all the adjacent vertices faster than matrix which is an important and most frequent steps in most of the shortest path algorithms.
So where matrix is O(N) adjacency list goes only O(k) where k is number of adjacent vertices.

Object and Pointer Graph representations

I keep seeing everywhere that there are 3 ways to represent graphs:
Objects and pointers
Adjacency matrix
Adjacency lists
However, I just plain don't understand what these Object and pointer representations are - yet every recruiter, and many blogs cite Steve Yegge's blog that they are indeed a separate representation.
This widely accepted answer to a very similar question seems to suggest that the vertex structures themselves have no internal pointers to other vertices, and instead all edges are represented by edge structures which contain pointers to the adjacent vertices.
How does this representation offer any discernible analytical advantage in any scenario?
From the top of my head, I hope I have the facts correct.
Conceptually, graph tries to represent how a set of nodes (or vertices) are related (connected) to each other (via edges).
However, in actual physical device (memory), we have a continuous array of memory cell.
So, in order to represent the graph, we can choose to use a matrix.
In this case, we use the vertex index as the row and column and the entry has value 1 if the vertices are adjacent to each other, 0 otherwise.
Alternatively, you can also represent a graph by allocating an object to represent the node/vertex which points to a list of all the nodes that are adjacent to it.
The matrix representation gives the advantage when the graph is dense, meaning when most of the nodes/vertices are connected to each other. This is because in such cases, by using the entry of matrix, it saves us from having to allocate an extra pointer (which need a word size memory) for each connection.
For sparse graph, the list approach is better because you don't need to account for the 0 entries when there is no connection between the vertices.
Hope it helps.
For now I have a hard time finding a pro w.r.t typical "graph algorithms". But it sure is possible to represent a graph with objects and pointers and a very natural thing to do if you think of it as a representation of something you just drew on a whiteboard.
Think of a scenario where you want to combine nodes of a graph in a certain order.
Nodes have payloads that contain domain data, the graph structure itself is not a core aspect of your program.
Sure, you can update your lists / matrix for every operation, but given an "objects and pointers" structure, you can do the merging locally. Further, if nodes have payloads, it means that lists/matrix will feature node id's that identify the actual node objects. A combination would mean you update your graph representation, follow the node identifiers and do the actual processing. It may feel more intuitively to work on your actual node objects and simply remove pointerswhen collapsing a neighbor (and delete that node) .
Besides, there are more ways to represent a graph:
E.g. just as triples, like Turle does
Or as offset
representation (offsets per node into an edge array), e.g. this
Boost data structure (disclaimer: I have not tested the linked
implementation myself)
etc
Here a way i have been using to create Graph with this concept :
#include <vector>
class Node
{
public:
Node();
void setLink(Node *n); // *n as argument to pass the address of the node
virtual ~Node(void);
private:
vector<Node*> m_links;
};
And the function responsible for creating the link between vertices is :
void Node::setLink(Node *n)
{
m_links.push_back(n);
}
Objects and pointers representation reduces space complexity to exactly V+E, where V is the number of vertices, E - the number of edges (down from V+2E in Adjacency List or even 2V+2E if you store index->Vertex mapping in a separate hash map), sacrificing time complexity: particular edge lookup will take O(E), which equals O(V^2) in a Dense graph (up from O(V) in Adjacency List). The space saving is achieved by removing duplicated edges that appear in the Adjacency List.

Resources