I am trying to find a suitable algorithm to solve this: suppose I have some (oriented graph) nodes. Each node might have or not a parent (meaning at most one parent). Suppose this notation for a node: (id, id_parent). Some nodes will be (id_i, NULL) while there will be nodes (id_j, id_i) as "sons" of id_i . Having an array of these nodes in a particular order, I want to get them sorted in this order: parent-son-son of son-son-son of son, etc.
Example: nodes (1, NULL), (2,NULL), (3,1), (4,3), (5,2), (6,3)
The sorted array will be: (1,NULL), (3,1), (4,3), (6,3), (2, NULL), (5,2) . A kind of in-depth tree exploration.
Which algorithm would be suitable for achieving this? Thanks
If the graph has no cycles - it is a DAG, and you are looking for topoloical sort.
If it has cycles - there is no such ordering, since in the cycle, there will be a node, which its son is also its ancestor.
EDIT:
If the graph is a forest (disjoint union of trees) - then a simple DFS on it from sources will do. Just construct the graph (It is O(nlogn) to sort, if it is not already sorted, or O(n) using radix sort), find the list of sources, and do the DFS from each source, and each time you visit a node, store it in an output array. Iterate while there are undiscovered vertices.
Related
Depth First Search allows to traverse adjacent vertices in an arbitrary order.
Are there any advantages in choosing random neighbours vs choosing neighbours in ascending fashion?
Consider the exploration order of following graph:
0 -> 9 -> 8 -> 7 ..
0 -> 1 -> 8 -> 7 ..
Can random choice lead to more favourable results?
Can I find a situation in which there is an advantage?
Yes. Easily. If I have a connected graph, traversing with random choices is guaranteed to eventually reach every node. Traversing in order is guaranteed to eventually wind up in a loop.
Is there an advantage in general?
No. For example in the connected example, a simple "keep track of where we have been" makes both reach everything. Which one will find a target node first will be a question of chance.
First of all, the DFS is not responsible for the arbitrary order.
The way of traversing the nodes depend on 2 things:
The order in which you pushed the adjacent nodes in the adjacency list (Generally influenced by the order of the edges provided in the input).
The custom order which you decide for traversal (as you mentioned, the sorted order).
Answer to your question:
Choosing the order in which you pushed the adjacent nodes in the adjacency list does not affect the complexity.
If you decide to traverse adjacent nodes in sorted, then you need to maintain the adjacency list accordingly, which comes at the cost of extra factor of V*log(V).
Overall time complexity = O(V+E) + O(V * log(V)).
O(V+E) => for DFS.
O(V * log(V)) => for priority queue/sorting the adjacency list.
Here, V = number of nodes in the graph and E = number of edges.
I was trying to answer the following question from a textbook.
Given n nodes of a forest and their edges, describe and prove an algorithm that finds the number of trees in the forest.
Take for instance this graph:
Suppose that we explore it with a DFS, starting at vertex 0, that visits the vertices in the following order: 0, 2, 1, 5, 4, 3, 6.
The corresponding DFS forest will look like this:
.
The above forest contains two trees and I think we can use the below algorithm to count the number of trees in a forest. [Source of algorithm]
Apply DFS on every node.
Increment count by one if every connected node is visited from one source.
Again perform DFS traversal if some nodes yet not visited.
Count will give the number of trees in forest.
My question is: How can I prove that this algorithm is correct, and is there a more efficient algorithm to get the number of trees in a forest?
Given undirecred connected graph with edges of costs either x or y (where x is less than y and both are positive integers) find MST in O(V+E)
The idea involves using two DFS runs and collapsing nodes of lower weight into supernode (after first DFS run), but I'm not entirely certain. Any help is appreciated. I have seen such solution hinted in several answers, but couldn't find an explanation of it anywhere.
I think that your intuition is correct that an MST of an undirected connected graph can be found with a running time of O(V+E). There is an algorithm called Kruskal's which can compute the MST of an undirected graph in O(V+Eα(V)), α(V) is the inverse of Ackermann's function which is very slow growing. The way that Kruskal's algorithm reaches O(V+Eα(V)) is by using the union-find data structure. Union-find is a data structure that tracks elements that have been partitioned into disjoint subsets. When an element is searched (find(x)) in this data structure the tree is compressed so that each node's pointer between the root and X is switched from its parent to the root of the tree. The union(x,y) function uses find to determine if the nodes belong to the same subset compressing the trees in the process if they are separate trees then they are combined. The tree with the lower rank (height of the tree) is moved to point to the root of the larger rank tree.
Kruskal's uses the union-find data structure to check if vertices are connected yet. In general Kruskal's works by adding all the vertices into the union-find data structure then continuously adding the lowest edge assuming they are sorted in increasing order. When adding the lowest edge check if the vertices are connected, if not add that edge and perform a union between the two vertices.
In my algorithms class I've been told that a draw back of Adjacency Lists for graph representation is the O(n) look up time for iterating through the array of adjacent nodes corresponding to each node. I implement my adjacency list by using a HashMap that maps nodes to a HashSet of their adjacent nodes, wouldn't that only take O(1) look up time? Is there something I'm missing?
As you know look up for value using key in HashMap is O(1). However, in adjacency list the value of the HashMap is also a list of its adjacent nodes. The main purpose of the adjacency list is to iterate the adjacent nodes. For example: graph traversal algorithms like DFS and BFS. In your case HashSet. Suppose number of elements in HashSet is n. Then for iterating all the elements even in HashSet is O(n).
So, total complexity would be O(1)+O(n).
Where O(1)= look up in HashMap
O(n)= iterate all the elements
Generally, Adjacency List is preferable for sparse graph because it is the graph with only a few edges. It means the number of adjacent elements in each node(key of HashMap) is less. So the look up for a element wont cost more.
I implement my adjacency list by using a HashMap that maps nodes to a HashSet of their adjacent nodes, wouldn't that only take O(1) look up time? [emphasis mine]
Right — but "adjacency list" normally implies a representation as an array or a linked-list rather than a HashSet: in other words, adjacency lists are optimized for iterating over a vertex's neighbors rather than for querying if two vertices are neighbors.
It may be possible to produce more time-efficient graph representations than adjacency lists, particularly for graphs where vertices vertex often have many edges.
With a map of vertices where each vertex contains a map of neighbor vertices and/or edge objects, we can look if nodes are connected in O(1) time by indexing a vertex id and then indexing a neighbor. That's potentially a big savings over an adjacency list where we might have to loop over many edges to find specific neighbors. Furthermore, a map-of-maps data structure can allow us to store arbitrary data in edge objects. That's useful for weighted graphs and features of actions/edges
I'm trying to implement the following graph reduction algorithm in
The graph is an undirected weighted graph
I want to strip away all nodes with only two neighbors
and update the weights
Have a look at the following illustration:
Algorithm reduce graph http://public.kungi.org/graph-reduction.png
The algorithm shall transform the upper graph into the lower one. Eliminate node 2 and update the weight of the edge to: w(1-3) = w(1-2)+w(2-3)
Since I have a very large graph I'm doing this with MapReduce.
My Question is how to represent the graph in HBase. I thought about building an adjacency list structure in HBase like this:
Column families: nodes, neighbors
1 -> 2, 6, 7
...
Is there a nicer way to do this?
Adjacency lists are the most frequently recommended structure.
You could use each node ID as the row ID and neighbor IDs as column qualifiers, with the weights as values.