How to find id of union find operation - algorithm

I am studying Union Find.
I understand how these union operations come together to make this graph but I do not understand how the ID variable is assigned. At first, I thought it was the size of each graph but this is not true because the size of the first graph is 5 and the size of the second one is 3. Any help would be appreciated.

Normally in the array ID, the index represents a node of any of the graphs and the associated value is the root of the graph that belongs to. So in the example here:
The node 0 (firs element) is associated with 6, because 0 belongs to a graph where 6 is the root.
The node 1 is also associated with 6, because 1 belongs to a graph where 6 is the root.
[...]
In the same way, 4, 5 7 are associated to 4 because these nodes belong to the graph where 4 is the root.
It's a way to quickly identify if two nodes are connected

Related

Computing level of nodes in a graph

I want to compute the level of each node in a directed graph. I'm currently applying a depth-first search algorithm on vertices that have no incoming edges. Considering the graph below, for instance:
The expected result is:
Vertex | Level
1 | 0
2 | 1
3 | 2
4 | 1
5 | 3
6 | 4
In this particular case, if we start by applying DFS on 4, then all results for vertices 4, 3, 5 and 6 are going to be wrong, since 1 has level 0. I've tried to always consider the greatest result for each one of the nodes, so in this case the results for 3, 5, and 6 are replaced when applying DFS on 1. It works, but I can't find a way to correctly compute the level of vertex 4.
I'm working only with directed acyclic graphs.
I'm not including any code here because it is a pretty straightforward DFS implementation and I'm not struggling implementation-wise.
Any hint would be much appreciated.
You can compute the levels starting from each vertex without having an incoming edge. Then you can store the maximum value for each vertex until the end. For eg :- Vertex 3 will have values 1 and 2 when traversed from starting points vertex 1 and vertex 4 respectively. At last, you can update the vertices not having the incoming edge(number on child -1). If there's a situation where there multiple children of such a vertex, then you might want to select the child with maximum number on it for replacement and then run the algorithm from that vertex again to see if changes the numbers assigned to any of the other children.

Algorithm for node assignment in graph

There are N nodes (1 ≤ N ≤ 2⋅10^5) and M (1 ≤ M ≤ 2⋅10^5) directed edges in a graph. Every node has an assigned number (an integer in the range 1...N) that we are trying to determine.
All nodes with a certain assigned number will have directed edges leading to other nodes with another certain assigned number. This also implies that if one node has multiple directed edges coming out of it, then the nodes that it leads to all have the same assigned number. We have to use this information to determine an assignment of numbers such that the number of distinct numbers among all nodes is maximized.
Because there are multiple possible answers, the output should be the assignment that minimizes the numbers assigned to nodes 1…N, in that order. Essentially the answer is the lexicographically smallest one.
Example:
In a graph of 9 nodes and 12 edges, here are the edges. For the two integers i and j on each line, there is a directed edge from i to j.
3 4
6 9
4 2
2 9
8 3
7 1
3 5
5 8
1 2
4 6
8 7
9 4
The correct assignment is that nodes 1, 4, 5 have the assigned number 1; nodes 2, 6, 8 have the assigned number 2; and nodes 3, 7, 9 have the assigned number 3. This makes sense because nodes 1, 4, 5 lead to nodes 2, 6, 8, which lead to nodes 3, 7, 9.
To solve this problem, I thought that you could create a graph with disconnected subgraphs each representing a group of nodes that have the same assigned number. To do this, I could simply scan through all the nodes, and if a node has multiple directed edges to other nodes, you should add them to your graph as a connected component. If some of the nodes were already in the graph, you could simply add edges in between the current components.
Then, for the rest of the nodes, you could find which nodes they have directed edges to, and somehow use that information to add them to your new graph.
Would this strategy work? If so, how can I properly implement the second portion of my algorithm?
EDIT 1: Earlier I interpreted the problem statement incorrectly; I have now posted the correct interpretation and my new way of approaching the problem.
EDIT 2: So once I go through all the nodes once, adding edges in the way I described above, I would determine the components for each node. Then I would iterate through the nodes again, this time making sure to add the rest of the edges into the graph recursively. For example, if a node with an assigned number has a directed edge to a node that hasn't been assigned a number, I can add that node to its designated component. I can also use Union Find to maintain the components.
While this will be fast enough, I'm worried that there may be errors - for example, when I do this recursive solution, it is possible that when a node is assigned a number, other nodes with assigned numbers that are connected to that node may not work with it. Basically, there would be a contradiction. I would have to come up with a solution for that.
For each node, print rand() % rand() + 1 and pray. With dedication, you might pass all cases.

How to count the maximum number of times any node has been visited while traveling through a tree several times?

We travel through a given tree (not binary) several times. How do we calculate the most number of times any node in the tree has been visited?
For example: in the tree:
1
/ \
2 3
/ \
4 5
Suppose we are told to travel 2 times, from 2 to 3, then 5 to 3. The travel paths will be (2->1->3 and 5->3). The maximum number of times a node has been visited is 2 (the node is 3). All travels are independent from each other. A given travel starts from a given node A and ends at B.
How to efficiently travel (if we even need to) in order to calculate that, considering that we have over 50,000 nodes and 75,000 paths to cover (like 2 to 3 and 3 to 4 in the example)?
Based on what you are saying, the answer is the amount of children that node has...
Also in your example, going off what you have said, both 1 and 3 are visited the most.
In your example each node is only going to get visited once. The only way you could get multiple visits to one node would be with a tree like:
1 3
\ /
2
Edit:: the most efficient way of traversing is if you have a perfect binary tre
4
2 6
1 3 5 7
Where the max depth is number of ((log base 2 of (number of nodes + 1)) + 1) rounded down
Why not store the travel count of each node separately?
Maintain a HashMap<Node, long> keeping track of how many times each node has been visited.
Then maintain a TreeMap<long, List<Node>> that is keyed on count and contains the list of node whose count it is representing.
This way, the TreeMap's first would contain all the nodes that have the highest count, because there can definitely be more than one node with that highest visit count.
All you now need to do is add bookkeeping code for properly updating the two maps whenever a node is visited as part of a tree traversal.
There's an XY problem here.
Your question states you want to store number of node visits. What you really want though is an efficient traversal strategy.
You have options here. Since the edges are bi-directional the best strategy IMO would be a bi-directional search.
But the search strategy itself is a toss up.
Consider a slightly more elaborate tree as
1 -> 2,3,4; 2 -> 5,6,7; 3->8,9; 4->10,11; 10 -> 12,13. If you have an efficient path from 5 to 4 It doesn't mean you can just start from there to find an efficient path from 5 to 13 because you don;t 13 comes under 4 unless you've already found an efficient path from 4 to 13.
So I would suggest memoizing your traversals in a dictionary of the form <Node Pair>: [Traversal list]
Where you start at a target node and perform a Breadth first search. and each time you visit a node, examine your memoization structure if there exists a an entry for <curnode,targetnode> in the dictionary. If there exists an entry, you are done. If not proceed to the current node's sibling or child.
CAVEAT: THIS IS UNDER THE ASSUMPTION THAT ALL NODES HAVE ONLY 1 PARENT AND CYCLES DON'T HAPPEN
I think people are miss understanding the question. He/she wants to ask which node is visited max number of times given x number of travel. So from his tree edges are {(2-1)(1-3)(3-4),(3-5)} now for example we are traveling in following paths{(1,5), (2,4), (1,3), (1,2), (1,3), (4,5)}. So this example 3 visited 5 times, 1 visited 4 times and so on.
Since it is a tree there is only one path exist from one node to another. Find paths for all combination of nodes in DP way and store it.
Then count for each visit. I know there is more efficient way for counting but cannot think of it right now.

How do I find the minimum extra amount of edges needed to complete a connection?

Let's say we have been given the number of nodes and edges, N and M respectively. And then we are given which of the nodes are connected.How do we find the minimum amount of extra edge(s) needed to complete the connection, so that you can visit every node? By finding the answer you should be able to traverse to every node, by either going directly or going through another node to get to the goal.
Example on input:
4 2 (Nodes and edges)
0 1 (node 0 and node 1 is connected)
2 3 (node 2 and node 3 is connected)
Which then should give us the answer 1, we need one extra edge to complete the connection.
All that you need is:
1) Find connected components. It can be done by dfs or bfs. In your example these components are 0, 1 and 2, 3 respectively.
2) Then you need to iterate through the all components and connect any two vertexes for every two consequtive components. In this way you connect first and second components, then second and third components and so on... In your example you can connect any of vertexes 0, 1 with any of vertexes 2, 3. For example, you can connect vertexes 0 and 2.
It's easy to see that if the total number of components is equal to C then the answer will be C - 1 additional edges.
The minimum number of connections needed in order for your graph to be connected is N-1. But this holds if there are no nodes with 0 connections.
Try and picture a path resembling the connected list design. Every node has a degree of exactly 2, except from the two ends. That way (let's suppose your connection are not directed), starting from any node, you can reach your target by simply visiting the next not already visited node.
If M>N-1 then you can search for nodes that have more connections than needed and carry on from there.
Try and count the extra connections and compare it with the minimum number needed(N-1).

Graph theory - depth first search algorithm (will need to program this at some point but just can't seem to understand it)

I have a mid-term (mock) exam in a few days and this is the only topic that I can't seem to get my head around. For example I have no idea how to do this question
How can I tell which vertices are visited in what order? I've looked online and found information about DFS, but nowhere have I seen the DFS-CC and DFS-PROC stuff we're doing.
DFS has to start somewhere, so by "the vertices of G are considered in natural order" it starts at node 1. There are two node adjacent to node 1, namely node 2 and 3. Again by the same rule it chooses to visit node 2 first. Node 2 is connected to 1, 3, 4, and 5. But 1 has already been visited so it chooses 3. Node 3 is connected to 1, 2, and 5. Both 1 and 2 are visited, so 5. And from 5 to 4.
Therefore 1, 2, 3, 5, 4.
Now all connected nodes are already visited so the process starts again with a new node. Again by "considering in natural order" that means starting at 6. The rest of the traversal follows the same pattern. I hope you get the idea now - if not ask a more specific question.
You should definitely ask you prof about the meaning of DFS-CC and DFS-PROC. I have a book called "Digraphs: Theory, Algorithms and Applications" written by Jørgen Bang-Jensen, Gregory Z. Gutin, and in it I found the following definition:
Hope this might help.
In a program, you could use a square matrix to represent which vertices are connected. The rows and the columns represent the vertices. A zero would mean not connected and a 1 would mean connected. For example in graph G, row 1 column 3 would be 1 but row 1 column 7 would be 0. For graph G, there would be symmetry in the matrix because each connection has two directions. But you could have a more complicated graph with certain paths only going in one direction.

Resources