Priority order in BFS (Breadth First Search Algorithm) - algorithm

Starting from the most top node i.e 1, at node 2, there will be two adjacent nodes to visit i.e 3 and 4. Which one should we put first in queue and print? Also please tell why.

By its definition BFS should always process 2 and 5 before processing 3 and 4.
In other words the order is determined by the distance from the origin.
For plain vanilla BFS it makes no difference if 2 is processed before 5 or after 5, as it makes no difference if 3 is processed before 4 or after it.
Note that this in not true for Depth First Search.

Related

Does Dijkstra's algorithm work with negative edges if there is no "processed" check?

Typically, in Dijkstra's algorithm, for each encountered node, we check whether that node was processed before attempting to update the distances of its neighbors and adding them to the queue. This method is under the assumption that if a distance to a node is set once then the distance to that node cannot improve for the rest of the algorithm, and so if the node was processed once already, then the distances to its neighbors cannot improve. However, this is not true for graphs with negative edges.
If there are no negatives cycles then if we remove that "processed" check, then will the algorithm always work for graphs with negative edges?
Edit: an example of a graph where the algorithm would fail would be nice
Edit 2: Java code https://pastebin.com/LSnfzBW4
Example usage:
3 3 1 <-- 3 nodes, 3 edges, starting point at node 1
1 2 5 <-- edge of node 1 and node 2 with a weight of 5 (unidirectional)
2 3 -20 <-- more edges
1 3 2
The algorithm will produce the correct answer, but since nodes can now be visited multiple times the time complexity will be exponential.
Here's an example demonstrating the exponential complexity:
w(1, 3) = 4
w(1, 2) = 100
w(2, 3) = -100
w(3, 5) = 2
w(3, 4) = 50
w(4, 5) = -50
w(5, 7) = 1
w(5, 6) = 25
w(6, 7) = -25
If the algorithm is trying to find the shortest path from node 1 to node 7, it will first reach node 3 via the edge with weight 4 and then explore the rest of the graph. Then, it will find a shorter path to node 3 by going to node 2 first, and then it will explore the rest of the graph again.
Every time the algorithm reaches one of the odd indexed nodes, it will first go to the next odd indexed node via the direct edge and explore the rest of the graph. Then it will find a shorter path to the next odd indexed node via the even indexed node and explore the rest of the graph again. This means that every time one of the odd indexed nodes is reached, the rest of the graph will be explored twice, leading to a complexity of at least O(2^(|V|/2)).
If I understand your question correctly, I don't think its possible. Without the processed check the algorithm would fall into infinite loop. For example, for a bidirected graph having two nodes i.e. a and b with one edge from "a" to "b" or "b" to "a", it will first insert node "a" inside the priority queue, then as there have an edge between "a" to "b", it will insert node "b" and pop node "a". And then as node "a" is not marked processed for node "b" it will again insert node "a" inside the priority queue and so on. Which leads to an infinite loop.
For finding shortest path in the graphs with negative edges Bellmen-ford algorithm would be the right way.
If negative edges release from start node, dijkstra's algorithm works. But in the other situation Usually it dosen't works for negative edges.

Algorithm for node assignment in graph

There are N nodes (1 ≤ N ≤ 2⋅10^5) and M (1 ≤ M ≤ 2⋅10^5) directed edges in a graph. Every node has an assigned number (an integer in the range 1...N) that we are trying to determine.
All nodes with a certain assigned number will have directed edges leading to other nodes with another certain assigned number. This also implies that if one node has multiple directed edges coming out of it, then the nodes that it leads to all have the same assigned number. We have to use this information to determine an assignment of numbers such that the number of distinct numbers among all nodes is maximized.
Because there are multiple possible answers, the output should be the assignment that minimizes the numbers assigned to nodes 1…N, in that order. Essentially the answer is the lexicographically smallest one.
Example:
In a graph of 9 nodes and 12 edges, here are the edges. For the two integers i and j on each line, there is a directed edge from i to j.
3 4
6 9
4 2
2 9
8 3
7 1
3 5
5 8
1 2
4 6
8 7
9 4
The correct assignment is that nodes 1, 4, 5 have the assigned number 1; nodes 2, 6, 8 have the assigned number 2; and nodes 3, 7, 9 have the assigned number 3. This makes sense because nodes 1, 4, 5 lead to nodes 2, 6, 8, which lead to nodes 3, 7, 9.
To solve this problem, I thought that you could create a graph with disconnected subgraphs each representing a group of nodes that have the same assigned number. To do this, I could simply scan through all the nodes, and if a node has multiple directed edges to other nodes, you should add them to your graph as a connected component. If some of the nodes were already in the graph, you could simply add edges in between the current components.
Then, for the rest of the nodes, you could find which nodes they have directed edges to, and somehow use that information to add them to your new graph.
Would this strategy work? If so, how can I properly implement the second portion of my algorithm?
EDIT 1: Earlier I interpreted the problem statement incorrectly; I have now posted the correct interpretation and my new way of approaching the problem.
EDIT 2: So once I go through all the nodes once, adding edges in the way I described above, I would determine the components for each node. Then I would iterate through the nodes again, this time making sure to add the rest of the edges into the graph recursively. For example, if a node with an assigned number has a directed edge to a node that hasn't been assigned a number, I can add that node to its designated component. I can also use Union Find to maintain the components.
While this will be fast enough, I'm worried that there may be errors - for example, when I do this recursive solution, it is possible that when a node is assigned a number, other nodes with assigned numbers that are connected to that node may not work with it. Basically, there would be a contradiction. I would have to come up with a solution for that.
For each node, print rand() % rand() + 1 and pray. With dedication, you might pass all cases.

How to count the maximum number of times any node has been visited while traveling through a tree several times?

We travel through a given tree (not binary) several times. How do we calculate the most number of times any node in the tree has been visited?
For example: in the tree:
1
/ \
2 3
/ \
4 5
Suppose we are told to travel 2 times, from 2 to 3, then 5 to 3. The travel paths will be (2->1->3 and 5->3). The maximum number of times a node has been visited is 2 (the node is 3). All travels are independent from each other. A given travel starts from a given node A and ends at B.
How to efficiently travel (if we even need to) in order to calculate that, considering that we have over 50,000 nodes and 75,000 paths to cover (like 2 to 3 and 3 to 4 in the example)?
Based on what you are saying, the answer is the amount of children that node has...
Also in your example, going off what you have said, both 1 and 3 are visited the most.
In your example each node is only going to get visited once. The only way you could get multiple visits to one node would be with a tree like:
1 3
\ /
2
Edit:: the most efficient way of traversing is if you have a perfect binary tre
4
2 6
1 3 5 7
Where the max depth is number of ((log base 2 of (number of nodes + 1)) + 1) rounded down
Why not store the travel count of each node separately?
Maintain a HashMap<Node, long> keeping track of how many times each node has been visited.
Then maintain a TreeMap<long, List<Node>> that is keyed on count and contains the list of node whose count it is representing.
This way, the TreeMap's first would contain all the nodes that have the highest count, because there can definitely be more than one node with that highest visit count.
All you now need to do is add bookkeeping code for properly updating the two maps whenever a node is visited as part of a tree traversal.
There's an XY problem here.
Your question states you want to store number of node visits. What you really want though is an efficient traversal strategy.
You have options here. Since the edges are bi-directional the best strategy IMO would be a bi-directional search.
But the search strategy itself is a toss up.
Consider a slightly more elaborate tree as
1 -> 2,3,4; 2 -> 5,6,7; 3->8,9; 4->10,11; 10 -> 12,13. If you have an efficient path from 5 to 4 It doesn't mean you can just start from there to find an efficient path from 5 to 13 because you don;t 13 comes under 4 unless you've already found an efficient path from 4 to 13.
So I would suggest memoizing your traversals in a dictionary of the form <Node Pair>: [Traversal list]
Where you start at a target node and perform a Breadth first search. and each time you visit a node, examine your memoization structure if there exists a an entry for <curnode,targetnode> in the dictionary. If there exists an entry, you are done. If not proceed to the current node's sibling or child.
CAVEAT: THIS IS UNDER THE ASSUMPTION THAT ALL NODES HAVE ONLY 1 PARENT AND CYCLES DON'T HAPPEN
I think people are miss understanding the question. He/she wants to ask which node is visited max number of times given x number of travel. So from his tree edges are {(2-1)(1-3)(3-4),(3-5)} now for example we are traveling in following paths{(1,5), (2,4), (1,3), (1,2), (1,3), (4,5)}. So this example 3 visited 5 times, 1 visited 4 times and so on.
Since it is a tree there is only one path exist from one node to another. Find paths for all combination of nodes in DP way and store it.
Then count for each visit. I know there is more efficient way for counting but cannot think of it right now.

How do I find the minimum extra amount of edges needed to complete a connection?

Let's say we have been given the number of nodes and edges, N and M respectively. And then we are given which of the nodes are connected.How do we find the minimum amount of extra edge(s) needed to complete the connection, so that you can visit every node? By finding the answer you should be able to traverse to every node, by either going directly or going through another node to get to the goal.
Example on input:
4 2 (Nodes and edges)
0 1 (node 0 and node 1 is connected)
2 3 (node 2 and node 3 is connected)
Which then should give us the answer 1, we need one extra edge to complete the connection.
All that you need is:
1) Find connected components. It can be done by dfs or bfs. In your example these components are 0, 1 and 2, 3 respectively.
2) Then you need to iterate through the all components and connect any two vertexes for every two consequtive components. In this way you connect first and second components, then second and third components and so on... In your example you can connect any of vertexes 0, 1 with any of vertexes 2, 3. For example, you can connect vertexes 0 and 2.
It's easy to see that if the total number of components is equal to C then the answer will be C - 1 additional edges.
The minimum number of connections needed in order for your graph to be connected is N-1. But this holds if there are no nodes with 0 connections.
Try and picture a path resembling the connected list design. Every node has a degree of exactly 2, except from the two ends. That way (let's suppose your connection are not directed), starting from any node, you can reach your target by simply visiting the next not already visited node.
If M>N-1 then you can search for nodes that have more connections than needed and carry on from there.
Try and count the extra connections and compare it with the minimum number needed(N-1).

Graph theory - depth first search algorithm (will need to program this at some point but just can't seem to understand it)

I have a mid-term (mock) exam in a few days and this is the only topic that I can't seem to get my head around. For example I have no idea how to do this question
How can I tell which vertices are visited in what order? I've looked online and found information about DFS, but nowhere have I seen the DFS-CC and DFS-PROC stuff we're doing.
DFS has to start somewhere, so by "the vertices of G are considered in natural order" it starts at node 1. There are two node adjacent to node 1, namely node 2 and 3. Again by the same rule it chooses to visit node 2 first. Node 2 is connected to 1, 3, 4, and 5. But 1 has already been visited so it chooses 3. Node 3 is connected to 1, 2, and 5. Both 1 and 2 are visited, so 5. And from 5 to 4.
Therefore 1, 2, 3, 5, 4.
Now all connected nodes are already visited so the process starts again with a new node. Again by "considering in natural order" that means starting at 6. The rest of the traversal follows the same pattern. I hope you get the idea now - if not ask a more specific question.
You should definitely ask you prof about the meaning of DFS-CC and DFS-PROC. I have a book called "Digraphs: Theory, Algorithms and Applications" written by Jørgen Bang-Jensen, Gregory Z. Gutin, and in it I found the following definition:
Hope this might help.
In a program, you could use a square matrix to represent which vertices are connected. The rows and the columns represent the vertices. A zero would mean not connected and a 1 would mean connected. For example in graph G, row 1 column 3 would be 1 but row 1 column 7 would be 0. For graph G, there would be symmetry in the matrix because each connection has two directions. But you could have a more complicated graph with certain paths only going in one direction.

Resources