Algorithm for node assignment in graph - algorithm

There are N nodes (1 ≤ N ≤ 2⋅10^5) and M (1 ≤ M ≤ 2⋅10^5) directed edges in a graph. Every node has an assigned number (an integer in the range 1...N) that we are trying to determine.
All nodes with a certain assigned number will have directed edges leading to other nodes with another certain assigned number. This also implies that if one node has multiple directed edges coming out of it, then the nodes that it leads to all have the same assigned number. We have to use this information to determine an assignment of numbers such that the number of distinct numbers among all nodes is maximized.
Because there are multiple possible answers, the output should be the assignment that minimizes the numbers assigned to nodes 1…N, in that order. Essentially the answer is the lexicographically smallest one.
Example:
In a graph of 9 nodes and 12 edges, here are the edges. For the two integers i and j on each line, there is a directed edge from i to j.
3 4
6 9
4 2
2 9
8 3
7 1
3 5
5 8
1 2
4 6
8 7
9 4
The correct assignment is that nodes 1, 4, 5 have the assigned number 1; nodes 2, 6, 8 have the assigned number 2; and nodes 3, 7, 9 have the assigned number 3. This makes sense because nodes 1, 4, 5 lead to nodes 2, 6, 8, which lead to nodes 3, 7, 9.
To solve this problem, I thought that you could create a graph with disconnected subgraphs each representing a group of nodes that have the same assigned number. To do this, I could simply scan through all the nodes, and if a node has multiple directed edges to other nodes, you should add them to your graph as a connected component. If some of the nodes were already in the graph, you could simply add edges in between the current components.
Then, for the rest of the nodes, you could find which nodes they have directed edges to, and somehow use that information to add them to your new graph.
Would this strategy work? If so, how can I properly implement the second portion of my algorithm?
EDIT 1: Earlier I interpreted the problem statement incorrectly; I have now posted the correct interpretation and my new way of approaching the problem.
EDIT 2: So once I go through all the nodes once, adding edges in the way I described above, I would determine the components for each node. Then I would iterate through the nodes again, this time making sure to add the rest of the edges into the graph recursively. For example, if a node with an assigned number has a directed edge to a node that hasn't been assigned a number, I can add that node to its designated component. I can also use Union Find to maintain the components.
While this will be fast enough, I'm worried that there may be errors - for example, when I do this recursive solution, it is possible that when a node is assigned a number, other nodes with assigned numbers that are connected to that node may not work with it. Basically, there would be a contradiction. I would have to come up with a solution for that.

For each node, print rand() % rand() + 1 and pray. With dedication, you might pass all cases.

Related

Computing level of nodes in a graph

I want to compute the level of each node in a directed graph. I'm currently applying a depth-first search algorithm on vertices that have no incoming edges. Considering the graph below, for instance:
The expected result is:
Vertex | Level
1 | 0
2 | 1
3 | 2
4 | 1
5 | 3
6 | 4
In this particular case, if we start by applying DFS on 4, then all results for vertices 4, 3, 5 and 6 are going to be wrong, since 1 has level 0. I've tried to always consider the greatest result for each one of the nodes, so in this case the results for 3, 5, and 6 are replaced when applying DFS on 1. It works, but I can't find a way to correctly compute the level of vertex 4.
I'm working only with directed acyclic graphs.
I'm not including any code here because it is a pretty straightforward DFS implementation and I'm not struggling implementation-wise.
Any hint would be much appreciated.
You can compute the levels starting from each vertex without having an incoming edge. Then you can store the maximum value for each vertex until the end. For eg :- Vertex 3 will have values 1 and 2 when traversed from starting points vertex 1 and vertex 4 respectively. At last, you can update the vertices not having the incoming edge(number on child -1). If there's a situation where there multiple children of such a vertex, then you might want to select the child with maximum number on it for replacement and then run the algorithm from that vertex again to see if changes the numbers assigned to any of the other children.

Does Dijkstra's algorithm work with negative edges if there is no "processed" check?

Typically, in Dijkstra's algorithm, for each encountered node, we check whether that node was processed before attempting to update the distances of its neighbors and adding them to the queue. This method is under the assumption that if a distance to a node is set once then the distance to that node cannot improve for the rest of the algorithm, and so if the node was processed once already, then the distances to its neighbors cannot improve. However, this is not true for graphs with negative edges.
If there are no negatives cycles then if we remove that "processed" check, then will the algorithm always work for graphs with negative edges?
Edit: an example of a graph where the algorithm would fail would be nice
Edit 2: Java code https://pastebin.com/LSnfzBW4
Example usage:
3 3 1 <-- 3 nodes, 3 edges, starting point at node 1
1 2 5 <-- edge of node 1 and node 2 with a weight of 5 (unidirectional)
2 3 -20 <-- more edges
1 3 2
The algorithm will produce the correct answer, but since nodes can now be visited multiple times the time complexity will be exponential.
Here's an example demonstrating the exponential complexity:
w(1, 3) = 4
w(1, 2) = 100
w(2, 3) = -100
w(3, 5) = 2
w(3, 4) = 50
w(4, 5) = -50
w(5, 7) = 1
w(5, 6) = 25
w(6, 7) = -25
If the algorithm is trying to find the shortest path from node 1 to node 7, it will first reach node 3 via the edge with weight 4 and then explore the rest of the graph. Then, it will find a shorter path to node 3 by going to node 2 first, and then it will explore the rest of the graph again.
Every time the algorithm reaches one of the odd indexed nodes, it will first go to the next odd indexed node via the direct edge and explore the rest of the graph. Then it will find a shorter path to the next odd indexed node via the even indexed node and explore the rest of the graph again. This means that every time one of the odd indexed nodes is reached, the rest of the graph will be explored twice, leading to a complexity of at least O(2^(|V|/2)).
If I understand your question correctly, I don't think its possible. Without the processed check the algorithm would fall into infinite loop. For example, for a bidirected graph having two nodes i.e. a and b with one edge from "a" to "b" or "b" to "a", it will first insert node "a" inside the priority queue, then as there have an edge between "a" to "b", it will insert node "b" and pop node "a". And then as node "a" is not marked processed for node "b" it will again insert node "a" inside the priority queue and so on. Which leads to an infinite loop.
For finding shortest path in the graphs with negative edges Bellmen-ford algorithm would be the right way.
If negative edges release from start node, dijkstra's algorithm works. But in the other situation Usually it dosen't works for negative edges.

How do I find the minimum extra amount of edges needed to complete a connection?

Let's say we have been given the number of nodes and edges, N and M respectively. And then we are given which of the nodes are connected.How do we find the minimum amount of extra edge(s) needed to complete the connection, so that you can visit every node? By finding the answer you should be able to traverse to every node, by either going directly or going through another node to get to the goal.
Example on input:
4 2 (Nodes and edges)
0 1 (node 0 and node 1 is connected)
2 3 (node 2 and node 3 is connected)
Which then should give us the answer 1, we need one extra edge to complete the connection.
All that you need is:
1) Find connected components. It can be done by dfs or bfs. In your example these components are 0, 1 and 2, 3 respectively.
2) Then you need to iterate through the all components and connect any two vertexes for every two consequtive components. In this way you connect first and second components, then second and third components and so on... In your example you can connect any of vertexes 0, 1 with any of vertexes 2, 3. For example, you can connect vertexes 0 and 2.
It's easy to see that if the total number of components is equal to C then the answer will be C - 1 additional edges.
The minimum number of connections needed in order for your graph to be connected is N-1. But this holds if there are no nodes with 0 connections.
Try and picture a path resembling the connected list design. Every node has a degree of exactly 2, except from the two ends. That way (let's suppose your connection are not directed), starting from any node, you can reach your target by simply visiting the next not already visited node.
If M>N-1 then you can search for nodes that have more connections than needed and carry on from there.
Try and count the extra connections and compare it with the minimum number needed(N-1).

Calculate maximum profit under given path cost

Rooted Graph is given. Here, nodes is "home" that contains some valuable item. Entry node is given, i.e., root of the graph.
Cost is also given to move from one node to other, i.e., Egde weight.
Question -
You have to collect maximum valuable item, and total cost should not exceed with given cost.
Contraint -
1. There is no cycle.
2. We can use adjancency matrix also.(Total number of vertices is upto 1000).
Example
Edges given with their weight and values present in destination node.
0 1 10 1
0 2 10 15
1 3 50 10
1 4 30 30
Given Cost = 70.
Solution - You will collect node 1, 2, 4's items in a maximum way. [1+15+30 = 46]
My efforts
I think, this problem will solve by DP, by maintaining some state at every node. But I am not able to make some algorithm. Please help.
Edit 1
I think this question may be solved by making special graph by using original graph by ading some state into each node.
Second approach is, Dynamic programming.
I don't think you're going to find an easy solution for this problem.
Consider a graph made by just a root node connected to N leaves. Each leaf has a value of 1 and the edges have cost c1, c2, ... cN.
As you can see this graph problem has the knapsack problem as a special case.

Using disjoint-set data structure, graph theory

I'm practicing solving programming problems in free time. This problem I spotted some time ago and still don't know how to solve it:
For a given undirected graph with n vertices and m edges (both less than 2 × 106)
I need to split its vertices into as many groups as possible, but with one
condition: each pair of vertices from different groups are connected by edge.
Each vertex is in exactly one group. At the end I need to know the size of
each group.
I was proud when I came up with this solution: consider complemented graph of the original graph and use Disjoint-set data structure for it. It gives us the right answer (not difficult to prove). But it's only theoretical solution. With given constraints it's very very bad, not optimal. But I believe this approach can be somehow smartly fixed. But how?
Can anyone help?
EDIT: for a graph with vertices from 1 to 7 and 16 edges:
1 3
1 4
1 5
2 3
3 4
4 5
4 7
4 6
5 6
6 7
2 4
2 7
2 5
3 5
3 7
1 7
we have 3 groups with sizes: 1, 2 and 4.
These groups are: {4}, {5,7}, {1,2,3,6} respectively. There are edges connecting each pair of vertices from different groups and we can't create more groups.
I think the only ingredient you're missing is how to deal with sparse graphs.
Let's think about this in terms of finding the biggest possible complete graph where the only operation I can do is group a set of nodes (say v_1, ..., v_k) together and give the new supernode edges only to those nodes u that were connected to all of v_1, ..., v_k.
If your graph has fewer than n^2/4 edges, randomly sample n node pairs, noting which pairs are not joined by an edge. Union-find is an easy way to code this up. Now rebuild the graph using as groups the sets you found by this random sampling. Recurse on this reduced graph. (I'm not quite sure how to analyse this step, but I believe each sample-rebuild cycle reduces the graph size by at least a constant factor with high probability, so this whole process takes near-linear time.)
Once you have a fairly dense graph (at least n^2/4 edges), you can convert to an adjacency matrix representation and do exactly what you were suggesting --- check all node pairs, do a union whenever you see that two nodes aren't joined by an edge, and read off the sets.

Resources