How to merge nodes given an adjacency matrix - algorithm

I have an adjacency matrix, and I can't seem to find a quick way to combine multiple nodes to know what the final number of "super-nodes" are. I thought an easy solution was to essentially compute the sum of the upper triangular part of the adjacency matrix, and subtracting the total amount of nodes minus the previous sum would give me the answer, but it looks a bit more tricky.
Suppose:
I have 6 nodes from 1 to 6:
Nodes 1,2,3 are connected all to each other.
Nodes 4 and 6 are connected to each other,
Node 5 is not connected to anything.
At this point it seems trivial that of the 6 initial nodes I will only have 3 remaining super nodes. Now the problem by using my method before is that suppose the first super-node is connected like this:
1 to 2
2 to 3,
but not 1 to 3 directly. (even though they merge).
Here sum(upptriangle(Adj))=3 works, but for the first case I am adding a "dummy node" (the connection 1-3) outputs: sum(upptriangle(Adj))=4, and this type of connection should not affect the final result.
Is there a standard algorithm I am missing, to solve this problem (and compensate for overcomplete of non fully connected sub-graphs?), instead of iteratively going over every node to see if it's been merged?
In other words, is there a fast computation of node merging that I can get only from the adjacency matrix?

When your problem is not dynamic you can use BFS or DFS to find the components.
Using BFS or DFS to determine the connectivity in a non connected graph?
But if your problem is dynamic you should use union find data-structure.
Union-find data structure

Related

is it possible to find a spanning tree for a direct and unweighted graph?

I try to explain my problem better.
in input I have the number of nodes N and the number of edges P.
N represent the cities while P the water pipes.
So for example in input I have 4 and 2 {0,1,2,3,4} would be my graph
and the 2 represents me the already existing water pipes
since in the following lines of the file I will have the already
existing connections so since in this case I have the 2 for example in
the file I'll have 3-1 and 2-1. so at the beginning I will have a
graph with 5 vertices and an edge that goes from node 3 to node 1 and
another arc that goes from 2 to 1.
now I have to determine how many and which connections to make to
bring water to all the cities. so for example in this case a solution
could be 0-4,0-3,0-2.
NOTE 0 represents the dam basin.
I was thinking that this problem seems to me very similar to MST but the graph I have is oriented and not weighted so I cannot apply the Prim or Kruskal algorithm.
So since it doesn't make sense to apply MST maybe I could do this via a DFS or a BFS but I don't quite understand how.
I believe that if I do a dfs from the source then from node 0 and I see all the nodes that I have not been able to reach and I connect these nodes to the source I have the solution but I have not understood how to do it.
or would it be better to try to make l mst work on this type of graph then cloning it and adding weights = 1 to the arcs?
" I know I can't apply it for the type of graph I have"
How do you know this?
Unweighted. Make your graph weighted by applying a weight of 1 to every edge.
Undirected. Make your graph directed by replacing every edge with two directed edges, one going each way.

How to solve a graph theory question similar to shortest-path?

I'm looking at several problems of similar format but different difficulty. Help would be appreciated for polynomial (preferably relatively fast, but not necessarily), and even brute-force solutions of any of them.
The idea of all of the problems is that you have a weighted, undirected graph, and that an agent controls some of the nodes of the graph at the start. The agent can gain control of a node if they already control two adjacent nodes. The agent is trying to minimise the time they take to control a certain number of nodes. The problems differ on some details.
(1) You gain control of nodes in order (ie. you cannot take over multiple nodes simultaneously). The time taken to take control of a node is defined as the minimum of the edges from the two nodes used to take control of it. The goal is to take control of every single node in the graph.
(2) Again, you gain nodes in order and the goal is to take control of every single node in the graph. The time taken to take control of a node is defined as the maximum of the two nodes used to take control of it.
(3) Either (1) or (2), but with the goal of taking control of a certain number of nodes, not necessarily all of them.
(4) (3), but you can take control of multiple nodes simultaneously. Basically, say nodes 2 and 4 are being used to take over node 3 in time of 5. During that time of 5, nodes 2 and 4 cannot be used to take over a node that is not node 3. However, nodes 5 and 6 may for example be simultaneously taking over node 1.
(5) (4), but with an unweighted graph.
I started with the problem (4). I progressively made the problem easier from (4) to (3) to (2) to (1) with the hopes I could construct the solution for (4) from that. Finally, I have solved (1) but do not know how to solve any other one. My solution to (1) is this: of all candidate nodes which have two adjacent nodes that we control, simply take the one which takes the shortest amount of time. This is similar to Dijkstra's shortest path algorithm. However, this kind of solution should not solve any of the others. I believe that possibly a dynamic programming solution might work though, but I have no idea how to formulate one. I also have not found brute force solutions for any of the 4 problems. It is also possible that some of the problems are not polynomially solvable, and I would be curious to know why if that is the case.
Idea for the questions are my own, and I'm solving for my own entertainment. But I would not be surprised if it can be found elsewhere.
This isn't an answer to the problem. It is a demonstration that the greedy approach fails for problem 1.
Suppose that we have a graph with 7 nodes. We start by controlling A and B. The cost from A to B and B to C and C to D are all 1. Both E and F connect to A, B, and D with cost 10. G connects to A, B, C, and D with cost 100.
The greedy strategy that you describe will connect to E and F at cost 10 each, then D at cost 10, then C at cost 1, then G at cost 100 for a total cost of 131.
The best strategy is to connect to G at cost 100, then C and D at cost 1, then E and F at cost 10 for a total cost of 122 < 131.
And this example demonstrates that greedy is not always going to produce the right answer.
I haven't been able to come up with a reduction yet, but these problems have the flavor of NP-hard network design and maximum coverage problems, so I would be quite surprised if variants (3) through (5) were tractable.
My practical suggestion would be to apply the Biased Random-Key Genetic Algorithm framework. The linked slide deck covers the generic part (an individual is a map from nodes to numbers; at each step, we rank individuals, retain the top x% "elite" individuals as is, produce y% offspring by crossing a random elite individual with a random non-elite individual, biased toward selecting the elite chromosomes, and fill out the rest of the population with random individuals). The non-generic part is translating an individual into a solution. My recommended starting point would be to choose to explore the lowest-numbered eligible node each time.

Maximal number of vertex pairs in undirected not weighted graph

Given undirected not weighted graph with any type of connectivity, i.e. it can contain from 1 to several components with or without single nodes, each node can have 0 to many connections, cycles are allowed (but no loops from node to itself).
I need to find the maximal amount of vertex pairs assuming that each vertex can be used only once, ex. if graph has nodes 1,2,3 and node 3 is connected to nodes 1 and 2, the answer is one (1-3 or 2-3).
I am thinking about the following approach:
Remove all single nodes.
Find the edge connected a node with minimal number of edges to node with maximal number of edges (if there are several - take any of them), count and remove this pair of nodes from graph.
Repeat step 2 while graph has connected nodes.
My questions are:
Does it provide maximal number of pairs for any case? I am
worrying about some extremes, like cycles connected with some
single or several paths, etc.
Is there any faster and correct algorithm?
I can use java or python, but pseudocode or just algo description is perfectly fine.
Your approach is not guaranteed to provide the maximum number of vertex pairs even in the case of a cycle-free graph. For example, in the following graph your approach is going to select the edge (B,C). After that unfortunate choice, there are no more vertex pairs to choose from, and therefore you'll end up with a solution of size 1. Clearly, the optimal solution contains two vertex pairs, and hence your approach is not optimal.
The problem you're trying to solve is the Maximum Matching Problem (not to be confused with the Maximal Matching Problem which is trivial to solve):
Find the largest subset of edges S such that no vertex is incident to more than one edge in S.
The Blossom Algorithm solves this problem in O(EV^2).
The way the algorithm works is not straightforward and it introduces nontrivial notions (like a contracted matching, forest expansions and blossoms) to establish the optimal matching. If you just want to use the algorithm without fully understanding its intricacies you can find ready-to-use implementations of it online (such as this Python implementation).

Storing data for trees and graphs?

In most tree or graph problems i tried to solve,the input is generally the entire tree or graph structure in a node1->leafs or node1->adjacent nodes format.
Is there any list of commonly used structures to save this data in memory which later helps for the intended algorithm.For example:
Say i have a list of graph nodes like:
1 3 8 2 4.....# 1 is connected to 3 8 2 4...nodes
2 5 1 3... # 2 is connected to 5 1 3...nodes
3 1 2... #likewise
. ...
8 ......
so if i want to use the random contraction algorithm (in which i will have to contract edges say i contract 1 and 8..i use a multi-linked list structure in which each node on the adjacency list points to its corresponding row i.e.8 in the first line points to the 8th node.
Now the question,why i chose this structure to store data?
contracting is effectively making 1 and 8 one single entity,
so i read 1's adjacency list starting from 3 and go to 3rds adjacency list change 1 to 8 and next 8's row make 1 to 8 now go to 2's list change 1 to 8....and finally i append 1s list to 8 and remove duplicates..Yep,so finally 1 is deleted from graph after contracting 1 and 8
I want to know all the usually or rarely used structures for storing trees and graphs,if associated with algos the algo name as well?Thank You
One common way to store graphs is to use an n-by-n matrix, where n is the number of vertices in the graph. If you simply wanted to store the adjacency, if X is the matrix, then X[i][j] = 1 if vertex j is reachable from vertex i, and 0 otherwise. You could also store edge costs or edge capacities in this manner. The disadvantage is of course the amount of memory being used, O(n^2) instead of O(n+m) where m is the number of edges, but the advantage is O(1) lookup for every possible vertex pair.
Floyd's algorithm for solving the All Pairs Shortest Paths problem can naturally make use of such a matrix, as well as more complex sub-cubic algorithms for solving various graph paths problems that utilize faster matrix multiplication over a ring.

How to find all paths in a graph between two nodes up to a given number of intermediate nodes?

I have a huge directed graph with about a million nodes and more than ten million edges. The edges are not weighted. The graph is a small-world like graph. In fact I see that every node is (on average) connected to another node over three intermediate nodes.
Given this graph can you think of a fast algorithm that returns all paths (without cycles) between a start and a destination node, but only up to a given maximum number N of intermediate nodes (and in my case N most of the time will be between 0 and 3)?
If your graph was undirected, you would certainly want to do a bidirectional breadth first search. For length 2 paths, enumerate edges from the start node and the end node and see where they intersect. For the length 3 paths, go 2 deep from the end point with smaller degree, and one deep on the node with greater degree.
Since your graph is directed, you might want to also keep reverse edges so you can do the same trick.
Perhaps breath-first from both directions at once? Take neighbours of A, and neighbours of B. if you haven't found a link yet, add A to "neighbours of a" and B to "neighbours of B", then find any link between the two sets.
To extend it a bit further than three links, the "neighbours of A/B" lists need to contain a bit more. You will not be able to do it in-memory - you'll need a scratch table with
whatever TRANSACTION_ID; (or use an ORACLE 1-per-session temp table)
boolean MY_BFS_WAS_ROOTED_AT_A;
int NODE_ID;
int previous_node_id;
(you don't need to track depth if you check for loops in your insert statement)
you have found a path when there exists any
select from pathfinder a, pathfinder b
where a.taxn_id = foo and b.tnx_id=foo
and a.MY_BFS_WAS_ROOTED_AT_A = false
and b.MY_BFS_WAS_ROOTED_AT_A = true
and a.node_id = b.node_id
Don't forget to clean out the table when you are done! Doing it all as one transaction and rolling it back might be the easiest way.

Resources