How to connect all connected components in shortest way - algorithm

Given a N*M array of 0 and 1.
A lake is a set of cells(1) which are horizontally or vertically adjacent.
We are going to connect all lakes on map by updating some cell(0) to 1.
The task is finding the way that number of updated cells is the smallest in a given time limit.
I found this similar question: What is the minimum cost to connect all the islands?
The solution on this topic get some problem:
1) It uses lib (pulp) to solve the task
2) It take time to get output
Is there an optimization solution for this problem
Thank you in advance

I think this is a tricky question but if you really draw it out and look at this matrix as a graph it makes it simpler.
Consider each cell as a node and each connection to its top/right/bottom/left to be an edge.
Start by connection the edges of the lakes to the nearby vertices. Keep doing the same for each each and only connect two vertices if it doesn't create a cycle.
At this stage carry out the same process for the immediate neighbours of the lakes. Keep doing the same and break if its creating cycles.
After this you should have a connected tree.
Once you have a connected tree you can find all the articulation point (Cut vertex) of the tree. (A vertex in an undirected connected graph is an articulation point (or cut vertex) iff removing it (and edges through it) disconnects the graph. Articulation points represent vulnerabilities in a connected network – single points whose failure would split the network into 2 or more disconnected components)
The number of cut vertex in the tree (excluding the initial lakes) would be the smallest number of cells that you need to change.
You can search there are many efficient ways to find cut vertex of a graph.
Finding articulation points takes O(V+E)
Preprocessing takes O(V+E) as it somewhat similar to a BFS.

Don't know whether you are still interested but I have an idea. What about a min cost flow algorithm.
Assume you have an m*n 2-d input array and i Islands. Create a graph where each position in the 2-d array is a node and has 4 edges to each neighbour. Each edge will be assigned a cost later on. Each edge has minimum capacity 0 and maximum capacity infinit.
Choose a random island to be the source. Create an extra node target and connect it to all other islands except the source with each new edge having maximum and minimum flow capacity 1 and cost 0.
Now assign the old edges costs, so that an edge connecting two island nodes costs nothing, but an edge between and island and a water node or an edge between two water nodes costs 1.
Calculate min cost flow over this graph. The initial graph generating can be done in nm and the min cost flow algorithm in (nm) ^3

Related

Impossible DFS of Directed graph question

Consider the following directed graph. For a given n, the vertices of the graph correspond to the
integers 1 through n. There is a directed edge from vertex i to vertex j if i divides j.
Draw the graph for n = 12. Perform a DFS of the above graph with n = 12. Record the discovery
and finish times of each vertex according to your DFS and classify all the edges of the graph into tree, back, forward, and cross edges. You can pick any start vertex (vertices) and any order of visiting the vertices.
I do not see how it is possible to traverse this graph because of the included stipulations. It is not possible to get a back edge because a dividing a smaller number by a larger number does not produce an integer and will never be valid.
Say we go by this logic and create a directed graph with the given instructions. Vertex 1 is able to travel to vertex 2, because 2 / 1 is a whole number. However, it is impossible to get to vertex 3 as vertex 2 can only travel to vertex 4, 6, 8, or 10. Since you cannot divide by a bigger number it will never be possible to visit a lower vertex once taking one of these paths and therefore not possible to reach vertex 3.
Your assumption about the back tracks is correct. You cannot have back tracks because you don't have any circles. Consider the following example:
When we start at 2 we will find the edges 2->4, 4->8, 4->12, 2->6 and 2->10. The edges 2->8 and 2->12 are forward edges, they are like shortcuts to get forward much faster. The edge 6->12 is a cross edge because you are switching from one branch to another branch. And since we have no circles to somehow get back to a previous node, we don't have any backward edges.

Add maximum possible edges to the graph with nodes capacity

Problem: given N nodes, each of them has a limit for it's own degree, for example degree of the node (1) can not be higher that 10 (but can be less, of course), degree of the node (2) can not be higher that 3, etc. On these nodes build graph with maximum possible edges.
Would be happy to see any hints/recommendations.
EIDT: Graph should be simple :)
If there's no other constraint on which vertices can be connected, a greedy algorithm should work here: Connect whichever two unconnected vertices have the highest remaining degree, until no such pair exists. This can be done efficiently with an array of vertices dynamically sorted by remaining degree.
If the graph doesn't have to be simple (the question doesn't specify) then just add duplicate self loops to exhaust all but at most one available endpoint at each node. Then, pair off nodes. You will be left with at most one unused endpoint; the number of edges is trivially the sum of endpoint allowances, divided by two, rounded down.

Finding all independent connections in graph

I have an undirected graph. Is there any efficient algorithm on how to find all independent connections between two nodes? By independent, I mean that these connections could have common nodes but cannot have common edges.
In this example, there are 2 independent connections from 0 to 8 (0-2-3-4-8 or 0-5-6-7-8). I tried using Breadth-first search algorithm continuously while "pseudo-erasing" edges which I've already seen. Problem is that I can go through graph this way: 0-5-4-8 which is not right because I can't make any other path.
Thanks for any help!
What you are interested is to solve min cut problem between a source and sink (the first of the nodes of interest for you is a source and the second is a sink).
Here you can read about the approach to this task. Basically I link to a theorem proving that the max flow between the source and the sink equals the min cut. You are interested in the min cut as this is exactly the minimum number of edges that need to be removed in order to get your source and sink disconnected.
If you run a Ford Fulkerson max flow algorithm you can reconstruct the different paths from the source to the sink considering which reverse edges have capacity after the algorithm is finished. One last note - Ford Fulkerson is classically described in directed graph. To make it work for your undirected case represent each edge as two separate directed edges facing opposite directions. All your initial capacities should be equal to 1.

Making a fully connected graph using a distance metric

Say I have a series of several thousand nodes. For each pair of nodes I have a distance metric. This distance metric could be a physical distance ( say x,y coordinates for every node ) or other things that make nodes similar.
Each node can connect to up to N other nodes, where N is small - say 6.
How can I construct a graph that is fully connected ( e.g. I can travel between any two nodes following graph edges ) while minimizing the total distance between all graph nodes.
That is I don't want a graph where the total distance for any traversal is minimized, but where for any node the total distance of all the links from that node is minimized.
I don't need an absolute minimum - as I think that is likely NP complete - but a relatively efficient method of getting a graph that is close to the true absolute minimum.
I'd suggest a greedy heuristic where you select edges until all vertices have 6 neighbors. For example, start with a minimum spanning tree. Then, for some random pairs of vertices, find a shortest path between them that uses at most one of the unselected edges (using Dijkstra's algorithm on two copies of the graph with the selected edges, connected by the unselected edges). Then select the edge that yielded in total the largest decrease of distance.
You can use a kernel to create edges only for nodes under a certain cutoff distance.
If you want non-weighted edges You could simply use a basic cutoff to start with. You add an edge between 2 points if d(v1,v2) < R
You can tweak your cutoff R to get the right average number of edges between nodes.
If you want a weighted graph, the preferred kernel is often the gaussian one, with
K(x,y) = e^(-d(x,y)^2/d_0)
with a cutoff to keep away nodes with too low values. d_0 is the parameter to tweak to get the weights that suits you best.
While looking for references, I found this blog post that I didn't about, but that seems very explanatory, with many more details : http://charlesmartin14.wordpress.com/2012/10/09/spectral-clustering/
This method is used in graph-based semi-supervised machine learning tasks, for instance in image recognition, where you tag a small part of an object, and have an efficient label propagation to identify the whole object.
You can search on google for : semi supervised learning with graph

How to detect if the given graph has a cycle containing all of its nodes? Does the suggested algorithm have any flaws?

I have a connected, non-directed, graph with N nodes and 2N-3 edges. You can consider the graph as it is built onto an existing initial graph, which has 3 nodes and 3 edges. Every node added onto the graph and has 2 connections with the existing nodes in the graph. When all nodes are added to the graph (N-3 nodes added in total), the final graph is constructed.
Originally I'm asked, what is the maximum number of nodes in this graph that can be visited exactly once (except for the initial node), i.e., what is the maximum number of nodes contained in the largest Hamiltonian path of the given graph? (Okay, saying largest Hamiltonian path is not a valid phrase, but considering the question's nature, I need to find a max. number of nodes that are visited once and the trip ends at the initial node. I thought it can be considered as a sub-graph which is Hamiltonian, and consists max. number of nodes, thus largest possible Hamiltonian path).
Since i'm not asked to find a path, I should check if a hamiltonian path exists for given number of nodes first. I know that planar graphs and cycle graphs (Cn) are hamiltonian graphs (I also know Ore's theorem for Hamiltonian graphs, but the graph I will be working on will not be a dense graph with a great probability, thus making Ore's theorem pretty much useless in my case). Therefore I need to find an algorithm for checking if the graph is cycle graph, i.e. does there exist a cycle which contains all the nodes of the given graph.
Since DFS is used for detecting cycles, I thought some minor manipulation to the DFS can help me detect what I am looking for, as in keeping track of explored nodes, and finally checking if the last node visited has a connection to the initial node. Unfortunately
I could not succeed with that approach.
Another approach I tried was excluding a node, and then try to reach to its adjacent node starting from its other adjacent node. That algorithm may not give correct results according to the chosen adjacent nodes.
I'm pretty much stuck here. Can you help me think of another algorithm to tell me if the graph is a cycle graph?
Edit
I realized by the help of the comment (thank you for it n.m.):
A cycle graph consists of a single cycle and has N edges and N vertices. If there exist a cycle which contains all the nodes of the given graph, that's a Hamiltonian cycle. – n.m.
that I am actually searching for a Hamiltonian path, which I did not intend to do so:)
On a second thought, I think checking the Hamiltonian property of the graph while building it will be more efficient, which is I'm also looking for: time efficiency.
After some thinking, I thought whatever the number of nodes will be, the graph seems to be Hamiltonian due to node addition criteria. The problem is I can't be sure and I can't prove it. Does adding nodes in that fashion, i.e. adding new nodes with 2 edges which connect the added node to the existing nodes, alter the Hamiltonian property of the graph? If it doesn't alter the Hamiltonian property, how so? If it does alter, again, how so? Thanks.
EDIT #2
I, again, realized that building the graph the way I described might alter the Hamiltonian property. Consider an input given as follows:
1 3
2 3
1 5
1 3
these input says that 4th node is connected to node 1 and node 3, 5th to node 2 and node 3 . . .
4th and 7th node are connected to the same nodes, thus lowering the maximum number of nodes that can be visited exactly once, by 1. If i detect these collisions (NOT including an input such as 3 3, which is an example that you suggested since the problem states that the newly added edges are connected to 2 other nodes) and lower the maximum number of nodes, starting from N, I believe I can get the right result.
See, I do not choose the connections, they are given to me and I have to find the max. number of nodes.
I think counting the same connections while building the graph and subtracting the number of same connections from N will give the right result? Can you confirm this or is there a flaw with this algorithm?
What we have in this problem is a connected, non-directed graph with N nodes and 2N-3 edges. Consider the graph given below,
A
/ \
B _ C
( )
D
The Graph does not have a Hamiltonian Cycle. But the Graph is constructed conforming to your rules of adding nodes. So searching for a Hamiltonian Cycle may not give you the solution. More over even if it is possible Hamiltonian Cycle detection is an NP-Complete problem with O(2N) complexity. So the approach may not be ideal.
What I suggest is to use a modified version of Floyd's Cycle Finding algorithm (Also called the Tortoise and Hare Algorithm).
The modified algorithm is,
1. Initialize a List CYC_LIST to ∅.
2. Add the root node to the list CYC_LIST and set it as unvisited.
3. Call the function Floyd() twice with the unvisited node in the list CYC_LIST for each of the two edges. Mark the node as visited.
4. Add all the previously unvisited vertices traversed by the Tortoise pointer to the list CYC_LIST.
5. Repeat steps 3 and 4 until no more unvisited nodes remains in the list.
6. If the list CYC_LIST contains N nodes, then the Graph contains a Cycle involving all the nodes.
The algorithm calls Floyd's Cycle Finding Algorithm a maximum of 2N times. Floyd's Cycle Finding algorithm takes a linear time ( O(N) ). So the complexity of the modied algorithm is O(N2) which is much better than the exponential time taken by the Hamiltonian Cycle based approach.
One possible problem with this approach is that it will detect closed paths along with cycles unless stricter checking criteria are implemented.
Reply to Edit #2
Consider the Graph given below,
A------------\
/ \ \
B _ C \
|\ /| \
| D | F
\ / /
\ / /
E------------/
According to your algorithm this graph does not have a cycle containing all the nodes.
But there is a cycle in this graph containing all the nodes.
A-B-D-C-E-F-A
So I think there is some flaw with your approach. But suppose if your algorithm is correct, it is far better than my approach. Since mine takes O(n2) time, where as yours takes just O(n).
To add some clarification to this thread: finding a Hamiltonian Cycle is NP-complete, which implies that finding a longest cycle is also NP-complete because if we can find a longest cycle in any graph, we can find the Hamiltonian cycle of the subgraph induced by the vertices that lie on that cycle. (See also for example this paper regarding the longest cycle problem)
We can't use Dirac's criterion here: Dirac only tells us minimum degree >= n/2 -> Hamiltonian Cycle, that is the implication in the opposite direction of what we would need. The other way around is definitely wrong: take a cycle over n vertices, every vertex in it has exactly degree 2, no matter the size of the circle, but it has (is) an HC. What you can tell from Dirac is that no Hamiltonian Cycle -> minimum degree < n/2, which is of no use here since we don't know whether our graph has an HC or not, so we can't use the implication (nevertheless every graph we construct according to what OP described will have a vertex of degree 2, namely the last vertex added to the graph, so for arbitrary n, we have minimum degree 2).
The problem is that you can construct both graphs of arbitrary size that have an HC and graphs of arbitrary size that do not have an HC. For the first part: if the original triangle is A,B,C and the vertices added are numbered 1 to k, then connect the 1st added vertex to A and C and the k+1-th vertex to A and the k-th vertex for all k >= 1. The cycle is A,B,C,1,2,...,k,A. For the second part, connect both vertices 1 and 2 to A and B; that graph does not have an HC.
What is also important to note is that the property of having an HC can change from one vertex to the other during construction. You can both create and destroy the HC property when you add a vertex, so you would have to check for it every time you add a vertex. A simple example: take the graph after the 1st vertex was added, and add a second vertex along with edges to the same two vertices of the triangle that the 1st vertex was connected to. This constructs from a graph with an HC a graph without an HC. The other way around: add now a 3rd vertex and connect it to 1 and 2; this builds from a graph without an HC a graph with an HC.
Storing the last known HC during construction doesn't really help you because it may change completely. You could have an HC after the 20th vertex was added, then not have one for k in [21,2000], and have one again for the 2001st vertex added. Most likely the HC you had on 23 vertices will not help you a lot.
If you want to figure out how to solve this problem efficiently, you'll have to find criteria that work for all your graphs that can be checked for efficiently. Otherwise, your problem doesn't appear to me to be simpler than the Hamiltonian Cycle problem is in the general case, so you might be able to adjust one of the algorithms used for that problem to your variant of it.
Below I have added three extra nodes (3,4,5) in the original graph and it does seem like I can keep adding new nodes indefinitely while keeping the property of Hamiltonian cycle. For the below graph the cycle would be 0-1-3-5-4-2-0
1---3---5
/ \ / \ /
0---2---4
As there were no extra restrictions about how you can add a new node with two edges, I think by construction you can have a graph that holds the property of hamiltonian cycle.

Resources