Add maximum possible edges to the graph with nodes capacity - algorithm

Problem: given N nodes, each of them has a limit for it's own degree, for example degree of the node (1) can not be higher that 10 (but can be less, of course), degree of the node (2) can not be higher that 3, etc. On these nodes build graph with maximum possible edges.
Would be happy to see any hints/recommendations.
EIDT: Graph should be simple :)

If there's no other constraint on which vertices can be connected, a greedy algorithm should work here: Connect whichever two unconnected vertices have the highest remaining degree, until no such pair exists. This can be done efficiently with an array of vertices dynamically sorted by remaining degree.

If the graph doesn't have to be simple (the question doesn't specify) then just add duplicate self loops to exhaust all but at most one available endpoint at each node. Then, pair off nodes. You will be left with at most one unused endpoint; the number of edges is trivially the sum of endpoint allowances, divided by two, rounded down.


How to connect all connected components in shortest way

Given a N*M array of 0 and 1.
A lake is a set of cells(1) which are horizontally or vertically adjacent.
We are going to connect all lakes on map by updating some cell(0) to 1.
The task is finding the way that number of updated cells is the smallest in a given time limit.
I found this similar question: What is the minimum cost to connect all the islands?
The solution on this topic get some problem:
1) It uses lib (pulp) to solve the task
2) It take time to get output
Is there an optimization solution for this problem
Thank you in advance
I think this is a tricky question but if you really draw it out and look at this matrix as a graph it makes it simpler.
Consider each cell as a node and each connection to its top/right/bottom/left to be an edge.
Start by connection the edges of the lakes to the nearby vertices. Keep doing the same for each each and only connect two vertices if it doesn't create a cycle.
At this stage carry out the same process for the immediate neighbours of the lakes. Keep doing the same and break if its creating cycles.
After this you should have a connected tree.
Once you have a connected tree you can find all the articulation point (Cut vertex) of the tree. (A vertex in an undirected connected graph is an articulation point (or cut vertex) iff removing it (and edges through it) disconnects the graph. Articulation points represent vulnerabilities in a connected network – single points whose failure would split the network into 2 or more disconnected components)
The number of cut vertex in the tree (excluding the initial lakes) would be the smallest number of cells that you need to change.
You can search there are many efficient ways to find cut vertex of a graph.
Finding articulation points takes O(V+E)
Preprocessing takes O(V+E) as it somewhat similar to a BFS.
Don't know whether you are still interested but I have an idea. What about a min cost flow algorithm.
Assume you have an m*n 2-d input array and i Islands. Create a graph where each position in the 2-d array is a node and has 4 edges to each neighbour. Each edge will be assigned a cost later on. Each edge has minimum capacity 0 and maximum capacity infinit.
Choose a random island to be the source. Create an extra node target and connect it to all other islands except the source with each new edge having maximum and minimum flow capacity 1 and cost 0.
Now assign the old edges costs, so that an edge connecting two island nodes costs nothing, but an edge between and island and a water node or an edge between two water nodes costs 1.
Calculate min cost flow over this graph. The initial graph generating can be done in nm and the min cost flow algorithm in (nm) ^3

Maximal number of vertex pairs in undirected not weighted graph

Given undirected not weighted graph with any type of connectivity, i.e. it can contain from 1 to several components with or without single nodes, each node can have 0 to many connections, cycles are allowed (but no loops from node to itself).
I need to find the maximal amount of vertex pairs assuming that each vertex can be used only once, ex. if graph has nodes 1,2,3 and node 3 is connected to nodes 1 and 2, the answer is one (1-3 or 2-3).
I am thinking about the following approach:
Remove all single nodes.
Find the edge connected a node with minimal number of edges to node with maximal number of edges (if there are several - take any of them), count and remove this pair of nodes from graph.
Repeat step 2 while graph has connected nodes.
My questions are:
Does it provide maximal number of pairs for any case? I am
worrying about some extremes, like cycles connected with some
single or several paths, etc.
Is there any faster and correct algorithm?
I can use java or python, but pseudocode or just algo description is perfectly fine.
Your approach is not guaranteed to provide the maximum number of vertex pairs even in the case of a cycle-free graph. For example, in the following graph your approach is going to select the edge (B,C). After that unfortunate choice, there are no more vertex pairs to choose from, and therefore you'll end up with a solution of size 1. Clearly, the optimal solution contains two vertex pairs, and hence your approach is not optimal.
The problem you're trying to solve is the Maximum Matching Problem (not to be confused with the Maximal Matching Problem which is trivial to solve):
Find the largest subset of edges S such that no vertex is incident to more than one edge in S.
The Blossom Algorithm solves this problem in O(EV^2).
The way the algorithm works is not straightforward and it introduces nontrivial notions (like a contracted matching, forest expansions and blossoms) to establish the optimal matching. If you just want to use the algorithm without fully understanding its intricacies you can find ready-to-use implementations of it online (such as this Python implementation).

Finding the heaviest edge in the graph that forms a cycle

Given an undirected graph, I want an algorithm (inO(|V|+|E|)) that will find me the heaviest edge in the graph that forms a cycle. For example, if my graph is as below, and I'll run DFS(A), then the heaviest edge in the graph will be BC.
(*) In this problem, I have at most 1 cycle.
I'm trying to write a modified DFS, that will return the desired heavy edge, but I'm having some trouble.
Because I have at most 1 cycle, I can save the edges in the cycle in an array, and find the maximum edge easily at the end of the run, but I think this answer seems a bit messy, and I'm sure there's a better recursive answer.
I think the easiest way to solve this is to use a union-find data structure ( in a manner similar to Kruskal's MST algorithm:
Put each vertex in its own set
Iterate through the edges in order of weight. For each edge, merge the sets of the adjacent vertices if they're not already in the same set.
Remember the last edge for which you found that its adjacent vertices were already in the same set. That's the one you're looking for.
This works because the last and heaviest edge that you visit in any cycle must already have its adjacent vertices connected by edges you visited earlier.
Use Tarjan's Strongly Connected Components algorithm.
Once you have split your graph into many strongly connected graphs assign a COMP_ID to each node which specifies the component ID to which this node belongs (This can be done with a small edit on the algorithm. Define a global integer value which starts at 1. Every time you pop nodes from the stack they all correspond to the same component, save the value of this variable to the COMP_ID of these nodes. When the pop loop ends increment the value of this integer by one).
Now, iterate over all the edges. You have 2 possibilities:
If this edge links two nodes from two different components, then this edge can't be the answer, since it can't possibly be a part of a cycle.
If this edge links two nodes from the same component, then this edge is a part of some cycle. All you have left to do now is to choose the maximum edge among all the edges of type 2.
The described approach runs in a total complexity of O(|V| + |E|) because every node and edge corresponds to at most one strongly connected component.
In the graph example you provided COMP_ID will be as follows:
COMP_ID[A] = 1
COMP_ID[B] = 2
COMP_ID[C] = 2
COMP_ID[D] = 2
Edge 10 connects COMP_ID 1 with COMP_ID 2, thus it can't be the answer. The answer is the maximum among edges {2, 5, 8} since they all connect COMP_ID 1 with it self, thus the answer is 8

how to partition the nodes of an undirected graph into k sets

I have an undirected graph G=(V,E) where each vertex represents a router in a large network. Each edge represents a network hop from one router to the other therefore, all edges have the same weight. I wish to partition this network of routers into 3 or k different sets clustered by Hop count.
The idea is to replicate some data in routers contained in each of these 3 sets. This is so that whenever a node( or client or whatever) in the network graph requests for a certain data item, I can search for it in the 3 sets and get a responsible node(one that has cached that particular data) from each set. Then I'd select the node which is at the minimum hop count away from the requesting node and continue with my algorithms and tests.
The cache distribution and request response are not in the scope of this question. I just need a way to partition the network into 3 sets so that I can perform the above operations on it.
Which clustering algorithm could be used in such a scenario. I have almost 9000 nodes in the graph and I wish to get 3 sets of ~3000 nodes each
In the graph case, a clustering method based on minimum spanning trees can be used.
The regular algorithm is the following:
Find the minimum spanning tree of the graph.
Remove the k - 1 longest edges in the spanning tree, where k is the desired number of clusters.
However, this works only if the edges differ in length (or weight). In the case of edges of equal length, every spanning tree is a minimum one so this would not work well. However, putting a little thinking into it, a different algorithm came to my mind which uses BFS.
The algorithm:
1. for i = 1..k do // for each cluster
2. choose the number of nodes N in cluster i
3. choose an arbitrary node n
4. run breadth-first search (BFS) from n until N
5. assign the first N nodes (incl. n) tapped by the BFS to the i-th cluster
6. remove these nodes (and the incident edges) from the graph
7. done
This algorithm (the results) hugely depends on how step 3, i.e. choosing the "root" node of a cluster, is implemented. For the sake of simplicity I choose an arbitrary node, but it could be more sophisticated. The best nodes are those that are the at the "end" of the graph. You could find a center of the graph (a node that has the lowest sum of lengths of paths to all other nodes) and then use the nodes that are the furthes from this center.
The real issue is that your edges are equal (if I understood your problem statement well) and you have no information about the nodes (i.e. their coordinates - then you could use e.g. k-means).

Finding maximum number k such that for all combinations of k pairs, we have k different elements in each combination

We are given N pairs. Each pair contains two numbers. We have to find maximum number K such that if we take any combination of J (1<=J<=K) pairs from the given N pairs, we have at least J different numbers in all those selected J pairs. We can have more than one pair same.
For example, consider the pairs
For this case K = 2, because for K > 2, if we select three pairs of (1,2), we have only two different numbers i.e 1 and 2.
Checking for each possible combination starting from one will take a very large amount of time. What would be an efficient algorithm for solving the problem?
Create a graph with one vertex for each number and one edge for each pair.
If this graph is a chain or a tree, we have the number of "numbers", equal to number of "pairs" plus one, After removing any number of edges from this graph, we never get less vertexes than edges.
Now add a single cycle to this chain/tree. There is equal number of vertexes and edges. After removing any number of edges from this graph, again we never get less vertexes than edges.
Now add any number of disconnected components, each should not contain more than one cycle. Once again, we never get less vertexes than edges after removing any number of edges.
Now add a second cycle to any of disconnected components. After removing all other components. at last we have more edges than vertexes (more pairs than numbers).
All this leads to the conclusion that K+1 is exactly the number of edges in the smallest possible subgraph, consisting of two cycles and, possibly, a chain, connecting these cycles.
For each connected component, find the shortest cycle going through every node with Floyd-Warshall algorithm.
Then for each non-overlapping pair of cycles (in single component), use Dijkstra’s algorithm, starting from any node with at least 3 edges in one cycle, to find shortest path to other cycle; and compute a sum of lengths of both cycles and a shortest path, connecting them. For each overlapping pair of cycles, just compute the number of their edges.
Now find the minimum length of all these subgraphs. And subtract 1.
The above algorithm computes K if there is at least one double-cycle component in the graph. If there are no such components, K = N.
Seems related to MinCut/MaxFlow. Here is a try to reduce it to MinCut/MaxFlow:
- Produce one vertex for each number
- Produce one vertex for each pair
- Produce an edge from number i to a pair if the number is present in the pair, weight 1
- Produce a source node and connect it to all numbers, weight 1 for each connection
- Produce a sink node and connect it to all numbers, weight 1 for each connection
Running MaxFlow on this should give you the number K, since any set of three pairs which only contains two numbers in total, will be "blocked" by the constrains on the outgoing edges from the number.
I am not sure whether this is the fastest solution. There might also be a matroid hidden in there somewhere, I think. In that case there is a greedy approach. But I cannot find a proof for the matroid properties of the sets you are constructing.
I made some progress on it, but not yet an efficient solution. However it may point the way.
Make a graph whose points are pairs, and connect any pair of points if they share a number. Then for any subgraph, the number of numbers in it is the number of vertices minus the number of edges. Therefore your problem is the same as locating the smallest subgraph (if any) that has more edges than vertices.
A minimal subgraph that has the same number of edges and vertices is a cycle. Therefore the graphs we're looking for are either 2 cycles that share one or more vertices, or else 2 cycles which are connected by a path. There are no other minimal types possible.
You can locate and enumerate cycles fairly easily with a breadth-first search. There may be a lot of them, but this is doable. Armed with that you can look for subgraphs of these subtypes. (Enumerate minimal cycles, look for either pairs that share points, or which are connected.) But that isn't guaranteed to be polynomial. I suspect it will be something where on average it is pretty good, but the worst case is very bad. However that may be more efficient than what you're doing now.
I keep on thinking that some kind of breadth-first search can find these in polynomial time, but I keep failing to see exactly how to do it.
This is equivalent to finding the chord that chords the smallest cycle in the graph. A very naive algorithm would be:
Check if removal of an edge results in a cycle containing the vertices corresponding to the edge. If yes, then note down the length of the smallest cycle.
