Looking for an algorithm to evaluate nodes of a graph - algorithm

Let's suppose I have an undirected multi-graph, i.e. a (G, E) pair, where G is a finite set of nodes and E is a finite set of edges. I am looking for an algorithm that would assign a single string value to each node under the following constraints.
1.
Every node is given a (possibly empty) set of constraints that restrict permissible values. I'd like to support at least the following types of value constraints:
min-length(x) (the value is at least the given number of characters long),
max-length(x) (the value is at most the given number of characters long),
regexp(x) (the value conforms to the given regular expression),
numeric (the value consists of digits only).
Ideally it should be possible to add support for new types of constraints in future.
2.
There are two types of edges:
different,
same,
meaning that the concerned nodes should be assigned different/same values (meaning non-equal/equal strings).
3.
Finally every node can be assigned a (possibly empty) set of constraints of the following types:
different-from(x),
equal-to(x),
meaning that the given node should be assigned a value different from or equal to the given one.
I expect the algorithm to either report an inconsistency (if no such evaluation exists) or to return any (ideally a small one, i.e. one where the assigned values consist of a small number of characters) of the evaluations that meet the criteria (otherwise).
Please note that I don't expect you to provide a detailed description of an algorithm for me. I'd be grateful for any hints you could provide to get me on the right track.

A few suggestions:
You can simplify the problem by combining all nodes connected by "same" edges into a single node. (Note that the constraints for this single node will be the union of all the individual constraints.)
The reduced problem seems very similar to graph colouring as you need to choose labels for each node such that the labels are different for connected nodes.
Unfortunately, graph colouring is NP complete so you may well struggle to get an efficient algorithm unless your number of nodes is quite small
Graph coloring is computationally hard. It is NP-complete to decide if
a given graph admits a k-coloring for a given k except for the cases k
= 1 and k = 2. In particular, it is NP-hard to compute the chromatic number. The 3-coloring problem remains NP-complete even on planar
graphs of degree 4
It may help to look at greedy colouring algorithms if you don't necessarily need a perfect solution

Related

Max size of a cyclic graph with conditional edges

We have a directed cyclic graph, with some of the edges conditioned on a binary variable, and we need to find the variable assignment that results in the largest graph size (the sum of visited node sizes).
There could be k such variables, and the same variable may reappear multiple times within the graph.
The variables are independent of each other.
What are the possible ways to solve this problem efficiently?
What does the complexity depend on?
What would be an efficient way to sample graph sizes from the space of all possible variable assignments? (with the goal of understanding the distribution)
What are known algorithms / graph theory concepts that could be related to this problem?
Attached is an example graph and the resulting decision tree that enumerates all the possibilities. The numbers represent node size. The max assignment in this case is [A=false, B=false, C=true], which includes nodes 1,2,3,5,6,7,8,9 for the total size of 41.

Finding common elements of a matrix between rows

If there is a 4*2 matrix :
A = [1,2;3,4;5,6;7,1]
I need to find the rows which has atleast an element common between those rows. For example in the above eg, the 1 and 4 rows have 1 in common. This matrix rows can be of big length. What can be the best algorithm/logic for it
I tried the following algorithm :
for(i=0;i<N;i++){
for(j=i+1;j<N;j++){
if(ipArr[i][0] == ipArr[j][0] || ipArr[i][0] == ipArr[j][1] ||
ipArr[i][1] == ipArr[j][0] || ipArr[i][1] == ipArr[j][1]){
//code to perform for repeating row, having atleast 1 common element.
}
}
}
For me the matrix has only 2 columns and it will be 2 only. It has N rows
It did not work out
I don't have a detailed algorithm for you, but I would approach this as a graph algorithm problem. Think of each row as a vertex of a graph. There is an edge between vertices if the two rows have at least one element in common. Then, if I understand your problem correctly, you are trying to find the connected components of the graph. (A connected component of a graph is a subgraph that has the property that all vertices in the subgraph are connected to each other by paths, and are not connected to any other vertices of the supergraph.)
This breaks down into two parts:
Find a way to compute whether two rows are joined by an edge, and build a graph representation based on that.
Find the connected components of the graph.
For the second part, there are standard algorithms, as discussed in this Wikipedia article. So let's turn to the first part.
One way to decide whether two rows have an element in common is to dump the elements into two set structures and check whether the intersection of the two sets is empty. Many programming languages have built-in collection data structures (usually based on hashes) to do this reasonably easily (in terms of programming effort). However, this is not going to be very efficient, particularly for large numbers of rows. But, it might be good enough for your purposes.
If time complexity is important, I would be inclined to try a slightly different method: sort each of the rows. This creates additional work at the beginning, but pays off as you compare all the rows pairwise. For instance, by comparing min and max values, you can quickly detect if two rows have disjoint ranges of values (and hence can't possibly have elements in common). Also, if the rows are sorted, you can (with some careful bookkeeping) do a coupled linear scan of both rows to search for common elements in linear time.
This solution assumes that your main purpose is to find the similarities between people as you had mentioned in the comments
Let each person(number) be a node and a row be the edge with weight 1.
Now build an undirected graph with this.
Let each node also store it's 'similarity' with every other node. This can be found by the shortest path from this node to every other node. (Requires O(n) space for each node)
Use Floyd Warshall algorithm for shortest path from one node to every other node.
If the shortest path is Inf it means there is no similarity and the minimum shortest path is maximum similarity
Time complexity: O(n^3) where n is number of people/numbers
Space complexity: O(n^2)

Looking for algorithms: Minimum cut to produce bipartite graph

Given an undirected weighted graph (or a single connected component of a larger disjoint graph) which typically will contain numerous odd and even cycles, I am searching for algorithms to remove the smallest possible number of edges necessary in order to produce one or more bipartite subgraphs. Are there any standard algorithms in the literature such as exist for minimum cut, etc.?
The problem I am trying to solve looks like this in the real world:
Presentations of about 1 hour each are given to students about different subjects in one or two time blocks. Students can sign up for at least one presentation of their choice, or two, or three (3rd choice is an alternative in case one of the others isn't going to be presented). They have to be all different choices. If there are less than three sign-ups for a given presentation, it will not be given. If there are 18 or more, it will be given twice in both blocks. I have to schedule the presentations such that the maximum number of sign-ups are satisfied.
Scheduling is trivial in the following cases:
Sign-ups for only one presentation can always be satisfied if the presentation is given (i.e. sign-ups >= 3);
Sign-ups for two given presentations are always satisfiable if at least one of them is given twice.
First, all sign-ups are aggregated to determine which ones are given once and which are given twice. If a student has signed up for a presentation with too few other sign-ups, the alternative presentation is chosen if it will also be given.
At the end of the day, I am left with an undirected weighted graph where the vertices are the presentations and the edges represent students who have signed up for that combination of presentations, each of which is only presented once. The weight corresponds to the number of sign-ups for the unique combination of presentations (thus avoiding parallel edges).
If the number of vertices, or presentations, is around 20 or less, I have come up with a brute force solution which finishes in acceptable time. However, each additional vertex will double the runtime of that solution. After 28 or so, it rapidly becomes unmanageable.
This year we had 37 presentations, thirty of which were only given once and thus ended up in the graph. What I am trying right now for larger graphs is the following:
Find all discrete components and solve each component individually;
For each component, remove leaf nodes and bridge edges recursively;
Generate a spanning tree (I am using Kruskal's algorithm which works very well), saving the removed edges;
Generate the fundamental cycle set by adding one removed edge back into the tree at a time and stripping off the rest of the tree;
Using the Gibbs-Welch algorithm, I generate the complete set of all elemental cycles starting with the fundamental set obtained in step 4;
Count the number of odd and even cycles to which each edge belongs;
Create a priority queue of edges (ordering discussed below) and remove each edge successively from its connected component until the resulting component is bipartite.
I cannot find an ordering of the priority queue for which I can prove that the result would be as acceptable as a solution obtained using the brute force method (it is probably NP-hard). However, I am trying something along these lines:
a. If the edge belongs only to odd cycles, remove it first;
b. If the edge belongs to more odd than even cycles, remove it before any other edges which belong to more even cycles than odd;
c. Edges with the smallest weight should be removed first.
If an edge belongs to both an odd and an even cycle, removing it would leave a larger odd cycle behind. That is why I am ordering them like that. Obviously, the larger the number of odd cycles to which an edge belongs, the higher the priority, but only if less even cycles are affected.
There are additional criteria which exist but need to be considered outside of the graph problem; for example, removing an edge effectively removes one of the sign-ups for one of the presentations, so an eye has to be kept on not letting the number of sign-ups get too small.
(EDIT: there is also the possibility of splitting presentations into two blocks which have almost enough sign-ups, e.g. 15-16 instead of 18. But this means that whoever is giving the presentation would have to do it twice, so it is a trade-off.)
Thanks in advance for any suggestions!
This problem is equivalent to the NP-hard weighted max cut problem, which asks for a partition of the vertices into two parts such that the maximum number of edges go between the parts.
I think the easiest way to solve a problem size such as you have would be to formulate it as a quadratic integer program and then apply an off the shelf solver. The formulation looks like
maximize (1/2) sum_{ij} w_{ij} (1 - y_i y_j)
subject to
y_i in {±1} for all i
where w_ij is the weight of the undirected edge ij if present else zero (so the corresponding variable and its constraint can be omitted).

Correctness of algorithm to calculate maximal independent set

I am trying to find the maximal set for an undirected graph and here is the algorithm that i am using to do so:
1) Select the node with minimum number of edges
2) Eliminate all it's neighbors
3) From the rest of the nodes, select the node with minimum number of edges
4) Repeat the steps until the whole graph is covered
Can someone tell me if this is right? If not, then why is this method wrong to calculate the maximal independent set in a graph?
What you have described will pick a maximal independent set. We can see this as follows:
This produces an independent set. By contradiction, suppose that it didn't. Then there would have to be two nodes connected by edges that were added into the set you produced. Take whichever one of them was picked first (call it u, let the other be v) Then when it was added to the set, you would have removed all of its neighboring nodes from the set, including node v. Then v wouldn't have been added to the set, giving a contradiction.
This produces a maximal independent set. By contradiction, suppose that it didn't. This means that there is some node v that can be added to the independent set produced by your algorithm, but was not added. Since this node wasn't added, it must have been removed from the graph by the algorithm. This means that it must have been adjacent to some node added to the set already. But this is impossible, because it would mean that the node v cannot be added to the produced independent set without making the result not an independent set. We have a contradiction.
Hope this helps!
There is not one definite maximal independent set in any graph; take for example the cycle over 3 nodes, each of the nodes forms a maximal independent set. Your algorithm will give you one of the maximal independent sets of the graph, without guaranteeing that it has maximum cardinality.On the other hand, finding the maximum independent set in a graph is NP-complete (since that problem is complementary to that of finding a maximum clique), so there probably isn't an efficient algorithm.
After your clarify situation in comments, your solutions is right.
Even better, according to Corollary 3 from this paper http://courses.engr.illinois.edu/cs598csc/sp2011/Lectures/lecture_7.pdf
your get good aproximation for subset order.
Greedy gives a 1 / (d + 1) -approximation for (unweighted) MIS in graphs of degree at most d

Is it possible to develop an algorithm to solve a graph isomorphism?

Or will I need to develop an algorithm for every unique graph? The user is given a type of graph, and they are then supposed to use the interface to add nodes and edges to an initial graph. Then they submit the graph and the algorithm is supposed to confirm whether the user's graph matches the given graph.
The algorithm needs to confirm not only the neighbours of each node, but also that each node and each edge has the correct value. The initial graphs will always have a root node, which is where the algorithm can start from.
I am wondering if I can develop the logic for such an algorithm in the general sense, or will I need to actually code a unique algorithm for each unique graph. It isn't a big deal if it's the latter case, since I only have about 20 unique graphs.
Thanks. I hope I was clear.
Graph isomorphism problem might not be hard. But it's very hard to prove this problem is not hard.
There are three possibilities for this problem.
1. Graph isomorphism problem is NP-hard.
2. Graph isomorphism problem has a polynomial time solution.
3. Graph isomorphism problem is neither NP-hard or P.
If two graphs are isomorphic, then there exist a permutation for this isomorphism. Take this permutation as a certificate, we could prove this two graphs are isomorphic to each other in polynomial time. Thus, graph isomorphism lies in the territory of NP set. However, it has been more than 30 years that no one could prove whether this problem is NP-hard or P. Thus, this problem is intrinsically hard despite its simple problem description.
If I understand the question properly, you can have ONE single algorithm, which will work by accepting one of several reference graphs as its input (in addition to the input of the unknown graph which isomorphism with the reference graph is to be asserted).
It appears that you seek to assert whether a given graph is exactly identical to another graph rather than asserting if the graphs are isomorph relative to a particular set of operations or characteristics. This implies that the algorithm be supplied some specific reference graph, rather than working off some set of "abstract" rules such as whether neither graphs have loops, or both graphs are fully connected etc. even though the graphs may differ in some other fashion.
Edit, following confirmation that:
Yeah, the algorithm would be supplied a reference graph (which is the answer), and will then check the user's graph to see if it is isomorphic (including the values of edges and nodes) to the reference
In that case, yes, it is quite possible to develop a relatively simple algorithm which would assert isomorphism of these two graphs. Note that the considerations mentioned in other remarks and answers and relative to the fact that the problem may be NP-Hard are merely indicative that a simple algorithm [or any algorithm for that matter] may not be sufficient to solve the problem in a reasonable amount of time for graphs which size and complexity are too big. However, assuming relatively small graphs and taking advantage (!) of the requirement that the weights of edges and nodes also need to match, the following algorithm should generally be applicable.
General idea:
For each sub-graph that is disconnected from the rest of the graph, identify one (or possibly several) node(s) in the user graph which must match a particular node of the reference graph. By following the paths from this node [in an orderly fashion, more on this below], assert the identity of other nodes and/or determine that there are some nodes which cannot be matched (and hence that the two structures are not isomorphic).
Rough pseudo code:
1. For both the reference and the user supplied graph, make the the list of their Connected Components i.e. the list of sub-graphs therein which are disconnected from the rest of the graph. Finding these connected components is done by following either a breadth-first or a depth-first path from starting at a given node and "marking" all nodes on that path with an arbitrary [typically incremental] element ID number. Once a given path has been fully visited, repeat the operation from any other non-marked node, and do so until there are no more non-marked nodes.
2. Build a "database" of the characteristics of each graph.
This will be useful to identify matching candidates and also to determine, early on, instances of non-isomorphism.
Each "database" would have two kinds of "records" : node and edge, with the following fields, respectively:
- node_id, Connected_element_Id, node weight, number of outgoing edges, number of incoming edges, sum of outgoing edges weights, sum of incoming edges weight.
node
- edge_id, Connected_element_Id, edge weight, node_id_of_start, node_id_of_end, weight_of_start_node, weight_of_end_node
3. Build a database of the Connected elements of each graph
Each record should have the following fields: Connected_element_id, number of nodes, number of edges, sum of node weights, sum of edge weights.
4. [optionally] Dispatch the easy cases of non-isomorphism:
4.a mismatch of the number of connected elements
4.b mismatch of of number of connected elements, grouped-by all fields but the id (number of nodes, number of edges, sum of nodes weights, sum of edges weights)
5. For each connected element in the reference graph
5.1 Identify candidates for the matching connected element in the user-supplied graph. The candidates must have the same connected element characteristics (number of nodes, number of edges, sum of nodes weights, sum of edges weights) and contain the same list of nodes and edges, again, counted by grouping by all characteristics but the id.
5.2 For each candidate, finalize its confirmation as an isomorph graph relative to the corresponding connected element in the reference graph. This is done by starting at a candidate node-match, i.e. a node, hopefully unique which has the exact same characteristics on both graphs. In case there is not such a node, one needs to disqualify each possible candidate until isomorphism can be confirmed (or all candidates are exhausted). For the candidate node match, walk the graph, in, say, breadth first, and by finding matches for the other nodes, on the basis of the direction and weight of the edges and weight of the nodes.
The main tricks with this algorithm is are to keep proper accounting of the candidates (whether candidate connected element at higher level or candidate node, at lower level), and to also remember and mark other identified items as such (and un-mark them if somehow the hypothetical candidate eventually proves to not be feasible.)
I realize the above falls short of a formal algorithm description, but that should give you an idea of what is required and possibly a starting point, would you decide to implement it.
You can remark that the requirement of matching nodes and edges weights may appear to be an added difficulty for asserting isomorphism, effectively simplify the algorithm because the underlying node/edge characteristics render these more unique and hence make it more likely that the algorithm will a) find unique node candidates and b) either quickly find other candidates on the path and/or quickly assert non-isomorphism.

Resources