We have a directed cyclic graph, with some of the edges conditioned on a binary variable, and we need to find the variable assignment that results in the largest graph size (the sum of visited node sizes).
There could be k such variables, and the same variable may reappear multiple times within the graph.
The variables are independent of each other.
What are the possible ways to solve this problem efficiently?
What does the complexity depend on?
What would be an efficient way to sample graph sizes from the space of all possible variable assignments? (with the goal of understanding the distribution)
What are known algorithms / graph theory concepts that could be related to this problem?
Attached is an example graph and the resulting decision tree that enumerates all the possibilities. The numbers represent node size. The max assignment in this case is [A=false, B=false, C=true], which includes nodes 1,2,3,5,6,7,8,9 for the total size of 41.
Related
Given a network of a binary tree of nodes with edge capacities c_e. There are data at the leaf nodes and each has data size s_v. L_e is the set of all leaves in the subtree below edge e. Our aim is to find subset S of leaves such that the number of data size transmitted to the root r is maximized, but for all the edges that the data passes through the capacity constraint must hold. It is assumed that c_e and s_v are non negative integers and let m be their maximum. Using dynamic programming on trees it should run in O(nm^2) time.
I have working on this for hours but haven't really come up woth a working solution. Any hints would be appreciated.
edit:
The data must be transmitted as a whole or not at all. for example if a leaf has 10 the algorithm can only take 10 or 0 at all.
for example,
v4=1, v5=3, v6=2, v7=2.
e1=(v1,r), e2=(v2,v1) and e3=(v3,v1) and so on.
assume that the capacity for e4, e5,e6 and e7 satisfied. But c1=5, c2=3 and c3=4
if we focused on finding the maximum of each subtree, we will end up taking v5 and v6+v7 which is not optimal. how to make dynamic programming rule that can tackle this problem and find the correct optimal solution?
Similar to the dynamic programming solution for subset sum...
For each node, calculate the set of attainable sums for subsets of the leaves, and for each attainable sum, remember the last contributing child and the previous sum. You can use this information to reconstruct the set that produces the sum.
While doing a postorder traversal of the tree, you can calculate this set for each node using only the information on its children.
When you get to the root, pick the maximum attainable sum and reconstruct the leaves that produce it.
Given undirected not weighted graph with any type of connectivity, i.e. it can contain from 1 to several components with or without single nodes, each node can have 0 to many connections, cycles are allowed (but no loops from node to itself).
I need to find the maximal amount of vertex pairs assuming that each vertex can be used only once, ex. if graph has nodes 1,2,3 and node 3 is connected to nodes 1 and 2, the answer is one (1-3 or 2-3).
I am thinking about the following approach:
Remove all single nodes.
Find the edge connected a node with minimal number of edges to node with maximal number of edges (if there are several - take any of them), count and remove this pair of nodes from graph.
Repeat step 2 while graph has connected nodes.
My questions are:
Does it provide maximal number of pairs for any case? I am
worrying about some extremes, like cycles connected with some
single or several paths, etc.
Is there any faster and correct algorithm?
I can use java or python, but pseudocode or just algo description is perfectly fine.
Your approach is not guaranteed to provide the maximum number of vertex pairs even in the case of a cycle-free graph. For example, in the following graph your approach is going to select the edge (B,C). After that unfortunate choice, there are no more vertex pairs to choose from, and therefore you'll end up with a solution of size 1. Clearly, the optimal solution contains two vertex pairs, and hence your approach is not optimal.
The problem you're trying to solve is the Maximum Matching Problem (not to be confused with the Maximal Matching Problem which is trivial to solve):
Find the largest subset of edges S such that no vertex is incident to more than one edge in S.
The Blossom Algorithm solves this problem in O(EV^2).
The way the algorithm works is not straightforward and it introduces nontrivial notions (like a contracted matching, forest expansions and blossoms) to establish the optimal matching. If you just want to use the algorithm without fully understanding its intricacies you can find ready-to-use implementations of it online (such as this Python implementation).
I have a set of N objects and N * N distances between them. I want to cluster this set on subsets, such that in each cluster there are all objects has the same distance and mean(cluster_size) on all clusters is maximized.
I tried to solve this task by such algorithm:
Lets enumerate all unique distances between objects.
For each unique distance X lets build graph based with objects as nodes and adjacency matrix, where edge between A and B is present if distance between objects A and B is exactly X
Lets find maximum clique in this graph. If size of this clique is bigger than current maximum - update maximum and store clique as Result
Delete objects stored in Result from set of objects
Repeat until set of objects is not empty
Is there are any more efficient [approximate] solution?
Mean(cluster size) = total number of points / number of clusters
The only way to maximize this is to minimize the number of clusters. That seems a fairly bad choice as optimization target. You may want to reconsider this objective.
Apart from that I think your algorithm is fairly sensible. As the problem likely is NP hard, you do want to use a greedy approximation.
I suggest to be more lazy in recomputing, and add some bounds.
Build subgraphs for every unique distance.
Sort subgraphs by size, descending.
Unless you have a cached value from a previous iteration, find the largest clique in each subgraph. Remember the overall largest clique. Stop if the current largest is larger than the remaining subgraphs.
Output best subgraph found.
Remove the contained nodes from all graphs, and forget those best cliques that contain any of the nodes just found. Go back to 2.
This is a homework question so I'll be glad to get a hint.
I have a graph G, where each vertex v has a weight w(v).
S(G) is the sum of weights of the all the vertexes in the graph.
I need to find an algorithm that determines if there is a group of vertexes A, when G[A] (G's graph induced by A) is a tree, that conducts S(G[A])=S(G[V\A]).
I know that i should go over all vertexes, sum their weights, and then try to find a tree that reaches half of that sum, but i'm not sure how exactly. I'm pretty sure it involves dynamic programming.
Thank you very much,
Yaron.
This is not really a dynamic programming problem, it is a search problem, the key being that you are trying to find a tree.
If you think about it, you already know an algorithm or two that will will tell you the minimum spanning tree. By the same logic, you can make a maximum spanning tree. For example, if you find the maximum spanning tree and the sum of its weights is less than 50% (or whatever the target value is), then you know the problem is impossible.
So, following this logic, you can go along as though you were making a spanning tree and reject any path that goes over the target amount. This strategy is known as "branch and bound". Let's imagine how we could do this with Kruskal's algorithm:
(1) you will have a set of trees; start with each vertex as a separate "tree"
(2) maintain a queue of edges you have not used yet, sorted from least to greatest
(3) maintain a stack of edges that you have used
(4) look for an edge that (a) connects two different trees, and (b) the sum of the two trees is less than (or equal to the target value, ie a solution)
(4a) if no such edge exists, then pop a value from the stack (remove the edge and seperate the trees) and try the next value in the queue
(4b) if such an edge does exist, then add the edge (combine two of the trees), push it onto the stack and go back to step 4
Obviously there are different ways to do the same process. For example, you could use a variant of Prim's algorithm as well.
Or will I need to develop an algorithm for every unique graph? The user is given a type of graph, and they are then supposed to use the interface to add nodes and edges to an initial graph. Then they submit the graph and the algorithm is supposed to confirm whether the user's graph matches the given graph.
The algorithm needs to confirm not only the neighbours of each node, but also that each node and each edge has the correct value. The initial graphs will always have a root node, which is where the algorithm can start from.
I am wondering if I can develop the logic for such an algorithm in the general sense, or will I need to actually code a unique algorithm for each unique graph. It isn't a big deal if it's the latter case, since I only have about 20 unique graphs.
Thanks. I hope I was clear.
Graph isomorphism problem might not be hard. But it's very hard to prove this problem is not hard.
There are three possibilities for this problem.
1. Graph isomorphism problem is NP-hard.
2. Graph isomorphism problem has a polynomial time solution.
3. Graph isomorphism problem is neither NP-hard or P.
If two graphs are isomorphic, then there exist a permutation for this isomorphism. Take this permutation as a certificate, we could prove this two graphs are isomorphic to each other in polynomial time. Thus, graph isomorphism lies in the territory of NP set. However, it has been more than 30 years that no one could prove whether this problem is NP-hard or P. Thus, this problem is intrinsically hard despite its simple problem description.
If I understand the question properly, you can have ONE single algorithm, which will work by accepting one of several reference graphs as its input (in addition to the input of the unknown graph which isomorphism with the reference graph is to be asserted).
It appears that you seek to assert whether a given graph is exactly identical to another graph rather than asserting if the graphs are isomorph relative to a particular set of operations or characteristics. This implies that the algorithm be supplied some specific reference graph, rather than working off some set of "abstract" rules such as whether neither graphs have loops, or both graphs are fully connected etc. even though the graphs may differ in some other fashion.
Edit, following confirmation that:
Yeah, the algorithm would be supplied a reference graph (which is the answer), and will then check the user's graph to see if it is isomorphic (including the values of edges and nodes) to the reference
In that case, yes, it is quite possible to develop a relatively simple algorithm which would assert isomorphism of these two graphs. Note that the considerations mentioned in other remarks and answers and relative to the fact that the problem may be NP-Hard are merely indicative that a simple algorithm [or any algorithm for that matter] may not be sufficient to solve the problem in a reasonable amount of time for graphs which size and complexity are too big. However, assuming relatively small graphs and taking advantage (!) of the requirement that the weights of edges and nodes also need to match, the following algorithm should generally be applicable.
General idea:
For each sub-graph that is disconnected from the rest of the graph, identify one (or possibly several) node(s) in the user graph which must match a particular node of the reference graph. By following the paths from this node [in an orderly fashion, more on this below], assert the identity of other nodes and/or determine that there are some nodes which cannot be matched (and hence that the two structures are not isomorphic).
Rough pseudo code:
1. For both the reference and the user supplied graph, make the the list of their Connected Components i.e. the list of sub-graphs therein which are disconnected from the rest of the graph. Finding these connected components is done by following either a breadth-first or a depth-first path from starting at a given node and "marking" all nodes on that path with an arbitrary [typically incremental] element ID number. Once a given path has been fully visited, repeat the operation from any other non-marked node, and do so until there are no more non-marked nodes.
2. Build a "database" of the characteristics of each graph.
This will be useful to identify matching candidates and also to determine, early on, instances of non-isomorphism.
Each "database" would have two kinds of "records" : node and edge, with the following fields, respectively:
- node_id, Connected_element_Id, node weight, number of outgoing edges, number of incoming edges, sum of outgoing edges weights, sum of incoming edges weight.
node
- edge_id, Connected_element_Id, edge weight, node_id_of_start, node_id_of_end, weight_of_start_node, weight_of_end_node
3. Build a database of the Connected elements of each graph
Each record should have the following fields: Connected_element_id, number of nodes, number of edges, sum of node weights, sum of edge weights.
4. [optionally] Dispatch the easy cases of non-isomorphism:
4.a mismatch of the number of connected elements
4.b mismatch of of number of connected elements, grouped-by all fields but the id (number of nodes, number of edges, sum of nodes weights, sum of edges weights)
5. For each connected element in the reference graph
5.1 Identify candidates for the matching connected element in the user-supplied graph. The candidates must have the same connected element characteristics (number of nodes, number of edges, sum of nodes weights, sum of edges weights) and contain the same list of nodes and edges, again, counted by grouping by all characteristics but the id.
5.2 For each candidate, finalize its confirmation as an isomorph graph relative to the corresponding connected element in the reference graph. This is done by starting at a candidate node-match, i.e. a node, hopefully unique which has the exact same characteristics on both graphs. In case there is not such a node, one needs to disqualify each possible candidate until isomorphism can be confirmed (or all candidates are exhausted). For the candidate node match, walk the graph, in, say, breadth first, and by finding matches for the other nodes, on the basis of the direction and weight of the edges and weight of the nodes.
The main tricks with this algorithm is are to keep proper accounting of the candidates (whether candidate connected element at higher level or candidate node, at lower level), and to also remember and mark other identified items as such (and un-mark them if somehow the hypothetical candidate eventually proves to not be feasible.)
I realize the above falls short of a formal algorithm description, but that should give you an idea of what is required and possibly a starting point, would you decide to implement it.
You can remark that the requirement of matching nodes and edges weights may appear to be an added difficulty for asserting isomorphism, effectively simplify the algorithm because the underlying node/edge characteristics render these more unique and hence make it more likely that the algorithm will a) find unique node candidates and b) either quickly find other candidates on the path and/or quickly assert non-isomorphism.