How to prune this type of sorted weighted trees to maximize this particular function? - algorithm

Disclaimer #1: I'm not a pro, so many of my nomenclatures might be not standard or useful. Please bear with me / edit me.
Disclaimer #2: As the tags suggest, this may start out as a theoretical question, but I think it's a programming one, though some theory would also be nice.
First, let me describe this type of sorted weighted trees, now called SWR trees. Let T = (V, E, W, U, m, r) be an SWR tree. The only defining properties of T are:
T is a m-ary rooted tree with root r, and every leaf has the same height/level in T
T has predefined and unchanged weights on edges, defined by the function W: E -> R+ (R+ is the set of positive real numbers)
T has predefined and unchanged weights on leaves, defined by the function U: V_L -> R+ (V_L is the set of leaves in V)
For each non-leaf node v of T, its children are sorted in the increasing values of the edges connecting them to v
Now, let me describe the function on T, now called F(T). F will produce a number on T as follows:
Extend the function U to U*: V -> R+ as follows: for each non-leaf node v, assign to v the largest value of the child edges of v (the edges connecting v to its children)
For each height/level h of T, calculate f(h) as the minimum value of the vertices (defined by U*) at that height/level
Sum all of the f(h) to get F(T)
Also, let me describe the proper pruning process on T. Consider the pruning of the edges. When an edge is pruned, its sub-tree is removed. Not only that, all of its larger edges (and their sub-trees) are also removed (keep in mind, due to the sorting, only consider the larger sibling edges). Hence, the remaining tree T' is still an SWR tree and properly inherits all properties from T. Obviously, F(T') has changed (even U* and f have changed).
Therefore, the problem arises. Given an SWR tree T, how can one properly prune it to get an SWR tree T' with the maximum value of F ?
Disclaimer #3: I'm aware of the fact that the problem is like fallen from the sky and rather messy. Please feel free to reformulate it as you like. Also, just to formulate the problem itself exhausts me a bit, so I have had no handle to solve this yet.

Let's first simplify your problem definition slightly by removing the leaf weights. Now that none of the weights are negative, we can put a single child under each leaf and move each leaf's weight to its new child edge.
I can write down what seems like a pretty tight integer program that captures this problem. For each edge e, the variable x[e] is 1 if we keep the edge, 0 otherwise. The variable y[e] is 1 if e is the minimum value of the maximum sibling on its level, 0 otherwise.
maximize sum_{e} W(e) y[e]
subject to
for all e, x[e] ∈ {0, 1}
for all e, y[e] ∈ {0, 1}
for all e sibling of e' with W(e) ≤ W(e'), x[e'] − x[e] ≤ 0
for all e parent of e', x[e'] − x[e] ≤ 0
for all levels ℓ, for all e at level ℓ, for all p at level ℓ−1, y[e] + x[p] − sum_{e' child of p with W(e) ≤ W(e')} x[e] ≤ 1
for all levels ℓ, sum_{e at level ℓ} y[e] = 1
The first two constraint groups enforce the restrictions on pruning. The next constraint group says, essentially, an edge cannot be the minimum value of the maximum sibling on its level unless each sibling group on its level has an edge at least as valuable or is totally gone. The final constraint is only needed to break ties.
This formulation can be solved as is with an integer program solver, but I strongly suspect that there's a more efficient algorithm.

Related

Choosing k vertices from binary tree vertices set such that sum of cost edges in this new k vertices subset is minimum

Given a binary tree T with a weight function on its edge set w : E -> Z
(Note the weight can be negative too) and a positive integer k. For
a subset T' of V (T), cost(T') is defined as the sum of the weights of
edges (u, v) such that u, v ∈ T'
.Give an algorithm to find a subset T'
of exactly k vertices that minimizes cost(T').
How to solve this problem using dynamic programming?
It is usually easier to solve dynamic programming problems top down. They are often more efficient if solved bottom up. But I'll just get you going on the easier version.
Fill out this recursive function:
def min_cost(root, count_included, include_root):
# root is the root of the subtree
# count_included is how many nodes in the subtree to include in T'
# include_root is whether root will be in T'.
#
# It will return the minimum cost, or math.inf if no solution exists
...
And with that, your answer will be:
min_cost(root_of_tree, k, True) + min_cost(root_of_tree, k, False)
This is slow, so memoize for the top down.
I will leave the bottom up as a more difficult exercise.

why when we change the cost of every edge in G as c'= log17(c),every MST in G is still an MST in G′ (and vice versa)?

remarks:c' is logc with base 17
MST means (minimum spanning tree)
it's easy to prove the conclusion is correct when we use linear function to transform the cost of every edge.
But log function is not a linear function ,I could not understand why this conclusion is correct。
Supplementary notes:
I did not consider specific algorithms, such as the greedy algorithm. I simply consider the relationship between the sum of the weights of the two trees after transformation.
Numerically if (a + b) > (c + d) , (log a + log b) maybe not > ( logc + logd) .
If a tree generated by G has two edge a and b ,another tree generated by G has c and d,a + b < c + d and the first tree is a MST,but in transformed graph G' ,the sum of weights of edges of second tree may be smaller.
Because of this, I want to construct a counterexample based on "if (a + b)> (c + d), (log a + log b) maybe not> (logc + logd) ", but I failed.
One way to characterize when a spanning tree T is a minimum spanning tree is that, for every edge e not in T, the cycle formed by e and edges of T (the fundamental cycle of e with respect to T) has no edge more expensive than e. Using this characterization, I hope you see how to prove that transforming the costs with any increasing function preserves minimum spanning trees.
There's a one line proof that this condition is necessary. If the fundamental cycle contained a more expensive edge, we could replace it with e and get a spanning tree that costs less than T.
It's less obvious that this condition is sufficient, since at first glance it looks like we're trying to prove global optimality from a local optimality condition. To prove this statement, let T be a spanning tree that satisfies the condition, let T' be a minimum spanning tree, and let G' be the graph whose edges are the union of the edges of T and T'. Run Kruskal's algorithm on G', breaking ties by favoring edges in T over edges not in T. Let T'' be the resulting minimum spanning tree in G'. Since T' is a spanning tree in G', the cost of T'' is not greater than T', hence T'' is a minimum spanning tree in G as well as G'.
Suppose to the contrary that T'' ≠ T. Then there exists an edge in T but not in T''. Let e be the first such edge considered by Kruskal's algorithm. At the time that e was considered, it formed a cycle C in the edges that had been selected from T''. Since T is acyclic, C \ T is nonempty. By the tie breaking criterion, we know that every edge in C \ T costs less than e. Observing that some edge e' in C \ T must have one endpoint in each of the two connected components of T \ {e}, we infer that the fundamental cycle of e' with respect to T contains e, which violates the local optimality condition. In conclusion, T = T'', hence is a minimum spanning tree in G.
If you want a deeper dive, this logic gets abstracted out in the theory of matroids.
Well, its pretty easy to understand...let's see if I can break it down for you:
c` = log_17(c) // here 17 is base
log may not be linear function...but we can say that:
log_b(x) > log_b(y) if x > y and b > 1 (and of course x > 0 and y > 0)
I hope you get the equation I've written...In words in means, consider a base "b" such that b > 1, then log_b(x) would be greater than log_b(y) if x > y.
So, if we apply this rule in your costs of MST of G, then we see that the edges those were selected for G, would still produce the least possible edges to construct MST G' if c' = log_17(c) // here 17 is base.
UPDATE: As I can see you've problem understanding the proof, I'm elaborating a bit:
I guess, you know MST construction is greedy. We're going to use kruskal's algo to proof why it is correct.(In case, you don't know, how kruskal's algo works, you can read it somewhere, or just google it, you'll find millions of resources). Now, Let me write some steps of kruskal's edge selection for MST of G:
// the following edges are sorted by cost..i.e. c_0 <= c_1 <= c_2 ....
c_0: A, F // here, edge c_0 connects A, F, we've to take the edge in MST
c_1: A, B // it is also taken to construct MST
c_2: B, R // it is also taken to construct MST
c_3: A, R // we won't take it to construct to MST, cause (A, R) already connected through A -> B -> R
c_4: F, X // it is also taken to construct MST
...
...
so on...
Now, when constructing MST of G', we've to select edges which are in the form c' = log_17(c) // where 17 is base
Now, if we convert the edges using log of base 17, then c_0 becomes c_0', c_1 becomes c_1' and so on...
But we, know that:
log_b(x) > log_b(y) if x > y and b > 1 (and of course x > 0 and y > 0)
So, we may say that,
log_17(c_0) <= log_17(c_1), cause c_0 <= c_1
in general,
log_17(c_i) <= log_17(c_j), where i <= j
And now, we may say:
c_0` <= c_1` <= c_2` <= c_3` <= ....
So, the edge selection process to construct MST of G' would be:
// the following edges are sorted by cost..i.e. c_0` <= c_1` <= c_2` ....
c_0`: A, F // here, edge c_0` connects A, F, we've to take the edge in MST
c_1`: A, B // it is also taken to construct MST
c_2`: B, R // it is also taken to construct MST
c_3`: A, R // we won't take it to construct to MST, cause (A, R) already connected through A -> B -> R
c_4`: F, X // it is also taken to construct MST
...
...
so on...
Which is same as MST of G...
That proves the theorem ultimately....
I hope you get it...if not ask me in the comment what is not clear to you...

Encoding directed graph as numbers

Let's say that I have a directed graph, with a single root and without cycles. I would like to add a type on each node (for example as an integer with some custom ordering) with the following property:
if Node1.type <= Node2.type then there exists a path from Node1 to Node2
Note that topological sorting actually satisfies the reversed property:
if there exists a path from Node1 to Node2 then Node1.type <= Node2.type
so it cannot be used here.
Now note that integers with natural ordering cannot be used here because every 2 integers can be compared, i.e. the ordering of integers is linear while the tree does not have to be.
So here's an example. Assume that the graph has 4 nodes A, B, C, D and 4 arrows:
A->B, A->C, B->D, C->D
So it's a diamond. Now we can put
A.type = 00
B.type = 01
C.type = 10
D.type = 11
where on the right side we have integers in binary format. The comparison is defined bitwise:
(X <= Y) if and only if (n-th bit of X <= n-th bit of Y for all n)
So I guess such ordering could be used, the question is how to construct values from a given graph? I'm not even sure if the solution always exists. Any hints?
UPDATE: Since there is some misunderstanding about terminology I'm using let me be more explicite: I'm interested in directed acyclic graph such that there is exactly one node without predecessors (a.k.a. the root) and there's at most one arrow between any two nodes. The diamond would be an example. It does not have to have one leaf (i.e. the node without successors). Each node might have multiple predecessors and multiple successors. You might say that this is a partially ordered set with a smallest element (i.e. a unique globally minimal element).
You call the relation <=, but it's necessarily not complete (that is: it may be that for a given pair a and b, neither a <= b nor b <= a).
Here's one idea for how to define it.
If your nodes are numbered 0, 1..., N-1, then you can define type like this:
type(i) = (1 << i) + sum(1 << (N + j), for j such that Path(i, j))
And define <= like this:
type1 <= type2 if (type1 >> N) & type2 != 0
That is, type(i) encodes the value of i in the lowest N bits, and the set of all reachable nodes in the highest N bits. The <= relation looks for the target node in the encoded set of reachable nodes.
This definition works whether or not there's cycles in the graph, and in fact just encodes an arbitrary relation on your set of nodes.
You could make the definition a little more efficient by using ceil(log2(N)) bits to encode the node number (for a total of N + ceil(log2(N)) bits per type).
For any DAG, you can define x <= y as "there's a path from x to y". This relation is a partial order. I take it that the question is how to represent this relation efficiently.
For each vertex X, define ¡X to be the set of vertices reachable from X (including X itself). The two statements
¡X is a subset of ¡Y
X is reachable from Y
are equivalent.
Encode these sets as bitsets (N-bit binary numbers), and you are set.
The question said (and continues to say) that the input is a tree, but a later edit contradicted this with an example of a diamond graph. In such non-tree cases, my algorithm below won't apply.
The existing answers work for general relations on general directed graphs, which inflates their representation sizes to O(n) bits for n vertices. Since you have a tree, a shorter O(log n)-bit representation is possible.
In a tree directed away from the root, for any two vertices u and v, the sets of leaves L(u) and L(v) reachable from u and v, respectively, must either be disjoint, or one must be a subset of the other. If they are disjoint, then u is not reachable from v (and vice versa); if one is a proper subset of the other, the one with the smaller set is reachable from the other (and in this case, the one with the smaller set will necessarily have strictly greater depth). If L(u) = L(v), then u is reachable from v if and only if depth(v) < depth(u), where depth(u) is the number of edges on the path from the root to u. (In particular, if L(u) = L(v) and depth(u) = depth(v), then u = v.)
We can encode this relationship concisely by noticing that all leaves reachable from a given vertex v occupy a contiguous segment of the leaves output by an inorder traversal of the tree. For any given vertex v, this set of leaves can therefore be represented by a pair of integers (first, last), with first identifying the first leaf (in inorder traversal order) and last the last. The test for whether a path exists from i to j is then very simple -- in pseudo-C++:
bool doesPathExist(int i, int j) {
return x[i].first <= x[j].first && x[i].last >= x[j].last && depth[i] <= depth[j];
}
Note that if every non-leaf vertex in the tree has at least 2 children, then you don't need to bother with depths, since L(u) = L(v) implies u = v in this case. (My original version of the post made this assumption; I've now fixed it to work even when this is not the case.)

Path finding algorithm on graph considering both nodes and edges

I have an undirected graph. For now, assume that the graph is complete. Each node has a certain value associated with it. All edges have a positive weight.
I want to find a path between any 2 given nodes such that the sum of the values associated with the path nodes is maximum while at the same time the path length is within a given threshold value.
The solution should be "global", meaning that the path obtained should be optimal among all possible paths. I tried a linear programming approach but am not able to formulate it correctly.
Any suggestions or a different method of solving would be of great help.
Thanks!
If you looking for an algorithm in general graph, your problem is NP-Complete, Assume path length threshold is n-1, and each vertex has value 1, If you find the solution for your problem, you can say given graph has Hamiltonian path or not. In fact If your maximized vertex size path has value n, then you have a Hamiltonian path. I think you can use something like Held-Karp relaxation, for finding good solution.
This might not be perfect, but if the threshold value (T) is small enough, there's a simple algorithm that runs in O(n^3 T^2). It's a small modification of Floyd-Warshall.
d = int array with size n x n x (T + 1)
initialize all d[i][j][k] to -infty
for i in nodes:
d[i][i][0] = value[i]
for e:(u, v) in edges:
d[u][v][w(e)] = value[u] + value[v]
for t in 1 .. T
for k in nodes:
for t' in 1..t-1:
for i in nodes:
for j in nodes:
d[i][j][t] = max(d[i][j][t],
d[i][k][t'] + d[k][j][t-t'] - value[k])
The result is the pair (i, j) with the maximum d[i][j][t] for all t in 0..T
EDIT: this assumes that the paths are allowed to be not simple, they can contain cycles.
EDIT2: This also assumes that if a node appears more than once in a path, it will be counted more than once. This is apparently not what OP wanted!
Integer program (this may be a good idea or maybe not):
For each vertex v, let xv be 1 if vertex v is visited and 0 otherwise. For each arc a, let ya be the number of times arc a is used. Let s be the source and t be the destination. The objective is
maximize ∑v value(v) xv .
The constraints are
∑a value(a) ya ≤ threshold
∀v, ∑a has head v ya - ∑a has tail v ya = {-1 if v = s; 1 if v = t; 0 otherwise (conserve flow)
∀v ≠ x, xv ≤ ∑a has head v ya (must enter a vertex to visit)
∀v, xv ≤ 1 (visit each vertex at most once)
∀v ∉ {s, t}, ∀cuts S that separate vertex v from {s, t}, xv ≤ ∑a such that tail(a) ∉ S &wedge; head(a) &in; S ya (benefit only from vertices not on isolated loops).
To solve, do branch and bound with the relaxation values. Unfortunately, the last group of constraints are exponential in number, so when you're solving the relaxed dual, you'll need to generate columns. Typically for connectivity problems, this means using a min-cut algorithm repeatedly to find a cut worth enforcing. Good luck!
If you just add the weight of a node to the weights of its outgoing edges you can forget about the node weights. Then you can use any of the standard algorigthms for the shortest path problem.

find the minimum size dominating set for a tree using greedy algorithm

Dominating Set (DS) := given an undirected graph G = (V;E), a set of
vertices S V is a dominating set if for every vertex in V , there is a vertex in
S that is adjacent to v. Entire vertex set V is a trivial dominating set in
any graph.
Find minimum size dominating set for a tree.
I'll attempt to prove this in a more formal way.
OUTLINE
To prove your greedy algorithm is correct, you need to prove two things:
First, that your greedy choice is valid and can always be used in the formation of an optimal solution, and
second, that your problem has an optimal substructure property, that is, you can form an optimal solution from optimal solutions to subproblems of your own problem.
Greedy Choice: In your tree T = (V, E), find a vertex v in the tree with the highest number of leaves. Add it to your dominant set.
Optimal Substructure
T' = (V', E') such that:
V' = V \ ({a : a ϵ V, a is adjacent to v, and a's degree ≤ 2} ∪ {v})
E' = E - any edge involving any of the removed vertices
In other words
Look for a vertex with the highest number of leaves, remove any of its adjacent vertices with degree less than or equal to 2, then remove v itself, and add it to your dominant set. Repeat this until you have no vertices left.
PROOF
Greedy choice proof
For any leaf l, it must be that either itself or its parent is in the dominant set. In our case, the vertex v we would have chosen is in this situation.
Let A = {v1 , v2 , ... , vk} be a minimum dominant set of T. If A already has v as member, we are done. If it does not, we see two situations:
v has some neighbouring leaf l. Then, l must be part of the dominant set, otherwise our set is not dominating the entire tree. We can simply thus form A' = {A - {l} + {v}} and still be a dominant set. Since |A'| = |A|, A' is still optimal.
v does not have any neighbouring leaves l. Then, because v was chosen such that it has the highest number of leaves, then no vertex in T have any leaves. Then T is not a tree. Contradiction.
Thus, we will always be able to form an optimal solution with our greedy choice.
Optimal Substructure proof
Suppose that A is a minimum dominant set for T = (V, E), but that A' = A \ {v} is not a minimum dominant set for T' as defined above.
Make a minimum dominant set for T', call it B. As aforementioned, |B| < |A'|. It can be shown that B' = B ∪ {v} is a dominating set for T. Then, since |A'| = |A| - 1, |B'| = |B| + 1, we get |B'| < |A|. This is contradictory, since we assumed that A is an minimum independent set. Thus it must be that A' is also a minimum independent set of T'.
Proving B' = B ∪ {v} is a dominating set for T:
v may have had adjacent vertices adjacent not in T'. We will show that any vertices that were not considered in T' will be dominated by vertices in B' (This means that we picked our set optimally): Let y be some vertex adjacent to v and not in T'. By definition of T', y can only have degree 1 or 2. Now, y is dominated by v. If y is a leaf, then we are done. However, if y is of degree 2, then y is connected another node which is necessarily in the dominant set of B. This is because, when we removed v to make T', the degree of y became 1, meaning that y or its parent was necessarily added to the dominant set. Hence, B' is a dominant set for T.
1- Always start from leafs
2- Add their parent to DS and cut the children
3- Mark parent's of selected parent as already dominated
4- After completing process , check whether those marked nodes has a children that is not
dominated and add them to DS
Good luck

Resources