Finding the minimum number of calls on a tree - algorithm

I was asked this question in an interview and struggled to answer it correctly in the time allotted. Nonetheless, I thought it was an interesting problem, and I hadn't seen it before.
Suppose you have a tree where the root can call (on the phone) each of it's children, when a child receives the call, he can call each of his children, etc. The problem is that each call must be done in a number of rounds, and we need to minimize the number of rounds it takes to make the calls. For example, suppose you have the following tree:
A
/ \
/ \
B D
|
|
C
One solution is for A to call D in round one, A to call B in round two, and B to call C in round three. The optimal solution is for A to call B in round one, and A to call D and B to call C in round two.
Note that A cannot call both B and D in the same round, nor can any node call more than one of its children in the same round. However, multiple nodes with a different parent can call simultaneously. For example, given the tree:
A
/ | \
/ | \
B C D
/\ |
/ \ |
E F G
We can have a sequence (where - separates rounds), such as:
A B - B E, A D - B F, A C, D G
(A calls B first round, B calls E and A calls D second, ...)
I'm assuming some type of dynamic programming can be used, but I'm not sure which direction to take this in. My initial inclination is to use DFS to order the longest path from the root to leaves in decreasing order, but when it comes to the nodes actually making calls, I'm not sure how we can achieve optimality given any tree, not how we can output the paths that the optimal calls would make (i.e. in the first example we could output
A B - B C, A D

I think something like this could get the optimal solution:
suppose the value of 'calls' for each of leaves is 1
for each node get the value of calls for all of his children and rank them according to their 'calls' value
consider rank of each child as 'ranks'
to compute the value of 'calls' for each node loop over his children (after computing their ranks) and find the maximum value of 'calls' + 'ranks'
'calls' value of the root node is the answer
It's sorta dynamic programming on trees and you can implement it recursively like this:
int f(node v)
{
int s = 0;
for each u in v.children
{
d[u] = f(u)
}
sort d and rank its values in r (r for the maximum u would be 1)
for each u in v.children
{
s = max(s, d[u] + r[u] + 1)
}
return s
}
Good Luck!

Related

data structure for recombining tree: parents are permutations

I am trying to create a data structure which will describe a certain type of recombining "combinations" tree (similar to here).
First, consider a tree where each node has a particular ID which is described by the sequence needed to reach the node. As an example, take a fixed list Q = [1,2,3] and arrange the possible permutations of Q into a tree T based on the sequence diagram S:
S =
_0_
/ | \
1 2 3
/| / \ |\
2 3 1 3 1 2
| | | | | |
3 2 3 1 2 1
Then giving each node a letter, A, B, C,... the tree T can be represented by:
T =
_0_
/ | \
A B C
/| / \ |\
D E F G H I
| | | | | |
J K L M N O
where
A = {1}
B = {2}
C = {3}
D = {1,2}
E = {1,3}
F = {2,1}
G = {2,3}
H = {3,1}
I = {3,2}
J = {1,2,3}
K = {1,3,2}
L = {2,1,3}
M = {2,3,1}
N = {3,1,2}
O = {3,2,1}
Now, my goal is to come up with a data structure such that the tree that recombines in a way where nodes J and L are the same object (ie recombine), and similarly K and N recombine, and finally M and O recombine. The rule for recombining is that their parents D and F, E and H, and G and I contain the same elements, and the next element in the sequence is identical. In more detail, the rule resulting in equivalence between J and L is that their "parents" D and F are set equivalent (= {1,2}). I'm not sure how this would look for larger lists Q...
Do trees like this have a particular name? Are there any existing resources that I should look into, or any places I should start? Thanks!
It's not really useful to store this graph in any kind of data structure, since everything about it can be easily calculated without any kind of storage. You could store it in any kind of directed graph data structure if you really wanted to.
As a mathematical object, I would call it the "power set lattice" of Q, and anyone who knows the words would know what I mean, since it is the complete lattice over the power set.
A picture like the ones you are drawing is the "Hasse Diagram" of the power set: https://demonstrations.wolfram.com/HasseDiagramOfPowerSets/
You could also call it the "minimal deterministic finite automaton" for the set of permutations, but that leads to algorithms that are way more complicated than anything you will need.
Following up from the edit... I suppose it might be better to represent the various combinations at each level, that is, like:
level 0: |Q| choose 0 nodes
level 1: |Q| choose 1 nodes
level 2: |Q| choose 2 nodes
level 3: |Q| choose 3 nodes
and the various interconnections between them so that there 2^(N-1) * N such edges.

How to generate all trees having n-nodes and m-level depth? The branching factor is variable and need not be constant within the tree itself

Quite straightforward question I believe.^
For example a 5 node 3 level algorithm would eliminate
A
|
B
|
C
|
D
|
E
but not
A
/|\
B C D
\
E
which is the same as (atleast to me)
A
/|\
D C B
/
E
For example a 3-Node 2-Level algorithm would generate the following only:
(1) (2) (3)
A B C
/ \ or / \ or / \
B C A C A B
Notice that I consider the tree below to be the same as (2) above:
B
/ \
C A
The basic structure of the solution will be recursive, although the precise details vary depending on the precise equivalence classes you are assuming for the generation. Here you'll need to consider questions such as whether the nodes are labelled or not, and whether the children of a node are ordered or not. Each of these choices defines a very different but equally interesting generation problem.
The essential algorithm is to figure out an enumeration of possible sets of children, obeying the equivalence classes defined for the particular problem by producing only a canonical entry for each equivalence class. For unlabelled trees with ordered children, for example, we might start by enumerating all the ordered partitions of the node count; for a given partition, we would form the Cartesian product of all the different possible subtrees of the indicated sizes. If we dodn't care about the order of the children, we'd need to find some way of making canonical lists of children: we could first sort the subtrees by total size, but then we'd need a canonical order for two subtrees of the same size. This gets tricky, but there are solutions.
I gather that your problem does not just have a number of nodes but rather specific node labels, and therefore two trees with the same shape but different labellings are to be considered distinct. However, you want the order of children to be irrelevant to the comparison. The fact that all nodes have unique labels makes the problem quite tractable: we can enumerate non-empty subsets of the available labels because we know that the trees generated with one subset of the labels must all be distinct from the trees generated with a different subset. Since the subtrees themselves are unordered, we can canonicalise by keeping the subtree roots sorted (any ordering is as good as any other).
So we end up with the procedure GenTrees(R, D, N), where R is the root label, D is the set of descendant labels, and N is the maximum tree depth, which can be defined as follows:
If D is empty, return a singleton set with the tree consisting only of the indicated root node.
(For efficiency) If N is 1, return a singleton set with the tree consisting of the indicated root node with the remaining nodes as leaves. (The less efficient step is "If N is 0, return the empty set." But it's better to shortcut by checking for N == 1.)
Otherwise, start with an empty set of possible trees, and
For every non-empty subset S of D (these are the children of the root node):
For every ordered partition P of D - S into |S| subsets (these are the remaining descendants of each subtree):
Add to the return set all trees with root R whose children are in GenTrees(Si, Pi, N-1). (This is a cartesian product. Here, the ordering of S is arbitrary but must be consistent; sorting the labels into ascending order would be an obvious solution.)
By way of testing the above a little, I wrote a sample implementation in Python. It makes no attempt to be efficient, and is basically a transcription of the above using python generators. (The advantage of generators is that we don't have to fill memory with the millions of possible trees; we just generate one at a time.)
In case it is clearer than the verbiage, here it is:
from functools import reduce
from itertools import product
class tree(object):
def __init__(self, root, kids=()):
self.root = root
self.kids = tuple(kids)
def __str__(self):
'''Produce an S-expr for the tree.'''
if self.kids:
return "(%s %s)" % (self.root,
' '.join(str(kid) for kid in self.kids))
else:
return self.root
def append(part, wherewhat):
part[wherewhat[0]].append(wherewhat[1])
return part
def gentrees(root, labels, N):
if not labels or N <= 1:
yield tree(root, labels if N else ())
else:
for p in product((0,1), repeat = len(labels)):
if any(p):
nexts, roots = reduce(append, zip(p, labels), ([], []))
for q in product(range(len(roots)), repeat = len(nexts)):
kidsets = reduce(append, zip(q, nexts), tuple([] for i in roots))
yield from (tree(root, kids)
for kids in product(*(gentrees(root, rest, N-1)
for root, rest in zip(roots, kidsets))))
def alltrees(labels, N):
for i, root in enumerate(labels):
yield from gentrees(root, labels[:i] + labels[i+1:], N)
The last function iterates over all possible roots and calls gentrees with each one. Here's a sample output, for three nodes and maximum depth 2 (this function counts depth in links, not nodes):
>>> print('\n'.join(map(str,alltrees("ABC",2))))
(A (C B))
(A (B C))
(A B C)
(B (C A))
(B (A C))
(B A C)
(C (B A))
(C (A B))
(C A B)
Worth noting that the count of generated trees increases exponentially with the number of node labels. Unless you are limiting yourself to very small trees, you should probably make some attempt to prune the generation recursion whenever possible (top-down or bottom-up or both).

Do we have to create a tree all the nodes of which have 3 children?

Steps to build Huffman Tree
Input is array of unique characters along with their frequency of occurrences and output is Huffman Tree.
Create a leaf node for each unique character and build a min heap of all leaf nodes (Min Heap is used as a priority queue. The value of frequency field is used to compare two nodes in min heap. Initially, the least frequent character is at root)
Extract two nodes with the minimum frequency from the min heap.
Create a new internal node with frequency equal to the sum of the two nodes frequencies. Make the first extracted node as its left child and the other extracted node as its right child. Add this node to the min heap.
Repeat steps#2 and #3 until the heap contains only one node. The remaining node is the root node and the tree is complete.
At a heap, a node can have at most 2 children, right?
So if we would like to generalize the Huffman algorithm for coded words in ternary system (i.e. coded words using the symbols 0 , 1 and 2 ) what could we do? Do we have to create a tree all the nodes of which have 3 children?
EDIT:
I think that it would be as follows.
Steps to build Huffman Tree
Input is array of unique characters along with their frequency of occurrences and output is Huffman Tree.
Create a leaf node for each unique character and build a min heap of all leaf nodes
Extract three nodes with the minimum frequency from the min heap.
Create a new internal node with frequency equal to the sum of the three nodes frequencies. Make the first extracted node as its left child, the second extracted node as its middle child and the third extracted node as its right child. Add this node to the min heap.
Repeat steps#2 and #3 until the heap contains only one node. The remaining node is the root node and the tree is complete.
How can we prove that the algorithm yields optimal ternary codes?
EDIT 2: Suppose that we have the frequencies 5,9,12,13,16,45.
Their number is even, so we add a dummy node with frequency 0. Do we put this at the end of the array? So, will it be as follows?
Then will we have the following heap?
Then:
Then:
Or have I understood it wrong?
Yes! you have to create all nodes with 3 children. Why 3? you can also have n-ary huffman coding using nodes with n child. The tree will look something like this-(for n=3)
*
/ | \
* * *
/|\
* * *
Huffman Algorithm for Ternary Codewords
I am giving the algorithms for easy reference.
HUFFMAN_TERNARY(C)
{
IF |C|=EVEN
THEN ADD DUMMY CHARACTER Z WITH FREQUENCY 0.
N=|C|
Q=C; //WE ARE BASICALLY HEAPIFYING THE CHARACTERS
FOR I=1 TO floor(N/2)
{
ALLOCATE NEW_NODE;
LEFT[NEW_NODE]= U= EXTRACT_MIN(Q)
MID[NEW_NODE] = V= EXTRACT_MIN(Q)
RIGHT[NEW_NODE]=W= EXTRACT_MIN(Q)
F[NEW_NODE]=F[U]+F[V]+F[W];
INSERT(Q,NEW_NODE);
}
RETURN EXTRACT_MIN(Q);
} //END-OF-ALGO
Why are we adding extra nodes? To make the number of nodes odd.(Why?) Because we want to get out of the for loop with just one node in Q.
Why floor(N/2)?
At first we take 3 nodes. Then replace with it 1 node.There are N-2 nodes.
After that we always take 3 nodes (if not available 1 node,it is never possible to get 2 nodes because of the dummy node) and replace with 1. In each iteration we are reducing it by 2 nodes. So that's why we are using the term floor(N/2).
Check it yourself in paper using some sample character set. You will understand.
CORRECTNESS
I am taking here reference from "Introduction to Algorithms" by Cormen, Rivest.
Proof: The step by step mathematical proof is too long to post here but it is quite similar to the proof given in the book.
Idea
Any optimal tree has the lowest three frequencies at the lowest level.(We have to prove this).(using contradiction) Suppose it is not the case then we could switch a leaf with a higher frequency from the lowest level with one of the lowest three leaves and obtain a lower average length. Without any loss of generality, we can assume that all the three lowest frequencies are the children of the same node. if they are at the same level, the average length does not change irrespective of where the frequencies are). They only differ in the last digit of their codeword (one will be 0,1 or 2).
Again as the binary codewords we have to contract the three nodes and make a new character out of it having frequency=total of three node's(character's) frequencies. Like binary Huffman codes, we see that the cost of the optimal tree is the sum of the tree
with the three symbols contracted and the eliminated sub-tree which had the nodes before contraction. Since it has been proved that the sub-tree has
to be present in the final optimal tree, we can optimize on the tree with the contracted newly created node.
Example
Suppose the character set contains frequencies 5,9,12,13,16,45.
Now N=6-> even. So add dummy character with freq=0
N=7 now and freq in C are 0,5,9,12,13,16,45
Now using min priority queue get 3 values. 0 then 5 then 9.
Add them insert new char with freq=0+9+5 in priority queue. This way continue.
The tree will be like this
100
/ | \
/ | \
/ | \
39 16 45 step-3
/ | \
14 12 13 step-2
/ | \
0 5 9 step-1
Finally Prove it
I will now go to straight forward mimic of the proof of Cormen.
Lemma 1. Let C be an alphabet in which each character d belonging to C has frequency c.freq. Let
x ,y and z be three characters in C having the lowest frequencies. Then there exists
an optimal prefix code for C in which the codewords for x ,y and z have the same
length and differ only in the last bit.
Proof:
Idea
First consider any tree T generating arbitrary optimal prefix code.
Then we will modify it to make a tree representing another optimal prefix such that the characters x,y,z appears as sibling nodes at the maximum depth.
If we can construct such a tree then the codewords for x,y and z will have the same length and differ only in the last bit.
Proof--
Let a,b,c be three characters that are sibling leaves of maximum depth in T .
Without loss of generality, we assume that a.freq < b:freq < c.freq and x.freq < y.freq < z.freq.
Since x.freq and y.freq and z.freq are the 3 lowest leaf frequencies, in order (means there are no frequencies between them) and a.freq
, b.freq and c.freq are two arbitrary frequencies, in order, we have x.freq < a:freq and
y.freq < b.freq and z.freq< c.freq.
In the remainder of the proof we can have x.freq=a.freq or y.freq=b.freq or z.freq=c.freq.
But if x.freq=b.freq or x.freq=c.freq
or y.freq=c.freq
then all of them are same. WHY??
Let's see. Suppose x!=y,y!=z,x!=z but z=c and x<y<z in order and aa<b<c.
Also x!=a. --> x<a
y!=b. --> y<b
z!=c. --> z<c but z=c is given. This contradicts our assumption. (Thus proves).
The lemma would be trivially true. Thus we will assume
x!=b and x!=c.
T1
* |
/ | \ |
* * x +---d(x)
/ | \ |
y * z +---d(y) or d(z)
/|\ |
a b c +---d(a) or d(b) or d(c) actually d(a)=d(b)=d(c)
T2
*
/ | \
* * a
/ | \
y * z
/|\
x b c
T3
*
/ | \
* * x
/ | \
b * z
/|\
x y c
T4
*
/ | \
* * a
/ | \
b * c
/|\
x y z
In case of T1 costt1= x.freq*d(x)+ cost_of_other_nodes + y.freq*d(y) + z.freq*d(z) + d(a)*a.freq + b.freq*d(b) + c.freq*d(c)
In case of T2 costt2= x.freq*d(a)+ cost_of_other_nodes + y.freq*d(y) + z.freq*d(z) + d(x)*a.freq + b.freq*d(b) + c.freq*d(c)
costt1-costt2= x.freq*[d(x)-d(a)]+0 + 0 + 0 + a.freq[d(a)-d(x)]+0 + 0
= (a.freq-x.freq)*(d(a)-d(x))
>= 0
So costt1>=costt2. --->(1)
Similarly we can show costt2 >= costt3--->(2)
And costt3 >= costt4--->(3)
From (1),(2) and (3) we get
costt1>=costt4.-->(4)
But T1 is optimal.
So costt1<=costt4 -->(5)
From (4) and (5) we get costt1=costt2.
SO, T4 is an optimal tree in which x,y,and z appears as sibling leaves at maximum depth, from which the lemma follows.
Lemma-2
Let C be a given alphabet with frequency c.freq defined for each character c belonging to C.
Let x , y, z be three characters in C with minimum frequency. Let C1 be the
alphabet C with the characters x and y removed and a new character z1 added,
so that C1 = C - {x,y,z} union {z1}. Define f for C1 as for C, except that
z1.freq=x.freq+y.freq+z.freq. Let T1 be any tree representing an optimal prefix code
for the alphabet C1. Then the tree T , obtained from T1 by replacing the leaf node
for z with an internal node having x , y and z as children, represents an optimal prefix
code for the alphabet C.
Proof.:
Look we are making a transition from T1-> T.
So we must find a way to express the T i.e, costt in terms of costt1.
* *
/ | \ / | \
* * * * * *
/ | \ / | \
* * * ----> * z1 *
/|\
x y z
For c belonging to (C-{x,y,z}), dT(c)=dT1(c). [depth corresponding to T and T1 tree]
Hence c.freq*dT(c)=c.freq*dT1(c).
Since dT(x)=dT(y)=dT(z)=dT1(z1)+1
So we have `x.freq*dT(x)+y.freq*dT(y)+z.freq*dT(z)=(x.freq+y.freq+z.freq)(dT1(z)+1)`
= `z1.freq*dT1(z1)+x.freq+y.freq+z.freq`
Adding both side the cost of other nodes which is same in both T and T1.
x.freq*dT(x)+y.freq*dT(y)+z.freq*dT(z)+cost_of_other_nodes= z1.freq*dT1(z1)+x.freq+y.freq+z.freq+cost_of_other_nodes
So costt=costt1+x.freq+y.freq+z.freq
or equivalently
costt1=costt-x.freq-y.freq-z.freq ---->(1)
Now we prove the lemma by contradiction.
We now prove the lemma by contradiction. Suppose that T does not represent
an optimal prefix code for C. Then there exists an optimal tree T2 such that
costt2 < costt. Without loss of generality (by Lemma 1), T2 has x and y and z as
siblings.
Let T3 be the tree T2 with the common parent of x and y and z replaced by a
leaf z1 with frequency z1.freq=x.freq+y.freq+z.freq Then
costt3 = costt2-x.freq-y.freq-z.freq
< costt-x.freq-y.freq-z.freq
= costt1 (From 1)
yielding a contradiction to the assumption that T1 represents an optimal prefix code
for C1. Thus, T must represent an optimal prefix code for the alphabet C.
-Proved.
Procedure HUFFMAN produces an optimal prefix code.
Proof: Immediate from Lemmas 1 and 2.
NOTE.: Terminologies are from Introduction to Algorithms 3rd edition Cormen Rivest

minimum weight vertex cover of a tree

There's an existing question dealing with trees where the weight of a vertex is its degree, but I'm interested in the case where the vertices can have arbitrary weights.
This isn't homework but it is one of the questions in the algorithm design manual, which I'm currently reading; an answer set gives the solution as
Perform a DFS, at each step update Score[v][include], where v is a vertex and include is either true or false;
If v is a leaf, set Score[v][false] = 0, Score[v][true] = wv, where wv is the weight of vertex v.
During DFS, when moving up from the last child of the node v, update Score[v][include]:
Score[v][false] = Sum for c in children(v) of Score[c][true] and Score[v][true] = wv + Sum for c in children(v) of min(Score[c][true]; Score[c][false])
Extract actual cover by backtracking Score.
However, I can't actually translate that into something that works. (In response to the comment: what I've tried so far is drawing some smallish graphs with weights and running through the algorithm on paper, up until step four, where the "extract actual cover" part is not transparent.)
In response Ali's answer: So suppose I have this graph, with the vertices given by A etc. and the weights in parens after:
A(9)---B(3)---C(2)
\ \
E(1) D(4)
The right answer is clearly {B,E}.
Going through this algorithm, we'd set values like so:
score[D][false] = 0; score[D][true] = 4
score[C][false] = 0; score[C][true] = 2
score[B][false] = 6; score[B][true] = 3
score[E][false] = 0; score[E][true] = 1
score[A][false] = 4; score[A][true] = 12
Ok, so, my question is basically, now what? Doing the simple thing and iterating through the score vector and deciding what's cheapest locally doesn't work; you only end up including B. Deciding based on the parent and alternating also doesn't work: consider the case where the weight of E is 1000; now the correct answer is {A,B}, and they're adjacent. Perhaps it is not supposed to be confusing, but frankly, I'm confused.
There's no actual backtracking done (or needed). The solution uses dynamic programming to avoid backtracking, since that'd take exponential time. My guess is "backtracking Score" means the Score contains the partial results you would get by doing backtracking.
The cover vertex of a tree allows to include alternated and adjacent vertices. It does not allow to exclude two adjacent vertices, because it must contain all of the edges.
The answer is given in the way the Score is recursively calculated. The cost of not including a vertex, is the cost of including its children. However, the cost of including a vertex is whatever is less costly, the cost of including its children or not including them, because both things are allowed.
As your solution suggests, it can be done with DFS in post-order, in a single pass. The trick is to include a vertex if the Score says it must be included, and include its children if it must be excluded, otherwise we'd be excluding two adjacent vertices.
Here's some pseudocode:
find_cover_vertex_of_minimum_weight(v)
find_cover_vertex_of_minimum_weight(left children of v)
find_cover_vertex_of_minimum_weight(right children of v)
Score[v][false] = Sum for c in children(v) of Score[c][true]
Score[v][true] = v weight + Sum for c in children(v) of min(Score[c][true]; Score[c][false])
if Score[v][true] < Score[v][false] then
add v to cover vertex tree
else
for c in children(v)
add c to cover vertex tree
It actually didnt mean any thing confusing and it is just Dynamic Programming, you seems to almost understand all the algorithm. If I want to make it any more clear, I have to say:
first preform DFS on you graph and find leafs.
for every leaf assign values as the algorithm says.
now start from leafs and assign values to each leaf parent by that formula.
start assigning values to parent of nodes that already have values until you reach the root of your graph.
That is just it, by backtracking in your algorithm it means that you assign value to each node that its child already have values. As I said above this kind of solving problem is called dynamic programming.
Edit just for explaining your changes in the question. As you you have the following graph and answer is clearly B,E but you though this algorithm just give you B and you are incorrect this algorithm give you B and E.
A(9)---B(3)---C(2)
\ \
E(1) D(4)
score[D][false] = 0; score[D][true] = 4
score[C][false] = 0; score[C][true] = 2
score[B][false] = 6 this means we use C and D; score[B][true] = 3 this means we use B
score[E][false] = 0; score[E][true] = 1
score[A][false] = 4 This means we use B and E; score[A][true] = 12 this means we use B and A.
and you select 4 so you must use B and E. if it was just B your answer would be 3. but as you find it correctly your answer is 4 = 3 + 1 = B + E.
Also when E = 1000
A(9)---B(3)---C(2)
\ \
E(1000) D(4)
it is 100% correct that the answer is B and A because it is wrong to use E just because you dont want to select adjacent nodes. with this algorithm you will find the answer is A and B and just by checking you can find it too. suppose this covers :
C D A = 15
C D E = 1006
A B = 12
Although the first two answer have no adjacent nodes but they are bigger than last answer that have adjacent nodes. so it is best to use A and B for cover.

Algorithm for topological sorting if cycles exist

Some programming languages (like haskell) allow cyclic dependencies between modules. Since the compiler needs to know all definitions of all modules imported while compiling one module, it usually has to do some extra work if some modules import each other mutually or any other kind of cycle occurs. In that case, the compiler may not be able to optimize code as much as in modules that have no import cycles, since imported functions may have not yet been analyzed. Usually only one module of a cycle has to be compiled that way, as a binary object has no dependecies. Let's call such a module loop-breaker
Especially if the import cycles are interleaved it is interesting to know, how to minimize the number of loop-breakers when compiling a big project composed of hundreds of modules.
Is there an algorithm that given a set of dependecies outputs a minimal number of modules that need to be compiled as loop-breakers to compile the program successfully?
Example
I try to clarify what I mean in this example.
Consider a project with the four modules A, B, C and D. This is a list of dependencies between these modules, an entry X y means y depends on x:
A C
A D
B A
C B
D B
The same relation visualized as an ASCII-diagram:
D ---> B
^ / ^
| / |
| / |
| L |
A ---> C
There are two cycles in this dependency-graph: ADB and ACB. To break these cycles one could for instance compile modules C and D as loop-breakers. Obviously, this is not the best approach. Compiling A as a loop-breaker is completely sufficient to break both loops and you need to compile one less module as a loop-breaker.
This is the NP-hard (and APX-hard) problem known as minimum feedback vertex set. An approximation algorithm due to Demetrescu and Finocchi (pdf, Combinatorial Algorithms for Feedback Problems in Directed Graphs (2003)") works well in practice when there are no long simple cycles, as I would expect for your application.
Here is how to do it in Python:
from collections import defaultdict
def topological_sort(dependency_pairs):
'Sort values subject to dependency constraints'
num_heads = defaultdict(int) # num arrows pointing in
tails = defaultdict(list) # list of arrows going out
for h, t in dependency_pairs:
num_heads[t] += 1
tails[h].append(t)
ordered = [h for h in tails if h not in num_heads]
for h in ordered:
for t in tails[h]:
num_heads[t] -= 1
if not num_heads[t]:
ordered.append(t)
cyclic = [n for n, heads in num_heads.iteritems() if heads]
return ordered, cyclic
def is_toposorted(ordered, dependency_pairs):
'''Return True if all dependencies have been honored.
Raise KeyError for missing tasks.
'''
rank = {t: i for i, t in enumerate(ordered)}
return all(rank[h] < rank[t] for h, t in dependency_pairs)
print topological_sort('aa'.split())
ordered, cyclic = topological_sort('ah bg cf ch di ed fb fg hd he ib'.split())
print ordered, cyclic
print is_toposorted(ordered, 'ah bg cf ch di ed fb fg hd he ib'.split())
print topological_sort('ah bg cf ch di ed fb fg hd he ib ba xx'.split())
The runtime is linearly proportional to the number of edges (dependency pairs).
The algorithm is organized around a lookup table called num_heads that keeps a count the number of predecessors (incoming arrows). In the ah bg cf ch di ed fb fg hd he ib example, the counts are:
node number of incoming edges
---- ------------------------
a 0
b 2
c 0
d 2
e 1
f 1
g 2
h 2
i 1
The algorithm works by "visting" nodes with no predecessors. For example, nodes a and c have no incoming edges, so they are visited first.
Visiting means that the nodes are output and removed from the graph. When a node is visited, we loop over its successors and decrement their incoming count by one.
For example, in visiting node a, we go to its successor h to decrement its incoming count by one (so that h 2 becomes h 1.
Likewise, when visiting node c, we loop over its successors f and h, decrementing their counts by one (so that f 1 becomes f 0 and h 1 becomes h 0).
The nodes f and h no longer have incoming edges, so we repeat the process of outputting them and removing them from the graph until all the nodes have been visited. In the example, the visitation order (the topological sort is):
a c f h e d i b g
If num_heads ever arrives at a state when there are no nodes without incoming edges, then it means there is a cycle that cannot be topologically sorted and the algorithm exits.

Resources