Understanding McCabe's Cyclomatic complexity - cyclomatic-complexity

McCabe's cyclomatic complexity measure is computed as follows:
V = e - n + 2
where e = number of edges; n = number of nodes
Why 2 is added to (e-n) in order to obtain the result?

It seems to be similar to Euler's formula for planar graphs, and since I'm no fan of McCabe, I suppose they use it to seem more science-y.
I say similar because I don't see how, in the question, if 'n' is the number of nodes, why that would be different from the number of vertices.

Related

Longest Simple Path In Sparse Cyclic Graphs

Given: An unweighted, directed Graph (G=(E,V)), which can contain any number of cycles.
Goal: For all vertices I want the longest simple path to some target vertex X in V
Algorithm Idea:
For each v in V
v.distanceToTarget = DepthFirstSearch(v)
Next
DepthFirstSearch(v as Vertex)
if v = target then
'Distance towards target is 0 for target itself
return 0
elseif v.isVisitedInCurrentDFSPath then
'Cycle found -> I wont find the target when I go around in cycles -> abort
return -infinity
else
'Return the maximum Distance of all Successors + 1
return max(v.Successors.ForEach(Function(v) DepthFirstSearch(v) )) + 1
end if
Is this correct for all cases? (Assuming, that the target can be reached from every vertex)
The number of edges in my graphs is very small.
Assume |E| <= 3*|V| holds. How would I compute the average time complexity?
Thanks!
Time complexity is about what values influence your runtime most. In your case your evaluating all possible paths between v and target. That is basically O(number of routes). Now you need to figure out how to express number of all possible routes in terms of E and V.
Most likely result with be something like O(exp(E)) or O(exp(V)) because number of routes going through each node/vertex goes exponentially when you add new possible routes.
EDIT: I missed a detail that you were asking for average time complexity that would mean amortized complexity. But as your algorithm is always evaluates all possible routes worst case complexity is same as average complexity.

Big O for exponential complexity specific case

Let's an algorithm to find all paths between two nodes in a directed, acyclic, non-weighted graph, that may contain more than one edge between the same two vertices. (this DAG is just an example, please I'm not discussing this case specifically, so disregard it's correctness though it's correct, I think).
We have two effecting factors which are:
mc: max number of outgoing edges from a vertex.
ml: length of the max length path measured by number of edges.
Using an iterative fashion to solve the problem, where complexity in the following stands for count of processing operations done.
for the first iteration the complexity = mc
for the second iteration the complexity = mc*mc
for the third iteration the complexity = mc*mc*mc
for the (max length path)th iteration the complexity = mc^ml
Total worst complexity is (mc + mc*mc + ... + mc^ml).
1- can we say it's O(mc^ml)?
2- Is this exponential complexity?, as I know, in exponential complexity, the variable only appear at the exponent, not at the base.
3- Are mc and ml both are variables in my algorithm comlexity?
There's a faster way to achieve the answer in O(V + E), but seems like your question is about calculating complexity, not about optimizing algorithm.
Yes, seems like it's O(mc^ml)
Yes, they bot can be variables in your algorithm complexity
As about the complexity of your algorithm: let's do some transformation, using the fact that a^b = e^(b*ln(a)):
mc^ml = (e^ln(mc))^ml = e^(ml*ln(mc)) < e^(ml*mc) if ml,mc -> infinity
So, basically, your algorithm complexity upperbound is O(e^(ml*mc)), but we can still shorten it to see, if it's really an exponential complexity. Assume that ml, mc <= N, where N is, let's say, max(ml, mc). So:
e^(ml*mc) <= e^N^2 = e^(e^2*ln(N)) = (e^e)^(2*ln(N)) < (e^e)^(2*N) = [C = e^e] = O(C^N).
So, your algorithm complexity will be O(C^N), where C is a constant, and N is something that growth not faster than linear. So, basically - yes, it is exponetinal complexity.

Maximum non-overlapping intervals in a interval tree

Given a list of intervals of time, I need to find the set of maximum non-overlapping intervals.
For example,
if we have the following intervals:
[0600, 0830], [0800, 0900], [0900, 1100], [0900, 1130],
[1030, 1400], [1230, 1400]
Also it is given that time have to be in the range [0000, 2400].
The maximum non-overlapping set of intervals is [0600, 0830], [0900, 1130], [1230, 1400].
I understand that maximum set packing is NP-Complete. I want to confirm if my problem (with intervals containing only start and end time) is also NP-Complete.
And if so, is there a way to find an optimal solution in exponential time, but with smarter preprocessing and pruning data. Or if there is a relatively easy to implement fixed parameter tractable algorithm. I don't want to go for an approximation algorithm.
This is not a NP-Complete problem. I can think of an O(n * log(n)) algorithm using dynamic programming to solve this problem.
Suppose we have n intervals. Suppose the given range is S (in your case, S = [0000, 2400]). Either suppose all intervals are within S, or eliminate all intervals not within S in linear time.
Pre-process:
Sort all intervals by their begin points. Suppose we get an array A[n] of n intervals.
This step takes O(n * log(n)) time
For all end points of intervals, find the index of the smallest begin point that follows after it. Suppose we get an array Next[n] of n integers.
If such begin point does not exist for the end point of interval i, we may assign n to Next[i].
We can do this in O(n * log(n)) time by enumerating n end points of all intervals, and use a binary search to find the answer. Maybe there exists linear approach to solve this, but it doesn't matter, because the previous step already take O(n * log(n)) time.
DP:
Suppose the maximum non-overlapping intervals in range [A[i].begin, S.end] is f[i]. Then f[0] is the answer we want.
Also suppose f[n] = 0;
State transition equation:
f[i] = max{f[i+1], 1 + f[Next[i]]}
It is quite obvious that the DP step take linear time.
The above solution is the one I come up with at the first glance of the problem. After that, I also think out a greedy approach which is simpler (but not faster in the sense of big O notation):
(With the same notation and assumptions as the DP approach above)
Pre-process: Sort all intervals by their end points. Suppose we get an array B[n] of n intervals.
Greedy:
int ans = 0, cursor = S.begin;
for(int i = 0; i < n; i++){
if(B[i].begin >= cursor){
ans++;
cursor = B[i].end;
}
}
The above two solutions come out from my mind, but your problem is also referred as the activity selection problem, which can be found on Wikipedia http://en.wikipedia.org/wiki/Activity_selection_problem.
Also, Introduction to Algorithms discusses this problem in depth in 16.1.

Computing Combinations

I am facing difficulty in coming up with a solution for the problem given below:
We are given n boxes each having a weight ( it means each ball in box B_i have weight C_i),
Each box contain some balls specifically
{b1,b2,b3...,b_n} (b_i is the count of balls in Box B_i).
we have to choose m balls out of it such that sum of the weights of m chosen balls be less than a given number T.
How many ways to do it?
First, let's have a look on a similar problem:
The similar problem is: you are looking to maximize the sum (such that it is still smaller then T), you are facing a variation of subset-sum problem, which is NP-Hard. The variation with a constant number of items is discussed in this thread: Sum-subset with a fixed subset size.
An alternative way to look at the problem is with a 2-dimensional knapsack problem, where weight = cost, and an extra dimension for number of elements. This concept is discussed in this thread: What's the fastest way to solve knapsack prob with two properties
Now, look at your problem: Finding the number of possible ways to achieve a sum which is smaller/equal T is still NP-Hard.
Assume you had a polynomial algorithm to do it, let it be A.
Running A(T) and A(T-1) will give you two numbers, if A(T) > A(T-1), the answer to the subset sum problem would have been true - otherwise it is false, so given a polynomial solution to this problem, we could prove P=NP.
You can solve it by using dynamic programming techniques.
Let f[i][j][k] denote the number of ways to choose j balls from B_1 to B_i with sum of weights to be exactly k. The answer you want to get is f[n][m][T].
Initially, let f[i][j][k] = 1 for all i,j,k
for i = 1 to n
for j = 0 to m
for k = 0 to T
for x = 0 to min(b_i,j) # choose x balls from B_i
y = x * C_i
if y <= k
f[i][j][k] = f[i][j][k] * f[i-1][j-x][k-y] * Comb(b_i,x)
Comb(n,k) is the number of ways to choose k elements from n elements.
The time complexity is O(n m T b) where b is the maximum number of balls in a box.
Note that, because of the T in the big-O notation, theoretically it is NP-hard. However, in practice, when T is relatively small, this algorithm is still feasible.

Algorithm to max(min(matching))?

Given two sets A and B of equal size N, and a weighting assigning a real number to each of the N^2 entries of the cross-product AxB, we want to form a matching of A and B such that the lowest weighting is maximized.
Take as an example we are organizing a horse race and we have 10 jockeys and 10 horses, each jockey has a different expected speed riding each horse. We have to pick which jockey rides which horse such that the slowest jockey/horse of this match-up, is as fast as possible.
Take
i j k
a 9 1 2
b 4 3 1
c 7 3 5
Here the "max-min-matching" is { (a,i), (b,j), (c,k) } with a value of 3.
What is an algorithm to calculate this matching and what is its complexity?
This answer guides how to create an O(n^2 * sqrt(n) * log(n)) solution for this problem.
Naive slow algorithm:
First, note that a naive O(n^4 * sqrt(n)) is iteratively using a matching algorithm on the bipartite graph which models the problem, and looking for the "highest set of edges" that cannot be remvoed. (Meaning: looking for the maximal edge that will be minimal in a matching).
The graph is G= (V,E), where V = A [union] B and E = A x B.
The algorithm is:
sort the edges according to the weighted value
while there is an unweighted match match:
remove the edge with smallest value
return the match weight of the last removed edge
Correctness explanation:
It is easy to see that the value is not smaller then the last removed edge - because there is a match using it and not "smaller" edge.
It is also not higher because when this edge is removed - there is no match.
complexity:
running O(n^2) the matching algorithm, which is O(|E|sqrt(|V|)) = O(n^2 * sqrt(n)) yields total of O(n^4 * sqrt(n)
We would like to reduce the O(n^2) factor, since the matching algorithm should probably be used.
Optimizing:
Note that the algorithm is actually looking where to "cut" the list of sorted edges. We are actually looking for the smallest edge that must be in the list in order to obtain a match.
One can imply binary search here, where each "compare" is actually checking if there is a matching, and you are looking for the "highest" element that yields a match. This will result in O(log(n^2)) = O(logn) iterations of the matching algorithm, giving you total of O(n^2 * sqrt(n) * log(n))
high level optimized algorithm:
//the compare OP
checkMatching(edges,i):
edges' <- edges
remove from edges' all the elements with index j < i
check if there is a match in the graph
return 1 if there is, 0 otherwise.
//the algorithm:
find max_min(vertices,edges):
sort edges according to weight ascending
binary search in edges for the smallest index that yields 0
let this index be i
remove from edges all the elements with index j < i
return a match with the modified edges
This problem is a typical Bipartite Matching problem. You can have look at the Hungarian method or KM algorithm to solve it.

Resources