Cyclomatic Complexity with multiple & different returns - cyclomatic-complexity

Working on validating a tool calculating Cyclomatic Complexity and have had a bit of issue with trying to figure out how it works with multiple returns that return different variables. I've come across articles that handle multiple returns but have all been of the same variable. I am unsure of whether this makes a difference in calculating CC. Take this basic example below:
if(!is_enabled(instance))
return ACCESS;
if(instance->current_cmd != NULL)
return BUSY;
return 0;
I've run two tools on this and get different results. CCCC gives a CC value of 5 while Metrix++ gives a value of 2 (I've already determined that Metrix++ gives a CC value of 1 less than actual due to not counting default path). Manually I am either getting a CC value of 3 or a CC value of 5 depending on whether there should be a single end point or 3 end points.
Which is the correct flow chart/method for calculating Cyclomatic Complexity?

Related

Lua: What is typical approach for using calculated values in a for loop?

What is the typical approach in LUA (before the introduction of integers in 5.3) for dealing with calculated range values in for loops? Mathematical calculations on the start and end values in a numerical for loop put the code at risk of bugs, possibly nasty latent ones as this will only occur on certain values and/or with changes to calculation ordering. Here's a concocted example of a loop not producing the desire output:
a={"a","b","c","d","e"}
maybethree = 3
maybethree = maybethree / 94
maybethree = maybethree * 94
for i = 1,maybethree do print(a[i]) end
This produces the unforuntate output of two items rather than the desired three (tested on 5.1.4 on 64bit x86):
a
b
Programmers unfamiliar with this territory might be further confused by print() output as that prints 3!
The application of a rounding function to the nearest whole number could work here. I understand the approximatation with FP and why this fails, I'm interested in what the typical style/solution is for this in LUA.
Related questions:
Lua for loop does not do all iterations
Lua: converting from float to int
The solution is to avoid this reliance on floating-point math where floating-point precision may become an issue. Or, more realistically, just be aware of when you are using FP and be mindul of the precision issue. This isn’t a Lua problem that requires a Lua-specific solution.
maybethree is a misnomer: it is never three. Your code above is deterministic. It will always print just a and b. Since the maybethree variable is less than three, of course the for loop would not execute 3 times.
The print function is also behaving as defined/expected. Use string.format to show thr FP number in all its glory:
print(string.format("%1.16f", maybethree)) -- 2.9999999999999996
Still need to use calculated values to control your for loop? Then you already mentioned the answer: implement a rounding function.

Matlab - Genetic algorithm for mixed integer optimization

The problem that I am trying to solve is based on the following code:
https://www.mathworks.com/help/gads/examples/solving-a-mixed-integer-engineering-design-problem-using-the-genetic-algorithm.html
My function has a lot more variables but basically it is the same. I have a set of variables that needs to be optimized under given constraints. Some of the variables have to be discrete. However, they can only take the values 0 and 1, I don't have to specify them, as it is shown in the example. (I have tried both methods though)
First I create the upper and lower boundaries, which creates a variable of size 1x193, respectively.
[lb,ub] = GWO_LUBGA(n_var,n_comp,C,n_comp);
Afterwards I call up the constraints. As I have discrete values, I cannot use equality constraints. Therefore I am using the workaround that was proposed here:
http://www.mathworks.com/help/gads/mixed-integer-optimization.html
ObjCon = #(x) funconGA(x,C,ub,n_comp);
Same for the objective function:
ObjFcn = #(x) CostFcnGA(x,C);
Afterwards I pass it over to the genetic algorithm:
[Pos,Best,~,GWO_cg_curve] = ga(ObjFcn,n_var,[],[],[],[],lb,ub,ObjCon,C.T*6+2:C.T*8+1,opts);
with n_var = 193 and C.T=24
When I try to compile I receive the following error:
Error using ga (line 366)
Dimensions of matrices being concatenated are not consistent.
Line 366 contains the following code. Unfortunately gaminlp cannot be opened.
% Call appropriate single objective optimization solver
if ~isempty(intcon)
[x,fval,exitFlag,output,population,scores] = gaminlp(FitnessFcn,nvars, ...
Aineq,bineq,Aeq,beq,lb,ub,NonconFcn,intcon,options,output,Iterate);
Both anonymous functions work when random values are entered. What could be the reason for this error?

Random number generation from 1 to 7

I was going through Google Interview Questions. to implement the random number generation from 1 to 7.
I did write a simple code, I would like to understand if in the interview this question asked to me and if I write the below code is it Acceptable or not?
import time
def generate_rand():
ret = str(time.time()) # time in second like, 12345.1234
ret = int(ret[-1])
if ret == 0 or ret == 1:
return 1
elif ret > 7:
ret = ret - 7
return ret
return ret
while 1:
print(generate_rand())
time.sleep(1) # Just to see the output in the STDOUT
(Since the question seems to ask for analysis of issues in the code and not a solution, I am not providing one. )
The answer is unacceptable because:
You need to wait for a second for each random number. Many applications need a few hundred at a time. (If the sleep is just for convenience, note that even a microsecond granularity will not yield true random numbers as the last microsecond will be monotonically increasing until 10us are reached. You may get more than a few calls done in a span of 10us and there will be a set of monotonically increasing pseudo-random numbers).
Random numbers have uniform distribution. Each element should have the same probability in theory. In this case, you skew 1 more (twice the probability for 0, 1) and 7 more (thrice the probability for 7, 8, 9) compared to the others in the range 2-6.
Typically answers to this sort of a question will try to get a large range of numbers and distribute the ranges evenly from 1-7. For example, the above method would have worked fine if u had wanted randomness from 1-5 as 10 is evenly divisible by 5. Note that this will only solve (2) above.
For (1), there are other sources of randomness, such as /dev/random on a Linux OS.
You haven't really specified the constraints of the problem you're trying to solve, but if it's from a collection of interview questions it seems likely that it might be something like this.
In any case, the answer shown would not be acceptable for the following reasons:
The distribution of the results is not uniform, even if the samples you read from time.time() are uniform.
The results from time.time() will probably not be uniform. The result depends on the time at which you make the call, and if your calls are not uniformly distributed in time then the results will probably not be uniformly distributed either. In the worst case, if you're trying to randomise an array on a very fast processor then you might complete the entire operation before the time changes, so the whole array would be filled with the same value. Or at least large chunks of it would be.
The changes to the random value are highly predictable and can be inferred from the speed at which your program runs. In the very-fast-computer case you'll get a bunch of x followed by a bunch of x+1, but even if the computer is much slower or the clock is more precise, you're likely to get aliasing patterns which behave in a similarly predictable way.
Since you take the time value in decimal, it's likely that the least significant digit doesn't visit all possible values uniformly. It's most likely a conversion from binary to some arbitrary number of decimal digits, and the distribution of the least significant digit can be quite uneven when that happens.
The code should be much simpler. It's a complicated solution with many special cases, which reflects a piecemeal approach to the problem rather than an understanding of the relevant principles. An ideal solution would make the behaviour self-evident without having to consider each case individually.
The last one would probably end the interview, I'm afraid. Perhaps not if you could tell a good story about how you got there.
You need to understand the pigeonhole principle to begin to develop a solution. It looks like you're reducing the time to its least significant decimal digit for possible values 0 to 9. Legal results are 1 to 7. If you have seven pigeonholes and ten pigeons then you can start by putting your first seven pigeons into one hole each, but then you have three pigeons left. There's nowhere that you can put the remaining three pigeons (provided you only use whole pigeons) such that every hole has the same number of pigeons.
The problem is that if you pick a pigeon at random and ask what hole it's in, the answer is more likely to be a hole with two pigeons than a hole with one. This is what's called "non-uniform", and it causes all sorts of problems, depending on what you need your random numbers for.
You would either need to figure out how to ensure that all holes are filled equally, or you would have to come up with an explanation for why it doesn't matter.
Typically the "doesn't matter" answer is that each hole has either a million or a million and one pigeons in it, and for the scale of problem you're working with the bias would be undetectable.
Using the same general architecture you've created, I would do something like this:
import time
def generate_rand():
ret = str(time.time()) # time in second like, 12345.1234
ret = ret % 8 # will return pseudorandom numbers 0-7
if ret == 0:
return 1 # or you could also return the result of another call to generate_rand()
return ret
while 1:
print(generate_rand())
time.sleep(1)

Are pseudo random number generators less likely to repeat?

So they say if you flip a coin 50 times and get heads all 50 times, you're still 50/50 the next flip and 1/4 for the next two. Do you think/know if this same principle applies to computer pseudo-random number generators? I theorize they're less likely to repeat the same number for long stretches.
I ran this a few times and the results are believable, but I'm wondering how many times I'd have to run it to get an anomaly output.
def genString(iterations):
mystring = ''
for _ in range(iterations):
mystring += str(random.randint(0,9))
return mystring
def repeatMax(mystring):
tempchar = ''
max = 0
for char in mystring:
if char == tempchar:
count += 1
if count > max:
max = count
else:
count = 0
tempchar = char
return max
for _ in range(10):
stringer = genString()
print repeatMax(stringer)
I got all 7's and a couple 6's. If I run this 1000 times, will it approximate a normal distribution or should I expect it to stay relatively predictable? I'm trying to understand the predictability of pseudo random number generation.
Failure to produce specific patterns is a typical weakness of PRNGs, but the probability of hitting a substantial run of repeated digits at random is so small it's hard to demonstrate that weakness.
It's perfectly reasonable for a PRNG to use only a 32-bit state, which (traditionally) means producing a sequence of four billion numbers and then repeating from the start again. In that case your sequence of 50 coin-flips coming out the same is probably never going to happen (four billion tries at something that has a one in a quadrillion chance is unlikely to succeed); but if it does, then it's going to appear way too often.
Superficially you're looking for k-dimensional equidistribution as a test for whether or not you can expect to find a prescribed pattern in the output without deeper analysis of the specific generator. If your generator claims at least 50-dimensional equidistribution then you're guaranteed to see the 50-heads state at least once.
However, if your generator emits 32-bit results but you only test whether each result maps to heads or tails, you have some chance at success even if the generator fails the k-dimension test, and that chance depends on the specifics of the generator and the mapping function.
If you adjust the implementation of your generator to return just one bit at a time, then you have an opportunity to try to squeeze 50 heads out of just 50 bits of state (or potentially as few as 18, but that generator would probably be faulty). Provided the generator visits all 2**50 possible states, one of those states will produce 50 heads in a row. You may get a few more heads when adjacent states start or end with more zeroes.

Aggregating automatically-generated feature vectors

I've got a classification system, which I will unfortunately need to be vague about for work reasons. Say we have 5 features to consider, it is basically a set of rules:
A B C D E Result
1 2 b 5 3 X
1 2 c 5 4 X
1 2 e 5 2 X
We take a subject and get its values for A-E, then try matching the rules in sequence. If one matches we return the first result.
C is a discrete value, which could be any of a-e. The rest are just integers.
The ruleset has been automatically generated from our old system and has an extremely large number of rules (~25 million). The old rules were if statements, e.g.
result("X") if $A >= 1 && $A <= 10 && $C eq 'A';
As you can see, the old rules often do not even use some features, or accept ranges. Some are more annoying:
result("Y") if ($A == 1 && $B == 2) || ($A == 2 && $B == 4);
The ruleset needs to be much smaller as it has to be human maintained, so I'd like to shrink rule sets so that the first example would become:
A B C D E Result
1 2 bce 5 2-4 X
The upshot is that we can split the ruleset by the Result column and shrink each independently. However, I cannot think of an easy way to identify and shrink down the ruleset. I've tried clustering algorithms but they choke because some of the data is discrete, and treating it as continuous is imperfect. Another example:
A B C Result
1 2 a X
1 2 b X
(repeat a few hundred times)
2 4 a X
2 4 b X
(ditto)
In an ideal world, this would be two rules:
A B C Result
1 2 * X
2 4 * X
That is: not only would the algorithm identify the relationship between A and B, but would also deduce that C is noise (not important for the rule)
Does anyone have an idea of how to go about this problem? Any language or library is fair game, as I expect this to be a mostly one-off process. Thanks in advance.
Check out the Weka machine learning lib for Java. The API is a little bit crufty but it's very useful. Overall, what you seem to want is an off-the-shelf machine learning algorithm, which is exactly what Weka contains. You're apparently looking for something relatively easy to interpret (you mention that you want it to deduce the relationship between A and B and to tell you that C is just noise.) You could try a decision tree, such as J48, as these are usually easy to visualize/interpret.
Twenty-five million rules? How many features? How many values per feature? Is it possible to iterate through all combinations in practical time? If you can, you could begin by separating the rules into groups by result.
Then, for each result, do the following. Considering each feature as a dimension, and the allowed values for a feature as the metric along that dimension, construct a huge Karnaugh map representing the entire rule set.
The map has two uses. One: research automated methods for the Quine-McCluskey algorithm. A lot of work has been done in this area. There are even a few programs available, although probably none of them will deal with a Karnaugh map of the size you're going to make.
Two: when you have created your final reduced rule set, iterate over all combinations of all values for all features again, and construct another Karnaugh map using the reduced rule set. If the maps match, your rule sets are equivalent.
-Al.
You could try a neural network approach, trained via backpropagation, assuming you have or can randomly generate (based on the old ruleset) a large set of data that hit all your classes. Using a hidden layer of appropriate size will allow you to approximate arbitrary discriminant functions in your feature space. This is more or less the same idea as clustering, but due to the training paradigm should have no issue with your discrete inputs.
This may, however, be a little too "black box" for your case, particularly if you have zero tolerance for false positives and negatives (although, it being a one-off process, you get an arbitrary degree of confidence by checking a gargantuan validation set).

Resources