Cyclomatic complexity of a code with multiple exit points - cyclomatic-complexity

How to find the cyclomatic complexity of a function with multiple exit points?
The wiki page says
p-s+2 where p is the number of decision points and s is the number of exit points.
But should not more exit points increase the cyclomatic complexity as it may lead to more independent paths?
Cheers,
Aman

CC measures linearly independent paths. Exit points don't ADD paths to the code, they TERMINATE paths, thus reducing CC (or in the very least, they certainly don't increase CC).
To put it another way, the ONLY way to add exit points is to ADD more paths (conditions like IF). Otherwise, the code after the 'naked' exit point is unreachable, so it is the conditionals that add complexity, not exit points.

Why not try a trial of NDepend? It will calculate cyclomatic complexity and many other code metrics.

Thanks michael. I had realised my mistake after i posted the question. My mistake stemmed from the observation that JavaNcss(which uses the source file) and Xdepend
(which uses the jar file) both seemed to overestimate the CC for a piece of code which had multiple exit points. I have posted the code here .
But using the formula p-s+2 the answer seems to be 4. Is there some simple explanation i am missing ?
#Richard: I have tried Xdepend( The non .NET version of Ndepend). It looks a good tool. But when it uses jar files, it overestimates the CC ( which they have accepted in their documentation). At this stage i am exploring different tools. Are you aware of any better one?
Cheers.

Related

What does number in SonarLint actually means?

I am very new to sonarlint, So after installing it and analyzing the file, I have seen a number indications in my code. I searched on google about the same but didn’t find anything. A version of Sonnarlint that I am using is 7.4 on Spring Tool Suite 4 Version: 4.14.1.RELEASE, Build Id: 202204250734
Considering it has been installed 1M times, definitely, someone would come to this. So please guide me to understand this.
What you're seeing is the Complexity metric of your code.
This is defined by Sonar as:
Complexity (complexity)
It is the Cyclomatic Complexity calculated based on the number of paths through the code. Whenever the control flow of a function splits, the complexity counter gets incremented by one. Each function has a minimum complexity of 1. This calculation varies slightly by language because keywords and functionalities do.
With specific details for Java (your code looks like Java):
Keywords incrementing the complexity: if, for, while, case, catch, throw, &&, ||, ?
This means that every number you see in red is an additional path your code takes, increasing the total complexity of your code.
The general goal of Clean Code is to keep the complexity of your classes and functions low, in order to increase code readability, cohesion and to follow the single responsiblity principle.
The corresponding rule in Sonar is for example: https://rules.sonarsource.com/java/RSPEC-1541
The configuration of Sonar can be set to allow a maximum complexity number, and your goal would be to reduce the complexity of your code to meet that limit (where ever it makes sense).
References:
https://stackoverflow.com/a/58413756/18699445
https://docs.sonarqube.org/latest/user-guide/metric-definitions

Is there a canonical/performant way to reduce arrays/matrices by removing the border values?

A motivating issue, implemented in Matlab:
N = 1000;
R = zeros(2*N);
for i=0:N-1
R = R(2:end-1, 2:end-1);
end
For this code timeit() gives a time 2.9793 on my machine. It isn't really great.
By canonical way I mean a discussion that isn't just acceptable, but a performant implementation that respects very large matrices reduced. I would be very appreciative of any answer, referrals to other discussions or literature.
As for language, I am not really a programmer, this question is motivated by a mathematics inquiry and I have encountered performance issues implementing any such reduction process in Matlab. Is there a solution to this in Matlab, or must one delve into the scary depths of C/C++?
One note: One may ask, why not just keep the matrix as is and consider parts of it as needed? To clarify, the reduction process in practice of course depends on the actual (nonzero) values of the elements, e.g. by processing the matrix in 2x2 blocks, and the removal of edge-values is needed to prepare the matrix for then next reduction step.
R(2:end-1, 2:end-1) is the correct way of extracting the part of the array that is all values except the ones at the edges. This requires copying the data, so will take some time. There is no legal way around the copy, and no alternative for extracting a part of the array. (subsref might seems like an alternative, but is the function that is internally called for the given syntax.)
As for illegal ways, you could try James Tursa’s sharedchild from the MATLAB FileExchange. It allows to create an array that references subsets of the data of another array. James is well known in the MATLAB user community as one of the people reverse-engineering the system and bending it to his will. This is solid code. But every version of MATLAB introduces new changes to the infrastructure, so upgrading MATLAB might break your program if you use this code.
You don't need the for loop. If you want to remove L elements from the borders, simply do:
R=R(L+1:end-L, L+1:end-L)
I am surprised you didn't get an error with that code. I think you should end up with an empty matrix at the end of the loop.

What is the difference between Genetic Algorithm and Iterated Local Search Algorithm?

I'm basically trying to use Genetic Algorithm or Iterated Local Search Algorithm to get an optimal solution for a question.Can someone please explain what is the basic difference between these two algorithms and is there any situations where one of them is better than the other?
Let me start from the second question. I believe that there is no way to determine a better algorithm for a given problem without any trials and tests. The behavior of an algorithm heavily depends on problem's properties. If we are talking about complex problems with hundreds and thousands of variables, it's just too difficult to predict anything. I'm not talking about your engineer's intuition, some deep problem understanding, previous experience, etc, they are not really measurable.
The main difference between global and local search is quite straightforward - local search considers just one or a few of possible solutions at a single point of time and it tries to improve them with some modifications. Thus, each iteration it considers just a small portion of a search space (=local neighboorhood). Global search tries to take into account whole problem with all its parameters at the same time. For example, PSO samples huge amount of candidates and tries to move all of them into the global optimum's direction using some simple formula.

Preventing generation of swastika-like images when generating identicons

I am using this PHP script to generate identicons. It uses Don Park's original identicon algorithm.
The script works great and I have adapted it to my own application to generate identicons. The problem is that sometimes swastikas are generated. While swastikas have peaceful origins, people do take offence when seeing those symbols.
What I would like to do is to alter the algorithm so that swastikas are never generated. I have done a bit of digging and found this thread on Microsoft's website where an employee states that they have added a tweak to prevent generation of swastikas, but nothing more.
Has anyone identified what the tweak would be and how to prevent swastikas from being generated?
Identicons appear to me (on a quick glance) always to have four-fold rotational symmetry. Swastikas certainly do. How about just repeating the quarter-block in a different way? If you take a quarter-block that would produce a swastika in the current pattern, and reflect two diagonally-opposite quarters, then you get a sort of space invader.
Basically, nothing with reflectional symmetry can look very much like a swastika. I suppose if there's a small swastika entirely contained within the quarter, then you still have a problem.
On Jeff Atwood's introducing thread, Don Park suggested:
Re Swastika comments, that can be addressed by applying a specialized OCR-like visual analysis to identify all offending codes then crunch them into an effective bloom filter using genetic algorithm. When the filter returns true, a second type of identicon (i.e. 4-block quilt) can be used.
Alternatively, you could avoid the issue entirely by replacing identicons with unicorns.
My original suggestion involving visual analysis was in context of the particular algorithm in use, namely 9-block quilt.
If you want to try another algorithm without Swastika problem, try introducing symmetry like one seen in inkblots to popular 16-block quilt Identicons.

What is an efficient way to go beyond a greedy algorithm

The domain of this question is scheduling operations on constrained hardware. The resolution of the result is the number of clock cycles the schedule fits within. The search space grows very rapidly where early decisions constrain future decisions and the total number of possible schedules grows rapidly and exponentially. A lot of the possible schedules are equivalent because just swapping the order of two instructions usually result in the same timing constraint.
Basically the question is what is a good strategy for exploring the vast search space without spending too much time. I expect to search only a small fraction but would like to explore different parts of the search space while doing so.
The current greedy algorithm tend to make stupid decisions early on sometimes and the attempt at branch and bound was beyond slow.
Edit:
Want to point out that the result is very binary with perhaps the greedy algorithm ending up using 8 cycles while there exists a solution using only 7 cycles using branch and bound.
Second point is that there are significant restrictions in data routing between instructions and dependencies between instructions that limits the amount of commonality between solutions. Look at it as a knapsack problem with a lot of ordering constraints as well as some solutions completely failing because of routing congestion.
Clarification:
In each cycle there is a limit to how many operations of each type and some operations have two possible types. There are a set of routing constraints which can be varied to be either fairly tight or pretty forgiving and the limit depends on routing congestion.
Integer linear optimization for NP-hard problems
Depending on your side constraints, you may be able to use the critical path method or
(as suggested in a previous answer) dynamic programming. But many scheduling problems are NP-hard just like the classical traveling sales man --- a precise solution has a worst case of exponential search time, just as you describe in your problem.
It's important to know that while NP-hard problems still have a very bad worst case solution time there is an approach that very often produces exact answers with very short computations (the average case is acceptable and you often don't see the worst case).
This approach is to convert your problem to a linear optimization problem with integer variables. There are free-software packages (such as lp-solve) that can solve such problems efficiently.
The advantage of this approach is that it may give you exact answers to NP-hard problems in acceptable time. I used this approach in a few projects.
As your problem statement does not include more details about the side constraints, I cannot go into more detail how to apply the method.
Edit/addition: Sample implementation
Here are some details about how to implement this method in your case (of course, I make some assumptions that may not apply to your actual problem --- I only know the details form your question):
Let's assume that you have 50 instructions cmd(i) (i=1..50) to be scheduled in 10 or less cycles cycle(t) (t=1..10). We introduce 500 binary variables v(i,t) (i=1..50; t=1..10) which indicate whether instruction cmd(i) is executed at cycle(t) or not. This basic setup gives the following linear constraints:
v_it integer variables
0<=v_it; v_it<=1; # 1000 constraints: i=1..50; t=1..10
sum(v_it: t=1..10)==1 # 50 constraints: i=1..50
Now, we have to specify your side conditions. Let's assume that operations cmd(1)...cmd(5) are multiplication operations and that you have exactly two multipliers --- in any cycle, you may perform at most two of these operations in parallel:
sum(v_it: i=1..5)<=2 # 10 constraints: t=1..10
For each of your resources, you need to add the corresponding constraints.
Also, let's assume that operation cmd(7) depends on operation cmd(2) and needs to be executed after it. To make the equation a little bit more interesting, lets also require a two cycle gap between them:
sum(t*v(2,t): t=1..10) + 3 <= sum(t*v(7,t): t=1..10) # one constraint
Note: sum(t*v(2,t): t=1..10) is the cycle t where v(2,t) is equal to one.
Finally, we want to minimize the number of cycles. This is somewhat tricky because you get quite big numbers in the way that I propose: We give assign each v(i,t) a price that grows exponentially with time: pushing off operations into the future is much more expensive than performing them early:
sum(6^t * v(i,t): i=1..50; t=1..10) --> minimum. # one target function
I choose 6 to be bigger than 5 to ensure that adding one cycle to the system makes it more expensive than squeezing everything into less cycles. A side-effect is that the program will go out of it's way to schedule operations as early as possible. You may avoid this by performing a two-step optimization: First, use this target function to find the minimal number of necessary cycles. Then, ask the same problem again with a different target function --- limiting the number of available cycles at the outset and imposing a more moderate price penalty for later operations. You have to play with this, I hope you got the idea.
Hopefully, you can express all your requirements as such linear constraints in your binary variables. Of course, there may be many opportunities to exploit your insight into your specific problem to do with less constraints or less variables.
Then, hand your problem off to lp-solve or cplex and let them find the best solution!
At first blush, it sounds like this problem might fit into a dynamic programming solution. Several operations may take the same amount of time so you might end up with overlapping subproblems.
If you can map your problem to the "travelling salesman" (like: Find the optimal sequence to run all operations in minimum time), then you have an NP-complete problem.
A very quick way to solve that is the ant algorithm (or ant colony optimization).
The idea is that you send an ant down every path. The ant spreads a smelly substance on the path which evaporates over time. Short parts mean that the path will stink more when the next ant comes along. Ants prefer smelly over clean paths. Run thousands of ants through the network. The most smelly path is the optimal one (or at least very close).
Try simulated annealing, cfr. http://en.wikipedia.org/wiki/Simulated_annealing .

Resources