What does number in SonarLint actually means? - sonarqube

I am very new to sonarlint, So after installing it and analyzing the file, I have seen a number indications in my code. I searched on google about the same but didn’t find anything. A version of Sonnarlint that I am using is 7.4 on Spring Tool Suite 4 Version: 4.14.1.RELEASE, Build Id: 202204250734
Considering it has been installed 1M times, definitely, someone would come to this. So please guide me to understand this.

What you're seeing is the Complexity metric of your code.
This is defined by Sonar as:
Complexity (complexity)
It is the Cyclomatic Complexity calculated based on the number of paths through the code. Whenever the control flow of a function splits, the complexity counter gets incremented by one. Each function has a minimum complexity of 1. This calculation varies slightly by language because keywords and functionalities do.
With specific details for Java (your code looks like Java):
Keywords incrementing the complexity: if, for, while, case, catch, throw, &&, ||, ?
This means that every number you see in red is an additional path your code takes, increasing the total complexity of your code.
The general goal of Clean Code is to keep the complexity of your classes and functions low, in order to increase code readability, cohesion and to follow the single responsiblity principle.
The corresponding rule in Sonar is for example: https://rules.sonarsource.com/java/RSPEC-1541
The configuration of Sonar can be set to allow a maximum complexity number, and your goal would be to reduce the complexity of your code to meet that limit (where ever it makes sense).
References:
https://stackoverflow.com/a/58413756/18699445
https://docs.sonarqube.org/latest/user-guide/metric-definitions

Related

Difference between Running time and Execution time in algorithm?

I'm currently reading this book called CLRS 2.2 page 25. In which the author describes the Running time of an algorithm as
The running time of an algorithm on a particular input is the number of primitive
operations or “steps” executed.
Also the author uses the running time to analyze algorithms. Then I referred a book called Data Structures and Algorithms made easy by Narasimha Karumanchi.
In which he describes the following.
1.7 Goal of the Analysis of Algorithms
The goal of the analysis of algorithms is to compare algorithms (or solutions) mainly in terms of
running time but also in terms of other factors (e.g., memory, developer effort, etc.)
1.9 How to Compare Algorithms:
To compare algorithms, let us define a few objective measures:
Execution times? Not a good measure as execution times are specific to a particular computer.
Number of statements executed? Not a good measure, since the number of statements varies
with the programming language as well as the style of the individual programmer.
Ideal solution? Let us assume that we express the running time of a given algorithm as a function
of the input size n (i.e., f(n)) and compare these different functions corresponding to running
times. This kind of comparison is independent of machine time, programming style, etc.
As you can see from CLRS the author describes the running time as the number of steps executed whereas in the second book the author says its not a good measure to use Number of step executed to analyze the algorithms. Also the running time depends on the computer (my assumption) but the author from the second book says that we cannot consider the Execution time to analyze algorithms as it totally depends on the computer.
I thought the execution time and the running time are same!
So,
What is the real meaning or definition of running time and execution time? Are they the same of different?
Does running time describe the number of steps executed or not?
Does running time depend on the computer or not?
thanks in advance.
What is the real meaning or definition of running time and execution time? Are they the same of different?
The definition of "running time" in 'Introduction to Algorithms' by C,L,R,S [CLRS] is actually not a time, but a number of steps. This is not what you would intuitively use as a definition. Most would agree that "runnning" and "executing" are the same concept, and that "time" is expressed in a unit of time (like milliseconds). So while we would normally consider these two terms to have the same meaning, in CLRS they have deviated from that, and gave a different meaning to "running time".
Does running time describe the number of steps executed or not?
It does mean that in CLRS. But the definition that CLRS uses for "running time" is particular, and not the same as you might encounter in other resources.
CLRS assumes here that a primitive operation (i.e. a step) takes O(1) time.
This is typically true for CPU instructions, which take up to a fixed maximum number of cycles (where each cycle represents a unit of time), but it may not be true in higher level languages. For instance, some languages have a sort instruction. Counting that as a single "step" would give useless results in an analysis.
Breaking down an algorithm into its O(1) steps does help to analyse the complexity of an algorithm. Counting the steps for different inputs may only give a hint about the complexity though. Ultimately, the complexity of an algorithm requires a (mathematical) proof, based on the loops and the known complexity of the steps used in an algorithm.
Does running time depend on the computer or not?
Certainly the execution time may differ. This is one of the reasons we want to by a new computer once in a while.
The number of steps may depend on the computer. If both support the same programming language, and you count steps in that language, then: yes. But if you would do the counting more thoroughly and would count the CPU instructions that are actually ran by the compiled program, then it might be different. For instance, a C compiler on one computer may generate different machine code than a different C compiler on another computer, and so the number of CPU instructions may be less on the one than the other, even though they result from the same C program code.
Practically however, this counting at CPU instruction level is not relevant for determining the complexity of an algorithm. We generally know the time complexity of each instruction in the higher level language, and that is what counts for determining the overall complexity of an algorithm.

Which is the more important code quality metric - between cyclomatic complexity or average fan out?

I was working on TICS code quality metrics for the first time, and I have this question.
Many suggest breaking large functions into one or more functions in order to keep complexity less than 15. Doing so would increase number of functions called by the given function, hence average fan out would be increased.
Should we make the decision to decrease fan out or decrease cyclomatic complexity? Decreasing Cyclomatic complexity would increase maintainability, but splitting functions into 2 or more functions would increase the number of function calls, which would cost most memory.
So which of these two metrics is more important to improve?
I doubt you'll find a definitive answer. In general, I'd advise worrying more about cyclomatic complexity than fan out. You can reduce the cyclomatic complexity of a method using the extract method refactoring. Assuming that the new method that you extracted has a specific purpose that is easy-to-name, you will have replaced detailed logic (the various conditions) with a summary of those conditions (the extracted method). This should make the code easier to read and maintain.
All software quality metrics (including cyclomatic complexity and fanout) are at best unreliable measures of quality. I would strongly caution against trying to improve any metric just for the sake of it.
Metrics can however be reliably used to draw your attention to code that may need attention. If a class or method scores poorly for some metric, it is worth looking at it and deciding if there really is an issue that should be fixed, but it's developer judgment that should make final call on what action should be taken.

SonarQube: Qualify Cognitive Complexity

I understand what cognitive complexity is and how it will be calculated, but i don't now how to determine what is a good value for that measure, so how complex my code schouldn't be. I need an objective way to estimate it without to compare project against each other. A kind of formula like "complexity/lines code" or something like that.
Or if i define a quality gate for a big project how can i calculate the values for it.
At a method level, 15 is a recommended maximum.
At the class level, it depends on what you expect in the package.
For instance, in a package that should only hold classes with fields and simple getters or setters, a class with a Cognitive Complexity over 0 (5? 10?) probably deserves another look.
On the other hand, in a package that holds business logic classes, a class score >= ... 150(?) might indicate that it's time to look at splitting the class.
In terms of what the limit should be for a project, that's unanswerable, and brings us back to Fred Brooks' essential vs accidental complexity. Basically, there's a certain amount of logic that's required to get the job done. Complexity beyond that is accidental and could theoretically be eliminated. Figuring out the difference between the two is the crux of the issue, and in looking for the accidental complexity I would concentrate on methods where the complexity crosses the default threshold of 15.
To answer your initial question, "What should the limit for an application be?", I would say there shouldn't be one. Because the essential complexity for a simple calculator app is far, far lower than for a program on the Space Shuttle. And if you try to make the Space Shuttle program fit inside the calculator threshold, you're absolutely going to break something.
(Disclosure: I'm the primary author of Cognitive Complexity)

Measure expected time to execute any function

Often in Machine Learning, training consumes a lot of time and though, this is measurable, but only after the end of training.
Is there some method which can be used to estimate the time it might take to complete the training(or generally, any function), something like a before_call?
Sure it depends on the machine and more on the inputs but an approximation based on all the IO the algorithm will call, based on simple inputs and then scaled to the size of the actual inputs. Something like this?
PS - JS, Ruby or any other OO language
PPS - I see that in Oracle there is a way, described here. That is cool. How is it done?
Let Ci be the complexity of the i'th learning step. Let Pi be the probability that the thing to be learned will be learned at or before the i'th step. Let k be the step where Pk > 0.5.
In this case the complexity, C is
C = sum(Pi, i=1,k)
The problem is that k is difficult to find. In this case it is a good idea to have a stored set of previously learned similar patterns and compute their average step number, which will be the median. If the set is large-enough, it will be pretty accurate.
Pi = the number of instances when things were learned by step i / total number of instances
In case if you did not set any time/number of steps limits (that will be trivial), there is no way to estimate required time in general.
For example, neural network training basically is a problem of global high-dimensional optimization. In this task your are trying to find such set of parameters to a given loss function, that it will return minimal error. This task belong to NP-complete class and is very difficult to solve. Common approach is to randomly change some parameters by a small value in hope that it will improve overall performance. It works great in practice, but required runtime can vary greatly from problem to problem. I would recommend to read about NP-completness, stochastic gradient descent and optimisation in general.

What is an efficient way to go beyond a greedy algorithm

The domain of this question is scheduling operations on constrained hardware. The resolution of the result is the number of clock cycles the schedule fits within. The search space grows very rapidly where early decisions constrain future decisions and the total number of possible schedules grows rapidly and exponentially. A lot of the possible schedules are equivalent because just swapping the order of two instructions usually result in the same timing constraint.
Basically the question is what is a good strategy for exploring the vast search space without spending too much time. I expect to search only a small fraction but would like to explore different parts of the search space while doing so.
The current greedy algorithm tend to make stupid decisions early on sometimes and the attempt at branch and bound was beyond slow.
Edit:
Want to point out that the result is very binary with perhaps the greedy algorithm ending up using 8 cycles while there exists a solution using only 7 cycles using branch and bound.
Second point is that there are significant restrictions in data routing between instructions and dependencies between instructions that limits the amount of commonality between solutions. Look at it as a knapsack problem with a lot of ordering constraints as well as some solutions completely failing because of routing congestion.
Clarification:
In each cycle there is a limit to how many operations of each type and some operations have two possible types. There are a set of routing constraints which can be varied to be either fairly tight or pretty forgiving and the limit depends on routing congestion.
Integer linear optimization for NP-hard problems
Depending on your side constraints, you may be able to use the critical path method or
(as suggested in a previous answer) dynamic programming. But many scheduling problems are NP-hard just like the classical traveling sales man --- a precise solution has a worst case of exponential search time, just as you describe in your problem.
It's important to know that while NP-hard problems still have a very bad worst case solution time there is an approach that very often produces exact answers with very short computations (the average case is acceptable and you often don't see the worst case).
This approach is to convert your problem to a linear optimization problem with integer variables. There are free-software packages (such as lp-solve) that can solve such problems efficiently.
The advantage of this approach is that it may give you exact answers to NP-hard problems in acceptable time. I used this approach in a few projects.
As your problem statement does not include more details about the side constraints, I cannot go into more detail how to apply the method.
Edit/addition: Sample implementation
Here are some details about how to implement this method in your case (of course, I make some assumptions that may not apply to your actual problem --- I only know the details form your question):
Let's assume that you have 50 instructions cmd(i) (i=1..50) to be scheduled in 10 or less cycles cycle(t) (t=1..10). We introduce 500 binary variables v(i,t) (i=1..50; t=1..10) which indicate whether instruction cmd(i) is executed at cycle(t) or not. This basic setup gives the following linear constraints:
v_it integer variables
0<=v_it; v_it<=1; # 1000 constraints: i=1..50; t=1..10
sum(v_it: t=1..10)==1 # 50 constraints: i=1..50
Now, we have to specify your side conditions. Let's assume that operations cmd(1)...cmd(5) are multiplication operations and that you have exactly two multipliers --- in any cycle, you may perform at most two of these operations in parallel:
sum(v_it: i=1..5)<=2 # 10 constraints: t=1..10
For each of your resources, you need to add the corresponding constraints.
Also, let's assume that operation cmd(7) depends on operation cmd(2) and needs to be executed after it. To make the equation a little bit more interesting, lets also require a two cycle gap between them:
sum(t*v(2,t): t=1..10) + 3 <= sum(t*v(7,t): t=1..10) # one constraint
Note: sum(t*v(2,t): t=1..10) is the cycle t where v(2,t) is equal to one.
Finally, we want to minimize the number of cycles. This is somewhat tricky because you get quite big numbers in the way that I propose: We give assign each v(i,t) a price that grows exponentially with time: pushing off operations into the future is much more expensive than performing them early:
sum(6^t * v(i,t): i=1..50; t=1..10) --> minimum. # one target function
I choose 6 to be bigger than 5 to ensure that adding one cycle to the system makes it more expensive than squeezing everything into less cycles. A side-effect is that the program will go out of it's way to schedule operations as early as possible. You may avoid this by performing a two-step optimization: First, use this target function to find the minimal number of necessary cycles. Then, ask the same problem again with a different target function --- limiting the number of available cycles at the outset and imposing a more moderate price penalty for later operations. You have to play with this, I hope you got the idea.
Hopefully, you can express all your requirements as such linear constraints in your binary variables. Of course, there may be many opportunities to exploit your insight into your specific problem to do with less constraints or less variables.
Then, hand your problem off to lp-solve or cplex and let them find the best solution!
At first blush, it sounds like this problem might fit into a dynamic programming solution. Several operations may take the same amount of time so you might end up with overlapping subproblems.
If you can map your problem to the "travelling salesman" (like: Find the optimal sequence to run all operations in minimum time), then you have an NP-complete problem.
A very quick way to solve that is the ant algorithm (or ant colony optimization).
The idea is that you send an ant down every path. The ant spreads a smelly substance on the path which evaporates over time. Short parts mean that the path will stink more when the next ant comes along. Ants prefer smelly over clean paths. Run thousands of ants through the network. The most smelly path is the optimal one (or at least very close).
Try simulated annealing, cfr. http://en.wikipedia.org/wiki/Simulated_annealing .

Resources