Detect high cyclomatic complexity limit before check in code - cyclomatic-complexity

I would like to know what sort of tools are available to detect bad code (high cyclomatic complexity limit) before checking code in.
Working on a legacy project and there is so much spaghetti code already. Sonar is not necessarily helpful in this case because the purpose is not refactoring the legacy code. Only, newly added code is expected to be clean(er).
Important: The distinction between cleaner and bad code in this case is simply "Cyclomatic complexity".

There are three top players for static code analysis, namely
Findbug: All standards violations & logical issues are pointed here along with the requested CC issues.
PMD: All standards violation issues are pointed here along with CC again.
Checkstyle: All standards violations along with code formatting issues.
My suggestion one should always have zero Findbug issues as, apart from standard violations, the logical issues pointed by it is class apart.
Sonar is a good tool but it has too much dependency to manage, and the setup is also not very straightforward. On the contrary, one can put Findbug, PMD & Checkstyle checks in place on respective project POMs only, and can even configure it to terminate the build process on any single violation encounter and bind the developer to conform to all standards first before moving ahead with code check-in or build for testing.
Please find below respective Maven plugin details:
<groupId>org.codehaus.mojo</groupId>
<artifactId>findbugs-maven-plugin</artifactId>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-pmd-plugin</artifactId>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-checkstyle-plugin</artifactId>

You can use the JArchitect Tool for this kind of needs, its powerful query language let you create easilly your rules and integrate them into your build process, as example you can execute queries like this:
from m in Methods where m.CyclomaticComplexity>20
select m

Related

Understand SonarQube and its testing coverage

i am a Beginner with SonarQube and really tried to google and read a lot of community pages to understand which functions SonarQube offers.
What i dont get is: What does the test coverage in SonarQube refer to?
If it says for example that the coverage on New Code is 30% what does new code mean?
And when does SonarQube say that a issue is a bug? Is the analyzed code compared to a certain standard in order for SonarQube to say that there is a bug?
I hope someone with more knowledge about SonarQube can help me understand it. Thank you very much
Test coverage (also known as code coverage) corresponds to the proportion of the application code (i.e., code without test and sample code) that is executed by test cases out of all application code of the code base.
SonarQube does not compute code coverage itself. Instead coverage is computed and uploaded by external code coverage tools (e.g., cobertura, JaCoCo). SonarQube presents code coverage at different levels (e.g., line coverage, condition coverage); see https://docs.sonarqube.org/latest/user-guide/metric-definitions/#header-9.
Coverage on new code refers to the proportion of code that is both covered and added (or modified) since a certain baseline out of all added and changed code since the same baseline. The baseline can, for example, be the previously analyzed code state or the code state of the previous commit. That is, this metric expresses how extensively changes have been tested. Note that 100% coverage does not mean that code has been perfectly tested; it just says that all code has been executed by test cases.
Issues in SonarQube do not necessarily represent bugs. Usually most issues are actually not bugs but are problems affecting code maintainability in the long term (e.g., code duplications) or violations of best practices. Still, some issues can represent bugs (e.g., potential null-dereferences, incorrect concurrency handling).
Note that an issue can also be false a positive and therefore not be a problem at all.
Most issues are identified with static code analysis by searching the code structure for certain patterns. Some can be uncovered by simple code searches (e.g., violation of naming conventions). Other analyses / issue classes may additionally need data-flow analyses (null-dereferences) or require byte-code information.

Sonarqube and Cucumber features

Is there any way to include the test coverage of Cucumber features and other useful statistics in the SonarQube analysis? I have done a bit of researching, but couldn't find a proper plugin.
From this thread (written after the OP's question), David Racadon added:
As far as I understand:
It is not possible to run an analysis on a project containing only test code because the 'sonar.sources' property is mandatory.
Measures on test code are not aggregated at project level.
As far as I am concerned, I consider test files part of the project the same way source files are. Thus, measures of test files should be aggregated on top of source files.
For now, SonarQube shows that your project is 1,000 lines even if you have 0 or 10,000 lines of test code on top of those 1,000 lines of source code. For me, SonarQube gives you a biased estimate of the size of your project and the effort of maintenance.
The closest would then be his plugin racodond/sonar-gherkin-plugin which:
analyzes Cucumber Gherkin feature files and:
Computes metrics: lines of code, number of scenarios, etc.
Checks various guidelines to find out potential bugs and code smells through more than 40 checks
Provides the ability to write your own checks

Determining which classes would benefit most from unit testing?

I am working on a project where we have only 13% of code coverage with our unit tests. I would like to come up with a plan to improve that but by focusing first on the areas where increasing coverage would bring the greatest value.
This project is in C#, we're using VS 2008 and TFS 2008 and out unit tests are written using MSTest.
What methodology should I use to determine which classes we should tackle first?
Which metrics (code or usage) should I be looking at (and how can I get those metrics if this is not obvious)?
I would recommend adding unit tests to all the classes you touch, not retrofitting existing classes.
Most of the advantages of unit testing is in helping programmers code and ensuring that "Fixes" don't actually break anything, if you are not adding code to a new section of code that isn't every modified, the benefits of unit tests start to drop off.
You might also want to add unit tests to classes that you rely on if you have nothing better to do.
You absolutely should add tests to new functionality you add, but you should probably also add tests to existing functionality you may break.
If you are doing a big refactor, consider getting 80-100% coverage on that section first.
For some good statistics and deterministic querying of certain methods you could definitely look at NDepend: http://www.ndepend.com/
NDepend exposes a query language called CQL (Code Query Language) that allows you to write queries against your code relating to certain statistics and static analysis.
There is no true way to determine which classes might benefit the most, however by setting your own thresholds in CQL you could establish some rules and conventions.
The biggest value of a unit test is for maintenance, to ensure that the code still works after changes.
So, concentrate on methods/classes that are most likely / most frequently changed.
Next in importance are classes/methods with less-than-obvious logic. The unit tests will make them less fragile while serving as extra "documentation" for their contracted API
In general unit tests are a tool to protect against regression and regression is most likely to occur in classes with the most dependencies. You should not have to choose, you should test all classes, but if you have to, test the classes that have the most dependencies in general.
Arrange all your components into levels. Every class in a given level should only depend on components at a lower level. (A "component" is any logical group of one or more classes.)
Write unit tests for all your level 1 components first. You usually don't need mocking frameworks or another other such nonsense because these components only rely on the .NET Framework.
Once level 1 is done, start on level 2. If you level 1 tests are good, you won't need to mock those classes when you write your level 2 tests.
Continue in like fashion, working your way up the application stack.
Tip: Break all your components into level specific DLLs. That way you can ensure that low level components don't accidentally take a dependency on a higher level component.

What helps to you improve your ability to find a bug?

I want to know if there are method to quickly find bugs in the program.
It seems that the more you master the architecture of your software, the more quickly
you can locate the bugs.
How the programmers improve their ability to find a bug?
Logging, and unit tests. The more information you have about what happened, the easier it is to reproduce it. The more modular you can make your code, the easier it is to check that it really is misbehaving where you think it is, and then check that your fix solves the problem.
Divide and conquer. Whenever you are debugging, you should be thinking about cutting down the possible locations of the problem. Every time you run the app, you should be trying to eliminate a possible source and zero in on the actual location. This can be done with logging, with a debugger, assertions, etc.
Here's a prophylactic method after you have found a bug: I find it really helpful to take a minute and think about the bug.
What was the bug exactly in essence.
Why did it occur.
Could you have found it earlier, easier.
Anything else you learned from the bug.
I find taking a minute to think about these things will make it far less likely that you will produce the same bug in the future.
I will assume you mean logic bugs. The best way I have found to capture logic bugs is to implement some sort of testing scheme. Check out jUnit as the standard. Pretty much you define a set of accepted outputs of your methods. Every time you compile your system it checks all of your test cases. If you have introduced new logic that breaks your tests, you will know about it instantly and know exactly what you have to fix.
Test driven design is a pretty big movement in programming right now. You will be hard pressed to find a language that doesn't support some kind of testing. Even JavaScript has a multitude of test suites.
Experience makes you a better debugger. Pay close attention to the bugs that you AND others commonly make. Try to figure out if/how these bugs apply to ALL code that affects you, not the single instance of where the bug was seen.
Raymond Chen is famous for his powers of psychic debugging.
Most of what looks like psychic
debugging is really just knowing what
people tend to get wrong.
That means that you don't necessarily have to be intimately familiar with the architecture / system. You just need enough knowledge to understand the types of bugs that apply and are easy to make.
I personally take the approach of thinking about where the bug may be in the code before actually opening up the code and taking a look. When you first start with this approach, it may not actually work very well, especially if you are pretty unfamiliar with the code base. However, over time someone will be able to tell you the behavior they are experiencing and you'll have a good idea where the problem is located or you may even know what to fix in the code to remedy the problem before even looking at the code.
I was on a project for several years that maintained by a vendor. They were not very good debuggers and most of the time it was up to us to point them to an area of the code that had the problem. What made our problem worse was that we didn't have a nice way to view the source code, so a lot of our "debugging" was just feeling.
Error checking and reporting. The #1 newbie coder debugging mistake is to turn off error reporting, avoid checking for whether what's going on makes sense, etc etc. In general, people feel like if they can't see anything going wrong then nothing is going wrong. Which of course could not be further from the case.
Instead, your code should be chock full of error conditions that will make lots of noise, with detailed reporting, someplace you will see it. (This doesn't mean inside a production web page.) Then, instead of having to trace an error all over the place because it got passed through sixteen layers of execution before it finally got someplace that broke, your errors start happening proximately to the actual issue.
It seems that the more you master the
architecture of your software ,the
more quickly you can locate the bugs.
After understanding the architecture, one's ability to find bugs in the application increases with their ability to identify and write extensive tests.
Know your tools.
Make sure that you know how to use conditional breakpoints and watches in your debugger.
Use static analysis tools as well - they can point out the more obvious issues.
Sleep and rest.
Use programming methods that produce fewer bugs in the first place.
If to implement a single stand-alone functional requirement it takes N separate point-edits to source code, the number of bugs put into the code is roughly proportional to N, so find programming methods that minimize N. Ways to do this: DRY (don't repeat yourself), code generation, and DSL (domain-specific-language).
Where bugs are likely, have unit tests.
Obviously.IMHO, the best unit tests are monte-carlo.
Make intermediate results visible.
For example, compilers have intermediate representations, in the form of 4-tuples. If there is a bug, the intermediate code can be examined. That tells if the bug is in the first or second half of the compiler.
P.S. Most programmers are not aware that they have a choice of how much data structure to use. The less data structure you use, the less are the chances for bugs (and performance issues) caused by it.
I find tracepoints to be an invaluable debugging tool. They are a bit like logging, except you create them during a debugging session to solve a particular issue, like breakpoints.
Printing the stacktrace in a tracepoint can be especially useful. For example, you can print the hash code and stacktrace in the constructor of an object, and then later on when the object is used again you can search for its hashcode to see which client code created it. Same for seeing who disposed it or called a certain method etc.
They are also great for debugging issues related to window focus changes etc, where the debugger would interfere if you drop in break mode.
Static code tools like FindBugs
Assertions, assertions, and assertions.
Some areas of our code has 4 or 5 assertions for each line of real code. When we get a bug report the first thing that happens is that the customer data is processed in our debug build 99 times out a hundred an assert will fire near the cause of the bug.
Additionally our debug build perform redundant calculations to ensure that an optimized algorithm is returning the correct result, and also debug functions are used to examine the sanity of data structures.
The hardest thing new developers have to contend with is getting their code to survive the assertions of the code gthey are calling.
Additionally we do not allow any code to be putback to toplevel that causes any integration or unit test to fail.
Stepping through the code, examining flow/state where unexpected behavior is occurring. (Then develop a test for it, of course).
Writing Debug.Write(message) in your code and using DebugView is another option. And then run your application find out what is going on.
"Architecture" in software means something like:
Several components
The components interact across clearly-defined interfaces
Each component has a well-defined responsibility
The responsibility of one component is unlike the responsibilities of other components
So, as you said, the better the architecture the easier it is to find bugs.
First: knowing the bug, you can decide which functionality is broken, and therefore know which component implements that functionality. For example, if the bug is that something isn't being logged properly, therefore this bug should be in one of 3 places:
In the component that's responsible for logging (your logging library)
Or, above that in the application code which is using this library
Or, below that in the system code which this library is using
Second: examine the data transfered across the interfaces between components. To continue the previous example above:
Set a debugger breakpoint on the application code which invokes the logger API, to verify whether the logger API is being used correctly (e.g. whether it's being invoked at all, whether parameters are as-expected, etc.).
Doing this tells you whether the bug is in the component above this interface, or in the component that's below this interface.
Repeat (perhaps using binary search if the call stack is very deep) until you've found which component is at fault.
When you come to the point that you think there must be a bug in the OS, check your assertions -- and put them into the code with "assert" statements.
Conversely, as you are writing the code, think of the range of valid inputs for your algorithms and put in assertions to make sure you have what you think you have. Same goes for output: Check that you produced what you think you produced.
E.g. if you expect a non-empty list:
l = getList(input)
assert l, "List was empty for input: %s" % str(input)
I'm part of the QA team # work, and knowing anything about the product and how it is developed, helps a lot in finding bugs, also when I make new QA tools I pass it to our dev team to test it, finding bugs in your own code is just plain hard!
Some people say programmers are tainted, so we cannot see bugs in their own product; we are not talking about code here, we are beyond that, usability and functionality itself.
Meanwhile unit testing seams to be a nice solution to find bugs in your own code, its totally pointless if you're wrong even before writing the unit test, how are you going to find the bugs then? you don't!, let your co-worker find them, hire a QA guy.
Scientific debugging is what I always used, and it greatly helps.
Basically, if you can replicate a bug, you can track its origin. You should then experiment some tests, observe the results, and infer hypotheses on why the bug happens.
Writing about all your hypotheses, attempts, expected results and observed results can help you track down the bugs, particularly if they're nasty.
There are automated tools that can help you with that process, particularly git-bisect (and similar bisection tools on other revision systems) to quickly find which change introduced the bug, unit testing to reproduce a bug and prevent regressions in your code (can be used in combination with bisect), and delta debugging to find the culprit in your code (similar to git-bisect but whereas git-bisect works on the code history, delta debugging works on the code directly).
But whatever the tools you are using, the most important benefit is in the scientific methodology, as this is the formalization of what most experienced debuggers do.

How does one implement FxCop / static analysis on an existing code base

What are some of the strategies that are used when implementing FxCop / static analysis on existing code bases with existing violations? How can one most effectively reduce the static analysis violations?
Make liberal use of [SuppressMessage] attribute to begin with. At least at the beginning. Once you get the count to 0 via the attribute, you then put in a rule that new checkins may not introduce FxCop violations.
Visual Studio 2008 has a nice code analysis feature that allows you to ensure that code analysis runs on every build and you can treat warnings as errors. That might slow things down a bit so I recommend setting up a continuous integration server (like CruiseControl.NET) and having it run code analysis on every checkin.
Once you get it under control and aren't introducing new violations with every checkin, start to tackle whole classes of FxCop violations at a time with the goal of removing the SuppressMessageAttributes that you used.
The way to keep track of which ones you really want to keep is to always add a Justification value to the ones you really want to suppress.
Rewrite your code in a passing style!
Seriously, an old code base will have hundreds of errors - but that's why we have novice/intern programmers. Correcting FxCop violations is a great way to get an overview of the code base and also learn how to write conforming .NET code.
So just bite the bullet, drink lots of caffeine, and just get through it in a couple days!
NDepend looks like it could do what you're after, but I'm not sure if it can be integrated into a CruiseControl.Net automated build, and fail the build if the code doesn't meet the requirements (which is what I'd like to happen).
Any other ideas?
An alternative to FxCop would be to use the tool NDepend. This tool lets write Code Rules over C# LINQ Queries (what we call CQLinq). Disclaimer: I am one of the developers of the tool
More than 200 code rules are proposed by default. Customizing existing rules or creating your own rules is straightforward thanks to the well-known C# LINQ syntax.
To keep the number of false-positives low, CQLinq offers the unique capabilities to define what is the set JustMyCode through special code queries prefixed with notmycode. More explanations about this feature can be found here. Here are for example two notmycode default queries:
Discard generated and designer Methods from JustMyCode
Discard generated Types from JustMyCode
To keep the number of false-positives low, with CQLinq you can also focus rules result only on code added or code refactored, since a defined baseline in the past. See the following rule, that detect methods too complex added or refactored since the baseline:
warnif count > 0
from m in Methods
where m.CyclomaticComplexity > 20 &&
m.WasAdded() || m.CodeWasChanged()
select new { m, m.CyclomaticComplexity }
Finally, notice that with NDepend code rules can be verified live in Visual Studio and at build process time, in a generated HTML+javascript report.

Resources