What are some of the strategies that are used when implementing FxCop / static analysis on existing code bases with existing violations? How can one most effectively reduce the static analysis violations?
Make liberal use of [SuppressMessage] attribute to begin with. At least at the beginning. Once you get the count to 0 via the attribute, you then put in a rule that new checkins may not introduce FxCop violations.
Visual Studio 2008 has a nice code analysis feature that allows you to ensure that code analysis runs on every build and you can treat warnings as errors. That might slow things down a bit so I recommend setting up a continuous integration server (like CruiseControl.NET) and having it run code analysis on every checkin.
Once you get it under control and aren't introducing new violations with every checkin, start to tackle whole classes of FxCop violations at a time with the goal of removing the SuppressMessageAttributes that you used.
The way to keep track of which ones you really want to keep is to always add a Justification value to the ones you really want to suppress.
Rewrite your code in a passing style!
Seriously, an old code base will have hundreds of errors - but that's why we have novice/intern programmers. Correcting FxCop violations is a great way to get an overview of the code base and also learn how to write conforming .NET code.
So just bite the bullet, drink lots of caffeine, and just get through it in a couple days!
NDepend looks like it could do what you're after, but I'm not sure if it can be integrated into a CruiseControl.Net automated build, and fail the build if the code doesn't meet the requirements (which is what I'd like to happen).
Any other ideas?
An alternative to FxCop would be to use the tool NDepend. This tool lets write Code Rules over C# LINQ Queries (what we call CQLinq). Disclaimer: I am one of the developers of the tool
More than 200 code rules are proposed by default. Customizing existing rules or creating your own rules is straightforward thanks to the well-known C# LINQ syntax.
To keep the number of false-positives low, CQLinq offers the unique capabilities to define what is the set JustMyCode through special code queries prefixed with notmycode. More explanations about this feature can be found here. Here are for example two notmycode default queries:
Discard generated and designer Methods from JustMyCode
Discard generated Types from JustMyCode
To keep the number of false-positives low, with CQLinq you can also focus rules result only on code added or code refactored, since a defined baseline in the past. See the following rule, that detect methods too complex added or refactored since the baseline:
warnif count > 0
from m in Methods
where m.CyclomaticComplexity > 20 &&
m.WasAdded() || m.CodeWasChanged()
select new { m, m.CyclomaticComplexity }
Finally, notice that with NDepend code rules can be verified live in Visual Studio and at build process time, in a generated HTML+javascript report.
Related
In a large C++ project Coverity analysis reports issues in files that we won't be fixing e.g. Boost libraries, STL headers, some 3rd party libraries etc.
Ideally there would be a mechanism to completely ignore these and not to increment the total count for such issues.
In Coverity Connect (v8.1) we've set up Components with file path regexp and that nicely filters the files in question when browsing but the total number of issues does not drop down. Two questions related to this:
is there a way to drop the number of total issues for files we don't care about? e.g. after such an issue has already been captured
if new code we introduce includes one of the offending boost/STL/etc headers, will this clock up the total issue counter? (clearly, that would be less than desirable)
Mandatory disclaimer first: Your customers won’t care that bugs in your code came from a third party. That said, the main answer at the link Yannis mentioned is generally the correct one: “use a component filter.” If it’s not working correctly for you, double-check your configuration. I found it quite robust, even with a negative look-ahead regex with over a hundred disjuncts.
Once such issue is found, you can mark it as false positive or ignore it all the way. You have to do this only once. In future analyze, when this issue is found again, it will keep this status. And no, if you use this include to other files too, the total issue counter won't go higher as long as the issue is in the same file.
Check this:
Can Synopsys Static Analysis (Coverity) automatically ignore issues in third-party or noncritical code?
We use Vs2008/2010 with TFS 2010 for our source control, because it also lets us create custom work item types that we can use for project management, such as product backlog items and sprint backlog items.
One item thats not tracked (by machine) is build regression test tasks for release candidates. Our regression testing is part automated, part manual, and the manual part can take several days. Currently we use an excel spreadsheet with a list of all the test cases, and then the testers just fill in results and notes.
I've been proposing creating a build regression test template that contains each test case, default owner, and then when we want to do regression testing on a build, we can automatically create work items for every test in the template.
My argument is that if the regression test work is mandatory for the project, and the results should be tracked, then writing additional TFS work items make sense, especially since the work items can hold estimates, giving managers an idea of how much re-test time remains.
The argument against this is that we already have high level work items to capture the overall project test requirements, and the regression testing is basically a "re-test", so new work items would be duplicate.
My question: Is anyone else doing anything like this? Is it reasonable to use TFS to track outstanding re-test tasks?
Note: we don't own Visual Studio Test Professional
I think it's reasonable to go with your suggested solution. You should have another work item type for the "test tasks", that can be linked as children to the test requirement work items. Doing that, like you said, would allow you to track results, progress, reporting, etc. You can also add other fields like build number, tested by, tested date, etc. to the work item type for history, something that cannot be done with just one test requirement work item type.
Essentially, what you proposed is done in the ITestResult object in the Microsoft.TeamFoundation.TestManagement.Client.dll.
Forgive me if this is a dupe but I couldn't find anything that hit this exact question.
I'm working with a legacy application that is very tightly coupled. We are pulling out some of the major functionality because we will be getting that functionality from an external service.
What is the best way to begin removing the now unused code? Should I just start at the extreme base, remove, and refactor my way up the stack? During lunch I'm going to go take a look at Working Effectively with Legacy Code.
If you can, and it makes sense in your problem domain, I would try to, during development, try and keep the legacy code functioning in parallel with the new API. And use the results from the legacy API to cross check that the new API is working as expected.
I think the most important thing you can do is to refactor/remove/test in very small chunks. It's tedious and time consuming but it will help limit risks and errors later on.
I would also start with code that is "low risk" to change.
My advice is to use findbugs and PMD/CPD (copy-paste-detector) to remove dead code (code that can not or will not be called) unused variables and duplicated code. Getting rid of this junk will make re-factoring easier.
Then learn the key mappings for the common re-factoring in your IDE. Extract method and introduce variable should be committed to muscle memory after an hour.
Use the primary disadvantage of tightly coupled code to... your advantage!
Step 1: Identify the area which provides the redundant functionality which you want to replace. Break it...do a quick smoke test of some of the critical parts of the application. Get the feel.
Step 2: Depending on what language it is find the relevant static-code analysis tools and get the needed refactoring info.
Step 3: Repeat Step 1 in incremental levels of narrowing down to the exact pattern.
All this of course, in a sandbox environment. This may seem a bit haphazard but if you limit yourself to critical functionality testing ... you may get many leads in the process. You will definitely identify the pattern of the legacy code, if nothing else.
You absolutely cannot do with with a live development version [new features being added]. You must start with a feature freeze.
I tend to look at all of the components of the system in an overview and see the biggest places of reuse. From there I would implement the appropriate design pattern to solve it, and make the new component reusable. Write test cases to ensure the new code works as expected, then refactor your code around the new change. Then repeat [overview, etc] till you are satisfied.
I would suggest this for many reasons:
Everyone working with you on refactoring will learn something
People learn how to avoid design mistakes down the road
Everyone working on it will get a better understanding of the code base
As antiquated and painful as it is - I work at a company that continues to actively use VB6 for a large project. In fact, 18 months ago we came up against the 32k identifier limit.
Not willing to give up on the large code base and rewrite everything in .NET we broke our application into a main executable and several supporting DLL files. This week we ran into the 32k limit again.
The problem we have is that no tool we can find will tell us how many unique identifiers our source is using. We have no accurate way to gauge how our efforts are reducing the number of identifiers or how close we are to the limit before we reach it.
Does anyone know of a tool that will scan the source for a project and return some accurate metrics and statistics?
OK. The Project Metrics Viewer which is part of the Project Analyzer tool from Aivosto will do exactly what you want. I've included a screenshot and also the link to the metrics list which includes numbers of variables etc.
Metrics List
(source: aivosto.com)
The company I work for also has a large VB6 project that encountered the identifier limit. I developed a way to accurately count the number of identifiers remaining, and this has been incorporated into our build process for this project.
After trying several tools without success, I finally realized that the VB6 IDE itself knows exactly how many identifiers it has remaining. In fact, the VB6 IDE throws an "out of memory" error when you add one variable past its limit.
Taking advantage of this fact, I wrote a VB6 Add-In project that first compiles the currently loaded project in the IDE, then adds uniquely named variables to the project until it throws an error. When an error is raised, it records the number of identifiers added before the error as the number of identifiers remaining.
This number is stored in file in a location known to our automated build process, which then reads this number and reports it to the development team. When it gets below a value we feel comfortable with, we schedule some refactoring time and move more code out of this project into DLL projects. We have been using this in production for several years now, and has proven to be a reliable process.
To directly answer the question, using an Add-In is the only way I know to accurately measure the number of remaining identifiers. While I cannot share the Add-In code our project is using, I can say there is not much code involved, and it did not take long to develop.
Microsoft has a decent guide for how to create an Add-In, which can get you started:
https://support.microsoft.com/en-us/kb/189468
Here are some important details specific to counting identifiers:
The VB6 IDE will not consistently throw an error when out of identifiers until the current loaded project has been compiled. Our Add-In programmatically does this before adding identifiers to guarantee an accurate count. If the project cannot be compiled, then an accurate count cannot be obtained.
There are 32,500 identifiers available to a new, empty VB6 project.
Only unique identifier names count. Two local variables with the same name in two different routines only count as one identifier.
CodeSmart by AxTools is very good.
(source: axtools.com)
Cheat - create an unused class with #### unique variables in it. Use Excel or something to generate the alphabetical unique variable names. Remove the class from the project when you hit the limit, or comment out blocks of 100 unique variables..
I'd rather lean on the compiler (which defines how many variables are too many) than on some 3rd party tool anyway.
(oh crud, sorry to necro - didn't notice the dates)
You could get this from a tool that extracted identifiers from VB6 code. Then all you'd have to do is sort the list, eliminate duplicates, and measure the list size. We have a source code search engine that breaks up source code into language tokens ("lexes"), with some of those tokens being exactly those identifiers. That would contain exactly the data you want.
But maybe there's another way to solve your problem: find out which variable names which occur rarely and replace them by a set of standard names (e.g., "temp"). So what you really want is a count of the number of each variable name so you can sort for "small numbers of references". The same lexer data can provide this information.
Then all you need is a tool to rename low-occurrence identifiers to something from the standard set. We offer obfuscators that replace one name by another that could probably do this.
[Oct 2014 update]. Just had a long conversation with somebody with this problem. It turns out there's a pretty conceptual answer on which to base a tool, and that is called register coloring, which allocates a fixed number of registers to an arbitrary number of operands. This works by computing an "interference graph" over operands; and two operands that don't "interfere" can be assigned the same register. One could use that to allocate 2^16 available variable names names to an arbitrary number of identifiers, if the interference graph isn't bad enough. My guess is that it is not. YMMV, and somebody still has to build such a tool, needing likely a VB6 parser and machinery to compute such a graph. [Check out my bio].
It seems that Compuware's DevPartner had that kind of code analysis. I don't know if the current version still supports Visual Basic 6.0. (But at least there's a 14-day trial available)
As you work in a legacy codebase what will have the greatest impact over time that will improve the quality of the codebase?
Remove unused code
Remove duplicated code
Add unit tests to improve test coverage where coverage is low
Create consistent formatting across files
Update 3rd party software
Reduce warnings generated by static analysis tools (i.e.Findbugs)
The codebase has been written by many developers with varying levels of expertise over many years, with a lot of areas untested and some untestable without spending a significant time on writing tests.
Read Michael Feather's book "Working effectively with Legacy Code"
This is a GREAT book.
If you don't like that answer, then the best advice I can give would be:
First, stop making new legacy code[1]
[1]: Legacy code = code without unit tests and therefore an unknown
Changing legacy code without an automated test suite in place is dangerous and irresponsible. Without good unit test coverage, you can't possibly know what affect those changes will have. Feathers recommends a "stranglehold" approach where you isolate areas of code you need to change, write some basic tests to verify basic assumptions, make small changes backed by unit tests, and work out from there.
NOTE: I'm not saying you need to stop everything and spend weeks writing tests for everything. Quite the contrary, just test around the areas you need to test and work out from there.
Jimmy Bogard and Ray Houston did an interesting screen cast on a subject very similar to this:
http://www.lostechies.com/blogs/jimmy_bogard/archive/2008/05/06/pablotv-eliminating-static-dependencies-screencast.aspx
I work with a legacy 1M LOC application written and modified by about 50 programmers.
* Remove unused code
Almost useless... just ignore it. You wont get a big Return On Investment (ROI) from that one.
* Remove duplicated code
Actually, when I fix something I always search for duplicate. If I found some I put a generic function or comment all code occurrence for duplication (sometime, the effort for putting a generic function doesn't worth it). The main idea, is that I hate doing the same action more than once. Another reason is because there's always someone (could be me) that forget to check for other occurrence...
* Add unit tests to improve test coverage where coverage is low
Automated unit tests is wonderful... but if you have a big backlog, the task itself is hard to promote unless you have stability issue. Go with the part you are working on and hope that in a few year you have decent coverage.
* Create consistent formatting across files
IMO the difference in formatting is part of the legacy. It give you an hint about who or when the code was written. This can gave you some clue about how to behave in that part of the code. Doing the job of reformatting, isn't fun and it doesn't give any value for your customer.
* Update 3rd party software
Do it only if there's new really nice feature's or the version you have is not supported by the new operating system.
* Reduce warnings generated by static analysis tools
It can worth it. Sometime warning can hide a potential bug.
I'd say 'remove duplicated code' pretty much means you have to pull code out and abstract it so it can be used in multiple places - this, in theory, makes bugs easier to fix because you only have to fix one piece of code, as opposed to many pieces of code, to fix a bug in it.
Add unit tests to improve test coverage. Having good test coverage will allow you to refactor and improve functionality without fear.
There is a good book on this written by the author of CPPUnit, Working Effectively with Legacy Code.
Adding tests to legacy code is certianly more challenging than creating them from scratch. The most useful concept I've taken away from the book is the notion of "seams", which Feathers defines as
"a place where you can alter behavior in your program without editing in that place."
Sometimes its worth refactoring to create seams that will make future testing easier (or possible in the first place.) The google testing blog has several interesting posts on the subject, mostly revolving around the process of Dependency Injection.
I can relate to this question as I currently have in my lap one of 'those' old school codebase. Its not really legacy but its certainly not followed the trend of the years.
I'll tell you the things I would love to fix in it as they bug me every day:
Document the input and output variables
Refactor the variable names so they actually mean something other and some hungarian notation prefix followed by an acronym of three letters with some obscure meaning. CammelCase is the way to go.
I'm scared to death of changing any code as it will affect hundreds of clients that use the software and someone WILL notice even the most obscure side effect. Any repeatable regression tests would be a blessing since there are zero now.
The rest is really peanuts. These are the main problems with a legacy codebase, they really eat up tons of time.
I'd say it largely depends on what you want to do with the legacy code...
If it will indefinitely remain in maintenance mode and it's working fine, doing nothing at all is your best bet. "If it ain't broke, don't fix it."
If it's not working fine, removing the unused code and refactoring the duplicate code will make debugging a lot easier. However, I would only make these changes on the erring code.
If you plan on version 2.0, add unit tests and clean up the code you will bring forward
Good documentation. As someone who has to maintain and extend legacy code, that is the number one problem. It's difficult, if not downright dangerous to change code you don't understand. Even if you're lucky enough to be handed documented code, how sure are you that the documentation is right? That it covers all of the implicit knowledge of the original author? That it speaks to all of the "tricks" and edge cases?
Good documentation is what allows those other than the original author to understand, fix, and extend even bad code. I'll take hacked yet well-documented code that I can understand over perfect yet inscrutable code any day of the week.
The single biggest thing that I've done to the legacy code that I have to work with is to build a real API around it. It's a 1970's style COBOL API that I've built a .NET object model around, so that all the unsafe code is in one place, all of the translation between the API's native data types and .NET data types is in one place, the primary methods return and accept DataSets, and so on.
This was immensely difficult to do right, and there are still some defects in it that I know about. It's not terrifically efficient either, with all the marshalling that goes on. But on the other hand, I can build a DataGridView that round-trips data to a 15-year-old application which persists its data in Btrieve (!) in about half an hour, and it works. When customers come to me with projects, my estimates are in days and weeks rather than months and years.
As a parallel to what Josh Segall said, I would say comment the hell out of it. I've worked on several very large legacy systems that got dumped in my lap, and I found the biggest problem was keeping track of what I already learned about a particular section of code. Once I started placing notes as I go, including "To Do" notes, I stopped re-figuring out what I already figured out. Then I could focus on how those code segments flow and interact.
I would say just leave it alone for the most part. If it's not broken then don't fix it. If it is broken then go ahead and fix and improve the portion of the code that is broken and its immediately surrounding code. You can use the pain of the bug or sorely missing feature to justify the effort and expense of improving that part.
I would not recommend any wholesale kind of rewrite, refactor, reformat, or putting in of unit tests that is not guided by actual business or end-user need.
If you do get the opportunity to fix something, then do it right (the chance of doing it right the first time might have already passed, but since you are touching that part again might as well do it right time around) and this includes all the items you mentioned.
So in summary, there's no single or just a few things that you should do. You should do it all but in small portions and in an opportunistic manner.
Late to the party, but the following may be worth doing where a function/method is used or referenced often:
Local variables often tend to be poorly named in legacy code (often owing to their scope expanding when a method is modified, and not being updated to reflect this). Renaming these in line with their actual purpose can help clarify legacy code.
Even just laying out the method slightly differently can work wonders - for instance, putting all the clauses of an if on one line.
There might be stale/confusing code comments there already. Remove them if they're not needed, or amend them if you absolutely have to. (Of course, I'm not advocating removal of useful comments, just those that are a hindrance.)
These might not have the massive headline impact you're looking for, but they are low risk, particularly if the code can't be unit tested.