What is statistical debugging? - debugging

What is statistical debugging? I haven't found a clear, concise explanation yet, but the term certainly sounds impressive.
Is it just a research topic, or is it being used somewhere, for actual development? In other words: Will it help me find bugs in my program?

I created statistical debugging, along with various wonderful collaborators across the years. I wish I'd noticed your question months ago! But if you are still curious, perhaps this late answer will be better than nothing.
At a very high level, statistical debugging is the idea of using statistical models of program success/failure to track down bugs. These statistical models expose relationships between specific program behaviors and eventual success or failure of a run. For example, suppose you notice that there's a particular branch in the program that sometimes goes left, sometimes right. And you also notice that runs where the branch goes left are fine, but runs where the branch goes right are 75% more likely to crash. So there's a statistical correlation here which may be worth investigating more closely. Statistical debugging formalizes and automates that process of finding program (mis)behaviors that correlate with failure, thereby guiding developers to the root causes of bugs.
Getting back to your original question:
Is it just a research topic, or is it being used somewhere, for actual development?
It is mostly a research topic, but it is out there in the "real" world in two ways:
The public deployment of the Cooperative Bug Isolation Project hunts for bugs in various Open Source programs running under Fedora Linux. You can download pre-instrumented packages and every time you use them you're feeding us data to help us find bugs.
Microsoft has released Holmes, an implementation of statistical debugging for .NET. It's nicely integrated into Visual Studio and should be a very easy way for you to use statistical debugging to help find your own bugs in your own code. I've worked closely with Microsoft Research on Holmes, and these are good smart people who know how to put out high-quality tools.
One warning to keep in mind: statistical debugging needs ample raw data to build good statistical models. In CBI's public deployment, that raw data comes from real end users. With Holmes, I think Microsoft assumes that the raw data will come from in-house automated unit tests and manual tests. What won't work is code with no runs at all, or with only failing runs but no successful counterexamples. Statistical debugging works off of the contrast between good and bad runs, so you need to feed it both. If you want bug-hunting tools without runs, then you'll need some sort of static analysis. I do research on that too, but that's not statistical debugging. :-)
I hope this helped and was not too long. I'm happy to answer any follow-up questions. Happy bug-hunting!

that's when you ship software saying "well, it probably works..."
;-)
EDIT: it's a research topic where machine learning and statistical clustering are used to try to find patterns in programs that are good predictors of bugs, to identify where more bugs are likely to hide.

It sounds like statistical sampling. When you buy a product, there's a good chance that not every single product coming off the "assembly line" has been checked for quality.
Statistical sampling calls for checking a certain percentage of products to almost ensure they're all problem-free. It minimizes the effort at the risk of some problems sneaking through and is absolutely necessary where the testing process is a destructive one - if you carry out destructive testing on 100% of your production line, that's not going to leave much for distribution :-)
To be honest, unless you're checking every single execution path and every single possible input value, you're already doing this in your testing. The amount of effort required to test everything for any but the most simplistic systems is not worth it. The extra cost would make your product a non-compete item.
Note that statistical sampling doesn't just involve testing every 100th unit. There are ways to target the sampling to improve the chance of catching problems. For example, if historical data suggests most errors are introduced at a specific phase, target that phase. If one of your developers is more problematic than others, check his stuff more closely.
From what I can see from a cursory glance at some research papers, statistical debugging is just that - targeting areas based on past history of problems.
I know we already do this for our software. Since any bugs that get fixed have to pass unit and system tests that replicate the problem (and our TDD says these tests should be written before trying to fix the bug), those tests are automatically added to the regression test suite so that those areas that cause more problems naturally are tested more often in the future.

Related

How to go about a large refactoring project? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I am about to start planning a major refactoring of our codebase, and I would like to get some opinions and answers to some questions (I have seen quite a few discussions on similar topics, such as https://stackoverflow.com/questions/108141/how-do-i-work-effectively-with-very-messy-legacy-code, Strategy for large scale refactoring, but I have some specific questions (at the bottom):
We develop a complex application. There are some 25 developers working the codebase. Total man years put into the product to date are roughly 150.
The current codebase is a single project, built with ant. The high level goal of the project I'm embarking on is to modularize the codebase into its various infrastructures and applicative components.
There is currently no good separation between the various logical components, so it's clear that any modularization effort will need to include some API definitions and serious untangling to enable the separation.
Quality standards are low - there are almost no tests, and definitely no tests running as part of the build process.
Another very important point is that this project needs to take place in parallel to active product development and versions being shipped to customers.
Goals of project:
allow reuse of components across different projects
separate application from infrastructure, and allow them to evolve independently
improve testability (by creating APIs)
simplify developers' dev env (less code checked out and compiled)
My thoughts and questions:
What are your thoughts regarding the project's goals? Anything you would change?
do you have experience with such projects? What would some recommendations?
I'm very concerned with the lack of tests - hence lack of control for me to know that the refactoring process is not breaking anything as i go. This is a catch 22, because one of the goals of this project is to make our code more testable...
I was very influenced by Michael Feathers' Working Effectively With Legacy Code . According to it, a bottom up approach is the way to solve my problem - don't jump head first into the codebase and try to fix it, but rather start small by adding unit tests around new code for several months, and see how the code (and team) become much better, to an extent where abstractions will emerge, APIs will surface, etc, and essentially - the modularization will start happening by itself.
Does anyone have experience with such a direction?
As seen in many other questions on this topic - the main problem here is managerial disbelief. "how is testing class by class (and spending a lot of time doing so) gonna bring us to a stable system? It's a nice theory which doesn't work in real life". Any tips on selling this?
Well I guess it's better now than later but you've definitely got a task ahead of you. I was once in a team of three responsible for a refactoring a product of similar size. It was procedural code but I'll describe some of the issues we had that will similarly apply.
We started at the bottom and started easing into it by picking functions that should have been highly reusable but weren't. We'd write a bunch of unit tests on the existing code (none existed at all!), but before long, we faced our first big problem--the existing code had bugs that had been laying dormant.
Do we fix them? If we do, then we've gone beyond a refactoring. So we'd log an issue with the existing code hoping to get a fixed and freshly tested code base, but of course management decided there were more important priorities than fixing bugs that had never surfaced. Understandable.
So we thought we'd try fixing the bugs in our new code. Then we discovered that these bugs in the original code made other code work, so really were 'conceptual bugs' rather than 'functional bugs'. Well maybe. There were occasional intermittent spasms in the original software that had never been tracked down.
So then we changed tack and decided to keep the bugs in place, as a true refactoring should do. It's easy to unintentionally introduce bugs, it's far harder to do it intentionally!
The next problem was that the code was in such as mess that the initial unit tests we wrote had to substantially change to cater for the refactoring. In other words, two moving targets. Not good. Just writing the tests was taking ages and lost us belief in the worthiness of the project. It really was something you just wanted to walk away from.
We found in the end we really had to tone down the extent of the refactoring if we were going to finish this millennium, which meant the codebase we dreamed of wouldn't be achieved. We declared that the most feasible solution was just to clean and trim the code and at least make it conceptually easier to understand for future developers to modify.
The reduced benefits of the limited refactoring was deemed not worth the effort by management, and given that similar reliability issues were being found in the hardware platform (embedded project), the company decided it was their chance to renew the entire product, with the software written from scratch, new language, objects. It was only the extensive system test specs in place from the original product that meant this had a chance.
Clearly the absence of tests is going to make people nervous when you attempt to refactor the code. Where will anybody get any faith that your refactoring doesn't break the application? Most of the answers you'll get, I think, will be "this is gonna be very hard and not very successful", and this is largely because you are facing a huge manual task and no faith in the answer.
There are only two ways out.
Build a bunch of tests. Unfortunately, this will cost a lot of time and most managers don't see any value; after all, you've gotten along without them so far. Pointing back to the faith question won't help; you're still using a lot of time before anything useful happens. If they do let you build tests, you'll have the problem of evolving the tests as you refactor; they may not change functionality one bit, but as you build new APIs the tests will have to change to match the new APIs. That's additional work beyond refactoring the code base.
Automate the refactoring process. If you apply trustworthy automated transformations, you can argue (often unsuccessfully) that the refactored code preserves the original system function. The way to beat the unsucessful argument is to write those tests (see first method) and apply the refactoring process to the application and the tests; as the application changes structures, the tests have to change too. But they are just application code from the point of view of automated machinery.
Not a lot of people do the latter; where do you get the tools that can do such things?
In fact, such tools exist. They are called program transformation tools and are used to carry out massive transformations on code.
Think of these as tools for literally refactoring in the large; because of scale,
they tend not to be interactive.
It does take effort to configure them for the task at hand; you have to write custom rules to accomplish your custom desired result. You likely can't do this in a week, but this is a lot less work than manually modifying a large system. And you should consider that you have 150 man-years invested in the existing software; it took that long to make the mess. It seems reasonable that "some" effort small in comparison should be OK.
I only know of 3 such tools that have a chance of working on real code: TXL, Stratego/XT, and our tool, the DMS Software Reengineering Toolkit. The first two are academic products (although TXL has been used for commercial activities in the past); DMS is commercial.
DMS has been used for a wide variety of large-scale software anaysis and massive transformation tasks. One task was automated translation between languages for the B-2 Stealth Bomber. Another, much closer to your refactoring problem, was automated architecting of a large-scale component-based system C++ for componentts, from a legacy proprietary RTOS with its idiosyncratic rules about how components are organized, to CORBA/RT in which the component APIs had to be changed from ad hoc structures to CORBA-style facet and receptacle interfaces as well as using CORBA/RT services in place of the legacy RTOS services. (These tasks were both done with 1-2 man-years of actual effort, by pretty smart and DMS-savvy guys).
There's still the test-construction problem (Both of these examples above had great system tests already).. Here I'm going to go out on a limb. I believe there is hope in getting such tools to automate test generation by instrumenting running code to collect function input-output results. We've built all kinds of instrumentation for source code (obviously you have to compile it after instrumentation) and think we know how to do this. YMMV.
There is something you do which is considerably less ambitious: identify the reusable parts of the code, by finding out what has been reused in the code. Most software systems contain a lot of cloned code (our experience is 10-20% [and I'm surprised by the PHP report of smaller numbers in another answer; I suspect they are using a weak clone detector). Cloned code is a hint of a missing abstraction in the application software. If you can find the clones and see how they vary, you can pretty easily see how to abstract them into functions (or whatever) to make them explicit and reusable.
Salion Inc. did clone detection and abstraction. The paper doesn't explore the abstraction activity; what Salion actually did was a periodic review of the detected clones, and manual remediation of the egregrious ones or those that made sense into (often library) methods. The net result was the code base actually shrank in size and the programmers became more effective because they had better ("more reusable") libraries.
They used our CloneDR, a tool for finding clones by using the program syntax as a guide. CloneDR finds exact clones and near misses (replacement of identifiers or statements) and provides a specific list of clone locatons and clone paramerizations, regardless of layout and comments. You can see clone reports for a number of languages at the link. (I'm the originator and author of CloneDR among my many hats).
Regarding the "small clone percentage" for the PHP project discussed in another answer: I don't know what was being used for a clone detector. The only clone detector focused on PHP that I know is PHPCPD, which IMHO is a terrible clone detector; it only finds exact clones if I understand the claimed implementation. See the PHP example at our site for comparative purposes.
This is exactly what we've been doing for web2project for the past couple years.. we forked from an existing system (dotproject) that had terrible metrics like high cyclomatic complexity (low: 17, avg: 27, high: 195M), lots of duplicate code (8% of overall code), and zero tests.
Since the split, we've reduced duplicate code (2.1% overall), reduced the total code (200kloc to 155kloc), added nearly 500 unit tests, and improved cyclomatic complexity (low: 1, avg: 11, high: 145M). Yes, we still have a ways to go.
Our strategy is detailed in my slides here:
http://caseysoftware.com/blog/phpbenelux-2011-recap - Project Triage & Recovery; and here:
http://www.phparch.com/2010/11/codeworks-2010-slides/ - Unit Testing Strategies; and in various posts like this one:
http://caseysoftware.com/blog/technical-debt-doesn039t-disappear
And just to warn you.. it's not fun at first. It can be fun and satisfying once your metrics start improving but that takes a while.
Good luck.

What do you think about the omnipresent "Test, Test, Test!" principle?

In the old days programming used to involve less guesswork. I would write some lines of code and be 100% certain about what the code does and what it does not at a glance. Errors were mostly typos, but not about the functionality.
The last years I believe there is a trend for this "trial-and-error" programming : write the code (as if in draft), and then debug iteratively until the program's behavior appears to comply with the requirements. Test, and test again, and then again.
Funny thing is, in my Visual Studio the "Run" button has been replaced by a button labelled "Debug" (= I know you have some bugs!). I have to admit that in several apps that I write I cannot guarantee a bug-free code.
What do you think ? Or maybe our systems are now overly complicated (browser/OS/Service Pack compatibilities, etc etc) and this justifies testing on all types of environments.
I've experienced the opposite, actually. Whereas it used to be a case of running until it worked, I now unit test until the tests pass... and this seems to be at least a reasonably common transition, as far as I can see.
I have to say that code which worked first time with only typos has never been the norm in my experience. The difference is that now I can find the problems much more quickly, and also spot if old problems come back. I can sometimes manage pretty short and simple bits of code with no errors (and posting on Stack Overflow has improved that ability) but large, complex systems? Heck no.
To answer the title of your post - the "test, test, test" principle is a good one, in my view... but I don't associate that with running the whole program repeatedly. I associate it with running unit tests frequently. I rarely need to use the debugger for unit tests - usually a failure makes the cause suitably obvious by inspection, because only a small amount of code is being tested.
The one word answer is "Complexity". The real answer is "Unnecessary Complexity"!
The accounting principles has not changed for the past 30 years. Why then is writing an accounting system is so much more difficult today? It is good to have a Graphic User Interface but do we have to go overboard?
Software development has been caught in a vicious circle for many years. The complexity is feeding itself and instead of reducing it we simply hide it under layers and layers of wrappers. Eventually something is going to give.
When we favor form over function, we have to pay the price.
Could it be that in later years developers have come to the realization that the "100% certainty" might not actually be correct? Developing software is very complex, and even though the tools have evolved over the years, so has our realization that writing good code is hard. True, debugging and automated unit tests have made us more productive, but we still produce bugs, just as we did back then, only now we have different tools to catch them with.
You may write code that you think you know 100% what it does and does not do, but there is always that edge case that you haven't thought of or the odd exception thrown that you don't expect. Some times trial-and-error programming can be a helpful tool to narrow down a problem, with the debuggers help.
Its important to know what tools are available to you to help produce code with minimal bugs.
I have found that the Test-Test approach helps me design the code. Sometimes the work that has to be done is too complex to do it all in one. Testing forces me to split it into smaller parts and as I solve these I am able to put them together into a larger whole.
I think the advantage comes in an indirect way: When you embrace tests and unit tests, you have to write your application in such a way that you can actually write tests:
Classes need to be written in such a way that you can instantiate a single object without the whole application and OS around it, but just a few helper objects. This means you need to minimize the dependencies, and make all communication to the surrounding system explicit.
Implementing the test cases means that you have to find a minimum sequence of commands and calls that makes your class do something meaningful. This often points to awkward design decisions, or shows you that classes are very difficult to use for certain purposes.
All in all, when you embrace tests, you end up with a system that has a minimum of interdependencies between its components, and the test cases serve as documentation of how to use your components.
Testing (executing your system) tells you something about "the presence of bugs but NOT about the absence of them" (afaik this term is coinced by dijkstra). It points to the direction that the strength of your test-suite is the key of testing: "You have so many test cases, that you can say, that many bugs do not exist. This implies that big parts of your software work as expected".
Some examples for having a strong/mighty test-suite:
A lot of code is executed by your unit tests (the traditional coverage term)
You have no false-negative tests (test which show green but in fact should be red). False negative tests are evil, because they give you a wrong sense of test-case quality. For details of good test-asserts and false-negatives see also blog-entry#1 and blog-entry#2.
The requirements are well understood (I have seen a lot of cases where an automated test was testing the wrong thing and the developer misunderstood the requirement from business). For the developer is was green, but for business the system was not working as expected (another kind of false-negative example but on a higher level).
In a sense the correctness of a program is only proven, when it is done with mathematical proofs (which only pays off for life-critical and money-intense systems). Still you can achieve a lot with automated testings (apart from unit-testing, automated integration testing always helped a lot).
Regarding debugging: I use debugging to as often as I used to be, but sometimes when adding new functionality to code (my new test-case shows green) I break other test-cases. By the assert I instantly see that something went wrong, but still didn't locate the bug. For locating the bug debugging is still helpful (with the red test-case I execute the problematic code-paths, with the debugger I locate the bug).
If you're interested in test-automation have a look at masterpiece xUnit Test patterns.
I've read one book ("TDD by example" by Kent Beck) which indeed seems to take that "trial and error" approach to an extreme: but it's more like "make the unit tests work". Still, I couldn't get myself to finish this book - a rare occurence, especially since I really hoped to get a better understanding. Still, committing obviously imbecile code to be improved later makes me shiver.
Science: Automated tests have their advantages. However, they are not the silver bullet they are claimed to be. No single test method is sufficient to findenough defects, and other methods have a better detection rate.
Gut feel: Our problems are facets of ever-increasing complexity. Complexity highly correlates with the amount of code we have to manage. In this light, TDD attempts to solve the problems of to much code by writing even more code.
Advantages: We now have an established formalism to make testing repeatable, accountable and immediately documented. It is definitely a way out of the "works on my machine" and "strange, it worked yesterday, I'll give you the latest DLL" trap.
I currently practice Test Driven Development (TDD), or at least write many unit tests to verify that most/all of my code behaves the way I expect it to behave. Taking this approach forces me to look at my program from the perspective of the consumer. Also, as I write tests, I often think of boundary limits, additional scenarios that I didn't originally envision, etc.
I've now come to the point where I'm afraid to make changes to older programs, as I'm afraid that I'll break something. Regression testing is onerous, compared with running a suite of unit tests.

TDD as a defect-reduction strategy

Can TDD be successful as a defect-reduction strategy without incorporating guidance on test case construction and evaluation?
IMO, my answer would be no. For TDD to be effective, there has to be guidelines around what is a test and what it means to have something be reasonably tested. Without a guideline, there may be some developers that end up with tons of defects because their initial tests cover a very small set of inputs,e.g. only the valid ones, which can cause the idea of using TDD to become worthless.
Test driven development can reduce defects in a QA cycle simply because testing allows developers to find defects prior to releasing their code to the QA team.
But without guidance on how to test you really won't be able to create any kind of long-term benefit since haphazard testing will only prevent defects by blind luck. Good tests based on good guidance can go a long way towards reducing defects.
if you don't have tests to reproduce defects, how do you know that "defect reduction" has taken place?
of course you do have tests - they're just manual, and thus tedious and time-consuming to reproduce...
Here's a study (warning: link to PDF file) done by microsoft on some of their internal teams.
A quote from it:
The results of the case studies indicate that the pre-release defect density of the four products decreased between 40% and 90% relative to similar projects that did not use the TDD practice. Subjectively, the teams experienced a 15–35% increase in initial development time after adopting TDD
That's the only actual empirical study done on TDD/Unit testing that I'm aware of, but there are plenty of people (including myself) that will anecdotally tell you that TDD (and unit testing in general) will definitely provide an increase in the quality of your code.
From my own experience, there is definitely a reduction in the number of defects, but the numbers feel like they would be far less than even the 40% from the Microsoft study; This is (again, based solely on what I've seen) primarily because most corporate developers have little to no experience with Unit Testing (let alone TDD), and will invariably do a bad job of it while they are learning. Actually learning how to do TDD well requires at least a solid year of experience, and I've never worked in (or even seen) a team which actually had a full complement of developers with that experience.
You may want to pickup a copy of Gerard Meszaros' xUnit Test Patterns. Specifically, Chapter 5 might apply most directly to your question where it covers the Principles of Test Automation. Some of those principles that I think apply to your scenario where there needs to be some sort of guidance, common interest, or some sort of implied do this or fear the wrath of :
Principle: Communicate Intent
Tests need to be easy to maintain, readily apparent what the test is doing.
Principle: Keep Tests Independent
Small tests that cover one small piece. Tests should not interfere with each other.
Principle: Minimize Test Overlap
Need to design tests that cover a specific piece, and do not create tests that exercise the same paths repeatedly.
Principle: Verify One Condition Per Test
This one seemed simple enough for me, but is one of the most challenging in my experiences for people to grasp. I may write tests that have some multiple asserts, but I try to keep all those together around the specific area. When it comes to hunting down failures and other test issues, it is MUCH easier to fiddle with a single test that is testing a specific path instead of several different paths all clumped into a single test method.
Further relating to my experiences, if we want to talk about the corporate developer, I have seen some folks that are interested and take the initiative to learn something new, but more often than not, you have folks that like to go with the flow, and like to have things laid out for them. Without some sort of direction, be it a mandate from a senior engineer-type, or some sort of joint-team discussions (see Practices of an Agile Developer for some ideas such as lunch time meetings once a week), I think your chance of success would be limited.
In a team situation, where your code is likely to be used by someone else, the tests have a fringe benefit that can reduce defects without necessarily even improving anyone's code.
Where documentation is poor (which during development is "often"), the tests act a crib for how you expect your code to be called. So, even in cases where the code is really very fragile, TDD can still reduce the number of defects raised against the end-product -- by ensuring your colleagues can see well-written tests before they can use your code, you've ensured they know how you intend your code to be used before they call it. They are thus less-likely to call your code in an unexpected sequence / without having configured something you expected (but forgot to write a check for) as a prerequisite. Thus they are less likely to trigger the failure condition, and you are less likely to see them or the (human) test team raising a defect because something crashed.
Of course, whether you see that "there's a hidden bug in there, it's just not being called yet" as a problem itself is another good question.

Software Rewrite-vs-Running Cost Analysis

The IT department I work in as a programmer revolves around a 30+ year old code base (Fortran and C). The code is in a poor condition partially as a result of 30+ years of ad-hoc poorly thought out changes but I also suspect a lot of it has to do with the capabilities of the programmers who made the changes (and who incidentally are still around).
The business that depends on the software operates 363 days a year and 20 hours a day. Unfortunately there are numerous outages. This is the first place I have worked where there are developers on call to apply operational code fixes to production systems. When I was first, there was actually a copy of the source code and development tools on the production servers so that on the fly changes could be applied; thankfully that practice has now been stopped.
I have hinted a couple of times to management that the costs of the downtime, having developers on call, extra operational staff, unsatisifed customers etc. are costing the business a lot more in the medium, and possibly even short term, than it would to launch a whole hearted effort to re-write/refactor/replace the whole thing (the code base is about 300k lines).
Ideally they'd be some external consultancy that could come in and run the rule over the quality of the code and the costs involved to keep it running vs rewrite/refactor/replace it. The question I have is how should a business go about doing that kind of cost analysis on software AND be able to have confidence in that analysis? The first IT consultants down the street may claim to be able to do the analysis but how could management be made to feel comfortable with it over what they are being told by internal staff?
We recently decided to completely rewrite large portions of our business code from scratch, and it has not gone as well as we had hoped. I've seen a lot of quotes saying you should never try to rewrite anything from scratch, and now I see why. I would recommend starting small - don't try to rewrite the whole thing at once. Identify the large problem areas and focus on refactoring small portions of the system at a time. Since there is 30+ years worth of work in the system, it will take a long time to get it back to a reasonable state. We had about 5-8 years worth of work to rewrite, and it has been difficult. I can't imagine 30+ years of work!
First, the profile of the consultant you need is very specific. Unless you can find someone who worked in a similar domain with the same languages, don't hire him.
Second, there's a 99% probability (I like dramatic numbers) the analysis will go as follow:
Consultant explores the application
Consultant does understand 10% of the application
Time's up, time for the report
Consultant advices a complete rewrite (no refactoring, plain rewrite)
So you may as well make the economy of what the consultant will cost.
You have only two solutions here:
Keep with the actual source code but determine proper methods to fix problems so that you have a very long run refactoring that is progressly made by those who know the application
Get a secondary team to make a new application to replace the old one
If I talk about a secondary team, it's because you cannot bring just one architect to make the new application and have the old team working with him:
They're too busy on the old application
There will be frictions because the newcomer will undoubtedly underestimate the task at hand
I talk from experience, believe me.
If you go the "new application" way don't put your hopes too high. You'll end up with an application that has less than half the functionalities of the current one, simply because you cannot cram 30+ years of special case and exceptional situation fixes into a freshly design software.
Oh, also, if your developers happen to tell you they have a plan, by all means, hear them out. They most probably know what they are talking about.
The first thing that comes to mind is that you are prematurely addressing the rewrite/refactor/replace argument. The first step two steps I would recommend would be:
Unit tests
QA
It's well within engineering scope to implement these. Unit tests are an essential preliminary step before any reasonable refactor or rewrite could possibly take place. By 'unit test' I mean wrap each function call with corresponding code that proves the code works for all known conditions. In complex retrofits this may not actually happen at the most granular level but any automated tests will help immensely.
And QA - have an independent (and aggressive) quality assurance team that rigorously tests beta releases before production. Their test plans and test procedures become essential for any kind of replacement effort.
Once you've got the code under control, then you are in a position where the business can reasonably consider massive changes.
Just a note about your comment about external consultants - no consultancy will ever care enough about the code to provide realistic quality assurance. QA ends up being married to the hip of business defending the company bottom line. It's an internal function ultimately and an external consultant can't provide much more than getting you started really.
I think that your description provides all of the necessary information on code quality (lack thereof). The fact that so many support resources are required also indicates the high costs involved with maintaining the existing system.
As I answered here, a good approach to consider is refactoring one piece of the system at a time until everything works at an acceptable level. I agree with Joel re not throwing away existing code (see Things You Should Never Do. Parts of your code work, so you should leave those in place whenever possible, and focus on the sections that lead to downtime.
Andy also makes a great point about starting small as well.
Another thing to try, is reviewing the processes around the system. When you do this, you should try to determine what failure situations are caused directly or indirectly by user action?, are there configuration or environment problems? If you are having trouble fixing the code directly, then you can still prop it up by dealing with external issues more effectively.
Read the book Working Effectively with Legacy Code (also see the short PDF version) and surround the code with automated tests, as instructed in that book.
Refactor the system little by little. If you rewrite some parts of the code, do it a small subsystem at a time. Don't try to make a Grand Redesign.
The code has been around for 30 years?
Development paradigms have shifted substantially in the last three decades in many ways, and most relevant to your predicament, I feel, is in terms of the amount of time (in man days) required to create something to input->process->output something.
300,000 lines of code 30 years ago, could probably fit into 100,000 lines or less today, and expending fewer man hours(?) This could seem optimistic/ridiculous to some, but on the other hand is achievable, depending on the type of application in question. You have given no indication as to the classification of system - is it a real-time manufacturing process control system of sorts with sensors and actuators tied to it? An airline booking system ? Does it post-process some backlog of data? In other words could it be rebuilt in something like Java and quickly with an agressive, smallish team? Have the requirements been documented, and if so do they need updating or redeveloping from scratch? Is human safety a factor?
Just a quick sanity check, I think whether or not you should rebuild depends on (any order means the same thing):
Number of code dudes required.
Level of expertise of said dudes.
Which languages do not fit.
Which languages do fit.
How much it costs to use chosen language(s) them in terms of hardware and software.
How much does the business depend on this to stay alive.
Is it really too much downtime, or are you just nitpicking? (maybe they really don't care, but pretend to).
Good luck with that!

Debugging is a bad smell - how to persuade them?

I've been working on a project that can't be described as 'small' anymore (40+ months), with a team that can't be defined as 'small' anymore (~30 people). We've been using Agile/Scrum (1) practices all along, and a healthy dose of TDD.
I'm not sure if I picked this up from Agile or TDD, more likely a combination of the two, but I'm now clearly in the camp of people that looks at debugging as a bad smell. By 'debugging' I'm not referring to the more abstract concept of figuring out what might be wrong with the system, but the specific activity of running the system in Debug mode, stepping through the code to figure out details that are otherwise inscrutable.
Since I'm fairly convinced, this question is not about whether debugging is a bad smell or not. Rather, I'd like to know how I can persuade my team-mates about this.
People that believe debugging mode is the 'standard' mode tend to write code that can be understood only by debugging through it, which leads to a lot of time wasted since every time you work an item on top of code developed by someone else, you get to first spend a considerable amount of time debugging it (and, since there's no bug involved.. the term is becoming increasingly ridiculous) - and then silos happen. So I'd love to convince a few of my team-mates that avoiding debug mode is a Good Thing (2). Since they are used to live in Debug mode, however, they don't seem to see the problem; to them, spending hours debugging someone else code before they even start doing anything related to their new item is the norm; they don't see anything wrong with it. Plus, as they spend time 'figuring it out' they know eventually the developer that worked that area will become available and the item will be passed on to them (leading to yet another silo).
Help me come up with a plan to turn them from the Dark Side !
Thanks in advance.
(1) Also referred to as SCRUM (all caps). Capitalization arguments aside, I think an asterisk after the term must be used since - unsurprisingly - our organization 'tweaked' the Agile and Scrum process to fit the perceived needs of all stakeholders involved. So, in all honesty, I won't pretend this has been 100% according to theory, but that's beside the point of my question.
(2) Yes, there will always be times when we'll have to get in debug mode, I'm not trying to absolutely avoid it, just.. trying to minimize the number of times we have to dive into it.
If you want to persuade your coworkers that your programming practices are better, first demonstrate by your productiveness that you are more effective than they are, at least for some tasks. Then they'll believe you when you explain how you get so much done.
It's also sometimes easier to focus on something concrete. Do your coworkers even talk in terms of "code smell"? Perhaps you could focus on specifics like "When the ABC module fails, it takes forever to debug it; it's much faster to use technique XYZ. Here, let me demonstrate." Then afterwards you can mention your basic principle, which is yeah the debugger is a useful tool, but there's usually other more useful ones.
This is a cross-post, because the first time around it was more of an aside on someone else's answer to a different question. To this question it's a direct answer.
Debugging degrades the quality code of
the code we produce because it allows
us to get away with a lower level of
preparation and less mental
discipline. I learnt this from an
accidental controlled experiment in
early 2000, which I now relate:
I took on a contract as a Delphi
coder, and the first task assigned was
to write a template engine
conceptually similar to a reporting
engine - using Java, a language with
which I was unfamiliar.
Bizarrely, the employer was quite
happy to pay me contract rates to
spend months becoming proficient with
a new language, but wouldn't pay for
books or debuggers. I was told to
download the compiler and learn using
online resources (Java Trails were
pretty good).
The golden rule of arts and sciences
is that whoever has the gold makes the
rules, so I proceeded as instructed. I
got my editor macros rigged up so I
could launch the Java compiler on the
current edit buffer with a single
keystroke, I found syntax-colouring
definitions for my editor and I used
regexes to parse the compiler output
and put my cursor on the reported
location of compile errors. When the
dust settled, I had a little IDE with
everything but a debugger.
To trace my code I used the good old
fashioned technique of inserting
writes to the console that logged
position in the code and the state of
any variables I cared to inspect. It
was crude, it was time-consuming, it
had to be pulled out once the code
worked and it sometimes had confusing
side-effects (eg forcing
initialisation earlier than it might
otherwise have occurred resulting in
code that only works while the trace
is present).
Under these conditions my class
methods got shorter and more and more
sharply defined, until typically they
did exactly one very well defined
operation. They also tended to be
specifically designed for easy
testing, with simple and completely
deterministic output so I could test
them independently.
The long and the short of it is that
when debugging is more painful than
designing, the path of least
resistance is better design.
What turned this from an observation
to a certainty was the success of the
project. Suddenly there was budget and
I had a "proper" IDE with an
integrated debugger. Over the course
of the next two weeks I noticed a
reversion to prior habits, with
"sketch" code made to work by
iterative refinement in the debugger.
Having noticed this I recreated some
earlier work using a debugger in place
of thoughtful design. Interestingly,
taking away the debugger slowed
development only slightly, and the
finished code was vastly better
quality particularly from a
maintenance perspective.
Don't get me wrong: there is a place
for debuggers. Personally, I think
that place is in the hands of the team
leader, to be brought out in times of
dire need to figure out a mystery, and
then taken away again before people
lose their discipline.
People won't want to ask for it
because that would be an admission of
weakness in front of their peers, and
the act of explaining the need and the
surrounding context may well induce
peer insights that solve the problem -
or even better designs free from the
problem.
So, FOR, I not only agree with your position, I have real data from a controlled experiment to support it. It is, however, a rather small sample. More elaborate tests are required before my conclusions are supportable.
Why don't you take what I've said to your team and suggest trials. You have more data than they do (I just gave it to you) and in order to have a credible basis for disagreeing with you they basically have to test the idea, and the only way to do that is to give your idea a go.
You should be ready for it to all fall apart, though, because the whole thing is predicated on the assumption that the developers have the talent and experience to rise to the challenge of stronger design in the absence of step-through debugging.
Step-through debugging was created to make debugging easier. The direct effect of lowering the bar is that people with less talent can participate - if you build a tool that even jackasses can use, you will get jackasses using it -- a lot of them, if the newly accessible activity is well-remunerated.
This causes an exodus of people with talent because they generally use that talent to do rare and precious things in order to be well paid without working too hard, and the market doesn't want to pay for excellence because it cannot distinguish talent well enough to know when paying for it is justified.
Another thought: more recent work with problems on production servers, where it was impossible to install a debugger, has shown the importance of having a codebase for which maintenance doesn't depend on the availability of a debugger. Code that's grown in the absence of debuggers is much less hassle. Choose not to use them when you can change your mind, and then when you can't change your mind it won't be so awful.
Since I'm fairly convinced, this question is not about whether debugging is a bad smell or not.
Well, your local Church might be more appropriate place for your question then.
That aside, convince them by arguments. You might want to reconsider your fundamentalist stance, however, because this is the very opposite of persuasive. One thing you might want to do is drop the term “debugging” in your whole discussion and replace it by “stepping through the code” or the likes, emphasizing that you oppose the uninformend guesswork/patchwork practice of probing that you condemn rather than an informed reflection about the code.
(I would still disagree with you, but that's besides the point since you didn't want a discussion.)
I think the real problem here is
People that believe debugging mode is
the 'standard' mode tend to write code
that can be understood only by
stepping through it
This, if true, should be self evidently wrong and there should be no need to discuss it. If it's not evident it's because they don't see how the badly written code could be improved. Show them, do code reviews where you show how that code could be refactored in a way that is clear without stepping through it.
Code stepping will automatically diminish once better code is written, it just doesn't work the other way around. People will still write bad code and if they avoid stepping through it that will only lead to more wasted time (damn I wish I could step through this spaghetti mess), not to better code.
There is something wrong here, but it's hard to put my finger on it. Perhaps the real issue is that the code has other smells that make it difficult to readily understand. I agree that with TDD one ought to use the debugger less rather than more, since you'll be developing the code in small increments. But, if you can't look at the code and understand it, perhaps it's because the design is too coupled -- there are too many interrelated classes required to make things work.
If the code really needs to be so complex that observation won't suffice, then maybe you need to invest in some good commenting, explaining what is happening -- though I would prefer to see things refactored to the point where comments are not needed. My suspicion is that the debugger may be a symptom rather than the problem.
I know that for me, switching from traditional, code-first development to test-first development has resulted in less time spent debugging...and it's not something I miss. Typically I'll only involve the debugger when its not obvious why the code I just wrote to pass a test, didn't.
This is going to sound like the argument you said you don't want to have, but I think if you want to convince your teammates, you're going to have to make a stronger case. I don't understand your objection. I frequently step through code I'm trying to understand with the debugger. It's a great way to see what's going on. You have not established your claim that people who use the debugger in this way tend to write code which is otherwise difficult to understand. The only convincing way to do so would be through some kind of case/control study which tried to measure and compare the readability of code written by people with varying approaches to the debugger. And you have not even told a plausible story explaining why you think using a tool to understand code execution tends to lead to sloppier code construction. For me it's a complete non sequitur.
A "plan" to convince them of the advantage of another approach is by establishing metrics linked to the number of time you debug the same function for different bugs.
By analysis the trend of that metric, you may convince them that non-regression tests are more useful to spend time writing, and will help them to debug more efficiently.
That way, you do not write completely off the "debug" habit, but you convince them of establishing a solid set of test, allowing them to focus on really useful debug session, if needed.
Should you consider this course of action (metrics), you should know its implementation involves the all hierarchy (stakeholder, project manager, architect, developers). They all need to be implicated in those metrics in order to act on them.
Regarding developers, you could try to suggest:
some new ways of closing a bug case (close it only with the test scenario played to reproduce that bug, meaning they need an independent test in order to, if needed, launch their debug session)
a clear relationship between those metrics and their evaluation by the management (it would be a bad practice to debug over and over the same function)
a larger involvement in architectural decisions: sometimes, knowing some functional or applicative features rather than just classes and code can incite a developer to think more in term of black-box test rather than white-box (which can more easily lead to debug session)
a participation into "operational architecture" process (where you need to deploy your app, and make full front-to-back integration test). Again, a larger picture of the all system can help a developer to get more interested in features rather than 'lines of code'
I think a better phrasing of this question would be "Is non-TDD a code smell?" TDD seems to lead to less time spent in the debugger due to more time spent writing/failing/passing tests. Without TDD, you are more likely to spend time in the debugger to diagnose errors.
At least within Visual Studio, using the debugger is not that painful, so the challenge for you would be to explain to your teammates how TDD would make their development more enjoyable, productive and successful. Just avoiding the debugger is probably not reason enough for a team to switch their development methodology.
Right on roadwarrior.
debugging isn't the problem, it's poorly commented and or documented code and bad archetecture. I work on a smaller team but when a bug does surface, I do step through the code. frequently it's a very small job because the app is well planned out and the doc's on the code are clear.
That said lets get to my point. Want the team to not debug... comment, comment comment. Nothing beats down the urge to debug faster. Sure they'll still do it, but they'll be more likely to step over well documented code.
Oh and though it should go without saying, I'll do it anyway. don't have bugs in your code. :)
I agree with those above who expressed the relative irrelevance of this "debugger issue."
IMO, the 2 most important goals of a developer are:
1) Make the software do what it's supposed to do.
2) Write the code so that a maintenance developer 2 years down the road enjoys the experience of changing existing or adding new features.
Before you make a plan, you should decide how important this change is to you. Although I agree that debugging is a smell, it is also a very well accepted and ingrained practice for developers, so convincing them that they should stop doing it won't be easy or quick - and for good reasons. How much energy do you want to put into this topic?
Second, why do you want to persuade them in the first place? If your motivation is to help them, is it really their top priority problem? When you help people in ways they want to be helped, change becomes easy.
Once you have decided that you want to go on with your change initiative, you need to take into account that different people are convinced by different things. Some people will already be convinced by trying something new and exciting. Some will be convinced by numbers (metrics). Some by getting told about it while eating their favorite type of cookie (seriously!), some by hearing about it from their favorite guru. Some by reading about it in a magazine. Some by seeing that "everyone else is doing it, too". Etc. pp.
There is an insightful interview with Linda Rising on this topic at InfoQ: http://www.infoq.com/interviews/Linda-Rising-Fearless-Change. She can say it much better than me. The book is quite good, too.
Whatever you do, don't press too much, but also don't give up. Change can happen - especially if you take resistance as a resource -, and sometimes it happens at unexpected times, so always keep a sense of wonder.
#FOR : You have a second problem too, here it is :
sadly it doesn't seem the devs are interested in being more productive (they get paid the same anyway)
How do you intend to make them want to be more productive when there is nothing (visible) for them to gain?
Designing software by debugging is a good practice.
The number of environments supporting this way of developing is very small: the best known is Smalltalk. In Smalltalk, you can write a test describing your objects protocol without the methods being implemented. Running this test will then trigger the debugger, and you can add the method to the right class in the debugger, and can continue stepping through the code until all functionality is implemented and the test is green.
This needs a compiler to be available at run-time, and first-class invocations. It offers a very short feedback cycle, and is one of the primary reasons for Smalltalks' productivity

Resources