When software problems reported are not really software problems - error-reporting

Apologies if this has already been covered or you think it really belongs on wiki.
I am a software developer at a company that manufactures microarray printing machines for the biosciences industry. I am primarily involved in interfacing with various bits of hardware (pneumatics, hydraulics, stepper motors, sensors etc) via GUI development in C++ to aspirate and print samples onto microarray slides.
On joining the company I noticed that whenever there was a hardware-related problem this would cause the whole setup to freeze, with nobody being any the wiser as to what the specific problem was - hardware / software / misuse etc. Since then I have improved things somewhat by introducing software timeouts and exception handling to better identify and deal with any hardware-related problems that arise eg PLC commands not successfully completed, inappropriate FPGA response commands, and various other deadlock type conditions etc. In addition, the software will now log a summary of the specific problem, inform the user and exit the thread gracefully. This software is not embedded, just interfacing using serial ports.
In spite of what has been achieved, non-software guys still do not fully appreciate that in these cases, the 'software' problem they are reporting to me is not really a software problem, rather the software is reporting a problem, but not causing it. Don't get me wrong, there is nothing I enjoy more than to come down on software bugs like a ton of bricks, and looking at ways of improving robustness in any way. I know the system well enough now that I almost have a sixth sense for these things.
No matter how many times I try to explain this, nothing really penetrates. They still report what are essentially hardware problems (which eventually get fixed) as software ones.
I would like to hear from any others that have endured similar finger-pointing experiences and what methods they used to deal with them.
UPDATE
Some great responses here that pretty much sing from the same hymn sheet: be more descriptive. I guess identifying the command and bombing out cleanly when the hardware fails was the first stage, but was still not quite enough. The next stage will be to map what are to the layman fairly meaningless PLC commands to something more suggestive. "PLC Command M71 timeout" becomes "Failure to initialize syringe system. Check adequate vacuum reached" and so on...

Perhaps when reporting the problem either as a message to the user or an entry in the log file you need to make it explicitly clear that it's the hardware that's at fault:
"Stepper motor not responding".
Unfortunately, because it's the software that people see and interact with they assume that the software is all that there is.

You could try labeling the error messages as "HARDWARE PROBLEM". Might get your point across.

There's no such thing as non-software problem in a system. Software is the boss, and the boss cannot blame failure for the tools.
If underlying hardware is malfunctioning, it should report to the user what exactly went wrong with which component. If it didn't, it is a software problem.
For example, TCP disconnection means it have to reconnect. If it's an FPGA response, it should tell exactly what were the inputs and the outputs to the user, and who is to blame. If not, this is a software problem.

"If what you're doing isn't working, stop doing it and try something else"
As pointed out in other comments, it's a communcation and to a lesser extent, perception problem. People will blame what they don't understand FAR more easily to make themselves feel like a victim. A motor could be sparking, throwing fire and explode from someone grossly overloading a feeder (with EVERY warning not to plastered all over it) -- but if that software stops responding, guess what caused the problem?
Since giving every one of your users a EE and CS class or 10 is completely out of the question, fall back on good ole communication. The basis of which is 4 things (mostly my opinion) in no particular order - What you observe, what you feel, what you think and what should be done. So with this idea, I'll put into practice by giving this response.
It seems like your users like to blame software when some of the underlying hardware is the key issue (observe). Trying to explain this with the users about this is impractical and a waste of time, that's not their job and most of them won't care (feel). What you may want to try is talking with the engineering team about the parts they're using and look into things that work better with software in general. Maybe there's some constraints of the inputs that were never considered? (think) Changing out the hardware or just a better understanding of it might be the real answer as well as more targeted errors and feedback to those users (done).

I agree with the other posters, but I wanted to add another perspective: It could be worse. They could be attempting to solve the hardware problems for days or weeks, and then find out later, when everyone is under the gun and has been going crazy about it not getting fixed, that they were addressing the wrong problem and it was, in fact, a software problem. So count your blessings. If they always classify it as a software problem, at least you know about it. Only then can you troubleshoot, maybe put in additional problem-solving or problem-identifying code, and make the system a tiny bit better.
Also, this is pretty much the same as every software developer everywhere has ever faced. Except usually it is the software versus the user, not the software versus the hardware. And in that case, it appears there is no known solution. Lots of ways to address the problem, but no way to fix it. Thus the ever-growing list of acronyms describing how to blame the user without being rude: ID-ten-T error, PICNIC, PEBKAC, etc.

Who is it who's reporting the problems?
If it's the end users, I think this is a non-issue. They just know that what they're trying to do is not working. It's not the user's responsibility to diagnose the problem. All they know is, "I tried to do X, Y should have happened, but instead Z happened." Everything beyond that is your problem.
If the hardware folks are insisting that the problem is in the software and the software folks are insisting that the problem is in the hardware, then you need to enhance the software to diagnose errors more precisely, as ChrisF and others have noted.
If the higher-ups are blaming the software group for problems that are the responsibility of the hardware group and you're sick of taking the blame for other people's mistakes, okay, I understand that. Again, as the software guy, you have the power to create more precise error messages. If you can explicitly say, "Stepper motor not responding" or whatever, then you have the "moral authority" to insist that someone run diagnostics on the stepper motor. Just saying, "I'm pretty sure it's a hardware problem" isn't going to win an argument.

Test-oriented development (not necessary means 'test-driven') is want you should resourced to.
Basically, every sub-systems should have a reasonably thorough set of unit tests to identify problem before integration. Every time a problem occurs test the hardware so you can know for sure (or almost sure) that it is the hardware problem. This means that hardware must be designed in the way that it can be thoroughly tested.
I was a integration head for my college robot team and this tactic helps a lot.
Hope this helps.

First, make sure your users are more likely to read and understand your error messages. Displaying "FPGA command GS_WIDGIT_FROB returned invalid response 0xFF45001C. Shutting down controller id 576D. (Error 1Xf)" might be great for you. But, the user is likely to hit "Ok" without reading it. Even if they do read it, it tells them no useful information. Either way, you're getting a phone call. Display "Widgit Frobber requires maintenance", but still log all the heavy details somewhere, and you're likely to get less calls.
Second, you know it's a hardware problem so do something about it! Have your software email hardware support, or whatever it takes to get the problem fixed. If the user is forced to decide what action to take to fix it, you can bet they'll get it wrong at least some of the time. If the user sees "Widgit Frobber requires maintenance. Hardware support has been notified (ticket #234)" they know that they don't have to do a thing.

Related

Fail Fast vs. Robustness

Our product is a distributed system. The modules I work on are fairly new, quite rigorous, well tested. They were developed with recent best practices in mind. Other modules can be considered as legacy software.
While I'm vigilant about everything that happens within modules I'm responsible for, I'm under constant pressure to work with bad data sent to me from the other modules. At heart, I'm a "Fail Fast" principle developer and as a result , when problems arise I usually am able to eliminate the possibility of error in my modules. It's not so much about blame, just saving wasted effort in chasing bugs in the wrong places.
But the argument I keep coming up against is: "We can't let this stuff fail in production, the customer expects this to work, why don't you work around this problem". And this would be an argument for robustness: be liberal in what you accept, conservative in what you send.
I should also note that these are mostly intermittent problems. We see them in integration tests but they are hard to reproduce. Timing and concurrency are involved.
I'm having a hard time balancing between the two principles. Part of it is my worry that if I start allowing and propagating exceptional data, I'm inviting trouble and I won't have as much confidence in my system. But I can't argue against keeping the system working even if other modules are sending me wrong data. The reason other modules aren't getting fixed is that they are too complex and fragile, while mine still appear clear and safe. But if I don't resist the pressure, my modules will slowly be saddled with the same problems I've been rejecting until now.
I should say that the system is not "crashing" in production, but my module may simply display an error to the operator and ask them to contact support. A crash would be a big problem, but if I'm reporting the error clearly, then isn't this the right thing to do? I suspect that my peers just don't want the customer to see any problems, period. But my module is rejecting data from other modules within our product, not customer input. So it seems to me that we are just not tackling problems.
So, do I need to be more pragmatic or hold my ground?
I share the "fail fast" preference/principle. Don't think of this as a conflict of principles though, its more a conflict of understanding. Your counterpart has some unspoken requirement ("dont show the user a bad time") that implies some missed requirement. You did not have a chance to think about/implement this requirement beforehand, so the requirement has left a bad taste in your mouth. Forget this viewpoint, re-approach it as a new project with a fixed requirement you can work against.
Maybe the best result is to give an error message like you displayed. But it sounds like you implemented it before having buy-in from your counterpart, when they had a choice to accept it. Earlier communication about what you were doing could have addressed something like that.
Be careful in how you prevent the ideas. Constantly referring to the other systems "too complex and fragile" might be rubbing people the wrong way. Express simply the systems are new to you and take longer to understand. Do put the time into understanding them, so you do not reduce peoples expectations of your capability.
I'd say that it depends on what happens if you don't halt. Does someone's paycheck get processed wrong? Does the wrong order get sent out? That would be worth stopping for.
If possible, have your cake and eat it too - don't report the error to the user, get the customer to agree to send diagnostic reports and report every failure back. Bug the developer(s) who own the faulting module(s) to fix them. And by bug I mean file a bug against them. Or, if management doesn't think it's worth the cost of fixing, don't.
I'd also write up unit tests against those modules that fail, especially if you can tell what the original input was that caused them to generate the wrong output.
What it really comes down to though is what the person who reviews your performance wants from you, especially after you explain the problem to them, via email.
Simply put, this sounds like a "don't check for something you can't handle". The fact that you're catching the error and able to report it means you're not propagating it. But it also means that since you can report it, you have some mechanism to trap the error and, therefore potentially handle it yourself, and correct it rather than report it.
Mind, I'm assuming that your error report is more interesting than a random exception you caught some place deep in the system. But even then, if it's an exception you're testing for and you're creating (i.e. you check if the denominator is zero and send an error rather than simply inadvertently dividing by zero and catching the exception higher up), then that suggests you may well have a way of correcting the problem.
Bottom line, you need both. You need to try to make the data as error free as practical, but also report the unexpected.
I don't think that you can lock the door and cross your arms saying "it's not my problem". The fact that it's coming from "old, fragile systems" is meaningless. YOUR code is not old a fragile and clearly the efficient place, in terms of the entire integrated system, to "fix" the data, once you've detected the problem. Yea the old modules will continue to GIGO to other, lesser systems, but those legacy modules combined with your new module are a cohesive whole and thus make up "the system".
The typical real problem here is simply the time/value equation of writing all this fix up code vs new features. That's a different debate. But if you have time, and you know things that you can do to clean up incoming data, "be liberal in what you accept" is sound policy.
I won't get into the reasons, but you are right.
In my experience, PHB's are missing the part of the brain required to understand why fail fast has merit and "robustness" as defined by do-whatever-it-takes-eat-errors-if-necessary is a bad idea. It is hopeless. They just don't have the hardware to grok it. They tend to say things "ok you make a good point but what about the user" - it's just their version of think of the children, and signals the end of a conversion with me anytime it's brought up.
My advice is to stand your ground. Eternally.
Thanks everyone. The case that prompted this question ended well, and partly thanks to insights I got from the answers above.
My initial reaction was to stick to fail fast, but I thought about this some more, and had reached the conclusion that one of the roles of my module is to provide a stabilizing anchor to the rest of the system. That does not necessarily mean accepting bad data, but surfacing problems, isolating them and handling them in a transparent manner until we find a solution.
I planned adding a new handler and code path for this case, which would properly execute as if it was a special use case that was previously undocumented.
We had a discussion where I reiterated the need to deal with the problem at the boundary, but was also willing to help. I outlined my plan to the other side, because I had a suspicion that my position was viewed as overly pedantic, and that the solution was perceived as me only having to turn off spurious validation of harmless data, even if it was incorrect. In reality though, the way I work is largely data driven, so I explained why it has to be correct and how behavior is driven by it and how in accommodating this data I will be implementing a special code path.
I think this gave weight to my position and it led to a more thorough discussion of the other side's aversion to fixing the data. It turned out that it was more of a weariness of dealing with an error prone legacy system than an actual obstacle. There was a relatively simple solution, it was just scary to make a change, a mindset that's fairly entrenched.
But having aired all challenges and possible solutions, we eventually agreed to fix the data, and so far it seems to have solved our problem. Our integration tests are now passing consistently, but we have also added logging and will continue to monitor it.
In summary, I think that for me, the synthesis of both principles is that fail fast is essential for surfacing problems. But once they do surface, robustness means providing a transparent path to continue operation in a way that does not compromise the system. I was able to offer that, and by doing so, won some goodwill from the other side and got the data fixed in the end.
Again, thanks to everyone that responded. I'm too new to rate comments, but I do appreciate all the perspectives presented.
That's a tricky one. If your module receives bad data and it's "ok" for you to just do nothing with them and return, then I would suggest to write to an error log instead of showing an error to the user.
It kind of depends on the class of error you are getting. If the way the system is breaking means you can keep going without feeding bad data to any other parts of the system, you should do everything in your power to work with whatever input is given.
To my mind though data purity trumps working systems, you cannot allow bad data to propagate elsewhere and corrupt other systems. To the extent you can massage data to be correct and then keep going, you should do so on the theory that the data is safe and you must keep the system running...
I like to think of things in terms of data streams. Passing bad data along is polluting the whole stream, and that is bad because just like real pollution a drop can spoil a whole river of data (if one element is bad, what else can you trust?). But equally bad is blocking the flow, letting nothing pass because you spotted something you could easily remove. Filter it out and if everyone at every stage is also filter, you get clear clean data out the other end even if a few impurities started up in the middle.
The question from your peers is: "why don't you work around this problem"
You say that it's possible for you detect the bad data, and report an error to the user. This is the normal approach - once you know the data coming to your functions is bad, you should fail fast (and this is the recommendation from the other answers I have read here).
However, your question doesn't specify the domain in which your software is operating. If you know the data coming in is erroneous, is it possible for you to request that data again? Is it actually possible to recover from the situation?
I mentioned that the "domain" here is important. So if you have an app which displays streamed video data for example, and maybe your wireless signal is weak so the stream is corrupt, should the system "fail fast" and display an error message? Or should a poorer image be displayed, and an attempt to reconnect made if needed, depending on the magnitude of the problem?
Depending on your domain, it may be possible for you to detect bad data, and make a second request for the data without inconveniencing the user. (This is clearly only relevant in cases where you'd expect the data to be better the second time, but you do say the issues you are experiencing are intermittent and possible concurrency related)...
So, fail-fast is good, and is definitely something you should do if you can't recover. And you should definitely not propagate bad data. But if you can recover, which in some domains you can, then failing straight away is not necessarily the best thing to do.

Why is it more costly to discover a defect later in the process? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
Why is it more costly to discover a defect later in the process?
I've heard this a lot but I struggle to understand and put context/examples to this.
You're building a house. You're laying the sewer pipes into the foundations, but unknown to you one of the pipes is blocked by a dead hedgehog.
Would you rather find out:
Just before you pour the concrete
After the house is finished and the new owner tries to use the toilet?
(There's a "Stack Overflow" joke somewhere in this analogy . 8-)
The longer it takes to find a bug, then:
the more the behavior of the bug may have been accepted as correct, and the more other things may have become dependent on that behavior (Windows is notorious for this).
the more tightly integrated the system is likely to have become, and the harder the bug will be to extract.
the higher the likelihood that the bug's erroneous behavior will be duplicated elsewhere by virtue of copy-pasting or in clients that use the erroneous code.
the longer it's been since the code was originally written and the harder it may be to understand it.
the less likely it will be for people who understand that original part of the system to be around to fix it.
This can be illustrated in a simple (if not trivial) example.
Take a simple dialog with a message and just two buttons "OK" and "Cancel".
Assume that the error is a spelling mistake.
If this is found after the product is released then a new version of the product has to be released with all the costs associated with that. Manuals will need to be reprinted.
If this is found in final testing the manual will have to be reprinted. The code will need to be rewritten and tests re-run.
If this is found during development then there is just the cost of fixing the code.
If this is found during design then the code is written correctly first time - no cost.
The later you find out a bug, the worse. When you find a bug immediately after you code, you have all the behavior in mind, and know exactly what changes caused it. You will be able to focus on the problem, once you know where it resides.
When you take long, developers no longer remember exactly how it worked, and there are many more places to investigate to find the bug. Perhaps the developer who coded the bug is no longer working in the company also.
Also, as time goes by, more parts of the code will probably depend on the buggy code, and you may need to fix them as well.
Finally, there are issues involving users. If you find a bug after a release, more users will be frustrated by it, and your product image will be worse. Users may also be used to have a workaround for this bug, which may start to fail after you fix the bug.
Summary: When you take long to find a bug
Your scope to investigate is bigger
The developer who created the bug may not be there anymore, and the other developers will have to study the code more to find it, understand it, and fix it
You may also need to fix parts that depends on buggy code (and there will be more parts like that)
Users will already be frustrated by the bug, and the image of the product will be damaged
No-one ever understands the code as well as you do as you are writing it.
People may have come to depend on the bug being there.
You may have to fix up lots of bad data that the bug has saved away.
You may have to roll out a new version or patch of your software.
Your helpdesk may have to field a whole heap of calls.
You may have to fill in bunches of paperwork explaining why that bug exists and what problems it causes, and what you are going to do to make sure it never, ever happens again.
Because more people will have spend time with the defective software.
If you fix a bug at early on you and maybe a code reviewer will spend a little time on it.
If it gets released to customers and reported as an error, you will have coded it, someone may have reviewed it, someone may have tested it, somebody may even have documented it and so forth ...
There may be other dependencies (internal or external) which will affect the fixing of a defect.
For example - If I resolve this defect, I may have to fix something else
Imagine you're writing an essay on why it's more costly to discover a defect later in the process, and you suddenly realise one of the premises on which most of your essay content is based is false.
If you're still planning, you only have the half a page of plan to change. If your essay is nearly finished, you suddenly need to scrap the lot and start over. If you've already handed it in, the error is gonna cost you your grade.
Same reason.
For a shrink-wrapped software product:
If you find a bug after your product hits the stores, you will have to help users through support calls, suggest a workaround or even recall the product/issue a service pack.
For a website:
Site outages and delays cost you money.
Customer loss as a result of poor/malfunctioning site costs you more.
The debugging process is also costly itself.
It is probably an error by the question author, but the actual question is, "Why is it more costly to discover a defect later in the process" Within that question is the cost to discover the bug and we can hope it also means to fix it. Most of the answers do a good job at describing the cost to fix and why it is better to fix early versus fix later. And, I really don't disagree with any of them. But, that isn't the whole question.
I have a regular series of esoteric arguments with some about the discovery cost. How much testing would have been required to find a specific bug (without hindsight). Would it have take 3 man-months more of automated or manual testing before you would have been likely to find that test case and scenario ?
In practice, test as much as you can but finding that balance point isn't as easy as many would have you think. Most programs are too big to have 100% code coverage. And, 100% code coverage is usually just a fraction of all the possible scenarios the code must handle.
Another factor that comes into the cost of a bug is the business cost associated with the bug. Are there 5 million boxes out there holding the bug ? Would you have to do a product recall ? Will it generate X calls to your warranty help desk ? Will it trigger some clause in a contract holding you liable for damages. In very simple terms, this is why software written in the medical field costs more per LOC than those for website development.
Because of the development process and all the work involved in fixing the defect.
Imagine you find a problem in the function you coded yesterday, you just check out, fix, check in, period. It's still fresh in your mind, you know what it is about and that your fix won't have any side effect.
Now imagine finding the same bug in six month from now. Will you remember why the function was coded that way ? Will you still be working on this project/company ? You have to open a defect report, a new version of your software have to be issued, QA needs to validate the correction. If the software has been deployed, then all instances have to be upgraded, customers will call support ...
Now it's true that the curve showing the cost are made up to illustrate the point; it actually depends on the development process.
I would say that the most costly is to find a defect and let it be. The longer you allow the defect to live the more costly it becomes.
I was at a company at a time, where they had the policy, that once they had taken a decision, they stick with it. The system I worked on was loaded with bugs because of a stupid corporate framework that we were forced to use, and a deep misunderstanding of the proper usage of web services.
To this day, I believe that the cheapest way for that company to get a working, usable system, would be to ditch the entire system and rewrite it from scratch.
So my point is, that I don't think that finding a defect at a late stage is that problematic. But ignoring a defect until a late stage is extremely problematic.

Standard methods of debugging

What's your standard way of debugging a problem? This might seem like a pretty broad question with some of you replying 'It depends on the problem' but I think a lot of us debug by instinct and haven't actually tried wording our process. That's why we say 'it depends'.
I was sort of forced to word my process recently because a few developers and I were working an the same problem and we were debugging it in totally different ways. I wanted them to understand what I was trying to do and vice versa.
After some reflection I realized that my way of debugging is actually quite monotonous. I'll first try to be able to reliably replicate the problem (especially on my local machine). Then through a series of elimination (and this is where I think it's problem dependent) try to identify the problem.
The other guys were trying to do it in a totally different way.
So, just wondering what has been working for you guys out there? And what would you say your process is for debugging if you had to formalize it in words?
BTW, we still haven't found out our problem =)
My approach varies based on my familiarity with the system at hand. Typically I do something like:
Replicate the failure, if at all possible.
Examine the fail state to determine the immediate cause of the failure.
If I'm familiar with the system, I may have a good guess about to root cause. If not, I start to mechanically trace the data back through the software while challenging basic assumptions made by the software.
If the problem seems to have a consistent trigger, I may manually walk forward through the code with a debugger while challenging implicit assumptions that the code makes.
Tracing the root cause is, of course, where things can get hairy. This is where having a dump (or better, a live, broken process) can be truly invaluable.
I think that the key point in my debugging process is challenging pre-conceptions and assumptions. The number of times I've found a bug in that component that I or a colleague would swear is working fine is massive.
I've been told by my more intuitive friends and colleagues that I'm quite pedantic when they watch me debug or ask me to help them figure something out. :)
Consider getting hold of the book "Debugging" by David J Agans. The subtitle is "The 9 Indispensable Rules for Finding Even the Most Elusive Software and Hardware Problems". His list of debugging rules — available in a poster form at the web site (and there's a link for the book, too) is:
Understand the system
Make it fail
Quit thinking and look
Divide and conquer
Change one thing at a time
Keep an audit trail
Check the plug
Get a fresh view
If you didn't fix it, it ain't fixed
The last point is particularly relevant in the software industry.
I picked those on the web or some book which I can't recall (it may have been CodingHorror ...)
Debugging 101:
Reproduce
Progressively Narrow Scope
Avoid Debuggers
Change Only One Thing At a Time
Psychological Methods:
Rubber-duck debugging
Don't Speculate
Don't be too Quick to Blame the Tools
Understand Both Problem and Solution
Take a Break
Consider Multiple Causes
Bug Prevention Methods:
Monitor Your Own Fault Injection Habits
Introduce Debugging Aids Early
Loose Coupling and Information Hiding
Write a Regression Test to Prevent Re occurrence
Technical Methods:
Inert Trace Statements
Consult the Log Files of Third Party Products
Search the web for the Stack Trace
Introduce Design By Contract
Wipe the Slate Clean
Intermittent Bugs
Explot Localility
Introduce Dummy Implementations and Subclasses
Recompile / Relink
Probe Boundary Conditions and Special Cases
Check Version Dependencies (third party)
Check Code that Has Changed Recently
Don't Trust the Error Message
Graphics Bugs
When I'm up against a bug that I can't get seem to figure out, I like to make a model of the problem. Make a copy of the section of problem code, and start removing features from it, one at a time. Run a unit test against the code after every removal. Through this process your will either remove the feature with the bug (and hence, locate the bug), or you will have isolated the bug down to a core piece of code that contains the essence of the problem. And once you figure out the essence of the problem, its a lot easier to fix.
I normally start off by forming an hypothesis based on the information I have at hand. Once this is done, I work to prove it to be correct. If it proves to be wrong, I start off with a different hypothesis.
Most of the Multithreaded synchronization issues get solved very easily with this approach.
Also you need to have a good understanding of the debugger you are using and its features. I work on Windows applications and have found windbg to be extremely helpful in finding bugs.
Reducing the bug to its simplest form often leads to greater understanding of the issue as well adding the benefit of being able to involve others if necessary.
Setting up a quick reproduction scenario to allow for efficient use of your time to test any hypothosis you chose.
Creating tools to dump the environment quickly for comparisons.
Creating and reproducing the bug with logging turned onto the maximum level.
Examining the system logs for anything alarming.
Looking at file dates and timestamps to get a feeling if the problem could be a recent introduction.
Looking through the source repository for recent activity in the relevant modules.
Apply deductive reasoning and apply the Ockham's Razor principles.
Be willing to step back and take a break from the problem.
I'm also a big fan of using process of elimination. Ruling out variables tremendously simplifies the debugging task. It's often the very first thing that should to be done.
Another really effective technique is to roll back to your last working version if possible and try again. This can be extremely powerful because it gives you solid footing to proceed more carefully. A variation on this is to get the code to a point where it is working, with less functionality, than not working with more functionality.
Of course, it's very important to not just try things. This increases your despair because it never works. I'd rather make 50 runs to gather information about the bug rather take a wild swing and hope it works.
I find the best time to "debug" is while you're writing the code. In other words, be defensive. Check return values, liberally use assert, use some kind of reliable logging mechanism and log everything.
To more directly answer the question, the most efficient way for me to debug problems is to read code. Having a log helps you find the relevant code to read quickly. No logging? Spend the time putting it in. It may not seem like you're finding the bug, and you may not be. The logging might help you find another bug though, and eventually once you've gone through enough code, you'll find it....faster than setting up debuggers and trying to reproduce the problem, single stepping, etc.
While debugging I try to think of what the possible problems could be. I've come up with a fairly arbitrary classification system, but it works for me: all bugs fall into one of four categories. Keep in mind here that I'm talking about runtime problems, not compiler or linker errors. The four categories are:
dynamic memory allocation
stack overflow
uninitialized variable
logic bug
These categories have been most useful to me with C and C++, but I expect they apply pretty well elsewhere. The logic bug category is a big one (e.g. putting a < b when the correct thing was a <= b), and can include things like failing to synchronize access among threads.
Knowing what I'm looking for (one of these four things) helps a lot in finding it. Finding bugs always seems to be much harder than fixing them.
The actual mechanics for debugging are most often:
do I have an automated test that demonstrates the problem?
if not, add a test that fails
change the code so the test passes
make sure all the other tests still pass
check in the change
No automated testing in your environment? No time like the present to set it up. Too hard to organize things so you can test individual pieces of your program? Take the time to make it so. May make it take "too long" to fix this particular bug, but the sooner you start, the faster everything else'll go. Again, you might not fix the particular bug you're looking for but I bet you find and fix others along the way.
My method of debugging is different, probably because I am still beginner.
When I encounter logical bug I seem to end up adding more variables to see which values go where and then I go and debug line by line in the piece of code that causing a problem.
Replicating the problem and generating a repeatable test data set is definitely the first and most important step to debugging.
If I can identify a repeatable bug, I'll typically try and isolate the components involved until I locate the problem. Frequently I'll spend a little time ruling out cases so I can state definitively: The problem is not in component X (or process Y, etc.).
First I try to replicate the error, without being able to replicate the error it is basically impossible in a non-trivial program to guess the problem.
Then if possible, break out the code in a separate standalone project. There are several reasons for this: If the original project is big it quite difficult to debug second it eliminates or highlights any assumptions about the code.
I normally always have another copy of VS open which I use for the debugging parts in mini projects and to test routines which I later add to the main project.
Once having reproduced the error in the separate module the battle is almost won.
Sometimes it is not easy to break out a piece of code so in those cases I use different methods depending on how complex the issue is. In most cases assumptions about data seem to come and bite me so I try to add lots of asserts in the code in order make sure my assumptions are correct. I also disabling code by using #ifdef until the error disappears. Eliminating dependencies to other modules etc... sort of slowly circling in the bug like a vulture ..
I think I don't have really a conscious way of doing it, it varies quite a lot but the general principle is to eliminate the noise around the issue until it is quite obvious what it is. Hope I didn't sound too confusing :)

How to react when the client's response is negative on delivery? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I am a junior programmer. Since my supervisor told me to sit in with the client, I joined. I saw the unsatisfied face of the client despite the successful (from my programmer's perspective) delivery of the project!
Client: You could have included this!
Us: Was not in the specification!
Client: Common Sense!
As a programmer, how do you respond in this situation?
What you should do to avoid this situation:
Explicitly spec out what will be included and what will not be included.
The problem probably comes down to the unspecified parts of the spec:
The client thinks that unspecified stuff should be in, i.e. it was implied.
The developer thinks that unspecified stuff should not be in.
For future specs that you have, you should have a catch all statement, that explicitly states that if something is not specified in this document, it can be done after the original specification is done at an additional cost.
What you should do in the current situation:
Other than learning from your experiences, you should come to some compromise with the client.
Example: I will do this feature that you feel is common sense, but for all future additions/changes it will have to be spec'ed out explicitly.
I.e. you will have to do a little more work, but it is worth it in return for the catch all explicitly spec'ed agreement your client will enter into.
Bad spec?
Was it necessarily a bad spec? No.
It is impossible to mention everything your clients may expect, so it is critical to have this catch all statement mentioned above stated clearly and explicitly in your spec/contract.
Other ways to reduce the problem:
Involve the client early, show them early prototypes. Even if they don't demand it.
Try not to sell the client an end product, but more of a service for working on his product.
Consider an agile development model or something similar so that tasks are well defined, small, paid for, and indisputable.
This would be one of many reasons why I switched to an Agile development philosophy. The only way, in my opinion, to successfully avoid this scenario is to either be omniscient or involve the customer heavily and release early/release often to get feedback as soon as possible. That way you can develop the software the customer really wants, not the software the customer tells you they want.
Client: You could have included this!
Us: Was not in the specification!
Client: Common Sense!!
Us: We do not attempt to go beyond what the client has specified - we follow the specification. It's as important to NOT implement features not specified as it is to implement features specified. We will never second guess our customers, who value the fact that they can completely depend on us to correctly and completely implement the specification on time and under budget.
As others very rightly point out, the situation is almost always more complex than the simple exchange I've described above.
However, the above is valid if the implementer has a specification with the customer's signature on it which essentially implements an agreement that says "once the software provably implements all the features in the spec then it is considered complete", and anything additional is outside the specification and therefore outside the contract.
The contract itself may have some input here as well - if you don't have a signed contract than it doesn't matter what's in the spec - everything so far has been done on a handshake, and the entire deal (including payment) can go down the toilet based on any dissatisfaction on either side.
But if you have a contract and a specification, and the customer has seen and signed both, then they have no wriggle room to ask you to go further.
Now, as to the question of whether you should implement it:
AWESOME! You delivered a product and they only had one complaint. Implement the feature, call it a 'freebie (make sure they understand you're working outside the spec and contract and explicitly send them a bill for the work with the discount shown in dollars) and have them sign off on the project as a whole.
It will explicitly demonstrate that the project is ended, that you went above and beyond the call of duty, and that any further 'surprises' are outside the contract/spec, which gives you a nice layer of protection beyond what you already (ostensibly) have.
If it's a UI issue, then you're in murkier water.
Does the spec adequately describe the UI? Does it have mockups? I wouldn't fault a customer for this complaint about the UI if the spec did not very closely describe the layout, usage, and include mockups.
Either way, I think you can understand the customer's position - if they haven't played with UI mockups, then they're going to be disappointed with the result regardless - there's no way, psychologically speaking, that you and your customer could have possibly had the same idea in mind (nevermind the fact that common sense isn't!).
Quite frankly if this is the first time the customer has thought about checking out the UI before the work is finished, then it's at least partially your fault for not explaining good UI design processes to them. This is a key feature for their app, and it's very tightly coupled to what they've imagined - no one can be satisfied in such a situation unless they've 'grown' their internal representation over time to match what the reality is.
This disconnect is solved only through frequent user and customer testing, which is obviously missing. This is a problem regarding client education and communication, not whether the specification was met or not.
-Adam
Expect last minute changes of scope - they always happen, so be ready.
Review progress frequently with client - to minimize surprises.
Contract: Functional Spec, plus Time & Materials with initial cap (so client feels control).
Then when changes come along, re-negotiate the cap if necessary.
Never say they can't have what they want. They can get that answer for free!
Always give them a little more than they asked for, so they know you've got a positive attitude.
Relate to the client as being on the same team with them. Don't accept being legalistically painted as an adversary.
They may think of contractors as not loyal, compared to employees. Show them you're as dedicated to their success as their employees are, and you'll go the extra mile.
Classic case...
There's not definite answer to this one, but it all turns around communication. There should have been preventive measures put in place (like weekly reviews or something like that).
For sure, you can't redo the whole thing for free.
Two ways: Or to tell them to ** off or you deal with it.
If you choose to deal:
First, empathize, respect the client.
Have a look at what can easily be changed.
Have a look at the contracts.
Maybe create a new agreement.
Don't do too much.
Make them see the progress and the work it takes.
Find workarounds for the missing features (maybe using other great features, or available tools.)
Use your common sense, it is so common, its not even funny.
This is one of the many drawbacks of a fixed bid arrangement. Any time business needs or priorities change, or there is even a simple misunderstanding, it results in anything from an awkward situation like this to calling lawyers in. If you have an arrangement where you get paid for development time, you can always react to any change and get paid for whatever time it takes to make that change. Also, having a by-the-hour arrangement does not preclude having a plan or making an estimate.
Once you are in a fixed bid pickle, though, your options are:
1) Do it at an additional cost.
2) Do it free.
3) Don't do it.
Option 3 is the worst, and Option 1 is the best. If you have a good trusting relationship and decent communication with the client, it's usually easy to arrive at Option 1. If the relationship is bad, then you've got bigger problems. At that point, just try to avoid laywers.
A final point - any project that has something known as "The Delivery Date" inevitably runs into the problem described. Projects with said date usually involve retreating to a cave for several months to develop in hiding followed by an unleashing of the product all at once in front of the stakeholders. This is abrupt and leaves plenty of time for client expectations and the actual product to drift apart. If, instead, you show intermediate versions of the product and gather feedback every few weeks, two things happen. First, you get better feedback, minimize misunderstandings, and make a better product. Second, there is no single point in time on which a massive amount of expectation is laid. The potential difference between what the client is imagining and what actually exists is much smaller. No surprises.
Good luck.
"how do you react?"
Question 1 - do you want to continue this relationship with this customer? Seriously. If they are going to claim that unspecified features are "common sense," this may not be a good relationship to maintain or enhance.
If you want to disengage, then that's easy. Ask for them to highlight each part of the specification that you failed to comply with and play that game. Get specific test criteria for each missing feature. Pull Teeth. Be confrontational in determining what's missing. Don't ask why. Just ask for all the details up front. It's slow and unpleasant. But you don't want them anyway.
If you want to engage, well, you're going to have to change the relationship. Currently, you have a Passive Aggressive Customer. They won't say what they want, but they will say what they don't want.
This may be a habit with them; this may be how they win concessions. Or this just may be sloppy specification on their part.
If you want the relationship, your reaction has two parts.
Short-term. Get something they're happy with. They have to identify specific changes. You have to score each change with a "cost to do" and "fit with specification".
Some things are cheap and a good fit. Do those.
Some things are cheap to do, but a bad fit with the specification. Think twice about enabling a bad specification to lead to rework. In a sense, you purchased the specification from them; you may need to raise your standards, also.
The expensive things which (sadly) fit the specification are a problem. You're in trouble with these, and pretty much have to do them.
The expensive things which don't fit well with the specification are lessons learned for everyone. Detail a plan for these, including specification rewrites and approvals.
Long-term. Make sure they you're not PA'd again. Review early and often, use Agile techniques. Communicate more, prototype more, release more.
Well, it was not successfully delivered. Somewhere along the line there was miscommunication. Without knowing the specifics I would suggest this is not a developer injected problem and this is probably not to be blamed on the customer - the requirements gathering task was insufficient. This is a classic example of what happens when the software side does not have domain experts or the requirements discovery process doesn't do all that it could...
If it was me I would correct the problem and figure out how to avoid similar issue in the future.
How you handle this can very well determine the future of this contract/business with the client. Taking responsibility and correcting the issue is a huge opportunity for your company.
EDIT:
This is a good time to evaluate how this happened to help correct it. Some companies choose to totally revamp everything they do which is a mistake I think. So is ignoring it. Blaming people for the problem is also a mistake.
It is a good time to walk through how this happened, what the process is, and maybe how it could have been caught. I would not make huge rule changes or process changes - but coming up with guidelines for future work is a great thing. Your company had a clear lesson about a shortcoming. Losing the opportunity to correct this problem and to correct your process would be a waste of a good chance.
ZiG, I've had to deal with this problem on several occasions at my current place of Employment. My group (3 developers) tries to approach things in an Agile manner. We're used to getting mid-stream and even last-second requests (which we then treat on a case-by-case basis).
However, we make it clear that resources (particularly time) are limited and if it's not in the spec we can't make promises. If it's judged important and it can't fit into the current release, we generally plan a followup release. If it isn't important, it goes on a list.
One thing I've found is that you can get users to agree to Spec S at Time T. However at Time T + N, getting them to remember they agreed to Spec S, or getting them to acknowledge that they did so (with the documentation you've been keeping, I hope!) can be trickier than it should be.
Speaking to the OP's subject and question:
If you are an employed programmer, then I would hope that other resources are in the meeting with you. Possibly "higher ups" in the organization.
If this is the case, then your job is to answer DIRECT questions, and to keep your emotions in check. Yes, you may feel injured because they don't love your code, but showing any emotion with bosses present is not a good thing. Rather, try and look neutral and let the others handle the session.
Now, if they "hang you out to dry", then I would recommend the following questions:
a) "OK. I see. Why exactly to you feel this is common sense to include this feature? I'd like to discover why we didn't include it." (force them to explain their thought process. Common sense to one person is rarely common sense to anyone else.)
b) "Well, I'm sure we could include that in the next release. I'll leave it up to XXX (the bosses) to come to a mutually agreeable approach" (i.e. don't talk cost or freebies with bosses present. EVER.)
Again, this assumes you are a programmer WORKING for a company that delivered the product. Now, if are more than that - i.e. you ARE one of the higher ups, then many of the suggestions here are excellent.
However, if you are the higher-up or are a consultant programmer, then first and formost
a) Apologize for the process that did not catch this requirement. Promise to work with the client to prevent this from recurring.
Then on to the other strategies. It really doesn't matter if you charge for the fix or not - the apology is the most important action to the client. Again, it bears repeating - you are not apologizing for the missed feature. You are apologizing for the faulty design process that let it slip. Clients are usually pretty accommodating when you start this way and then seek a solution.
Cheers,
-Richard
Use SCRUM like approaches to avoid this deathtrap: involve the client in the dev process early, frequently and in informal, restricted commitees -> risk reduction and improved agility.
In terms of your literal question, how to react, the best way is to ignore your ego ("what?! After I worked so hard on this and met the spec?!") and instead focus on some active listening and working to consensus.
Client: You could have included this!
Us: Was not in the specification!
Client: Common Sense!!
Us: I understand that you're not happy that we didn't go beyond the bounds of the specification. Seeing how you feel about this, how can we make you happy? Let's see if there's a process we can create together that will help everyone.
Essentially, you don't want to turn this into a "you said/I said" death match. The only way to resolve those involves lawyers and then nobody wins. If you can agree that the spec or the process was to fault, work together to fix those.
This approach actually just worked for me: wait for the guy who doesn't like your software to leave and be replaced by the guy who does like it.
Obviously you can't really rely on this, but if you're sure that you did a good job and that your software really will satisfy the business needs of the people who hired you, it does pay to wait it out. Sometimes the client's initial reaction will not be their final one, especially if you can quickly incorporate their concerns.
Don't try make the client feel like it is their fault. It might be their fault, but making them feel that way will not produce constructive results, and could just annoy them.
Instead, you should realize that clients only complain about software they use, in most cases because they like it. Nobody complains about software nobody uses. It is inevitable that a client will complain about the software you deliver, even if you deliver exactly what they ask for. So don't sweat it. Software is never done.
Total failure on the part of the person in charge of requirements collection, no doubt about it. Additional failure of the project management to not iterate the deliverable and have check-in meetings with the client.
However, you have a signed-off spec, and what you've delivered matches the spec. So, your company has two choices: write off the cost in the name of business development and make the change for free, or charge them for the change request.
If it ain't in the spec, it ain't in the spec. As a developer with no specific domain knowledge, 'common sense' is an irrelevant concept. Different industries work in different ways and one approach might be quite appropriate for a particular domain but completely unacceptable in the other.
Writing good specs is an art-form. IMO, you can either take an agile 'analyst/programmer' approach where you make small iterations or write and maintain a detailed, unambiguous specification. Both are highly skilled tasks, and are still iterative. You still have to evolve the specification.
Either way is not as easy as it sounds and both require the ability to establish a good working relationship with the client.
You cannot know what your customer think in his head. This situation occur often with client that haven't got any experiences with programming project. What I suggest to you is to simply show him that "common sense" isn't very accurate as answer in engineering (or programming if you prefer).
Show him other example in life that will show him that you cannot build something that aren't written. Example: building a new house, the guy who build the house need a plan with all detail... he won't put optional electric plug because in the living room it's more "common sense" to have some extra...
I had this once. And luckily it wasn't me that created the design because that proved to be the problem.
It is of vital importance that the communcation between your company and the client is as perfect as possible. Be sure you understand each other. Ask questions and let them ask questions. Do not let anything open in the design. This will be the problem point at delivery. And have regular meetings during the project (preferably with a prerelease).
Unfortunately a lot of developers are bad at communciation, and a lot of clients are not aware of their own needs. But if you can minimize the gap, you have found yourself a happy (and returning) customer.
This is why I/the teams I worked with always used a prototype-style approach, that means:
after collecting the requirements, you show the client an early and basic release of the software
the client says "you could have included this"/"it's common sense"
you change your design to reflect the client's desiderata
iterate from point 1 till the official release
You have to start it early on; tell the customer, early and often, that the spec/use-cases/user-stories are a contract which define what will be delivered. in an agile environment there are plenty of chances for the customer to observe some "common sense" feature they want and ask for it, which is one of the advantages of an agile approach, but if you start accepting "common sense" additions at the end, you are preparing yourself for infinite extensions, probably at your expense.
Some customers expect this; the more and better you tell them they can't, the easier the eventual arguments will be.
As a junior guy, I realize you can't do this -- yet -- but one of the hard-but-necessary lessons is that sometimes you have to fire a customer.
You learn - everything is learning and nothing is personal.
We are experts in our area we know better than customer what he need. And next time for next customer we will suggest all useful features in advance and make him happy and will make him pay more money because we are the experts and we know better.

Debugging is a bad smell - how to persuade them?

I've been working on a project that can't be described as 'small' anymore (40+ months), with a team that can't be defined as 'small' anymore (~30 people). We've been using Agile/Scrum (1) practices all along, and a healthy dose of TDD.
I'm not sure if I picked this up from Agile or TDD, more likely a combination of the two, but I'm now clearly in the camp of people that looks at debugging as a bad smell. By 'debugging' I'm not referring to the more abstract concept of figuring out what might be wrong with the system, but the specific activity of running the system in Debug mode, stepping through the code to figure out details that are otherwise inscrutable.
Since I'm fairly convinced, this question is not about whether debugging is a bad smell or not. Rather, I'd like to know how I can persuade my team-mates about this.
People that believe debugging mode is the 'standard' mode tend to write code that can be understood only by debugging through it, which leads to a lot of time wasted since every time you work an item on top of code developed by someone else, you get to first spend a considerable amount of time debugging it (and, since there's no bug involved.. the term is becoming increasingly ridiculous) - and then silos happen. So I'd love to convince a few of my team-mates that avoiding debug mode is a Good Thing (2). Since they are used to live in Debug mode, however, they don't seem to see the problem; to them, spending hours debugging someone else code before they even start doing anything related to their new item is the norm; they don't see anything wrong with it. Plus, as they spend time 'figuring it out' they know eventually the developer that worked that area will become available and the item will be passed on to them (leading to yet another silo).
Help me come up with a plan to turn them from the Dark Side !
Thanks in advance.
(1) Also referred to as SCRUM (all caps). Capitalization arguments aside, I think an asterisk after the term must be used since - unsurprisingly - our organization 'tweaked' the Agile and Scrum process to fit the perceived needs of all stakeholders involved. So, in all honesty, I won't pretend this has been 100% according to theory, but that's beside the point of my question.
(2) Yes, there will always be times when we'll have to get in debug mode, I'm not trying to absolutely avoid it, just.. trying to minimize the number of times we have to dive into it.
If you want to persuade your coworkers that your programming practices are better, first demonstrate by your productiveness that you are more effective than they are, at least for some tasks. Then they'll believe you when you explain how you get so much done.
It's also sometimes easier to focus on something concrete. Do your coworkers even talk in terms of "code smell"? Perhaps you could focus on specifics like "When the ABC module fails, it takes forever to debug it; it's much faster to use technique XYZ. Here, let me demonstrate." Then afterwards you can mention your basic principle, which is yeah the debugger is a useful tool, but there's usually other more useful ones.
This is a cross-post, because the first time around it was more of an aside on someone else's answer to a different question. To this question it's a direct answer.
Debugging degrades the quality code of
the code we produce because it allows
us to get away with a lower level of
preparation and less mental
discipline. I learnt this from an
accidental controlled experiment in
early 2000, which I now relate:
I took on a contract as a Delphi
coder, and the first task assigned was
to write a template engine
conceptually similar to a reporting
engine - using Java, a language with
which I was unfamiliar.
Bizarrely, the employer was quite
happy to pay me contract rates to
spend months becoming proficient with
a new language, but wouldn't pay for
books or debuggers. I was told to
download the compiler and learn using
online resources (Java Trails were
pretty good).
The golden rule of arts and sciences
is that whoever has the gold makes the
rules, so I proceeded as instructed. I
got my editor macros rigged up so I
could launch the Java compiler on the
current edit buffer with a single
keystroke, I found syntax-colouring
definitions for my editor and I used
regexes to parse the compiler output
and put my cursor on the reported
location of compile errors. When the
dust settled, I had a little IDE with
everything but a debugger.
To trace my code I used the good old
fashioned technique of inserting
writes to the console that logged
position in the code and the state of
any variables I cared to inspect. It
was crude, it was time-consuming, it
had to be pulled out once the code
worked and it sometimes had confusing
side-effects (eg forcing
initialisation earlier than it might
otherwise have occurred resulting in
code that only works while the trace
is present).
Under these conditions my class
methods got shorter and more and more
sharply defined, until typically they
did exactly one very well defined
operation. They also tended to be
specifically designed for easy
testing, with simple and completely
deterministic output so I could test
them independently.
The long and the short of it is that
when debugging is more painful than
designing, the path of least
resistance is better design.
What turned this from an observation
to a certainty was the success of the
project. Suddenly there was budget and
I had a "proper" IDE with an
integrated debugger. Over the course
of the next two weeks I noticed a
reversion to prior habits, with
"sketch" code made to work by
iterative refinement in the debugger.
Having noticed this I recreated some
earlier work using a debugger in place
of thoughtful design. Interestingly,
taking away the debugger slowed
development only slightly, and the
finished code was vastly better
quality particularly from a
maintenance perspective.
Don't get me wrong: there is a place
for debuggers. Personally, I think
that place is in the hands of the team
leader, to be brought out in times of
dire need to figure out a mystery, and
then taken away again before people
lose their discipline.
People won't want to ask for it
because that would be an admission of
weakness in front of their peers, and
the act of explaining the need and the
surrounding context may well induce
peer insights that solve the problem -
or even better designs free from the
problem.
So, FOR, I not only agree with your position, I have real data from a controlled experiment to support it. It is, however, a rather small sample. More elaborate tests are required before my conclusions are supportable.
Why don't you take what I've said to your team and suggest trials. You have more data than they do (I just gave it to you) and in order to have a credible basis for disagreeing with you they basically have to test the idea, and the only way to do that is to give your idea a go.
You should be ready for it to all fall apart, though, because the whole thing is predicated on the assumption that the developers have the talent and experience to rise to the challenge of stronger design in the absence of step-through debugging.
Step-through debugging was created to make debugging easier. The direct effect of lowering the bar is that people with less talent can participate - if you build a tool that even jackasses can use, you will get jackasses using it -- a lot of them, if the newly accessible activity is well-remunerated.
This causes an exodus of people with talent because they generally use that talent to do rare and precious things in order to be well paid without working too hard, and the market doesn't want to pay for excellence because it cannot distinguish talent well enough to know when paying for it is justified.
Another thought: more recent work with problems on production servers, where it was impossible to install a debugger, has shown the importance of having a codebase for which maintenance doesn't depend on the availability of a debugger. Code that's grown in the absence of debuggers is much less hassle. Choose not to use them when you can change your mind, and then when you can't change your mind it won't be so awful.
Since I'm fairly convinced, this question is not about whether debugging is a bad smell or not.
Well, your local Church might be more appropriate place for your question then.
That aside, convince them by arguments. You might want to reconsider your fundamentalist stance, however, because this is the very opposite of persuasive. One thing you might want to do is drop the term “debugging” in your whole discussion and replace it by “stepping through the code” or the likes, emphasizing that you oppose the uninformend guesswork/patchwork practice of probing that you condemn rather than an informed reflection about the code.
(I would still disagree with you, but that's besides the point since you didn't want a discussion.)
I think the real problem here is
People that believe debugging mode is
the 'standard' mode tend to write code
that can be understood only by
stepping through it
This, if true, should be self evidently wrong and there should be no need to discuss it. If it's not evident it's because they don't see how the badly written code could be improved. Show them, do code reviews where you show how that code could be refactored in a way that is clear without stepping through it.
Code stepping will automatically diminish once better code is written, it just doesn't work the other way around. People will still write bad code and if they avoid stepping through it that will only lead to more wasted time (damn I wish I could step through this spaghetti mess), not to better code.
There is something wrong here, but it's hard to put my finger on it. Perhaps the real issue is that the code has other smells that make it difficult to readily understand. I agree that with TDD one ought to use the debugger less rather than more, since you'll be developing the code in small increments. But, if you can't look at the code and understand it, perhaps it's because the design is too coupled -- there are too many interrelated classes required to make things work.
If the code really needs to be so complex that observation won't suffice, then maybe you need to invest in some good commenting, explaining what is happening -- though I would prefer to see things refactored to the point where comments are not needed. My suspicion is that the debugger may be a symptom rather than the problem.
I know that for me, switching from traditional, code-first development to test-first development has resulted in less time spent debugging...and it's not something I miss. Typically I'll only involve the debugger when its not obvious why the code I just wrote to pass a test, didn't.
This is going to sound like the argument you said you don't want to have, but I think if you want to convince your teammates, you're going to have to make a stronger case. I don't understand your objection. I frequently step through code I'm trying to understand with the debugger. It's a great way to see what's going on. You have not established your claim that people who use the debugger in this way tend to write code which is otherwise difficult to understand. The only convincing way to do so would be through some kind of case/control study which tried to measure and compare the readability of code written by people with varying approaches to the debugger. And you have not even told a plausible story explaining why you think using a tool to understand code execution tends to lead to sloppier code construction. For me it's a complete non sequitur.
A "plan" to convince them of the advantage of another approach is by establishing metrics linked to the number of time you debug the same function for different bugs.
By analysis the trend of that metric, you may convince them that non-regression tests are more useful to spend time writing, and will help them to debug more efficiently.
That way, you do not write completely off the "debug" habit, but you convince them of establishing a solid set of test, allowing them to focus on really useful debug session, if needed.
Should you consider this course of action (metrics), you should know its implementation involves the all hierarchy (stakeholder, project manager, architect, developers). They all need to be implicated in those metrics in order to act on them.
Regarding developers, you could try to suggest:
some new ways of closing a bug case (close it only with the test scenario played to reproduce that bug, meaning they need an independent test in order to, if needed, launch their debug session)
a clear relationship between those metrics and their evaluation by the management (it would be a bad practice to debug over and over the same function)
a larger involvement in architectural decisions: sometimes, knowing some functional or applicative features rather than just classes and code can incite a developer to think more in term of black-box test rather than white-box (which can more easily lead to debug session)
a participation into "operational architecture" process (where you need to deploy your app, and make full front-to-back integration test). Again, a larger picture of the all system can help a developer to get more interested in features rather than 'lines of code'
I think a better phrasing of this question would be "Is non-TDD a code smell?" TDD seems to lead to less time spent in the debugger due to more time spent writing/failing/passing tests. Without TDD, you are more likely to spend time in the debugger to diagnose errors.
At least within Visual Studio, using the debugger is not that painful, so the challenge for you would be to explain to your teammates how TDD would make their development more enjoyable, productive and successful. Just avoiding the debugger is probably not reason enough for a team to switch their development methodology.
Right on roadwarrior.
debugging isn't the problem, it's poorly commented and or documented code and bad archetecture. I work on a smaller team but when a bug does surface, I do step through the code. frequently it's a very small job because the app is well planned out and the doc's on the code are clear.
That said lets get to my point. Want the team to not debug... comment, comment comment. Nothing beats down the urge to debug faster. Sure they'll still do it, but they'll be more likely to step over well documented code.
Oh and though it should go without saying, I'll do it anyway. don't have bugs in your code. :)
I agree with those above who expressed the relative irrelevance of this "debugger issue."
IMO, the 2 most important goals of a developer are:
1) Make the software do what it's supposed to do.
2) Write the code so that a maintenance developer 2 years down the road enjoys the experience of changing existing or adding new features.
Before you make a plan, you should decide how important this change is to you. Although I agree that debugging is a smell, it is also a very well accepted and ingrained practice for developers, so convincing them that they should stop doing it won't be easy or quick - and for good reasons. How much energy do you want to put into this topic?
Second, why do you want to persuade them in the first place? If your motivation is to help them, is it really their top priority problem? When you help people in ways they want to be helped, change becomes easy.
Once you have decided that you want to go on with your change initiative, you need to take into account that different people are convinced by different things. Some people will already be convinced by trying something new and exciting. Some will be convinced by numbers (metrics). Some by getting told about it while eating their favorite type of cookie (seriously!), some by hearing about it from their favorite guru. Some by reading about it in a magazine. Some by seeing that "everyone else is doing it, too". Etc. pp.
There is an insightful interview with Linda Rising on this topic at InfoQ: http://www.infoq.com/interviews/Linda-Rising-Fearless-Change. She can say it much better than me. The book is quite good, too.
Whatever you do, don't press too much, but also don't give up. Change can happen - especially if you take resistance as a resource -, and sometimes it happens at unexpected times, so always keep a sense of wonder.
#FOR : You have a second problem too, here it is :
sadly it doesn't seem the devs are interested in being more productive (they get paid the same anyway)
How do you intend to make them want to be more productive when there is nothing (visible) for them to gain?
Designing software by debugging is a good practice.
The number of environments supporting this way of developing is very small: the best known is Smalltalk. In Smalltalk, you can write a test describing your objects protocol without the methods being implemented. Running this test will then trigger the debugger, and you can add the method to the right class in the debugger, and can continue stepping through the code until all functionality is implemented and the test is green.
This needs a compiler to be available at run-time, and first-class invocations. It offers a very short feedback cycle, and is one of the primary reasons for Smalltalks' productivity

Resources