Legacy Code Nightmare [closed] - refactoring

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I've inherited a project where the class diagrams closely resemble a spider web on a plate of spaghetti. I've written about 300 unit tests in the past two months to give myself a safety net covering the main executable.
I have my library of agile development books within reach at any given moment:
Working Effectively with Legacy Code
Refactoring
Code Complete
Agile Principles Patterns and Practices in C#
etc.
The problem is everything I touch seems to break something else.
The UI classes have business logic and database code mixed in. There are mutual dependencies between a number of classes. There's a couple of god classes that break every time I change any of the other classes. There's also a mutant singleton/utility class with about half instance methods and half static methods (though ironically the static methods rely on the instance and the instance methods don't).
My predecessors even thought it would be clever to use all the datasets backwards. Every database update is sent directly to the db server as parameters in a stored procedure, then the datasets are manually refreshed so the UI will display the most recent changes.
I'm sometimes tempted to think they used some form of weak obfuscation for either job security or as a last farewell before handing the code over.
Is there any good resources for detangling this mess? The books I have are helpful but only seem to cover half the scenarios I'm running into.

It sounds like you're tackling it in the right way.
Test
Refactor
Test again
Unfortunately, this can be a slow and tedious process. There's really no substitute for digging in and understanding what the code is trying to accomplish.
One book that I can recommend (if you don't already have it filed under "etc.") is Refactoring to Patterns. It's geared towards people who are in your exact situation.

I'm working in a similar situation.
If it is not a small utility but a big enterprise project then it is:
a) too late to fix it
b) beyond the capabilities of a single person to attempt a)
c) can only be fixed by a complete rewriting of the stuff which is out of the question
Refactoring can in many cases be only attempted in your private time at your personal risk. If you don't get an explicit mandate to do it as part of you daily job then you're likely not even get any credit for it. May even be criticized for "pointlessly wasting time on something that has perfectly worked for a long time already".
Just continue hacking it the way it has been hacked before, receive your paycheck and so on. When you get completely frustrated or the system reaches the point of being non-hackable any further, find another job.
EDIT: Whenever I attempt to address the question of the true architecture and doing the things the right way I usually get LOL in my face directly from responsible managers who are saying something like "I don't give a damn about good architecture" (attempted translation from German). I have personally brought one very bad component to the point of non-hackability while of course having given advanced warnings months in advance. They then had to cancel some promised features to customers because it was not doable any longer. Noone touches it anymore...

I've worked this job before. I spent just over two years on a legacy beast that is very similar. It took two of us over a year just to stabilize everything (it's still broke, but it's better).
First thing -- get exception logging into the app if it doesn't exist already. We used FogBugz, and it took us about a month to get reporting integrated into our app; it wasn't perfect right away, but it was reporting errors automatically. It's usually pretty safe to implement try-catch blocks in all your events, and that will cover most of your errors.
From there fix the bugs that come in first. Then fight the small battles, especially those based on the bugs. If you fix a bug that unexpectedly affects something else, refactor that block so that it is decoupled from the rest of the code.
It will take some extreme measures to rewrite a big, critical-to-company-success application no matter how bad it is. Even you get permission to do so, you'll be spending too much time supporting the legacy application to make any progress on the rewrite anyway. If you do many small refactorings, eventually either the big ones won't be that big or you'll have really good foundation classes for your rewrite.
One thing to take away from this is that it is a great experience. It will be frustrating, but you will learn a lot.

I have (once) come across code that was so insanely tangled that I couldn't fix it with a functional duplicate in a reasonable amount of time. That was sort of a special case though, as it was a parser and I had no idea how many clients might be "using" some of the bugs it had. Rendering hundreds of "working" source files erroneous was not a good option.
Most of the time it is imminently doable, just daunting. Read through that refactoring book.
I generally start fixing bad code by moving things around a bit (without actually changing implementation code more than required) so that modules and classes are at least somewhat coherent.
When that is done, you can take your more coherent class and rewrite its guts to perform the exact same way, but this time with sensible code. This is the tricky part with management, as they generally don't like to hear that you are going to take weeks to code and debug something that will behave exactly the same (if all goes well).
During this process I guarantee you will discover tons of bugs, and outright design stupidities. It's OK to fix trivial bugs while recoding, but otherwise leave such things for later.
Once this is done with a couple of classes, you will start to see where things can be modularized better, designed better, etc. Plus it will be easier to make such changes without impacting unrelated things because the code is now more modular, and you probably know it thoroughly.

Mostly, that sounds pretty bad. But I don't understand this part:
My predecessors even thought it would
be clever to use all the datasets
backwards. Every database update is
sent directly to the db server as
parameters in a stored procedure, then
the datasets are manually refreshed so
the UI will display the most recent
changes.
That sounds pretty close to a way I frequently write things. What's wrong with this? What's the correct way?

If your refactorings are breaking code, particularly code that seems to be unrelated, then you're trying to do too much at a time.
I recommend a first-pass refactoring where all you do is ExtractMethod: the goal is simply to name each step in the code, without any attempts at consolidation whatsoever.
After that, think about breaking dependencies, replacing singletons, consolidation.

If your refactorings are breaking things, then it means you don't have adequate unit test coverage - as the unit tests should have broken first. I recommend you get better unit test coverage second, after getting exception logging into place.
I then recommend you do small refactorings first - Extract Method to break large methods into understandable pieces; Introduce Variable to remove some duplication within a method; maybe Introduce Parameter if you find duplication between the variables used by your callers and the callee.
And run the unit test suite after each refactoring or set of refactorings. I'd say run them all until you gain confidence about which tests will need to be rerun every time.

No book will be able to cover all possible scenarios. It also depends on what you'll be expected to do with the project and whether there is any kind of external specification.
If you'll only have to do occasional small changes, just do those and don't bother starting to refactor.
If there is a specification (or you can get someone to write it), consider a complete rewrite if it can be justified by the foreseeable amount of changes to the project
If "the implementation is the specification" and there are a lot of changes planned, then you're pretty much hosed. Write LOTS of unit tests and start refactoring in small steps.
Actually, unit tests are going to be invaluable no matter what you do (if you can write them to an interface that's not going to change much with refactorings or a rewrite, that is).

See blog post Anatomy of an Anti-Corruption Layer, Part 1 and Anatomy of an Anti-Corruption Layer, Part 2.
It cites Eric Evans, Domain-Driven Design: Tackling Complexity in the Heart of Software:
Access the crap behind a facade

You could extract and then refactor some part of it, to break the dependencies and isolate layers into different modules, libraries, assemblies, directories. Then you re-inject the cleaned parts in to the application with a strangler application strategy. Lather, rinse, repeat.

Good luck, that is the tough part of being a developer.
I think your approach is good, but you need to focus on delivering business value (number of unit tests is not a measure of business value, but it may give you an indication if you are on or off track). It's important to have identified the behaviors that need to be changed, prioritize, and focus on the top ones.
The other piece of advise is to remain humble. Realize that if you wrote something so large under real deadlines and someone else saw your code, they would probably have problems understanding it as well. There is a skill in writing clean code, and there is a more important skill in dealing with other people's code.
The last piece of advise is to try to leverage the rest of your team. Past members may know information about the system you can learn. Also, they may be able to help test behaviors. I know the ideal is to have automated tests, but if someone can help by verifying things for you manually consider getting their help.

I particularly like the diagram in Code Complete, in which you start with just legacy code, a rectangle of fuzzy grey texture. Then when you replace some of it, you have fuzzy grey at the bottom, solid white at the top, and a jagged line representing the interface between the two.
That is, everything is either 'nasty old stuff' or 'nice new stuff'. One side of the line or the other.
The line is jagged, because you're migrating different parts of the system at different rates.
As you work, the jagged line gradually descends, until you have more white than grey, and eventually just grey.
Of course, that doesn't make the specifics any easier for you. But it does give you a model you can use to monitor your progress. At any one time you should have a clear understanding of where the line is: which bits are new, which are old, and how the two sides communicate.

You might find the following post useful:
http://refactoringin.net/?p=36
As it is said in the post, don't discard a complete overwrite that easily. Also, if at all possible, try to replace whole layers or tiers with third-party solution like for example ORM for persistence or with new code. But most important of all, try to understand the logic (problem domain) behind the code.

Related

How not to rush yourself? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I often find that I do a less than complete work on a feature, especially in the Design phase. I detect several reasons:
I'm over-optimistic
I feel the need to provide quick solutions, so sometimes I fool myself into thinking the design is fool-proof when in fact it's still full of holes, just to get the job done faster. Of course I end up paying dearly later.
I'm aware of this behavior of mine for some time, yet I still find I don't manage to compensate. Have you encountered similar problems? How do you approach solving them?
I use a couple of techniques. The first is a simple paper to-do list. In the morning I write down my tasks for the day. I try to work on a task until I can cross it off. I cross it off only when I'm done to my own satisfaction. My to-do list helps me stay focused. When an interruption comes in, I can consciously choose whether it is important enough to interrupt what I'm doing now.
The second technique I use is to give up on the idea of "done" for a design. Instead, I focus on what I've started calling "successions", where a design goes through predictable stages. Each stage supports the current functionality well and will be succeeded at some point by the next stage. This lets me do a good job, a job I can be proud of, without over-designing.
I have the intuition that there is a small catalog of such successions (like http://www.threeriversinstitute.org/FirstOneThenMany.html) that would cover most of design. In the meantime, I try to remember that "sufficient to the day are the troubles thereof".
I run into this problem a lot.
My solution is a notebook. (The old fashioned paper kind).
I write out how I'm planning on implementing the solution as an bulleted overview list, and then I try and flesh out each point on the list.
Often, during that process, I come across issues I hadn't thought of.
Of course, the 80/20 rule still applies... I still come across things when I'm actually doing the implementation that hadn't occurred to me, but with experience these tend to diminish.
EDIT: If I'm still not sure at the end of this process, I put together a throwaway prototype testbed... It's important to make sure it's throwaway, because otherwise you run the risk of including some nasty hacks in your real codebase.
It's very common to miss edge-cases and detail when you're in the planning phase of a project, especially in the software development field. Please don't feel that this is a personal failing; it's something endemic.
To counter this, many software development methodologies have emerged. Most recently there has been a shift by many development teams to 'agile' methods, where there is a focus on rapid development with little up-front technical design (after all, many complexities are only discovered when you actually begin developing). I'm currently using the Scrum system, which has been excellent in my small team:
http://en.wikipedia.org/wiki/Agile_methods
http://en.wikipedia.org/wiki/Scrum_%28development%29
If you find that your organisation will not accept what they may regard as a radical shift in approach, it may be worth investigating whether they will agree to the development of a prototype system. This means that you could code up a feature to investigate the technologies involved and judge whether it's feasible, without having to commit to full development, a quality bar, testing schedules etc. The prototype should be thrown away once the feasibility has been proved or disproved, then proper development may begin, including all that you've learned in the process.
If your problem is more related to time management, then I'd recommend the Getting Things Done approach (http://en.wikipedia.org/wiki/Getting_things_done). This is pragmatic and simple, concentrating on making you productive without overloading you with information that isn't immediately relevant to your current work. I've found that I get overwhelmed with project/feature ideas at times and it really helps to write everything down and file it for a later time when I have the resources available to work effectively.
I hope this helps and best of luck!
Communication.
The best way to not rush yourself into programming mistakes is communication. Yes, good ol' fashioned accountability. If another person in the office is involved in the process, the better the outcome. If a programmer just takes on the task without any concern for anybody else, then there is a higher possiblity for mistakes.
Accountability Checklist:
How do we support this?
Who needs to know what has changed?
Why are we doing this in the first place?
Will there be anybody who doesn't want this changed?
Will someone else understand how I did this?
How will the user perceive and use this change?
A skepticle comrad is usually good enough to help. Functional Specifications are good, they usually answer all of these thoughts. But, sometimes a conversation with another person can help you with it and you can get changes out the door faster.
I have learned, through years of mistakes (though still making them), that almost anything I want to use repeatedly, or distribute, needs to be designed properly. So getting burned enough times will end your optimism.
When getting pressure from management, I tell them I will have to put in the thought anyway, so I should do it when it's cheap. I think on paper as well, so I can actually prove that I'm doing something and it keeps my fingers on the keyboard, both of which provides a soothing effect to management. ;-)
At the risk of sounding obvious - be pessimistic. I had a few experiences where I thought "that should take a few hours" and it ended up taking a couple days because of all the little things that pop up unexpectedly.
By far the best way I've found to manage things is to (much like Andrew's answer) write out the design and requirements as a starting point. Then I go through and look for weak points in the design, gotchas and additional use cases etc. I try to look at this as a critical exercise - there's no code written yet, so this is the time to be totally ruthless and look for every weak point. Look for error conditions you'll have to handle, and whatever amount of time you think it will take to complete each feature/function, pad that amount by a lot. I've had times where I've doubled my initial estimate and still not been that far off the mark.
It's very hard as a programmer to realistically project debugging time - writing the code is easy to estimate, but debugging that into functioning, valid code is something else entirely. Therefore I find there's no exact science to it but I just pad tasks by a whole bunch, so that I have plenty of breathing room for debugging.
See also Evidence Based Scheduling which is a fascinating concept in scheduling developed by FogCreek for their FogBugz product.
You and the rest of the world.
You need more a more detailed design, more accurate estimate, and the willingness to accept that sometimes the optimal solution is not necessarily the best solution (e.g., you could code some loop in assembler to get optimal performance, but that's going to take a lot longer than just doing
for (i=1; i<=10; i++) {}
). Is the time spent doing it really worth it for an accounting package over a missile system.
I like to designing, but over time I've found that much design up front is a lot like building castles into the sky - it's too much speculation, however well-educated, missing critical feedback from actually implementing and using the design.
So today I'm much more into accepting that while implementing a design I will learn a lot of new stuff about it, and need to feed that learning back into the design. Doing that is a skill that is fun to learn, including the skills to keep a design flexible by keeping it simple, free of duplication and cohesive and decoupled, of changing the design in small, controlled steps (=refactoring), and writing the necessary extensive suite of automated tests that make this kind of changes safe.
This seems to be a much more effective approach to me than getting better at "up front design speculation" - and addtionally it makes me equally well prepared for the inevitable moment when the design needs to be changed due to a simply unforseeable change in the requirements.
Divide, divide, divide. List all the steps that will be required to finish the project, then list all the steps those steps will require to be concluded, and so on until you reach atomic items you are absolutely sure you can finish in a day or less. Add the duration of all these values to arrive at a length of time.
Then double it. Now you have a number that, if depressing, is at least somewhat realistic.
If possible "Sleep on your design" before publishing it. I find after I leave work, I usually think of things I have missed. This usually happens while I am lying in bed before falling asleep or even while showering the next day.
I also find it valuable to have a peer/friend that I trust review what I have before distributing it. Somebody else almost always sees something I didn't think of or miscommunicated.
I like to do as others stated here. Write down in pseudo code what the flow of your app will be. This immediately highlights some detailed areas that may require further attention that where not apparent up front.
Pseudo code is also readable to business users who can verify your approach meets their needs.
Using pseudo code also creates a nice set of methods that could be put to use as an interface in the final solution. Once the pseudo code is fairly tight, look for patterns and review some common GOF patterns. They do not have to be perfect but using them will sheild you from having to rewrite the code later during the revisions that are bound to come along.
Just taking an hour or two write psuedo code, yields some invaluable time saving pieces later on:
1. An object model emerges
2. The program's flow is clearly defined for others
3. It can be used as documentation of your design with some refinement
4. Comments are easier to add and will be clearer for someone else reviewing your code.
Best of luck to you!
I've found that the best way to make sure you've chosen a good design is to make sure that you understand the problem, know the limitations you have, and know what things are must-haves vs. nice-to-haves.
Understanding the problem will involve talking to the people who have the need and keeping them anchored to what needs to get done first instead of how they think it ought to get done. Once you know what actually has to happen, you can go back and talk over requirements about how.
Knowing your limitations may be quite easy: needs to run on the iPhone; has to be a web application; needs to integrate with the already-existing Java code and deployment setup; and so on. It may be quite difficult: you don't know what the potential size of your user base is (hundreds? thousands? millions?); you don't know whether you'll need to localize it (though if you're not sure, assume you will have to).
Must-haves vs, nice-to-haves: this is possibly the most difficult part. Users very often have emotional attachments to "requirements" ("It should look just like Excel") that are not actually part of the "has to happen" stuff. You often have to juggle functionality vs. desires to get an acceptable implementation. You can't always give everyone a pony.
Make sure you write all this down! Even if it evolves along the way, or the design is small, having a "this is what we're planning to do now" guide to refer to when you need ot make a decision about committing resources makes it easier to restrain yourself from implementing a really cool whiz-bang feature instead of a boring must-do.
Since you recognize that you feel the need to provide a quick solution, perhaps it will slow you down to realize that you can probably solve the problem faster and deliver it sooner if you spend more upfront time in design. For instance if you spend 3 hours designing and 30 hours writting code, it probably means that if you spend 6 hours designing you might need to only spend 10 hours writing code. (These are not actual figures just examples). You might try to quantify this for yourself on the next few projects you do. Do a couple where you behave as you normally would and see what ratio of design/codewriting/testing&debugging you actually do. Then on the next project deliberately increase the percentage of time you spend on design phase and see if it does shorten the time needed for the other phases. You will have to try for several projects on this as well to get a true baseline since the projects may be quite different. Do it as a test to see if you can improve your performance on the the other phases and thus deliver a faster product if you spend 20% more time or 50% more time or 100% more time on design.
Remember the later in the process you find the problem with a design the harder (and more time-consuming) it is to fix.

When does a code base become large and unwieldy? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
When do you start to consider a code base to be getting too large and unwieldy?
-when a significant amount of your coding time is devoted to "where do I put this code?"
-when reasoning about side-effects starts to become really hard.
-when there's a significant amount of code that's just "in there", and nobody knows what it does or if it's still running but it's too scary to remove
-when lots of team members spend significant chunks of their time chasing down intermittent bugs caused by some empty string somewhere in the data where it wasn't expected, or something that you think would usually be caught in a well-written application, in some edge case
-when, in considering how to implement a new feature, "complete rewrite" starts to seem like a good answer
-when you dread looking at the mess of code you need to maintain and wish you could find work building something clean and logical instead of dumpster diving through the detritus of someone else's poorly organized thinking
When it's over 100 lines. Joke. This is probably the hardest question to answer, because it's very individual.
But if you structure the application well and use different layers for i.e. interfaces, data, services and front-end you will automaticly get a nice "base"-structure. Then you can dividie each layer into different classes and then inside the classes you point out the appropriet methods for the class.
However, there's not an "x amount of lines per method is bad" but think of it more like this, if there is possibility of replication, split it from the current peice and make it re-usable.
Re-using code is the basics of all good structure.
And splitting up into different layers will help the base to become more and more flexible and modular.
There exist some calculable metrics if that's what you're searching for. Static code analysis tools can help with that:
Here's one list: http://checkstyle.sourceforge.net/config_metrics.html
Other factors can be the time it takes to change/add something.
Other non-calculable factors can be
the risk associated to changes
the level intermingling of features.
if the documentation can keep up with the features / code
if the documentation represent the application.
the level of training needed.
the quantity of repeat instead of reuse.
Ah, the god-program anti-pattern.
When you can't remember at least the
outline of sections of it.
When you have to think about how
changes will affect itself or
dependencies.
When you can't remember all the
things it's dependant on or depend
on it.
When it takes more than a few
minutes(?) to download the source or
compile.
When you have to worry about how to
deploy new versions.
When you encounter classes which are
functionally identical to other
classes elsewhere in the app.
So many possible signs.
I think there are many thoughts to why some code base is too large.
It is hard to remain in a constant naming convention. If classes/methods/atributes can't be named consistently or can't be found consistently, then it's time to reorganize.
When your programmers are surfing the web and going to lunch in order to compile. Keeping compiling/linking time to a minimum is important for management. The last thing you want is a programmer to get distracted by twiddling their thumbs for too long.
When small changes start to affect many MANY other places of code. There is a benefit to consolidation of code, but there is also a cost. If a small change to fix one bug causes a dozen more, and this is commonly happens, then your code base needs to be spread out (versioned libraries) or possibly unconsolidated (yes, duplicate code).
If the learning curve of new programmers to the project is obviously longer than acceptable (usually 90 days), then your code base/training isn't set up right.
..There are many many more, I'm sure. If you think about it from these three perspectives:
Is it hard to support?
Is it hard to change?
Is it hard to learn?
...Then you will have an idea if your code fits the "large and unwieldy" category
For me, code becomes unwieldy when there's been a lot of changes made to the codebase that weren't planned for when the program was initially written or last refactored significantly. At this point, stuff starts to get fitted into the existing codebase in odd places for expediency and you start to get a lot of design artifacts that only make sense if you know the history of the implementation.
Short answer: it depends on the project.
Long answer:
A codebase doesn't have to be large to be unwieldy - spaghetti code can be written from line 1. So, there's not really a magic tripping point from good to bad - it's more of a spectrum of great <---> awful, and it takes daily effort to keep your codebase from heading in the wrong direction. What you generally need is a lead developer that has the ability to review others' code objectively, and keep an eye on the architecture and design of the code as a whole - no one line developer can do that.
When I can't remember what a class does or what other classes it uses off the top of my head. It's really more a function of my cognitive capacity coupled with the code complexity.
I was trying to think of a way of deciding based on how your collegues perceive it to be.
During my first week at a gig a few years ago, I said during a stand-up that I had been tracking a white rabbit around the ContainerManagerBean, the ContainerManagementBean and the ContextManagerBean (it makes me shudder just recalling these words!). At least two of the developers looked at their shoes and I could see them keeping in a snigger.
Right then and there, I knew that this was not a problem with my lack of familiarity with the codebase - all the developers perceived a problem with it.
If over years of development different people code change requests and bug fixes you will sooner or later get parts of code with duplicated functionality, very similar classes, some spaghetti etc.
This is mostly due to the fact that a fix is needed fast and the "new guy" doesn't know the code base. So he happily codes away something which is already there.
But if you have automatic checks in place checking the style, unit test code coverage and similar you can avoid some of it.
A lot of the things that people have identified as indicating problems don't really have to do with the raw size of the codebase, but rather its comprehensibility. How does size relate to comprehensibility? If at all...
I've seen very short programs that are just a mess -- easier to throw away and redo from scratch. I've also seen very large programs whose structure is transparent enough that it is comprehensible even at progressively more detailed views of it. And everything in between...
I think look at this question from the standpoint of an entire codebase is a good one, but it probably pays to work up from the bottom and look first at the comprehensibility of individual classes, to multi-class components, to subsystems, and finally up to an entire system. I would expect the answers at each level of detail to build on each other.
For my money, the simplest benchmark is this: Can you explain the essence of what X does in one sentence? Where X is some granularity of component, and you can assume an understanding of the levels immediately above and below the component.
When you come to need a utility method or class, and have no idea whether someone else has already implemented it or have any idea where to look for one.
Related: when several slightly different implementations of the same functionality exist, because each author was unaware of other authors' work.

Improving really bad systems

How would you begin improving on a really bad system?
Let me explain what I mean before you recommend creating unit tests and refactoring. I could use those techniques but that would be pointless in this case.
Actually the system is so broken it doesn't do what it needs to do.
For example the system should count how many messages it sends. It mostly works but in some cases it "forgets" to increase the value of the message counter. The problem is that so many other modules with their own workarounds build upon this counter that if I correct the counter the system as a whole would become worse than it is currently. The solution could be to modify all the modules and remove their own corrections, but with 150+ modules that would require so much coordination that I can not afford it.
Even worse, there are some problems that has workarounds not in the system itself, but in people's head. For example the system can not represent more than four related messages in one message group. Some services would require five messages grouped together. The accounting department knows about this limitation and every time they count the messages for these services, they count the message groups and multiply it by 5/4 to get the correct number of the messages. There is absolutely no documentation about these deviations and nobody knows how many such things are present in the system now.
So how would you begin working on improving this system? What strategy would you follow?
A few additional things: I'm a one-men-army working on this so it is not an acceptable answer to hire enough men and redesign/refactor the system. And in a few weeks or months I really should show some visible progression so it is not an option either to do the refactoring myself in a couple of years.
Some technical details: the system is written in Java and PHP but I don't think that really matters. There are two databases behind it, an Oracle and a PostgreSQL one. Besides the flaws mentioned before the code itself is smells too, it is really badly written and documented.
Additional info:
The counter issue is not a synchronization problem. The counter++ statements are added to some modules, and are not added to some other modules. A quick and dirty fix is to add them where they are missing. The long solution is to make it kind of an aspect for the modules that need it, making impossible to forget it later. I have no problems with fixing things like this, but if I would make this change I would break over 10 other modules.
Update:
I accepted Greg D's answer. Even if I like Adam Bellaire's more, it wouldn't help me to know what would be ideal to know. Thanks all for the answers.
Put out the fires. If there are any issues of critical priority, whatever they are, you've got to handle them first. Hack it in if you must, with a smelly codebase it's ok. You know you'll improve it going forward. This is your sales technique targeted at whomever you're reporting to.
Pick some low-hanging fruit. I assume you're relatively new to this particular software and that you were re-tasked to deal with it. Find some apparently easy problems in a related subsystem of the code that shouldn't take more than a day or two to resolve apiece, and fix them. This may involve refactoring, or it may not. The goal is to familiarize yourself with the system and with the style of the original author. You may not get really lucky (One of the two incompetents who worked on my system before me always post-fixed his comments with four punctuation marks instead of one, which made it very easy to distinguish who wrote the particular segment of code.), but you'll develop insight into the author's weaknesses so you know what to look out for. Extensive, tight coupling with global state vs poor understanding of language tools, for example.
Set a big goal. If your experience parallels mine, you'll find yourself in a particular bit of spaghetti code more and more often as you perform the prior step. This is the first knot you need to untangle. With the experience you've gained understanding the component and knowledge about what the original author likely did wrong (and thus, what you need to watch out for), you can start envisioning a better model for this subset of the system. Don't worry if you still have to maintain some messy interfaces to maintain functionality, just take it one step at a time.
Lather, rinse, repeat! :)
Given time, consider adding unit tests for your new model one level underneath your interfaces with the rest of the system. Don't engrave the bad interfaces in code via tests that use them, you'll be changing them in a future iteration.
Addressing the particular issues you mention:
When you run into a situation that users are working around manually, talk with the users about changing it. Verify that they'll accept the change if you provide it before sinking the time into it. If they don't want the change, your job is to maintain the broken behavior.
When you run into a buggy component that multiple other components have worked around, I espouse a parallel component technique. Create a counter that works how the existing one should work. Provide a similar (or, if practical, identical) interface and slide the new component into the codebase. When you touch external components that work around the broken one, try to replace the old component with the new one. Similar interfaces ease porting of the code, and the old component is still around if the new one fails. Don't remove the old component until you can.
What is being asked of you right now? Are you being asked to implement functionality, or fix bugs? Do they even know what they want you to do?
If you don't have the manpower, time, or resources to "fix" the system as a whole, then all you can do is bail water. You're saying you should be able to make some "visible progress" in a few months' time. Well, with the system being as bad as you described, you may actually make the system worse. Under pressure to do something noticeable, you'll simply add code, and make the sysem even more convoluted.
You need to refactor, eventually. There is no way around it. If you can find a way to refactor that is visible to your end users, that would be ideal, even if it takes 6-9 months or a year instead of "a few months." But if you can't, then you have a choice to make:
Refactor, and risk being viewed as "not accomplishing anything" despite your efforts
Don't refactor, accomplish "visible" goals, and make the system more convoluted and more difficult to refactor one day. (Maybe after you find a better job, and hope the next developer to come along can never find out where you live.)
Which one is most beneficial to you personally depends on your company's culture. Will they one day decide to hire more developers, or replace this system completely with some other product?
Conversely, if your efforts to "fix things" actually break other things, will they be understanding about the monstrosity you're being asked to tackle single-handedly?
No easy answers here, sorry. You have to evaluate based on your unique, individual situation.
This is a whole book that will basically say unit test and refactor, but with more practical advice on how to do it
http://ecx.images-amazon.com/images/I/51RCXGPXQ8L._SL500_AA240_.jpg
http://www.amazon.com/Working-Effectively-Legacy-Robert-Martin/dp/0131177052
You open the directory that contains this system with Windows Explorer. Then, press Ctrl-A, and then Shift-Delete. That sounds like an improvement in your case.
Seriously though: that counter sounds like it's got thread-safety issues. I'd put a lock around the increasing functions.
And regarding the rest of the system, you can't do the impossible so try to do the possible. You need to attack your system from two fronts. Take care of the more visibly problematic issues first, so you can show progress. At the same time, you should deal with the more infrastructural problems, so that you have a chance at actually fixing this thing some day.
Good luck, and may the source be with you.
Pick one area that would be of medium difficulty to refactor. Create a skeleton of the original code with only the method signatures of the existing ones; maybe use an Interface even. Then start hacking away. You can even point the "new" methods to the old ones until you get to them.
Then, testing, testing, testing. Since there aren't any unit tests, maybe just use good old fashioned Voice-Activated-Unit Tests (people)? Or write your own tests as you go.
Document your progress as you go in some kind of repository, including frustrations and questions, so that when the next poor schmuck who gets this project won't be where you are :).
Once you get the first part done, move on to the next. The key is to build on top of incremental progress, that's why you shouldn't start with the hardest part first; it'll be too easy to get demoralized.
Joel has a couple of articles on rewriting/refactoring:
http://www.joelonsoftware.com/articles/fog0000000069.html
http://www.joelonsoftware.com/articles/fog0000000348.html
I've been working with a legacy system with the same characteristics for almost three years now, and there are no shortcuts that I'm aware of.
What bothers me most with our legacy system is that I'm not allowed to fix some bugs, since many other functions could break if I fixed them. This calls for ugly workarounds or creating new versions of old functions. Calls to the old functions can then be replaced with the new one at a time (while testing).
I'm not sure what the goal of your task is, but I strongly advise you to touch as little of the code as possible. Only do what you need to do.
You may want to get as much as possible documented by interviewing people. This is a huge task, since you don't know which questions to ask, and people will have forgotten a lot of details.
Other than that: make sure you're getting paid and enough moral support. There will be weeping and gnashing of teeth...
Well you need to start somewhere, and it sounds like there are bugs that need fixing. I would work through those bugs, making quick win refactorings, and writing any unit tests possible along the way. I would also use a tool like SourceMonitor to identify some of the most 'complex' parts of code in the system and see if I could simplify their design in any way. Ultimately, you just have to accept that it will be a slow process, and make small steps towards a better system.
I would try to pick a part of the system that could be extracted and rewritten in isolation fairly quickly. Even if it doesn't do much, you could show progress pretty quickly, and you don't have the problem of interfacing with the legacy code directly.
Hopefully, if you could pick off a few such tasks, they will see you making visible progress, and you could put forward an argument for hiring more people to rewrite the bigger modules. When parts of the system rely on broken behaviour, you don't have much choice but to separate before you fix anything.
Hopefully, you could gradually build a team capable of rewriting the whole lot.
All of this would have to go hand in hand with some decent training, otherwise people's old habits will stick, and your work will get the blame when things don't work as expected.
Good luck!
Deprecate everything that currently exists that has problems, and write new ones that work correctly. Document as much as you can about what will change and put big red flashing signs all over the place pointing to this documentation.
By doing it that way, you can keep your existing bugs (the ones that are being compensated for somewhere else) around without slowing down your progress towards getting an actual working system.

What are some reasons why a sole developer should use TDD? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I'm a contract programmer with lots of experience. I'm used to being hired by a client to go in and do a software project of one form or another on my own, usually from nothing. That means a clean slate, almost every time. I can bring in libraries I've developed to get a quick start, but they're always optional. (and depend on getting the right IP clauses in the contract) Many times I can specify or even design the hardware platform... so we're talking serious freedom here.
I can see uses for constructing automated tests for certain code: Libraries with more than trivial functionality, core functionality with a high number of references, etc. Basically, as the value of a piece of code goes up through heavy use, I can see it would be more and more valuable to automatically test that code so that I know I don't break it.
However, in my situation, I find it hard to rationalize anything more than that. I'll adopt things as they prove useful, but I'm not about to blindly follow anything.
I find many of the things I do in 'maintenance' are actually small design changes. In this case, the tests would not have saved me anything and now they'd have to change too. A highly iterative, stub-first design approach works very well for me. I can't see actually saving myself that much time with more extensive tests.
Hobby projects are even harder to justify... they're usually anything from weekenders up to a say month long. Edge-case bugs rarely matter, it's all about playing with something.
Reading questions such as this one, The most voted on response seems to say that in that poster's experience/opinion TDD actually wastes time if you've got less than 5 people (even assuming a certain level of competence/experience with TDD). However, that appears to be covering initial development time, not maintenance. It's not clear how TDD stacks up over the entire life cycle of a project.
I think TDD could be a good step in the worthwhile goal of improving the quality of the products of our industry as a whole. Idealism on it's own is no longer all that effective at motivating me, though.
I do think TDD would be a good approach in large teams, or any size team containing at least one unreliable programmer. That's not my question.
Why would a sole developer with a good track record adopt TDD?
I'd love to hear of any kind of metrics done (formally or not) on TDD... focusing on solo developers or very small teams.
Failing that, anecdotes of your personal experiences would be nice, too. :)
Please avoid stating opinion without experience to back it. Let's not make this an ideology war. Also the skip greater employment options argument. This is simply an efficiency question.
I'm not about to blindly follow anything.
That's the right attitude. I use TDD all the time, but I don't adhere to it as strictly as some.
The best argument (in my mind) in favor of TDD is that you get a set of tests you can run when you finally get to the refactoring and maintenance phases of your project. If this is your only reason for using TDD, then you can write the tests any time you want, instead of blindly following the methodology.
The other reason I use TDD is that writing tests gets me thinking about my API up front. I'm forced to think about how I'm going to use a class before I write it. Getting my head into the project at this high level works for me. There are other ways to do this, and if you've found other methods (there are plenty) to do the same thing, then I'd say keep doing what works for you.
I find it even more useful when flying solo. With nobody around to bounce ideas off of and nobody around to perform peer reviews, you will need some assurance that you're code is solid. TDD/BDD will provide that assurance for you. TDD is a bit contraversial, though. Others may completely disagree with what I'm saying.
EDIT: Might I add that if done right, you can actually generate specifications for your software at the same time you write tests. This is a great side effect of BDD. You can make yourself look like super developer if you're cranking out solid code along with specs, all on your own.
Ok my turn... I'd do TDD even on my own (for non-spike/experimental/prototype code) because
Think before you leap: forces me to think what I want to get done before i start cranking out code. What am I trying to accomplish here.. 'If I assume I already had this piece.. how would I expect it to work?' Encourages interface-in design of objects.
Easier to change: I can make modifications with confidence.. 'I didn't break anything in step1-10 when i changed step5.' Regression testing is instantaneous
Better designs emerge: I've found better designs emerging without me investing effort in a design activity. test-first + Refactoring lead to loosely coupled, minimal classes with minimal methods.. no overengineering.. no YAGNI code. The classes have better public interfaces, small methods and are more readable. This is kind of a zen thing.. you only notice you got it when you 'get it'.
The debugger is not my crutch anymore : I know what my program does.. without having to spend hours stepping thru my own code. Nowadays If I spend more than 10 mins with the debugger.. mental alarms start ringing.
Helps me go home on time I have noticed a marked decrease in the number of bugs in my code since TDD.. even if the assert is like a Console trace and not a xUnit type AT.
Productivity / Flow: it helps me to identify the next discrete baby-step that will take me towards done... keeps the snowball rolling. TDD helps me get into a rhythm (or what XPers call flow) quicker. I get a bigger chunk of quality work done per unit time than before. The red-green-refactor cycle turns into... a kind of perpetual motion machine.
I can prove that my code works at the touch of a button
Practice makes perfect I find myself learning & spotting dragons faster.. with more TDD time under my belt. Maybe dissonance.. but I feel that TDD has made me a better programmer even when I don't go test first. Spotting refactoring opportunities has become second nature...
I'll update if I think of any more.. this is what i came up with in the last 2 mins of reflection.
I'm also a contract programmer. Here are my 12 Reasons Why I Love Unit Tests.
My best experience with TDD is centered around the pyftpdlib project. Most of the development is done by the original author, and I've made a few small contributions, but it's essentially a solo project. The test suite for the project is very thorough, and tests all the major features of the FTPd library. Before checking in changes or releasing a version, all tests are checked, and when a new feature is added, the test suite is always updated as well.
As a result of this approach, this is the only project I've ever worked on that didn't have showstopper bugs appear after a new release, have changes checked in that broke a major feature, etc. The code is very solid and I've been consistently impressed with how few bug reports have been opened during the life of the project. I (and the original author) attribute much of this success to the comprehensive test suite and the ability to test every major code path at will.
From a logical perspective, any code you write has to be tested, and without TDD then you'll be testing it yourself manually. On the flip side to pyftpdlib, the worst code by number of bugs and frequency of major issues, is code that is/was solely being tested by the developers and QA trying out new features manually. Things don't get tested because of time crunch or falling through the cracks. Old code paths are forgotten and even the oldest stable features end up breaking, major releases end up with important features non-functional. etc. Manual testing is critically important for verification and some randomization of testing, but based on my experiences I'd say that it's essential to have both manual testing and a carefully constructed unit test framework. Between the two approaches the gaps in coverage are smaller, and your likelihood of problems can only be reduced.
It does not matter whether you are the sole developer or not. You have to think of it from the application point of view. All the applications needs to work properly, all the applications need to be maintained, all the applications needs to be less buggy. There are of course certain scenarios where a TDD approach might not suit you. This is when the deadline is approaching very fast and no time to perform unit testing.
Anyways, TDD does not depend on a solo or a team environment. It depends on the application as a whole.
I don't have an enormous amount of experience, but I have had the experience of seeing sharply-contrasted approaches to testing.
In one job, there was no automated testing. "Testing" consisted of poking around in the application, trying whatever popped in your head, to see if it broke. Needless to say, it was easy for flat-out-broken code to reach our production server.
In my current job, there is lots of automated testing, and a full CI-system. Now when code gets broken, it is immediately obvious. Not only that, but as I work, the tests really document what features are working in my code, and what haven't yet. It gives me great confidence to be able to add new features, knowing that if I break existing ones, it won't go unnoticed.
So, to me, it depends not so much on the size of the team, but the size of the application. Can you keep track of every part of the application? Every requirement? Every test you need to run to make sure the application is working? What does it even mean to say that the application is "working", if you don't have tests to prove it?
Just my $0.02.
Tests allow you to refactor with confidence that you are not breaking the system. Writing the tests first allows the tests to define what is working behavior for the system. Any behavior that isn't defined by the test is by definition a by-product and allowed to change when refactoring. Writing tests first also drive the design in good directions. To support testability you find that you need to decouple classes, use interfaces, and follow good pattern (Inversion of Control, for instance) to make your code easily testable. If you write tests afterwards, you can't be sure that you've covered all the behavior expected of your system in the tests. You also find that some things are hard to test because of the design -- since it was likely developed without testing in mind -- and are tempted to skimp on or omit tests.
I generally work solo and mostly do TDD -- the cases where I don't are simply where I fail to live up to my practices or haven't yet found a good way that works for me to do TDD, for example with web interfaces.
TDD is not about testing it's about writing code. As such, it provides a lot of benefits to even a single developer. For many developers it is a mindshift to write more robust code. For example, how often do you think "Now how can this code fail?" after writing code without TDD? For many developers, the answer to that question is none. For TDD practioners it shifts the mindset to to doing things like checking if objects or strings are null before doing something with them because you are writing tests to specifically do that (break the code).
Another major reason is change. Anytime you deal with a customer, they can never seem to make up their minds. The only constant is change. TDD helps as a "safety net" to find all the other areas that could break.Even on small projects this can keep you from burning up precious time in the debugger.
I could go and on, but I think saying that TDD is more about writing code than anything should be enough to justify it's use as a sole developer.
I tend to agree with the validity of your point about the overhead of TDD for 'one developer' or 'hobby' projects not justifying the expenses.
You have to consider however that most best practices are relevant and useful if they are consistently applied for a long period of time.
For example TDD is saving you testing/bugfixing time in a long run, not within 5 minutes after you've created the first unit test.
You're a contract programmer which means that you will leave your current project when it will be finished and will switch to something else, most likely in another company. Your current client will have to maintain and support your application. If you do not leave the support team a good framework to work with they will be stuck. TDD will help the project to be sustainable. It will increase the stability of the code base so other people with less experience will not be able not do too much damage trying to change it.
The same applies for the hobby projects. You may be tired of it and will want to pass it to someone. You might become commercially successful (think Craiglist) and will have 5 more people working besides you.
Investment in proper process always pays-off, even if it is just gained experience. But most of the time you will be grateful that when you started a new project you decided to do it properly
You have to consider OTHER people when doing something. You you have to think ahead, plan for growth, plan for sustainability.
If you don't want to do that - stick to the cowboy coding, it's much simpler this way.
P.S. The same thing applies to other practices:
If you don't comment your code and you have ideal memory you'll be fine but someone else reading your code will not.
If you don't document your discussions with the customer somebody else will not know anything about a crucial decision you made
etc ad infinitum
I no longer refactor anything without a reasonable set of unit tests.
I don't do full-on TDD with unit tests first and code second. I do CALTAL -- Code A LIttle, Test A Little -- development. Generally, code goes first, but not always.
When I find that I've got to refactor, I make sure I've got enough tests and then I hack away at the structure with complete confidence that I don't have to keep the entire old-architecture-becomes-new-architecture plan in my head. I just have to get the tests to pass again.
I refactor the important bits. Get the existing suite of tests to pass.
Then I realize I forgot something, and I'm back to CALTAL development on the new stuff.
Then I see things I forgot to delete -- but are they really unused everywhere? Delete 'em and see what fails in the testing.
Just yesterday -- part way through a big refactoring -- I realized that I still didn't have the exact right design. But the tests still had to pass, so I was free to refactor my refactoring before I was even done with the first refactoring. (whew!) And it all worked nicely because I had a set of tests to validate the changes against.
For flying solo TDD is my copilot.
TDD lets me more clearly define the problem in my head. That helps me focus on implementing just the functionality that is required, and nothing more. It also helps me create a better API, because I'm writing a "client" before I write the code itself. I can also refactor without having to worry about breaking anything.
I'm going to answer this question quite quickly, and hopefully you will start to see some of the reasoning, even if you still disagree. :)
If you are lucky enough to be on a long-running project, then there will be times when you want to, for example, write your data tier first, then maybe the business tier, before moving on up the stack. If your client then makes a requirement change that requires re-work on your data layer, a set of unit tests on the data layer will ensure that your methods don't fail in undesirable ways (assuming you update the tests to reflect the new requirements). However, you are likely to be calling the data layer method from the business layer as well, and possibly in several places.
Let's assume you have 3 calls to a method in the business layer, but you only modify 2. In the third method, you may still be getting data back from your data layer that appears to be valid, but may break some of the assumptions you coded months before. Unit tests at this level (and above) should have been designed to spot broken assumptions, and in failing they should highlight to you that there is a section of code that needs to be revisited.
I'm hoping that this very simplistic example will be enough to get you thinking about TDD a little more, and that it might create a spark that makes you consider using it. Of course, if you still don't see the point, and you are confident in your own abilities to keep track of many thousands of lines of code, then I have no place to tell you you should start TDD.
The point about writing the tests first is that it enforces the requirements and design decisions you are making. When I mod the code, I want to make sure those are still enforced and it is easy enough to "break" something without getting a compiler or run-time error.
I have a test-first approach because I want to have a high degree of confidence in my code. Granted, the tests need to be good tests or they don't enforce anything.
I've got some pretty large code bases that I work on and there is a lot of non-trivial stuff going on. It is easy enough to make changes that ripple and suddenly X happens when X should never happen. My tests have saved me on several occasions from making a critical (but subtle) error that might have gone unnoticed by human testers.
When the tests do fail, they are opportunities to look at them and the production code and make sure that it is correct. Sometimes the design changes and the tests will need to be modified. Sometimes I'll write something that passes 99 out of 100 tests. That 1 test that didn't pass is like a co-worker reviewing my code (in a sense) to make sure I'm still building what I'm supposed to be building.
I feel that as a solo developer on a project, especially a larger one, you tend to be spread pretty thin.
You are in the middle of a large refactoring when all of a sudden a couple of critical bugs are detected that for some reason did not show up during pre-release testing. In this case you have to drop everything and fix them and after having spent two weeks tearing your hair out you can finally get back to whatever you were doing before.
A week later one of your largest customers realizes that they absolutely must have this cool new shiny feature or otherwise they won't place the order for those 1M units they should have already ordered a month ago.
Now, three months later you don't even remember why you started refactoring in the first place let alone what the code you are refactoring was supposed to do. Thank god you did a good job writing those unit tests because at least they tell you that your refactored code is still doing what it was supposed to do.
Lather, rinse, repeat.
..story of my life for the past 6 months. :-/
Sole developer should use TDD on his project (track record does not matter), since eventually this project could be passed to some other developer. Or more developers could be brought in.
New people will have extremely have hard time working with the code without the tests. They will break things.
Does your client own the source code when you deliver the product? If you can convince them that delivering the product with unit tests adds value, then you are up-selling your services and delivering a better product. From the client's perspective, test coverage not only ensures quality, it allows future maintainers to understand the code much more readily since the tests isolate functionality from the UI.
I think TDD as a methodology is not just about "having tests when making changes", thus it does not depend on team- nor on project size. It's about noting one's expectations about what a pice of code/an application does BEFORE one starts to really think about HOW the noted behaviour is implemented. The main focus of TDD is not only having test in place for written code but writing less code because you just do what make the test green (and refactor later).
If you're like me and find it quite hard to think about what a part/the whole application does WITHOUT thinking about how to implement it, I think its fine to write your test after your code and thus letting the code "drive" the tests.
If your question isn't so much about test-first (TDD) or test-after (good coding?) I think testing should be standard practise for any developer, wether alone or in a big team, who creates code which stays in production longer than three months. In my expirience that's the time-span after which even the original author has to think hard about what these twenty lines of complex, super-optimized, but sparsely documented code really code do. If you've got tests (which cover all paths throughth the code), there less to think - and less to ERR about, even years later...
Here are a few memes and my responses:
"TDD made me think about how it would fail, which made me a better programmer"
Given enough experience, being higly concerned with failure modes should naturally become part of your process anyway.
"Applications need to work properly"
This assumes you are able to test absolutely everything. You're not going to be any better at covering all possible tests correctly than you were at writing the functional code correctly in the first place. "Applications need to work better" is a much better argument. I agree with that, but it's idealistic and not quite tangible enough to motivate as much as I wish it would. Metrics/anecdotes would be great here.
"Worked great for my <library component X>"
I said in the question I saw value in these cases, but thanks for the anecdote.
"Think of the next developer"
This is probably one of the best arguments to me. However, it is quite likely that the next developer wouldn't practice TDD either, and it would therefore be a waste or possibly even a burden in that case. Back-door evangelism is what it amounts to there. I'm quite sure a TDD developer would really appeciate it, though.
How much are you going to appreciate projects done in deprecated must-do methodologies when you inherit one? RUP, anyone? Think of what TDD means to next developer if TDD isn't as great as everyone thinks it is.
"Refactoring is a lot easier"
Refactoring is a skill like any other, and iterative development certainly requires this skill. I tend to throw away considerable amounts of code if I think the new design will save time in the long run, and it feels like there would be an awful number of tests thrown away too. Which is more efficient? I don't know.
...
I would probably recommend some level of TDD to anyone new... but I'm still having trouble with the benefits for anyone who's been around the block a few times already. I will probably start adding automated tests to libraries. It's possible that after doing that, I'll see more value in doing it generally.
Motivated self interest.
In my case, sole developer translates to small business owner. I've written a reasonable amount of library code to (ostensibly) make my life easier. A lot of these routines and classes aren't rocket science, so I can be pretty sure they work properly (at least in most cases) by reviewing the code, some some spot testing and debugging into the methods to make sure they behave the way I think they do. Brute force, if you will. Life is good.
Over time, this library grows and gets used in more projects for different customers. Testing gets more time consuming. Especially cases where I'm (hopefully) fixing bugs and (even more hopefully) not breaking something else. And this isn't just for bugs in my code. I have to be careful adding functionality (customers keep asking for more "stuff") or making sure code still works when moved to a new version of my compiler (Delphi!), third party code, runtime environment or operating system.
Taken to the extreme, I could spend more time reviewing old code than working on new (read: billable) projects. Think of it as the angle of repose of software (how high can you stack untested software before it falls over :).
Techniques like TDD gives me methods and classes that are more thoughtfully designed, more thoroughly tested (before the customer gets them) and need less maintenance going forward.
Ultimately, it translates to less time doing maintenance and more time to spend doing things that are more profitable, more interesting (almost anything) and more important (like family).
We are all developers with a good track record. After all, we are all reading Stackoverflow. And many of us use TDD and perhaps those people have a great track record. I get hired because people want someone who writes great test automation and can teach that to others. When working alone, I do TDD on my coding projects at home because I found that if I don’t, I spent time doing manual testing or even debugging, and who needs that. (Perhaps those people have only good track records. I don’t know.)
When it comes to being a good automobile driver, everyone believes they are a “good driver.” This is a cognitive bias all drivers have. Programmers have their own biases. The reasons developers such as the OP don’t do TDD are covered in this Agile Thoughts podcast series. The podcast archive also has content on test automation concepts such as the test pyramid, and an intro about what is TDD and why write tests first starting with episode 9 in the podcast archive.

When is it good (if ever) to scrap production code and start over? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I was asked to do a code review and report on the feasibility of adding a new feature to one of our new products, one that I haven't personally worked on until now. I know it's easy to nitpick someone else's code, but I'd say it's in bad shape (while trying to be as objective as possible). Some highlights from my code review:
Abuse of threads: QueueUserWorkItem and threads in general are used a lot, and Thread-pool delegates have uninformative names such as PoolStart and PoolStart2. There is also a lack of proper synchronization between threads, in particular accessing UI objects on threads other than the UI thread.
Magic numbers and magic strings: Some Const's and Enum's are defined in the code, but much of the code relies on literal values.
Global variables: Many variables are declared global and may or may not be initialized depending on what code paths get followed and what order things occur in. This gets very confusing when the code is also jumping around between threads.
Compiler warnings: The main solution file contains 500+ warnings, and the total number is unknown to me. I got a warning from Visual Studio that it couldn't display any more warnings.
Half-finished classes: The code was worked on and added to here and there, and I think this led to people forgetting what they had done before, so there are a few seemingly half-finished classes and empty stubs.
Not Invented Here: The product duplicates functionality that already exists in common libraries used by other products, such as data access helpers, error logging helpers, and user interface helpers.
Separation of concerns: I think someone was holding the book upside down when they read about the typical "UI -> business layer -> data access layer" 3-tier architecture. In this codebase, the UI layer directly accesses the database, because the business layer is partially implemented but mostly ignored due to not being fleshed out fully enough, and the data access layer controls the UI layer. Most of the low-level database and network methods operate on a global reference to the main form, and directly show, hide, and modify the form. Where the rather thin business layer is actually used, it also tends to control the UI directly. Most of this lower-level code also uses MessageBox.Show to display error messages when an exception occurs, and most swallow the original exception. This of course makes it a bit more complicated to start writing units tests to verify the functionality of the program before attempting to refactor it.
I'm just scratching the surface here, but my question is simple enough: Would it make more sense to take the time to refactor the existing codebase, focusing on one issue at a time, or would you consider rewriting the entire thing from scratch?
EDIT: To clarify a bit, we do have the original requirements for the project, which is why starting over could be an option. Another way to phrase my question is: Can code ever reach a point where the cost of maintaining it would become greater than the cost of dumping it and starting over?
Without any offense intended, the decision to rewrite a codebase from scratch is a common, and serious management mistake newbie software developers make.
There are many disadvantages to be wary of.
Rewrites stop new features from being developed cold for months/years. Few, if any companies can afford to stand-still for this long.
Most development schedules are difficult to nail. This rewrite will be no exception. Amplify the previous point by, now, a delay in development.
Bugs that were fixed in the existing codebase through painful experience will be re-introduced. Joel Spolsky has more examples in this article.
Danger of falling victim to the Second-system effect -- in summary, ``People who have designed something only once before try to do all the things they "didn't get to do last time", loading the project up with all the things they put off while making version one, even if most of them should be put off in version two as well.''
Once this expensive, burdensome rewrite is completed, the very next team to inherit the new codebase is likely to use the same excuses for doing another rewrite. Programmers hate learning someone else's code. No one writes perfect code because perfection is so subjective. Find me any real-world application and I can give you a damning indictment and rationale for doing a from-scratch rewrite.
Whether you ultimately rewrite from scratch or not, beginning a refactoring phase now is a good way to both really sit down and understand the problem so that the rewrite will go more smoothly if truly called for, as well as giving the existing codebase an honest look to really see if a rewrite's needed.
To actually scrap and start over?
When the current code doesn't do what you would like it to do, and would be cost prohibitive to change.
I'm sure someone will now link Joel's article about Netscape throwing their code away and how it's oh-so-terrible and a huge mistake. I don't want to talk about it in detail, but if you do link that article, before you do so, consider this: the IE engine, the engine that allowed MS to release IE 4, 5, 5.5, and 6 in quick succession, the IE engine that totally destroyed Netscape... it was new. Trident was a new engine after they threw away the IE 3 engine because it didn't provide a suitable basis for their future development work. MS did that which Joel says you must never do, and it is because MS did so that they had a browser that allowed them to completely eclipse Netscape. So please... just meditate on that thought for a moment before you link Joel and say "oh you should never do it, it's a terrible idea".
A rule of thumb I've found useful is that if given a code base, if I have to re-write more than 25% of the code to make it work or modify it based upon new requirements, you may as well re-write it from scratch.
The reasoning is that you can only patch a body of code so far; beyond a certain point, it's quicker to do over.
There's an underlying assumption that you have a mechanism (such as thorough unit and/or system tests) that will tell you whether your re-written version is functionally equivalent (where it needs to be) as the original.
If it requires more time to read and understand the code (if that is even possible)
than it would to rewrite the entire application, I say scrap it and start over.
Be very carefull with this:
Are you sure you aren't just being lazy and not bothering to read the code
Are you being arrogant about the great code you will write compared to the rubbish anyone else produced.
Remember tested-working code is worth a lot more than imaginary yet-to-be-written code
In the words of our estemed host and overlord, Joel - things you should never do,
it's not always wrong to abandon working code - but you have to be sure about the reason.
I saw an application re-architected within 2 years of its introduction into production, and others rewritten in different technologies (one was C++ - now Java). Both efforts were were not, to my mind, successful.
I prefer a more evolutionary approach to bad software. If you can "componentize" your old app such that you can introduce your new requirements and interface with the old code, you can ease yourself into the new environment without having to "sell" the zero-value (from a biz perspective) investment in rewriting.
Suggested approach - write unit tests for the functionality with which you wish to interface to 1) ensure the code behaves as you expect and 2) provide a safety net for any refactoring that you may wish to do on the old base.
Bad code is the norm. I think IT gets a bad rap from business for favoring rewrites/rearchitecting/etc. They pay the money and "trust" us (as an industry) to deliver solid, extensible code. Sadly, business pressures frequently result in shortcuts that make the code unmaintainable. Sometimes it's bad programmers... sometimes bad situations.
To answer your rephrased question... can code maintenance costs ever exceed rewriting costs... the answer is clearly yes. I don't see anything in your examples, however, that lead me to believe this is your case. I think those issues can be addressed with tests and refactoring.
In terms of business value, I would think it's extremely rare that a real case can be made for a rewrite due solely to the internal state of the code. If the product's customer-facing and is currently live and bringing in money (i.e. is not a mothballed or unreleased product), then consider that:
You already have customers using it. They're familiar with it, and might have built some of their own assets around it. (Other systems that interface to it; products based on it; processes they'd have to change; staff they'd maybe have to retrain). All of this costs the customer money.
Re-writing it might cost less in the long term than making difficult changes and fixes. But you can't quantify that yet, unless your app is no more complex than Hello World. And a re-write means a re-test and a redeploy, and probably an upgrade path for your customers.
Who says the re-write will be any better? Can you honestly say your firm is writing sparkly code now? Have the practices that turned the original code to spaghetti been corrected? (Even if the main culprit was a single developer, where were his peers and management, ensuring quality through reviews, testing, etc.?)
In terms of technical reasons, I'd suggest it could be time for a major rewrite if the original has some technical dependencies that have become problematic. e.g. a third party dependency that's now out of support, etc.
In general though, I think the most sensible move is to refactor piece by piece (very small pieces if it's really that bad), and improve the internal architecture incrementally rather than in one big drop.
Two threads of thought on this one: Do you have the original requirements? Do you have confidence that the original requirements are accurate? What about test plans or unit tests? If you have those things in place it might be easier.
Putting on my customer hat, does the system work or is it unstable? If you've got something that's unstable you've got an argument to change; otherwise you're best of refactoring it bit by bit.
I think the line in the sand is when basic maintenance is taking 25% - 50% longer than it should. There comes a time when maintaining legacy code becomes too costly. A number of factors contribute to the final decision. Time and cost being the most important factors I think.
If there are clean interfaces and you can cleanly delineate module boundaries, then it might be worth refactoring it module by module or layer by layer in order to allow you to migrate existing customers forward into cleaner more stable codebases, and over time, after you've refactored every module, you will have rewritten everything.
But, based on the codereview, doesn't sound like there would be any clean boundaries.
I wonder if the people who vote for scrapping and starting over have ever successfully refactored a large project, or at least seen a large project in poor condition that they think could use a refactoring?
If anything, I err on the opposite side: I've seen 4 large projects that were a mess, that I advocated refactoring as opposed to rewriting. On a couple, there was barely a single line of original code that remained, and major interfaces changed in significant ways, but the process never involved the entire project failing to function as well as it originally did, for any more than a week. (And top-of-trunk was never broken).
Perhaps a project exists that is so severely broken that to attempt to refactor it would be doomed to failure, or perhaps one of the previous projects I refactored would have been better served by a "clean re-write", but I'm not sure I'd know how to recognize it.
I agree with Martin. You really need to weigh the effort that will be involved in writing the app from scratch against the current state of the app and how many people use it, do they like it, etc. Often we may want to completely start from scratch, but the cost far outweighs the benefit. I come across bits of ugly looking code all the time, but I soon realize that some of these 'ugly' areas are really bug fixes and make the program work correctly.
I would try to consider the architecture of the system and see whether it is possible to scrap and rewrite specific well defined components without starting everything from scratch.
What would usually happen is that you can either do that (and then sell that to the customer/management), or that you find out that the code is such a horrible and tangled mess that you become even more convinced that you need a rewrite and have more convincing arguments for it (including: "if we engineer it right, we would never need to scrap the whole thing and do a third rewrite).
Slow maintenance would eventually cause that architectural drift that would make a rewrite more expensive later.
Scrap old code early and often. When in doubt, throw it out. The hard part is convincing non-technical folks of the cost-to-maintain.
So long as the value derived appears to be greater than the cost to operate and maintain, there's still positive value flowing from the software. The question surrounding a rewrite this: "will we get even more value from a rewrite?" Or alternatively "How much more value will we get from a rewrite?" How many person-hours of maintenance will you save?
Remember, the rewrite investment is once only. The return on the rewrite investment lasts forever. Forever.
Focus the value question down to specific issues. You listed a bunch of them above. Stick with that.
"Will we get more value by reducing cost through
dropping the junk that we don't use
but still have to wade through?"
"Will we get more value from dropping the junk that's unreliable and breaks?"
"Will we get more value if we understand it -- not by documenting, but by replacing with something we built as a team?"
Do you homework. You'll have to confront the following show-stoppers.
These will originate somewhere in your executive foodchain from someone who'll respond as follows:
"Is it broken?" And when you say "It's not crashed as such," They'll say "It's not broke - don't fix it."
"You've done the code analysis, you understand it, you no longer need to fix it."
What's your answer to them?
That's only the first hurdle. Here's the worst possible situation. This doesn't always happen, but it does happen with alarming frequency.
Someone in your executive foodchain will have this thought:
"A rewrite doesn't create enough value. Rather than simply rewrite, let's expand it." The justification is that by creating enough value, users are more likely to buy in to the rewrite.
A project where scope is expanded -- artificially -- to add value is usually doomed.
Instead, do the smallest rewrite you can to replace the darn thing. Then expand to fit real needs and add value.
You can only give a definite yes to rewriting in case if you know completely how your application works (and by completely I mean it, not just having a general idea of how it should work) and you know more or less exactly how to make it better. Any other cases and it's a shot in the dark, it depends on too much things. Perhaps gradual refactoring would be safer if it is possible.
If possible, I typically would prefer to rewrite smaller portions of the code over time when I need to refactor a baseline. There are typically many smaller issues such as magic number, poor commenting, etc. that tend to make the code look worse than it actually is. So, unless the baseline is just awful, keep the code and just make improvements at the same time you are maintaining the code.
If refactoring requires a lot of work, I recommend laying out a small re-design plan/todo list that gives you a list of things to work on in order so that you can bring the baseline to a better state. Starting from scratch is always a risky move and you are not guaranteed that the code will be better when you are finished. Using this technique, you will always have a working system that improves over time.
Code with excessively high cyclomatic complexity (like over 100 in a large number of modules) is a good clue. Also, how many bugs does it have / KLOC? How critical are the bugs? How often are bugs introduced when bug fixes are made. If your answer is a lot (I cant remember norms right now), then a rewrite is warranted.
As early as possible. Whenever you get a premonition that your code is slowly turning into an ugly beast that is very likely to consume your soul and give you headaches, and you know the problem is in the underlying structure of the code (so any fix would be a hack, e.g. introduce a global variable), then it's time to start over.
For some reasons people don't like throwing away precious code, but if you feel your better off starting over, you are probably right. Trust your instinct and remember that it wasn't a waste of time, it taught you one more way of NOT approaching the problem. You could (should) always use a version control system so your baby is never really lost.
I do not have any experience with using metrics for this myself, but the
article
"Software Maintainability Metrics Models in Practice" discusses
more or less the same question asked here for two case studies they did.
It starts with the following editor's note:
In the past, when a maintainer
received new code to maintain, the
rule-of-thumb was "If you have to
change more than 40 percent of someone
else's code, you throw it out and
start over." The Maintainability Index
[MI] addressed here gives a much more
quantifiable method to determine when
to "throw it out and start over." This
work was sponsored by the U.S. Air
Force Information Warfare Center and
the U.S. Department of Energy [DOE],
Idaho Field Office, DOE Contract No.
DE-AC07-94ID13223.)
I think the rule was...
The first version is always a throw away
So, if you learned your lesson(s), or his/her lessons, then you can go ahead and write it fresh now that you understand your problem domain better.
Not that there aren't parts that can/should be kept. Tested code is the most valuable code, so if it isn't deficient in any real way other than style, no reason to toss it all out.
When is it good (if ever) to scrap production code and start over?
Never had to do this, but logic would dictate (to me, anyway) that once you pass the inflection point where you're spending more time reworking and fixing bugs in the existing code base than you are adding new functionality, it's time to trash the old stuff and get a fresh start.
If it requires more time to read and understand the code (if that is even possible) than it would to rewrite the entire application, I say scrap it and start over.
I have never completely thrown out code. Even when going from a foxpro system to a c# system.
If the old system worked then why just throw it out?
I have come across a few really bad system. Threads being used where not needed. Horrible inheritance and abuse of interfaces.
It is best to understand what the old code is doing and why it is doing it. Then change it so that it is not confusing.
Of course if the old code doesn't work. I mean can't even compile. Then you might be justified in just starting over. But how often does that actually happen?
Yes, it totally can happen. I've seen money be saved by doing it.
This is not a tech decision, it's a business decision. Code rewrites are long term gains, while "if it ain't totally broke..." is a short term gain. If you are in a first year startup that is focused on getting a product out the door, the answer is usually to just live with it. If you're in an established company, or the errors with the current systems are causing more workload, therefor more company money.. then they might go for it.
Present the problem as best as you can to your GM, use dollar values where you can. "I don't like dealing with it" means nothing. "It'll take twice the time to do everything until this is fixed" means a lot.
I think there are a number of issues here that depend largely on where you are at.
Is the software working well from a customer perspective? (If yes be very careful about changes). I would think there would be little point re-witting unless you were expanding the feature set if the system was working. And are you planning to expand the features and customer base of the software? If so then you have much more reason to change.
As much as anything just trying to understand some else's code even if well written can be difficult, when badly written I would imagine almost impossible. What you describe sounds like something that would be very difficult to expand.
I would take into consideration if the application does what it is intended to do, is required for you to ever make modifications, and are you confident that the app has been thoroughly tested in all scenarios that it will be used in.
Do not invest the time if the app does not need alterations. However, if it doesn't function as you need and you need to control the hours and time invested to make corrections, scrap it and re-write to the standards that your team can support. There's nothing worse than terrible code that you have to support / decipher but still have to live with. Remember, Murphy's Law says it will 10 at night when you'll have to make things work, and that is never productive.
Production code always has some value. The only case where I would truly throw it all out and start again is if we determine the intellectual property is irrevocably contaminated. For example if someone brought large amounts of code from a previous employer, or a large percentage of the code was ripped from a GPLd codebase.
I'm going to post this book every time I see a discussion on Refactoring. Everyone should read "Working Effectively with Legacy Code" by Michael Feathers. I found it to be an excellent book - if nothing else, it's a fun read, and motivational.
When the code has reached a point that is not maintainable or extensible anymore. Is full of short-term hacky fixes. It has lots of coupling. It has long (100+lines) methods. It has database access in the UI. It generates a lot of random, impossible to debug errors.
Bottom line: When maintaining it is more expensive (i.e. takes longer) than rewriting it.
I used to believe in just re-write from scratch, but it is wrong.
http://www.joelonsoftware.com/articles/fog0000000069.html
Changed my mind.
What I would suggested is figuring out a way to properly refactor the code. Keep all existing functionality and test as you go. We have all seen horrible code bases, but it is important to keep the knowledge over time you application has.

Resources