At what point does refactoring become not worth it? - refactoring

Say you have a program that currently functions the way it is supposed to. The application has very poor code behind it, eats up a lot of memory, is unscalable and would take major rewriting to implement any changes in functionality.
At what point does refactoring become less logical then a total rebuild?

Joel wrote a nice essay about this very topic:
Things You Should Never Do, Part 1
The key lesson I got from this is that although the old code is horrible, hurts your eyes and your aesthetic sense, there's a pretty good chance that a lot of that code is patching undocumented errors and problems. Ie., it has a lot of domain knowledge embedded in it and it will be difficult or impossible for you to replicate it. You'll constantly be hitting against bugs-of-omission.
A book I found immensely useful is Working Effectively With Legacy Code by Michael C. Feathers. It offers strategies and methods for approaching even truly ugly legacy code.

One benefit of refactoring over rebuilding is that IF you can do refactoring step by step, i.e. in increments, you can test the increments in the context of the whole system, making development and debugging faster.
Old and deployed code, even when ugly and slow, has the benefit of having been tested thoroughly, and this benefit is lost if you start from scratch.
An incremental refactoring approach also has helps to ensure that there is always a product available which can be shipped (and it's improving constantly).
There is a nice article on the web about how Netscape 6 was written from scratch and it was business-wise a bad idea.

Robert L. Glass suggests that
Modification of reused code is particularly error-prone. If more than 20 to 25 percent of a component is to be revised, it is more efficient and effective to write it from scratch.

Well, the simplest answer is if it will take longer to refactor than it will to rebuild, then you should just rebuild.
If it's a personal project then you might want to rebuild it anyway as you will probably learn more from building from scratch than you would from refactoring, and that's one big objective of personal projects.
However, in a professional time-limited environment, you should always go with whatever costs the company the least amount of money (for the same payoff) in the long run, which means choosing whichever takes less time.
Of course, it can be a little more complicated than that. If other people can be working on features while the refactoring is being done, then that might be a better choice over having everyone wait for a completely new version to be built. In that case rebuilding might take less time than just the refactoring would have taken, but you need to take the entire project and all contributors of the project in to account.

When you spend more time refactoring than actually writing code.

At the point where the software doesn't do what it's supposed to do. Refactoring (changing the code without changing the functionality) makes sense if and only if the functionality is "as intended".

If you can afford the time to completely rebuild the app, don't need to improve functionality incrementally, and don't wish to retain any of the existing code then rewriting is certainly a viable alternative. You can, on the other hand, use refactoring to do an incremental rewrite by slowly replacing the existing functions with equivalent functions that are better written and more efficient.

If the application is very small, then you can rewrite it from scratch. If the application is big, never do it. Rewrite it progressively, one step at a time validating you didn't break anything.
The application is the specification. If your rewrite it from scratch you will most likely run into a lots of insidious bugs because "no one knew that the call to this function was supposed to return 3 in that very specific case" (undocumented behaviour...).
It's always more fun to rewrite from scratch so your brain might trick you into thinking it's the right choice. Be careful, it's most likely not.

I've worked with such applications in the past. The best approach I've found is a gradual one: When you are working on the code, find things that are done multiple times, group them together in functions. Keep a notebook (you know, a real one, with paper, and a pencil or pen) so that you can mark your progress. Use that in combination with your VCS, not instead of it. The notebook can be used to provide an overview of the new functions you've created as part of the refactoring, and the VCS of course fills in the blanks for the details.
Over time, you will have consolidated a lot of code into more appropriate places. Code duplication during this period of time is going to be next to impossible, so just do it as best as you can until you've reached a point where you can really start the refactoring process, auditing the entire code base and working on it as a whole.
If you've not enough time for that process (which will take a very long time), then rewriting from scratch using a test-first approach is probably better.

One option would be to write unit tests to cover the existing application and then start to refactor it bit by bit, using the unit tests to make sure everything works as before.
In an ideal world you'd already have unit tests for the program, but given your comments about the quality of the app I'm guessing you don't...

No document, no original writer, no test case, and a bunch of remaining bugs.

Uncle Bob weighs in with the following:
When is a redesign the right strategy?
I’m glad you asked that question. Here’s the answer. Never.
Look, you made the mess, now clean it up.

I’ve not had much luck with small incremental changes when the code I inherit is really bad. In theory the small incremental approach sounds good, but in practice all it ends up with is a better, but still poorly designed application that everyone thinks is now YOUR design. When things break, people no longer think it is because of the previous code, it now becomes YOUR fault. So, I would not use the word redesign, refactor or anything else that implies to a manager type that you are changing things to your way unless I was really going to do it my way. Otherwise, even though you may have fixed dozens of problems, any problems that still existed (but weren’t discovered) are now going to be attributed to your rework. And be assured that if the code is bad then your fixes will uncover a lot more bugs that were simply ignored before because the code was so bad to begin with.
If you truly know how to develop software systems then I would do a redesign of the whole system. If you don’t TRULY know how to design GOOD software then I’d say stick with the small incremental changes as you may otherwise end up with a code base that is just as bad as the original.
One mistake that is often made when redesigning is that people ignore the original code base. However, redesign does not have to mean totally ignore the old code. The old code still had to do what your new code has to do, so in many cases the steps you need are already in the old code. Copy and Paste then tweak works wonders when redesigning systems. I have found that in many cases, redesigning and rewriting an application and stealing snippets from the original code is far quicker and much more reliable than small incremental changes.

Related

When should you not refactor?

We all know that refactoring is good and I love it as much as the next guy, but do you have real cases where is better not to refactor ?
Something like time critical stuff or synchronization? Technical or human reasons are equally welcome. Real cases scenarios and experiences a plus.
Edit : from the answers thus far, it looks like the only reason not to refactor is money. My question is mostly relative to something like this: suppose you would like to perform "extract method", but if you add the additional function call, you will make the code slightly less faster and hinder a very strict synchronization. Just to give you an idea of what I mean.
Another reason I sometimes heard is that "others used to the current code layout will get annoyed by your changes". Of course, I doubt this is a good reason.
I'm a big fan of refactoring to keep code clean and maintainable. But you generally want to shy away from refactoring production modules that work fine and don't require change. However, when you do need to work on a module to fix bugs or introduce a new feature, some refactoring is usually worth it and won't cost much since you're already committed to doing a full set of tests and going through the release process. (Unit tests are very helpful, but are only part of the full test suite, as other posters noted.)
More significant refactorings may make it harder for others to find their way around the new code, and they may then react unfavorably to refactoring. To minimize this, bring other team members in on the process using an approach like pair programming.
Update (8/10): Another reason to not refactor is when you aren't approaching the existing code base with proper humility and respect. With these qualities you'll tend to be conservative and do only refactorings that really do make a difference. If you approach the code with too much arrogance, you may wind up just making changes instead of refactoring. Is that new method name really clearer, or did the old one have a name with a very specific meaning in your application domain? Did you really need to mechanically reformat that source file to your personal style, when the existing style met project guidelines? Again pair programming can help.
To reinforce the other answer (and touch on issues you mention): do not refactor a part of the code until it's well covered by all relevant kinds of testing. This doesn't mean "don't refactor it" -- the emphasis is on "add the necessary tests" (to do unit-tests properly may well require some refactoring, particularly the introduction of factory DPs and/or dependency injection DPs in code that's now solidly bolted to concrete dependencies).
Note that this does cover your second paragraph's issues: if a section of the code is time-critical it should be well covered by "load-tests" (which like the more usual kind, correctness-test, should cover both specific units [albeit performance-wise -- correctness-checking is other tests' business!-)] AND end-to-end operations -- the equivalent of unit tests and integration tests if one was talking about correctness rather than performance).
Multi-tasking code with subtle sync issues can be a nightmare as no test can really make you entirely confident about it -- no other refactoring (that might in any way affect any fragile sync that just appears to be working now) should be considered BEFORE one intended to make the synchronization much, MUCH more robust and sound (message-passing through guaranteed-threadsafe queues being BY FAR my favorite design pattern in this regard;-).
Hmmm - I disagree with the above (1st response). Given code with no tests, you may refactor it to to make it more testable.
You do not refactor code when you cannot test the resulting code in time to deliver it such that it is still valuable to the recipient.
You do not refactor code when your refactoring will not improve the quality of the code. Quality is not subjective, although at times, design may be.
You do not refactor code when there is no business justification for making an alteration.
There are probably more, but hopefully you get the idea...
As Martin Fowler writes, you shouldn't refactor if a deadline is near. That time in project is better suited to flush out bugs instead of improving design (refactoring). Do the refactoring omitted this time directly after the deadline is over.
Refactoring is not good in and of itself. Rather, its purpose is to improve code quality so that it can be maintained more cheaply and with less risk of adding defects. For actively developed code, the benefits of refactoring are worth the cost. For frozen code that there is no intention to do any further work on, refactoring yields no benefit.
Even for live code, refactoring has its own risks, which unit tests can minimize. It also has its own place in the development cycle, which is towards the front, where it's less disruptive. The best time and place for refactoring is just before you start to make major changes to some otherwise brittle code.
When it is not cost-effective. There's a guy at the place I work who loves refactoring. Making code perfect makes him very happy. He can check out a current project tree and go to town on it, moving functions and classes around and tightening things up so they look great, have better flow, and are more extensible in the future.
Unfortunately, it's not worth the money. If he spends a week refactoring some classes into more functional units that may be easier to work with in the future, that's a week's worth of salary lost to the company with no noticeable bottom-line improvements.
Code will never, ever be absolutely perfect. You learn to live with it, and keep your hands off something that could be done better, but perhaps isn't worth the time.
If the code seems very difficult to refactor without breaking, that's the most important code to refactor!
If there aren't any tests, write some as you refactor.
Honestly, the one case is where you are forbidden to touch some code by management/customer/SomeoneImportant, and when that happens I consider the project broken.
Here is my experience:
Don't refactor:
When you don't have test suite accompanying with the code you want to refactor. You might want to develop the test suit first instead.
When your manager doesn't really care about the maintainablity and extensibility of current code base, instead they care much about if they would be able to deliver the product on schedule, especially for the project with short and tight schedule.
If you stick to the principle that everything that you do should add value for the client/ business, then the times you should not refactor are the following:
Code that works and no new development is planned.
Code that is good enough / works and refactoring simply represents gold plating.
The cost of refactoring is higher than living with the existing code.
The cost of refactoring is higher that rewriting the code from scratch
Some of the other answsers say that you should not refactor code that does not have unit tests. If code needs refactoring, you should refactor it, you must however write tests first. If the code is written in a way that makes it difficult to test, it should be rewritten (in a perfect world).
When you've got other stuff to build. I always feel like refactoring an existing system when I'm supposed to be doing something else.
There's always a balance to be had between fixing or adding to code and refactoring. However, this balance is so far in favor of refactoring that I don't think I've ever been on a team that refactored too much. Chances are, if you think you're erring on the side of refactoring too much, you're right on the money.
Of course, the biggest determining factor is how close the deadline is. If a deadline is imminent, requirements come first.
Isn't the need to refactor code largely based on the propensity of people to cut and paste code rather than thinking the solution through, and doing the factoring in advance? In other words, whenever you feel the need to cut & paste some code, merely make that chunk of code a function, and document it.
I have had to maintain way too much code where people found it easier to cut and paste a whole function, only to make one or two trivial changes, which could easily have been parametrized. But like many other's experience, to try to refactor some of this code would have take a LOT of time and been very risky.
I have 4 projects wherein a 10K line collection of functions was merely copied and modified as needed. This is a horrid maintenance nightmare. Especially when the code has LOTS of problems, e.g. hard-wired endianness assumptions, tons of global variables, etc. I feel bile in my throat just thinking about it.
Don't refactor if you don't have the time to test the refactored code before release. Refactoring can introduce bugs. If you have well-tested and relatively bug-free code, why take the risk? Wait until the next development cycle.
If you're stuck maintaining an old flakey code base with no future beyond keeping it running until management can bite the bullet and do a rewrite then refactoring is a lose-lose situation. First the developer loses because refactoing bad flakey code is a nightmare and secondly the business loses because as the developer attempts to refactor the software breaks in unexpected and unforseen ways.
When you don't really know what the code is doing in the first place. And yes, I have seen people ignore that rule.
It's just a cost-benefit tradeoff. Estimate the cost to refactor, estimate the benefits, determine if you actually have the time to refactor given other tasks, determine if refactoring is the best time-benefit tradeoff. There may be other tasks more worth doing.

Why do we refactor? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I would like to know the reasons that we do refactoring and justify it. I read a lot of people upset over the idea of refactoring. Refactoring was variously described as:
A result of insufficient upfront
design.
Undisciplined hacking
A dangerous activity that needlessly risked destabilizing
working code
A waste of resources.
What are the responsible reasons that lead us to refactor our code?
I also found a similar question here how-often-should-you-refactor, it doesn't provide the reason for refactoring.
Why do we refactor?
Because there's no actual substitute for writing code. No amount of upfront planning or experience can substitute actual code writing. This is what an entire generation (called waterfall) learned the hard way.
Once you start writing the code and be in the middle of it, you reason about the way it works on a lower level you do notice things (performance, usability or correctness things) that escaped the higher design view.
Refactoring is perfecting.
Ask yourself: why do painters do multiple strokes with the brush on the same spot?
Refactoring is the way to pay the technical debt.
I'd like to briefly address three of your points.
1. "A result of insufficient up-front design"
Common sense (and several books and bloggers) tell us we should strive for the simplest, cleanest design possible to address a given problem. While it's quite possible that some code is written without sufficient work on developing an understanding of the requirements and the problem domain, it's probably more common that "poor code" wasn't "poor" when it was written; rather, it is no longer sufficient.
Requirements change, and designs have to support additional features and capabilities. It's not unreasonable to anticipate some future changes up-front, but McConnell et al. rightly caution against high-level, overly-flexible designs when there's no clear and present need for such an approach.
3. "A dangerous activity that needlessly risks destabilising working code"
Well, yes, if done improperly. Before you seek to make any significant modification to a working system, you should put in place proper measures to ensure that you're not causing any harm - a sort of "developmental Hippocratic oath", almost.
Typically, this will be done by a mixture of documentation and testing, and more often than not, the code wins out, because it's the most up-to-date description of the actual behaviour. In practical terms, this translates into having decent coverage with a unit test suite, so that if refactoring does introduce unexpected problems, these are identified and resolved.
Obviously, when you seek to refactor, you're going to break a certain number of tests, not least because you're trying to fix some broken code contracts. It is, however, perfectly possible to refactor with impunity, provided you have that mechanism in place to spot the accidental mistakes.
4. "A waste of resources"
Others have mentioned the concept of technical debt, which is, briefly, the idea that over time, the complexity of such systems builds up, and that some of that build-up has to be reduced, by refactoring and other techniques, in order to reasonably facilitate future development. In other words, sometimes you have to bite the bullet and go ahead with that change you've been putting off, because otherwise you'll be making a bad situation appallingly worse when you come to add something new in that area.
Obviously, there's a time and a place to pay off such things; you wouldn't try and repay a loan until you had the cash to do it, and you can't afford to go around refactoring willy nilly during a critical stage in development. Nevertheless, by making the decision to address some of the problems in your code base, you save future development time, and thus money, and maybe even further into the future, avoid the cost of having to abandon or completely rewrite some component that is beyond your understanding.
In order to keep a maintainable code base?
Code is more read than written, so it is necessary to have a code-base that is readable, understandable and maintainable. When you see something that is poorly written or designed, it can be refactored to improve the design of the code.
You clean your house also regularly, don't you? Although it may be considered a waste of time, it is necessary in order to keep your house clean, so that you have a nice environment to live in.
You may need to refactor if your code is
Inefficient
Buggy
Hard to extend
Hard to maintain
It all boils down to the original code not being very good, so you improve it.
If you have reasonable unit tests it shouldn't be dangerous at all.
Because hindsight is easier than foresight.
Software is one of the most complex things created by humans, so it is not easy to consider everything beforehand. For large projects it can even be impossible for the team (at least for one consisting of humans ;) ) to consider everything before they actually start developing it.
Another reason is that software isn't constructed, it's growing. That means software can and has to adapt to ever changing requirements and environments.
As Martin Fowler says, the only thing surprising about the requirements for software changing is that anyone is surprised by it.
The requirements will change, new features will be requested. This is a good thing. Enhancement efforts succeed most of the time, and when they fail, they fail small, so there is budget to do more. Big up front design projects fail often (one statistics puts the failure rate at 66%), so avoid them. The way to avoid them is to design enough for the first version, and as enhancements are added, refactor to the point where it looks like the system intended to do that in the first place. The lifespan of a project that can do this (there are issues when you publish data formats or APIs - once you go live you can't always be pristine anymore) is indefinite.
In response to the four points, I would say that a process that shuns refactoring demands:
A static world where nothing changes
so that the upfront design can hit a
non-moving target perfectly.
Will
result in ugly hacks to work around
design flaws that aren't being
refactored.
Will lead to dangerous
code duplication as the fear of
changing existing code sets in.
Will
waste resources over engineering the
problem and building large design
artifacts in anticipation of
requirements that never end up
getting built, causing large amounts
of code and complication to drag the
project down while not providing any
value.
One caveat, though. If you don't have the proper support, in an automated tool for simple cases, and thorough unit tests in the more complicated cases, it will hurt, there will be new bugs introduced, and you will develop a (quite rational) fear of doing it more. Refactoring is a great tool, but it requires safety equipment.
Another scenario where you need refactoring is TDD. The textbook approach for TDD is to write only the code you need to pass the test and then refactor it to something nicer afterwards.
...because coding is like gardening. Your codebase grows and you domain changes as time passes. What was a good idea back then often looks like a poor design now and what is a good design now may well not be optimal in the future.
Code should never be considered a permanent artifact nor should it be considered too sacred to touch. Confidence should be garnered through testing and refactoring is a mechanism to facilitate change.
While a lot of other people have already said perfectly valid reasons, here's mine:
Because it's fun. It's like beating your own time in steeplechase, having the stronger bicep in armwrestling or improving your highscore in a game of your choice.
A straightforward answer is, requirements change. No matter how elegant your design is, some requirements later on will not buy it.
Poor understanding of the requirements:
If developers don't have a clear understanding of the requirements, the resulting design and code cannot satisfy the customer. Later as the requirements become more clear, refactor becomes essential.
Supporting new requirements.
If a component is old, in most of the cases it will not be able handle the radical new requirements. It then becomes essential to go for refactoring.
Lots of bugs in the existing code.
If you have spent long hours in office fixing quite a few nasty bugs in a particular component, it becomes a natural choice for refactoring at the earliest.
Upfront: Refactoring does not need to be dangerous when a) supported by tools and b) you have a testsuite that you can run after the refactoring in order to check the functioning of your software.
One of the main reasons for refactoring is that at some point you find out that code is used by more than one code path and you don't want to duplicate (copy&paste) but reuse. This is especially important in cases where you find an error in that code. If you have refactored the duplicated code into an own method, you can fix that method and be done. If you copy&paste code around, there is a high chance that you don't fix all places where this code occurs (just think of projects with several members and thousands of lines of code).
You should of course not do refactoring just because of the sake of refactoring - then it is really a waste of resources.
For whatever reason, when I create or find a function that scrolls off the screen, I know it's time to sit back and consider whether it should be refactored or not - if I'm having to scroll the whole page to take in the function as a whole, chances are it's not a shining example of readability or maintainability.
To make insane stuff sane.
I mainly refactor when the code has suffered so much under copy + paste and a lack of architectural guideance that the action of understanding the code is akin to re-organising it and removing the duplication.
It is human to err, and you're ALWAYS going to make mistakes when you develop software. Creating a good design from the beginning helps a lot, and having skilled programmers on the team is also a good thing, but they will invariably make mistakes, and there will be code that is hard to read, tightly coupled or non-functional, etc. Refactoring is a tool to mend these flaws when they've already occurred. You should never stop working on preventing these things from happening to begin with, but when they do happen, you can fix them.
Refactoring to me is like cleaning my desk; it creates a better working environment because over time it will get messy.
I refactor because, without refactoring, it becomes harder and harder to add new features to a codebase over time. If I have features A, B, and C to add, feature C will be finished sooner, with less pain and suffering on my part, if I take time to refactor after features A and B. I'm happier, my boss is happier, and our customers are happier.
I think it's worth restating, in any conversation involving refactoring, that refactoring is verifiably behavior-preserving. If at the end of your "refactoring" your program has different outputs, or if you only think, but can't prove, that it has the same outputs, then what you've done isn't refactoring. (That doesn't mean it's worthless or not worth doing -- maybe it's an improvement. But it's not refactoring and shouldn't be confused with it.)
Refactoring is a central component in any agile software development methods.
Unless you fully understand all the requirements and technical limitations of your project you can't have a complete upfront design. In this case instead of using a traditional waterfall approach you're probably better off with an agile method - agile methods focus on adapting quickly to changing realities. And how would you adapt your source code without refactoring?
I've found code design and implementation, particularly with unfamiliar and large projects to be a learning process.
The scope and requirements of a project change over time, which has consequences on the design. It may be that after spending some time implementing your product you discover that your planned design is not optimal. Perhaps new requirements were added by the client. Or perhaps you're adding additional functionality to an older product and you need to refactor the code in order to sufficiently provide this functionality.
In my experience code has been written poorly and the refactoring has become necessary to prevent the product from failing and to ensure it is maintainable/extendable.
I believe an iterative design process, with prototyping early on is a good way to minimise refactoring later on. This also allows you to experiment with differing implementations to determine which is most suitable.
Not only that, but new ideas and methods for what you're doing may become available. Why stick with old, fallible code that could become problematic if it can be improved?
In short, projects will change overtime, which necessitates changes in structure to ensure it meets new requirements.
From my own personal experience I refactor because I find if I make software the way I want it made from first go that it takes a very long time to create something.
Therefore I value the pragmatism of developing software over clean code. Once I have something running I then begin to refactor it into the way it should be. Needless to say, the code never devolves into a piece of unreadable tripe.
Just a side note - I did my degree in software engineering after reading some material from Steve Mcconnell as a teen. I love design patterns, good code reuse, nicely thought out designs and so on. But I find when working on my own projects that designing things initially from that point of view just doesnt work unless I'm an absolute expert with the technology I'm using (Which is never the case)
Refactoring is done to help make code easier to understand/document.
To give a method a better name - perhaps the previous wasnt clear or incorrect.
To give variables more descriptive / better names.
Break up a really long method into many smaller methods representing the steps involved in solving the problem.
Move classes to a new package(namespace) to assist organisation.
Reduce duplicate code.
Does point number one even matter? If you're refactoring, the up-front design was obviously flawed. Don't waste time worrying about the flaws in the original design; it's old news. What matters is what you have now, so spend that time refactoring.
I refactor because proper refactoring makes maintenance SO much easier. I've had to maintain a TON of bad, awful code and I don't want to hand down any that I've written for someone else to maintain.
Maintenance costs of smelly code will almost always be higher than maintenance costs for sweet smelling code.
I refactor because:
Often my code is far from optimal first time around.
Hindsight is often 20-20.
My code will be easier to maintain for the next guy.
I have professional pride in the work I leave behind.
I believe time spent now can save a lot more time (and money) further down the track.
All your points are common descriptors of why people do refactor. I would say that the reason people should refactor lies within point #1: A Big Design Up Front (BDUF) is almost always imperfect. You learn about the system as you build it. In trying to anticipate what could happen you often end up building complex solutions to deal with things that never actually happen. (YAGNI - You ain't gonna need it).
Instead of the BDUF approach, a better solution is therefore to design the parts of the system you know you are going to need. Follow the principles of single responsibility principle, use inversion of control/dependency injection so that you can replace parts of your system when needed.
Write tests for your components. And then, when the requirements for your system change or you discover flaws in your initial design, you can refactor and extend your code. Since you have your unit tests and integration tests in place, you will know if and when the refactoring breaks something.
There is a difference between large refactorings (restructuring modules, class hierarchies, interfaces) and "unit" refactorings - within methods and classes.
Whenever I touch a piece of code I do a unit refactoring - renaming variables, extracting methods; because actually seeing the code in front of me gives me more information to make it better. Sometimes refactoring also helps me to better understand what the code is doing. It's like writing or painting, you extract a fuzzy idea out of your head; put a rough skeleton onto paper; then into code. You then refine the rough idea in the code.
With modern refactoring tools like ReSharper in C#, this kind of unit refactoring is extremely easy, quick & low risk.
Large refactorings are harder, break more things, and require communication with your team members. It will become clear to everyone when these need to happen - because requirements have changed so much that the original design no longer works - and then they should be planned like a new feature.
My last rule - only refactor code that you are actually working on. If code's functionality doesn't need to be changed, then it's good enough & doesn't need further work.
Avoid refactoring just for refactoring's sake; that's just refactorbating!

Improving really bad systems

How would you begin improving on a really bad system?
Let me explain what I mean before you recommend creating unit tests and refactoring. I could use those techniques but that would be pointless in this case.
Actually the system is so broken it doesn't do what it needs to do.
For example the system should count how many messages it sends. It mostly works but in some cases it "forgets" to increase the value of the message counter. The problem is that so many other modules with their own workarounds build upon this counter that if I correct the counter the system as a whole would become worse than it is currently. The solution could be to modify all the modules and remove their own corrections, but with 150+ modules that would require so much coordination that I can not afford it.
Even worse, there are some problems that has workarounds not in the system itself, but in people's head. For example the system can not represent more than four related messages in one message group. Some services would require five messages grouped together. The accounting department knows about this limitation and every time they count the messages for these services, they count the message groups and multiply it by 5/4 to get the correct number of the messages. There is absolutely no documentation about these deviations and nobody knows how many such things are present in the system now.
So how would you begin working on improving this system? What strategy would you follow?
A few additional things: I'm a one-men-army working on this so it is not an acceptable answer to hire enough men and redesign/refactor the system. And in a few weeks or months I really should show some visible progression so it is not an option either to do the refactoring myself in a couple of years.
Some technical details: the system is written in Java and PHP but I don't think that really matters. There are two databases behind it, an Oracle and a PostgreSQL one. Besides the flaws mentioned before the code itself is smells too, it is really badly written and documented.
Additional info:
The counter issue is not a synchronization problem. The counter++ statements are added to some modules, and are not added to some other modules. A quick and dirty fix is to add them where they are missing. The long solution is to make it kind of an aspect for the modules that need it, making impossible to forget it later. I have no problems with fixing things like this, but if I would make this change I would break over 10 other modules.
Update:
I accepted Greg D's answer. Even if I like Adam Bellaire's more, it wouldn't help me to know what would be ideal to know. Thanks all for the answers.
Put out the fires. If there are any issues of critical priority, whatever they are, you've got to handle them first. Hack it in if you must, with a smelly codebase it's ok. You know you'll improve it going forward. This is your sales technique targeted at whomever you're reporting to.
Pick some low-hanging fruit. I assume you're relatively new to this particular software and that you were re-tasked to deal with it. Find some apparently easy problems in a related subsystem of the code that shouldn't take more than a day or two to resolve apiece, and fix them. This may involve refactoring, or it may not. The goal is to familiarize yourself with the system and with the style of the original author. You may not get really lucky (One of the two incompetents who worked on my system before me always post-fixed his comments with four punctuation marks instead of one, which made it very easy to distinguish who wrote the particular segment of code.), but you'll develop insight into the author's weaknesses so you know what to look out for. Extensive, tight coupling with global state vs poor understanding of language tools, for example.
Set a big goal. If your experience parallels mine, you'll find yourself in a particular bit of spaghetti code more and more often as you perform the prior step. This is the first knot you need to untangle. With the experience you've gained understanding the component and knowledge about what the original author likely did wrong (and thus, what you need to watch out for), you can start envisioning a better model for this subset of the system. Don't worry if you still have to maintain some messy interfaces to maintain functionality, just take it one step at a time.
Lather, rinse, repeat! :)
Given time, consider adding unit tests for your new model one level underneath your interfaces with the rest of the system. Don't engrave the bad interfaces in code via tests that use them, you'll be changing them in a future iteration.
Addressing the particular issues you mention:
When you run into a situation that users are working around manually, talk with the users about changing it. Verify that they'll accept the change if you provide it before sinking the time into it. If they don't want the change, your job is to maintain the broken behavior.
When you run into a buggy component that multiple other components have worked around, I espouse a parallel component technique. Create a counter that works how the existing one should work. Provide a similar (or, if practical, identical) interface and slide the new component into the codebase. When you touch external components that work around the broken one, try to replace the old component with the new one. Similar interfaces ease porting of the code, and the old component is still around if the new one fails. Don't remove the old component until you can.
What is being asked of you right now? Are you being asked to implement functionality, or fix bugs? Do they even know what they want you to do?
If you don't have the manpower, time, or resources to "fix" the system as a whole, then all you can do is bail water. You're saying you should be able to make some "visible progress" in a few months' time. Well, with the system being as bad as you described, you may actually make the system worse. Under pressure to do something noticeable, you'll simply add code, and make the sysem even more convoluted.
You need to refactor, eventually. There is no way around it. If you can find a way to refactor that is visible to your end users, that would be ideal, even if it takes 6-9 months or a year instead of "a few months." But if you can't, then you have a choice to make:
Refactor, and risk being viewed as "not accomplishing anything" despite your efforts
Don't refactor, accomplish "visible" goals, and make the system more convoluted and more difficult to refactor one day. (Maybe after you find a better job, and hope the next developer to come along can never find out where you live.)
Which one is most beneficial to you personally depends on your company's culture. Will they one day decide to hire more developers, or replace this system completely with some other product?
Conversely, if your efforts to "fix things" actually break other things, will they be understanding about the monstrosity you're being asked to tackle single-handedly?
No easy answers here, sorry. You have to evaluate based on your unique, individual situation.
This is a whole book that will basically say unit test and refactor, but with more practical advice on how to do it
http://ecx.images-amazon.com/images/I/51RCXGPXQ8L._SL500_AA240_.jpg
http://www.amazon.com/Working-Effectively-Legacy-Robert-Martin/dp/0131177052
You open the directory that contains this system with Windows Explorer. Then, press Ctrl-A, and then Shift-Delete. That sounds like an improvement in your case.
Seriously though: that counter sounds like it's got thread-safety issues. I'd put a lock around the increasing functions.
And regarding the rest of the system, you can't do the impossible so try to do the possible. You need to attack your system from two fronts. Take care of the more visibly problematic issues first, so you can show progress. At the same time, you should deal with the more infrastructural problems, so that you have a chance at actually fixing this thing some day.
Good luck, and may the source be with you.
Pick one area that would be of medium difficulty to refactor. Create a skeleton of the original code with only the method signatures of the existing ones; maybe use an Interface even. Then start hacking away. You can even point the "new" methods to the old ones until you get to them.
Then, testing, testing, testing. Since there aren't any unit tests, maybe just use good old fashioned Voice-Activated-Unit Tests (people)? Or write your own tests as you go.
Document your progress as you go in some kind of repository, including frustrations and questions, so that when the next poor schmuck who gets this project won't be where you are :).
Once you get the first part done, move on to the next. The key is to build on top of incremental progress, that's why you shouldn't start with the hardest part first; it'll be too easy to get demoralized.
Joel has a couple of articles on rewriting/refactoring:
http://www.joelonsoftware.com/articles/fog0000000069.html
http://www.joelonsoftware.com/articles/fog0000000348.html
I've been working with a legacy system with the same characteristics for almost three years now, and there are no shortcuts that I'm aware of.
What bothers me most with our legacy system is that I'm not allowed to fix some bugs, since many other functions could break if I fixed them. This calls for ugly workarounds or creating new versions of old functions. Calls to the old functions can then be replaced with the new one at a time (while testing).
I'm not sure what the goal of your task is, but I strongly advise you to touch as little of the code as possible. Only do what you need to do.
You may want to get as much as possible documented by interviewing people. This is a huge task, since you don't know which questions to ask, and people will have forgotten a lot of details.
Other than that: make sure you're getting paid and enough moral support. There will be weeping and gnashing of teeth...
Well you need to start somewhere, and it sounds like there are bugs that need fixing. I would work through those bugs, making quick win refactorings, and writing any unit tests possible along the way. I would also use a tool like SourceMonitor to identify some of the most 'complex' parts of code in the system and see if I could simplify their design in any way. Ultimately, you just have to accept that it will be a slow process, and make small steps towards a better system.
I would try to pick a part of the system that could be extracted and rewritten in isolation fairly quickly. Even if it doesn't do much, you could show progress pretty quickly, and you don't have the problem of interfacing with the legacy code directly.
Hopefully, if you could pick off a few such tasks, they will see you making visible progress, and you could put forward an argument for hiring more people to rewrite the bigger modules. When parts of the system rely on broken behaviour, you don't have much choice but to separate before you fix anything.
Hopefully, you could gradually build a team capable of rewriting the whole lot.
All of this would have to go hand in hand with some decent training, otherwise people's old habits will stick, and your work will get the blame when things don't work as expected.
Good luck!
Deprecate everything that currently exists that has problems, and write new ones that work correctly. Document as much as you can about what will change and put big red flashing signs all over the place pointing to this documentation.
By doing it that way, you can keep your existing bugs (the ones that are being compensated for somewhere else) around without slowing down your progress towards getting an actual working system.

When is it good (if ever) to scrap production code and start over? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I was asked to do a code review and report on the feasibility of adding a new feature to one of our new products, one that I haven't personally worked on until now. I know it's easy to nitpick someone else's code, but I'd say it's in bad shape (while trying to be as objective as possible). Some highlights from my code review:
Abuse of threads: QueueUserWorkItem and threads in general are used a lot, and Thread-pool delegates have uninformative names such as PoolStart and PoolStart2. There is also a lack of proper synchronization between threads, in particular accessing UI objects on threads other than the UI thread.
Magic numbers and magic strings: Some Const's and Enum's are defined in the code, but much of the code relies on literal values.
Global variables: Many variables are declared global and may or may not be initialized depending on what code paths get followed and what order things occur in. This gets very confusing when the code is also jumping around between threads.
Compiler warnings: The main solution file contains 500+ warnings, and the total number is unknown to me. I got a warning from Visual Studio that it couldn't display any more warnings.
Half-finished classes: The code was worked on and added to here and there, and I think this led to people forgetting what they had done before, so there are a few seemingly half-finished classes and empty stubs.
Not Invented Here: The product duplicates functionality that already exists in common libraries used by other products, such as data access helpers, error logging helpers, and user interface helpers.
Separation of concerns: I think someone was holding the book upside down when they read about the typical "UI -> business layer -> data access layer" 3-tier architecture. In this codebase, the UI layer directly accesses the database, because the business layer is partially implemented but mostly ignored due to not being fleshed out fully enough, and the data access layer controls the UI layer. Most of the low-level database and network methods operate on a global reference to the main form, and directly show, hide, and modify the form. Where the rather thin business layer is actually used, it also tends to control the UI directly. Most of this lower-level code also uses MessageBox.Show to display error messages when an exception occurs, and most swallow the original exception. This of course makes it a bit more complicated to start writing units tests to verify the functionality of the program before attempting to refactor it.
I'm just scratching the surface here, but my question is simple enough: Would it make more sense to take the time to refactor the existing codebase, focusing on one issue at a time, or would you consider rewriting the entire thing from scratch?
EDIT: To clarify a bit, we do have the original requirements for the project, which is why starting over could be an option. Another way to phrase my question is: Can code ever reach a point where the cost of maintaining it would become greater than the cost of dumping it and starting over?
Without any offense intended, the decision to rewrite a codebase from scratch is a common, and serious management mistake newbie software developers make.
There are many disadvantages to be wary of.
Rewrites stop new features from being developed cold for months/years. Few, if any companies can afford to stand-still for this long.
Most development schedules are difficult to nail. This rewrite will be no exception. Amplify the previous point by, now, a delay in development.
Bugs that were fixed in the existing codebase through painful experience will be re-introduced. Joel Spolsky has more examples in this article.
Danger of falling victim to the Second-system effect -- in summary, ``People who have designed something only once before try to do all the things they "didn't get to do last time", loading the project up with all the things they put off while making version one, even if most of them should be put off in version two as well.''
Once this expensive, burdensome rewrite is completed, the very next team to inherit the new codebase is likely to use the same excuses for doing another rewrite. Programmers hate learning someone else's code. No one writes perfect code because perfection is so subjective. Find me any real-world application and I can give you a damning indictment and rationale for doing a from-scratch rewrite.
Whether you ultimately rewrite from scratch or not, beginning a refactoring phase now is a good way to both really sit down and understand the problem so that the rewrite will go more smoothly if truly called for, as well as giving the existing codebase an honest look to really see if a rewrite's needed.
To actually scrap and start over?
When the current code doesn't do what you would like it to do, and would be cost prohibitive to change.
I'm sure someone will now link Joel's article about Netscape throwing their code away and how it's oh-so-terrible and a huge mistake. I don't want to talk about it in detail, but if you do link that article, before you do so, consider this: the IE engine, the engine that allowed MS to release IE 4, 5, 5.5, and 6 in quick succession, the IE engine that totally destroyed Netscape... it was new. Trident was a new engine after they threw away the IE 3 engine because it didn't provide a suitable basis for their future development work. MS did that which Joel says you must never do, and it is because MS did so that they had a browser that allowed them to completely eclipse Netscape. So please... just meditate on that thought for a moment before you link Joel and say "oh you should never do it, it's a terrible idea".
A rule of thumb I've found useful is that if given a code base, if I have to re-write more than 25% of the code to make it work or modify it based upon new requirements, you may as well re-write it from scratch.
The reasoning is that you can only patch a body of code so far; beyond a certain point, it's quicker to do over.
There's an underlying assumption that you have a mechanism (such as thorough unit and/or system tests) that will tell you whether your re-written version is functionally equivalent (where it needs to be) as the original.
If it requires more time to read and understand the code (if that is even possible)
than it would to rewrite the entire application, I say scrap it and start over.
Be very carefull with this:
Are you sure you aren't just being lazy and not bothering to read the code
Are you being arrogant about the great code you will write compared to the rubbish anyone else produced.
Remember tested-working code is worth a lot more than imaginary yet-to-be-written code
In the words of our estemed host and overlord, Joel - things you should never do,
it's not always wrong to abandon working code - but you have to be sure about the reason.
I saw an application re-architected within 2 years of its introduction into production, and others rewritten in different technologies (one was C++ - now Java). Both efforts were were not, to my mind, successful.
I prefer a more evolutionary approach to bad software. If you can "componentize" your old app such that you can introduce your new requirements and interface with the old code, you can ease yourself into the new environment without having to "sell" the zero-value (from a biz perspective) investment in rewriting.
Suggested approach - write unit tests for the functionality with which you wish to interface to 1) ensure the code behaves as you expect and 2) provide a safety net for any refactoring that you may wish to do on the old base.
Bad code is the norm. I think IT gets a bad rap from business for favoring rewrites/rearchitecting/etc. They pay the money and "trust" us (as an industry) to deliver solid, extensible code. Sadly, business pressures frequently result in shortcuts that make the code unmaintainable. Sometimes it's bad programmers... sometimes bad situations.
To answer your rephrased question... can code maintenance costs ever exceed rewriting costs... the answer is clearly yes. I don't see anything in your examples, however, that lead me to believe this is your case. I think those issues can be addressed with tests and refactoring.
In terms of business value, I would think it's extremely rare that a real case can be made for a rewrite due solely to the internal state of the code. If the product's customer-facing and is currently live and bringing in money (i.e. is not a mothballed or unreleased product), then consider that:
You already have customers using it. They're familiar with it, and might have built some of their own assets around it. (Other systems that interface to it; products based on it; processes they'd have to change; staff they'd maybe have to retrain). All of this costs the customer money.
Re-writing it might cost less in the long term than making difficult changes and fixes. But you can't quantify that yet, unless your app is no more complex than Hello World. And a re-write means a re-test and a redeploy, and probably an upgrade path for your customers.
Who says the re-write will be any better? Can you honestly say your firm is writing sparkly code now? Have the practices that turned the original code to spaghetti been corrected? (Even if the main culprit was a single developer, where were his peers and management, ensuring quality through reviews, testing, etc.?)
In terms of technical reasons, I'd suggest it could be time for a major rewrite if the original has some technical dependencies that have become problematic. e.g. a third party dependency that's now out of support, etc.
In general though, I think the most sensible move is to refactor piece by piece (very small pieces if it's really that bad), and improve the internal architecture incrementally rather than in one big drop.
Two threads of thought on this one: Do you have the original requirements? Do you have confidence that the original requirements are accurate? What about test plans or unit tests? If you have those things in place it might be easier.
Putting on my customer hat, does the system work or is it unstable? If you've got something that's unstable you've got an argument to change; otherwise you're best of refactoring it bit by bit.
I think the line in the sand is when basic maintenance is taking 25% - 50% longer than it should. There comes a time when maintaining legacy code becomes too costly. A number of factors contribute to the final decision. Time and cost being the most important factors I think.
If there are clean interfaces and you can cleanly delineate module boundaries, then it might be worth refactoring it module by module or layer by layer in order to allow you to migrate existing customers forward into cleaner more stable codebases, and over time, after you've refactored every module, you will have rewritten everything.
But, based on the codereview, doesn't sound like there would be any clean boundaries.
I wonder if the people who vote for scrapping and starting over have ever successfully refactored a large project, or at least seen a large project in poor condition that they think could use a refactoring?
If anything, I err on the opposite side: I've seen 4 large projects that were a mess, that I advocated refactoring as opposed to rewriting. On a couple, there was barely a single line of original code that remained, and major interfaces changed in significant ways, but the process never involved the entire project failing to function as well as it originally did, for any more than a week. (And top-of-trunk was never broken).
Perhaps a project exists that is so severely broken that to attempt to refactor it would be doomed to failure, or perhaps one of the previous projects I refactored would have been better served by a "clean re-write", but I'm not sure I'd know how to recognize it.
I agree with Martin. You really need to weigh the effort that will be involved in writing the app from scratch against the current state of the app and how many people use it, do they like it, etc. Often we may want to completely start from scratch, but the cost far outweighs the benefit. I come across bits of ugly looking code all the time, but I soon realize that some of these 'ugly' areas are really bug fixes and make the program work correctly.
I would try to consider the architecture of the system and see whether it is possible to scrap and rewrite specific well defined components without starting everything from scratch.
What would usually happen is that you can either do that (and then sell that to the customer/management), or that you find out that the code is such a horrible and tangled mess that you become even more convinced that you need a rewrite and have more convincing arguments for it (including: "if we engineer it right, we would never need to scrap the whole thing and do a third rewrite).
Slow maintenance would eventually cause that architectural drift that would make a rewrite more expensive later.
Scrap old code early and often. When in doubt, throw it out. The hard part is convincing non-technical folks of the cost-to-maintain.
So long as the value derived appears to be greater than the cost to operate and maintain, there's still positive value flowing from the software. The question surrounding a rewrite this: "will we get even more value from a rewrite?" Or alternatively "How much more value will we get from a rewrite?" How many person-hours of maintenance will you save?
Remember, the rewrite investment is once only. The return on the rewrite investment lasts forever. Forever.
Focus the value question down to specific issues. You listed a bunch of them above. Stick with that.
"Will we get more value by reducing cost through
dropping the junk that we don't use
but still have to wade through?"
"Will we get more value from dropping the junk that's unreliable and breaks?"
"Will we get more value if we understand it -- not by documenting, but by replacing with something we built as a team?"
Do you homework. You'll have to confront the following show-stoppers.
These will originate somewhere in your executive foodchain from someone who'll respond as follows:
"Is it broken?" And when you say "It's not crashed as such," They'll say "It's not broke - don't fix it."
"You've done the code analysis, you understand it, you no longer need to fix it."
What's your answer to them?
That's only the first hurdle. Here's the worst possible situation. This doesn't always happen, but it does happen with alarming frequency.
Someone in your executive foodchain will have this thought:
"A rewrite doesn't create enough value. Rather than simply rewrite, let's expand it." The justification is that by creating enough value, users are more likely to buy in to the rewrite.
A project where scope is expanded -- artificially -- to add value is usually doomed.
Instead, do the smallest rewrite you can to replace the darn thing. Then expand to fit real needs and add value.
You can only give a definite yes to rewriting in case if you know completely how your application works (and by completely I mean it, not just having a general idea of how it should work) and you know more or less exactly how to make it better. Any other cases and it's a shot in the dark, it depends on too much things. Perhaps gradual refactoring would be safer if it is possible.
If possible, I typically would prefer to rewrite smaller portions of the code over time when I need to refactor a baseline. There are typically many smaller issues such as magic number, poor commenting, etc. that tend to make the code look worse than it actually is. So, unless the baseline is just awful, keep the code and just make improvements at the same time you are maintaining the code.
If refactoring requires a lot of work, I recommend laying out a small re-design plan/todo list that gives you a list of things to work on in order so that you can bring the baseline to a better state. Starting from scratch is always a risky move and you are not guaranteed that the code will be better when you are finished. Using this technique, you will always have a working system that improves over time.
Code with excessively high cyclomatic complexity (like over 100 in a large number of modules) is a good clue. Also, how many bugs does it have / KLOC? How critical are the bugs? How often are bugs introduced when bug fixes are made. If your answer is a lot (I cant remember norms right now), then a rewrite is warranted.
As early as possible. Whenever you get a premonition that your code is slowly turning into an ugly beast that is very likely to consume your soul and give you headaches, and you know the problem is in the underlying structure of the code (so any fix would be a hack, e.g. introduce a global variable), then it's time to start over.
For some reasons people don't like throwing away precious code, but if you feel your better off starting over, you are probably right. Trust your instinct and remember that it wasn't a waste of time, it taught you one more way of NOT approaching the problem. You could (should) always use a version control system so your baby is never really lost.
I do not have any experience with using metrics for this myself, but the
article
"Software Maintainability Metrics Models in Practice" discusses
more or less the same question asked here for two case studies they did.
It starts with the following editor's note:
In the past, when a maintainer
received new code to maintain, the
rule-of-thumb was "If you have to
change more than 40 percent of someone
else's code, you throw it out and
start over." The Maintainability Index
[MI] addressed here gives a much more
quantifiable method to determine when
to "throw it out and start over." This
work was sponsored by the U.S. Air
Force Information Warfare Center and
the U.S. Department of Energy [DOE],
Idaho Field Office, DOE Contract No.
DE-AC07-94ID13223.)
I think the rule was...
The first version is always a throw away
So, if you learned your lesson(s), or his/her lessons, then you can go ahead and write it fresh now that you understand your problem domain better.
Not that there aren't parts that can/should be kept. Tested code is the most valuable code, so if it isn't deficient in any real way other than style, no reason to toss it all out.
When is it good (if ever) to scrap production code and start over?
Never had to do this, but logic would dictate (to me, anyway) that once you pass the inflection point where you're spending more time reworking and fixing bugs in the existing code base than you are adding new functionality, it's time to trash the old stuff and get a fresh start.
If it requires more time to read and understand the code (if that is even possible) than it would to rewrite the entire application, I say scrap it and start over.
I have never completely thrown out code. Even when going from a foxpro system to a c# system.
If the old system worked then why just throw it out?
I have come across a few really bad system. Threads being used where not needed. Horrible inheritance and abuse of interfaces.
It is best to understand what the old code is doing and why it is doing it. Then change it so that it is not confusing.
Of course if the old code doesn't work. I mean can't even compile. Then you might be justified in just starting over. But how often does that actually happen?
Yes, it totally can happen. I've seen money be saved by doing it.
This is not a tech decision, it's a business decision. Code rewrites are long term gains, while "if it ain't totally broke..." is a short term gain. If you are in a first year startup that is focused on getting a product out the door, the answer is usually to just live with it. If you're in an established company, or the errors with the current systems are causing more workload, therefor more company money.. then they might go for it.
Present the problem as best as you can to your GM, use dollar values where you can. "I don't like dealing with it" means nothing. "It'll take twice the time to do everything until this is fixed" means a lot.
I think there are a number of issues here that depend largely on where you are at.
Is the software working well from a customer perspective? (If yes be very careful about changes). I would think there would be little point re-witting unless you were expanding the feature set if the system was working. And are you planning to expand the features and customer base of the software? If so then you have much more reason to change.
As much as anything just trying to understand some else's code even if well written can be difficult, when badly written I would imagine almost impossible. What you describe sounds like something that would be very difficult to expand.
I would take into consideration if the application does what it is intended to do, is required for you to ever make modifications, and are you confident that the app has been thoroughly tested in all scenarios that it will be used in.
Do not invest the time if the app does not need alterations. However, if it doesn't function as you need and you need to control the hours and time invested to make corrections, scrap it and re-write to the standards that your team can support. There's nothing worse than terrible code that you have to support / decipher but still have to live with. Remember, Murphy's Law says it will 10 at night when you'll have to make things work, and that is never productive.
Production code always has some value. The only case where I would truly throw it all out and start again is if we determine the intellectual property is irrevocably contaminated. For example if someone brought large amounts of code from a previous employer, or a large percentage of the code was ripped from a GPLd codebase.
I'm going to post this book every time I see a discussion on Refactoring. Everyone should read "Working Effectively with Legacy Code" by Michael Feathers. I found it to be an excellent book - if nothing else, it's a fun read, and motivational.
When the code has reached a point that is not maintainable or extensible anymore. Is full of short-term hacky fixes. It has lots of coupling. It has long (100+lines) methods. It has database access in the UI. It generates a lot of random, impossible to debug errors.
Bottom line: When maintaining it is more expensive (i.e. takes longer) than rewriting it.
I used to believe in just re-write from scratch, but it is wrong.
http://www.joelonsoftware.com/articles/fog0000000069.html
Changed my mind.
What I would suggested is figuring out a way to properly refactor the code. Keep all existing functionality and test as you go. We have all seen horrible code bases, but it is important to keep the knowledge over time you application has.

Rewrite or repair?

I'm sure you have all been there, you take on a project where there is a creaky old code base which is barely fit for purpose and you have to make the decision to either re-write it from scratch or repair what already exists.
Conventional wisdom tends to suggest that you should never attempt a re-write from scratch as the risk of failure is very high. So what did you do when faced with this problem, how did you make the decision and how did it turn out?
It really depends on how bad it is.
If it's a small system, and you fully understand it, then a rewrite is not crazy.
On the other hand, if it's a giant legacy monster with ten million lines of undocumented mystery code, then you're really going to have a hard time with a full rewrite.
Points to consider:
If it looks good to the user, they
won't care what kind of spaghetti
mess it is for you. On the other
hand, if it's bad for them too, then
it's easier to get agreement (and
patience).
If you do rewrite, try to do it one
part at a time. A messy,
disorganized codebase may make this
difficult (i.e, replacing just one
part requires a rewrite of large
icebergs of dependency code), but if
possible, this makes it a lot easier
to gradually do the rewrite and get
feedback from users along the way.
I would really hesitate to take on a giant rewrite project for a large system without being able to release the new edition one part at a time.
Just clean up the code a little bit every time you work with it. If there isn't one already, setup a unit testing framework. All new code should get tests written. Any old code you fix as a result of bugs, try to slide in tests too.
As the cleanups progress, you'll be able to sweep more and more of the nasty code into encapsulated bins. Then you can pick those off one by one in the future.
A tool like javadoc or doxygen, if not already in use, can also help improve code documentation and comprehensibility.
The arguments against a complete rewrite a pretty strong. Those tons of "little bugs" and behaviors that were coded in over the time frame of the original project will sneak right back in again.
See Joel Spolsky's essay Things You Should Never Do. In summary, when you rewrite you lose all the lessons you learned to make your current code work the way it needs to work.
See also: Big Ball of Mud
It is rare for a re-write of anything complex to succeed. It's tempting, but a low percentage strategy.
Get legacy code under unit tests and refactor it, and/or completely replace small portions of it incrementally when opportune.
Refactor unless it is very bad indeed.
Joel has a lot to say on this...
At the very least, rewrite the code with the old code in front of you and don't just start over from scratch. The old code may be terrible, but it is the way it is for a reason and if you ignore it you'll end up seeing the same bugs that were probably fixed years ago in the old code.
One reason for rewriting at one of my previous jobs was an inability to find developers with enough experience to work on the original code base.
The decision was made to first clean up the underlying database structure, then rewrite in something that would make it easier to find full-time employees and/or contractors.
I haven't heard yet how it worked out :)
I think people have a tendency to go for rewrites because it seems more fun on the surface.
We get to rebuild from scratch!
We'll do it right this time!
etc.
There is a new book coming out, Brownfield Application Development in .NET by Baley and Belcham. The first chapter is free, and talks about these issues from a mostly platform agnostic perspective.
Repair, or more importantly, refactor. Both because Joel said so and also because, if it's your code, you've probably learned a ton more stuff since you touched this code last. If you wrote it in .NET 1.1, you can upgrade it to 3.5 SP1. You get to go in and purge all the old commented out code. You're 100x better as a developer now than when you first wrote this code.
The one exception I think is when the code uses really antiquated technologies - in which case you might be better served by writing a new version. If you're looking at some VB6 app with 10,000 lines of code with an Access database backend obviously set up by someone who didn't know much about how databases work (which could very well be you eight years ago) then you can probably pull off a quicker, C#/SQL-based solution in a fraction of the time and code.
It's not so black and white... it really depends on a lot of factors (the more important being "what does the person paying you want you to do")
Where I work we re-wrote a development framework, and on the other hand, we keep modifying some old systems that cannot be migrated (because of the client's technology and time restrictions). In this case, we try to mantain the coding style and sometimes you have to implement a lot of workarounds because of the way it was built
Depending on your situation, you might have another option: in-license third-party code.
I've consulted at a couple of companies where that would be the sensible choice, although seemingly "throwing away IP" can be a big barrier for management. At my current company, we seriously considered the viable option of using third-party code to replace our core framework, but that idea was ultimately rejected more for business reasons than technical reasons.
To directly answer your question, we finally chose to rewrite the legacy framework - a decision we didn't take lightly! 14 months on, we don't regret this choice at all. Just considering the time spent fixing bugs, our new framework has nearly paid for itself. On the negative side, it is not quite feature-complete yet so we are in the unenviable position of maintaining two separate frameworks in parallel until we can port the last of our "front-end" applications.
I highly recommend reading "Working Effectively with Legacy Code" by Michael Feathers. It's coaching advice on how to refactor your code so that it is unit testable.

Resources