Code refactoring seems to be a necessary evil. How can one make it less painful? - refactoring

I was working on a project that involved refactoring a large codebase that had been in development for several years. One of the difficulties I encountered was that the codebase was very complex and had a lot of interdependent components, which made it difficult to understand how everything fit together. This posed a challenge because it meant that I had to spend a lot of time trying to get a grasp on the overall architecture of the codebase before I could even begin to consider how to refactor it.
Another difficulty I faced was that the codebase was very large, with thousands of lines of code spread out over hundreds of files. This made it difficult to keep track of all the changes I was making and to ensure that I wasn't breaking anything in the process. It also made it hard to test and verify that the refactored code was working as intended.
Overall, these difficulties have slowed down my progress on the code refactoring project and have made it more challenging than I anticipated. I have had to take extra care and spend more time than expected to ensure that I am refactoring the code correctly and not introducing any new bugs
I tried refactoring a large codebase by breaking it down into smaller, more manageable chunks and working on one chunk at a time. I expected the refactoring process to be smoother and easier to manage this way. However, I actually encountered more difficulties than I anticipated due to the complexity and interdependence of the code, and it has taken longer than expected to complete the refactoring process.

Related

Why do we refactor? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I would like to know the reasons that we do refactoring and justify it. I read a lot of people upset over the idea of refactoring. Refactoring was variously described as:
A result of insufficient upfront
design.
Undisciplined hacking
A dangerous activity that needlessly risked destabilizing
working code
A waste of resources.
What are the responsible reasons that lead us to refactor our code?
I also found a similar question here how-often-should-you-refactor, it doesn't provide the reason for refactoring.
Why do we refactor?
Because there's no actual substitute for writing code. No amount of upfront planning or experience can substitute actual code writing. This is what an entire generation (called waterfall) learned the hard way.
Once you start writing the code and be in the middle of it, you reason about the way it works on a lower level you do notice things (performance, usability or correctness things) that escaped the higher design view.
Refactoring is perfecting.
Ask yourself: why do painters do multiple strokes with the brush on the same spot?
Refactoring is the way to pay the technical debt.
I'd like to briefly address three of your points.
1. "A result of insufficient up-front design"
Common sense (and several books and bloggers) tell us we should strive for the simplest, cleanest design possible to address a given problem. While it's quite possible that some code is written without sufficient work on developing an understanding of the requirements and the problem domain, it's probably more common that "poor code" wasn't "poor" when it was written; rather, it is no longer sufficient.
Requirements change, and designs have to support additional features and capabilities. It's not unreasonable to anticipate some future changes up-front, but McConnell et al. rightly caution against high-level, overly-flexible designs when there's no clear and present need for such an approach.
3. "A dangerous activity that needlessly risks destabilising working code"
Well, yes, if done improperly. Before you seek to make any significant modification to a working system, you should put in place proper measures to ensure that you're not causing any harm - a sort of "developmental Hippocratic oath", almost.
Typically, this will be done by a mixture of documentation and testing, and more often than not, the code wins out, because it's the most up-to-date description of the actual behaviour. In practical terms, this translates into having decent coverage with a unit test suite, so that if refactoring does introduce unexpected problems, these are identified and resolved.
Obviously, when you seek to refactor, you're going to break a certain number of tests, not least because you're trying to fix some broken code contracts. It is, however, perfectly possible to refactor with impunity, provided you have that mechanism in place to spot the accidental mistakes.
4. "A waste of resources"
Others have mentioned the concept of technical debt, which is, briefly, the idea that over time, the complexity of such systems builds up, and that some of that build-up has to be reduced, by refactoring and other techniques, in order to reasonably facilitate future development. In other words, sometimes you have to bite the bullet and go ahead with that change you've been putting off, because otherwise you'll be making a bad situation appallingly worse when you come to add something new in that area.
Obviously, there's a time and a place to pay off such things; you wouldn't try and repay a loan until you had the cash to do it, and you can't afford to go around refactoring willy nilly during a critical stage in development. Nevertheless, by making the decision to address some of the problems in your code base, you save future development time, and thus money, and maybe even further into the future, avoid the cost of having to abandon or completely rewrite some component that is beyond your understanding.
In order to keep a maintainable code base?
Code is more read than written, so it is necessary to have a code-base that is readable, understandable and maintainable. When you see something that is poorly written or designed, it can be refactored to improve the design of the code.
You clean your house also regularly, don't you? Although it may be considered a waste of time, it is necessary in order to keep your house clean, so that you have a nice environment to live in.
You may need to refactor if your code is
Inefficient
Buggy
Hard to extend
Hard to maintain
It all boils down to the original code not being very good, so you improve it.
If you have reasonable unit tests it shouldn't be dangerous at all.
Because hindsight is easier than foresight.
Software is one of the most complex things created by humans, so it is not easy to consider everything beforehand. For large projects it can even be impossible for the team (at least for one consisting of humans ;) ) to consider everything before they actually start developing it.
Another reason is that software isn't constructed, it's growing. That means software can and has to adapt to ever changing requirements and environments.
As Martin Fowler says, the only thing surprising about the requirements for software changing is that anyone is surprised by it.
The requirements will change, new features will be requested. This is a good thing. Enhancement efforts succeed most of the time, and when they fail, they fail small, so there is budget to do more. Big up front design projects fail often (one statistics puts the failure rate at 66%), so avoid them. The way to avoid them is to design enough for the first version, and as enhancements are added, refactor to the point where it looks like the system intended to do that in the first place. The lifespan of a project that can do this (there are issues when you publish data formats or APIs - once you go live you can't always be pristine anymore) is indefinite.
In response to the four points, I would say that a process that shuns refactoring demands:
A static world where nothing changes
so that the upfront design can hit a
non-moving target perfectly.
Will
result in ugly hacks to work around
design flaws that aren't being
refactored.
Will lead to dangerous
code duplication as the fear of
changing existing code sets in.
Will
waste resources over engineering the
problem and building large design
artifacts in anticipation of
requirements that never end up
getting built, causing large amounts
of code and complication to drag the
project down while not providing any
value.
One caveat, though. If you don't have the proper support, in an automated tool for simple cases, and thorough unit tests in the more complicated cases, it will hurt, there will be new bugs introduced, and you will develop a (quite rational) fear of doing it more. Refactoring is a great tool, but it requires safety equipment.
Another scenario where you need refactoring is TDD. The textbook approach for TDD is to write only the code you need to pass the test and then refactor it to something nicer afterwards.
...because coding is like gardening. Your codebase grows and you domain changes as time passes. What was a good idea back then often looks like a poor design now and what is a good design now may well not be optimal in the future.
Code should never be considered a permanent artifact nor should it be considered too sacred to touch. Confidence should be garnered through testing and refactoring is a mechanism to facilitate change.
While a lot of other people have already said perfectly valid reasons, here's mine:
Because it's fun. It's like beating your own time in steeplechase, having the stronger bicep in armwrestling or improving your highscore in a game of your choice.
A straightforward answer is, requirements change. No matter how elegant your design is, some requirements later on will not buy it.
Poor understanding of the requirements:
If developers don't have a clear understanding of the requirements, the resulting design and code cannot satisfy the customer. Later as the requirements become more clear, refactor becomes essential.
Supporting new requirements.
If a component is old, in most of the cases it will not be able handle the radical new requirements. It then becomes essential to go for refactoring.
Lots of bugs in the existing code.
If you have spent long hours in office fixing quite a few nasty bugs in a particular component, it becomes a natural choice for refactoring at the earliest.
Upfront: Refactoring does not need to be dangerous when a) supported by tools and b) you have a testsuite that you can run after the refactoring in order to check the functioning of your software.
One of the main reasons for refactoring is that at some point you find out that code is used by more than one code path and you don't want to duplicate (copy&paste) but reuse. This is especially important in cases where you find an error in that code. If you have refactored the duplicated code into an own method, you can fix that method and be done. If you copy&paste code around, there is a high chance that you don't fix all places where this code occurs (just think of projects with several members and thousands of lines of code).
You should of course not do refactoring just because of the sake of refactoring - then it is really a waste of resources.
For whatever reason, when I create or find a function that scrolls off the screen, I know it's time to sit back and consider whether it should be refactored or not - if I'm having to scroll the whole page to take in the function as a whole, chances are it's not a shining example of readability or maintainability.
To make insane stuff sane.
I mainly refactor when the code has suffered so much under copy + paste and a lack of architectural guideance that the action of understanding the code is akin to re-organising it and removing the duplication.
It is human to err, and you're ALWAYS going to make mistakes when you develop software. Creating a good design from the beginning helps a lot, and having skilled programmers on the team is also a good thing, but they will invariably make mistakes, and there will be code that is hard to read, tightly coupled or non-functional, etc. Refactoring is a tool to mend these flaws when they've already occurred. You should never stop working on preventing these things from happening to begin with, but when they do happen, you can fix them.
Refactoring to me is like cleaning my desk; it creates a better working environment because over time it will get messy.
I refactor because, without refactoring, it becomes harder and harder to add new features to a codebase over time. If I have features A, B, and C to add, feature C will be finished sooner, with less pain and suffering on my part, if I take time to refactor after features A and B. I'm happier, my boss is happier, and our customers are happier.
I think it's worth restating, in any conversation involving refactoring, that refactoring is verifiably behavior-preserving. If at the end of your "refactoring" your program has different outputs, or if you only think, but can't prove, that it has the same outputs, then what you've done isn't refactoring. (That doesn't mean it's worthless or not worth doing -- maybe it's an improvement. But it's not refactoring and shouldn't be confused with it.)
Refactoring is a central component in any agile software development methods.
Unless you fully understand all the requirements and technical limitations of your project you can't have a complete upfront design. In this case instead of using a traditional waterfall approach you're probably better off with an agile method - agile methods focus on adapting quickly to changing realities. And how would you adapt your source code without refactoring?
I've found code design and implementation, particularly with unfamiliar and large projects to be a learning process.
The scope and requirements of a project change over time, which has consequences on the design. It may be that after spending some time implementing your product you discover that your planned design is not optimal. Perhaps new requirements were added by the client. Or perhaps you're adding additional functionality to an older product and you need to refactor the code in order to sufficiently provide this functionality.
In my experience code has been written poorly and the refactoring has become necessary to prevent the product from failing and to ensure it is maintainable/extendable.
I believe an iterative design process, with prototyping early on is a good way to minimise refactoring later on. This also allows you to experiment with differing implementations to determine which is most suitable.
Not only that, but new ideas and methods for what you're doing may become available. Why stick with old, fallible code that could become problematic if it can be improved?
In short, projects will change overtime, which necessitates changes in structure to ensure it meets new requirements.
From my own personal experience I refactor because I find if I make software the way I want it made from first go that it takes a very long time to create something.
Therefore I value the pragmatism of developing software over clean code. Once I have something running I then begin to refactor it into the way it should be. Needless to say, the code never devolves into a piece of unreadable tripe.
Just a side note - I did my degree in software engineering after reading some material from Steve Mcconnell as a teen. I love design patterns, good code reuse, nicely thought out designs and so on. But I find when working on my own projects that designing things initially from that point of view just doesnt work unless I'm an absolute expert with the technology I'm using (Which is never the case)
Refactoring is done to help make code easier to understand/document.
To give a method a better name - perhaps the previous wasnt clear or incorrect.
To give variables more descriptive / better names.
Break up a really long method into many smaller methods representing the steps involved in solving the problem.
Move classes to a new package(namespace) to assist organisation.
Reduce duplicate code.
Does point number one even matter? If you're refactoring, the up-front design was obviously flawed. Don't waste time worrying about the flaws in the original design; it's old news. What matters is what you have now, so spend that time refactoring.
I refactor because proper refactoring makes maintenance SO much easier. I've had to maintain a TON of bad, awful code and I don't want to hand down any that I've written for someone else to maintain.
Maintenance costs of smelly code will almost always be higher than maintenance costs for sweet smelling code.
I refactor because:
Often my code is far from optimal first time around.
Hindsight is often 20-20.
My code will be easier to maintain for the next guy.
I have professional pride in the work I leave behind.
I believe time spent now can save a lot more time (and money) further down the track.
All your points are common descriptors of why people do refactor. I would say that the reason people should refactor lies within point #1: A Big Design Up Front (BDUF) is almost always imperfect. You learn about the system as you build it. In trying to anticipate what could happen you often end up building complex solutions to deal with things that never actually happen. (YAGNI - You ain't gonna need it).
Instead of the BDUF approach, a better solution is therefore to design the parts of the system you know you are going to need. Follow the principles of single responsibility principle, use inversion of control/dependency injection so that you can replace parts of your system when needed.
Write tests for your components. And then, when the requirements for your system change or you discover flaws in your initial design, you can refactor and extend your code. Since you have your unit tests and integration tests in place, you will know if and when the refactoring breaks something.
There is a difference between large refactorings (restructuring modules, class hierarchies, interfaces) and "unit" refactorings - within methods and classes.
Whenever I touch a piece of code I do a unit refactoring - renaming variables, extracting methods; because actually seeing the code in front of me gives me more information to make it better. Sometimes refactoring also helps me to better understand what the code is doing. It's like writing or painting, you extract a fuzzy idea out of your head; put a rough skeleton onto paper; then into code. You then refine the rough idea in the code.
With modern refactoring tools like ReSharper in C#, this kind of unit refactoring is extremely easy, quick & low risk.
Large refactorings are harder, break more things, and require communication with your team members. It will become clear to everyone when these need to happen - because requirements have changed so much that the original design no longer works - and then they should be planned like a new feature.
My last rule - only refactor code that you are actually working on. If code's functionality doesn't need to be changed, then it's good enough & doesn't need further work.
Avoid refactoring just for refactoring's sake; that's just refactorbating!

How to refactor rapidly evolving code?

I have some research code that's a real rat's nest, with code duplication everywhere, and clearly needs to be refactored. However, the code base is evolving as I come up with new variations on the theme and fit them into the codebase. The reason I've put off refactoring so long is because I feel like the minute I spend a few days coming up with good abstractions, seeing what design patterns fit where, etc., I'll want to try out some new unforeseen idea that makes my abstractions completely inadequate. In other words, because of the rate at which the code is evolving, I really have no idea where abstraction lines belong, even though there is no shortage of (approximate) duplication and the general messiness of the code makes adding stuff to it a real pain. What are some general best practices for coping with this kind of situation?
Don't spend so long refactoring!
When you're about make a change in a piece of code, consider refactoring it to make the change easier.
After making the change, refactor again to clean up the damage done by that change.
In both cases, make the refactorings small and do them quickly, and move on.
You don't have to keep your code pristine at all times, but remember that it's easier to go fast if you have well-factored code to work in (and if you have good unit tests, of course).
Test Driven Development:
Red, Green, Refactor. Rinse, repeat.
Since it's one of the steps in every single cycle, you'll notice that's a LOT of usually minor refactoring taking place. That's the way it should be.
Your situation is pretty familiar to me. While doing investigative coding often you have no idea what the "right" abstraction will be, and as you say it can change with every new idea.Other posters have suggested:
Continuous small refactoring, which helps to avoid getting into the rats-nest situation
Test-Driven Development, which helps to find good, re-usable abstractions. It's important to note that TDD is less about testing than about doing good designs!
However, for investigative research code there is another strategy: the prototype. This seems to be what you are currently doing: coding as quickly as possible to prove a concept. There's nothing wrong with that, but a prototype should always be throw-away. Tweak it until you have all the necessary input and knowledge, then throw away the code and start over with TDD and continuous refactoring, and all your other "doing the things right" strategies.
Don't keep any of the code. Don't copy-paste anything. Don't refer back to it. Just start over with your new knowledge.
Clean up the code a little bit at a time. Always when you touch a class, try to leave the class cleaner that it was before you touched it ("the boy scout rule"). Refactoring is best done in very small steps, but very often.
Things like renaming some variable, splitting a method etc. take only some seconds or minutes. Large refactorings such as splitting or joining classes, may take an hour or two (and you make it in small steps, so that all tests pass at least every five minutes - otherwise you have entered Refactoring Hell and you should revert to the last known working state). If it takes days or weeks for you to refactor something, then it's not anymore "refactoring" - it's more like rewriting.
An article about this topic:
http://blog.objectmentor.com/articles/2007/07/20/whats-your-unit-of-measure
Put it in Distributed SCM like Git at least, that way when you break something refactoring you can reverse time divisibly to find the commit prior to the change, as well as being able to work on changes and commit them in branches without interfering with others work.
Gits Branch merge is great for things like this and you'll know easily if 2 people made incompatible changes in parallel without having to worry about the rest of the code.
For the above reasons, I would also create a seperate branch in the repository just for re factoring code with, and keep it up-dated regularly. This way, not only will others not interfere with your progress, but they can keep an eye on it and see changes in it that will eventually hit the main branch so they can pre-emptively code around those changes.
If you already know where there is duplication, you don't need several days to refactor it away.
Sometimes a rewrite is the only choice. This seems to be the case.
The CloneDR finds duplicate code, both exact copies and near-misses, across large source systems, parameterized by langauge syntax. It supports Java, C#, COBOL, C++, PHP and many other languages.
When it shows a parameterized abstraction of a set of found clones, it is essentially proposing that you refactor the code with that abstraction implemented (as a method, a function, a class, ...).
So running the CloneDR gets a list of potential abstractions to be added to your code, and replacing the clone instances by calls on the abstraction refactors your code thus cleaning it up (somewhat).
Even more remarkably, when it shows the parameter bindings used at each clone site needed to invoke the abstraction, it often shows a bungled clone instance, easily recognized when the bound paramters are conceptually inconsistent. If a parameer is bound to variables named YYYY-MM-DD, and one of them is YY-MM-DD, the "its a 4 digit-year" parameter type looks violated and in this this case there's a broken Y2K remediation. So examining the clone bindings often finds bugs.
This is a very common problem in scientific computing. Some of the most effective ideas for reducing the size and complexity of code require leveraging assumptions, and science demands that you constantly change those assumptions.
All you can do is try to refactor your code as you go, and try not to write yourself into any corners. Also work with good people who understand the value of not making a mess.

Do you refactor in small steps?

Having read Fowler's "Refactoring" for a while, I still often catch myself thinking "I should have done this in smaller steps." -- even when I did not broke my code.
Refactoring in small steps is safe, but cost time. It's a trade off between speed and risk -- I try to be strategic in choosing the way how I am refactoring.
Nevertheless: Most the time I am doing refactorings in larger steps. If I took some of Fowler's "Mechanics" section and compare how I am working, I maybe find that I often leap two or five steps forward at once. This does not mean that I am a refactoring guru. My code maybe stay for 5 - 60 minutes broken or uncompilable.
Do you refactor in smaller steps and try to produce unbroken code in shorter frequencies? And: Are you successful in doing this?
Martin Fowler seems to lean towards the small, gradual refactoring approach. However, after reading his book he does occasionally make some drastic steps but only with unit tests to back up the code.
Refactoring is a controlled technique for improving the design of an existing code base. Its essence is applying a series of small behavior-preserving transformations, each of which "too small to be worth doing". However the cumulative effect of each of these transformations is quite significant. By doing them in small steps you reduce the risk of introducing errors. You also avoid having the system broken while you are carrying out the restructuring - which allows you to gradually refactor a system over an extended period of time. - Martin Fowler
I try :) The one urge I have to resist most while refactoring is actually making other changes along the way. Say I'm refactoring some code and see something unrelated in the code. I have to make a conscious effort not to go "fix" that as well. Make a note of it and move on. For one thing, it's a distraction from the task at hand. It also ends up polluting your change set so your commit message now has to document several seemingly random changes.
Yes, always. I think the real essence of refactoring is picking which steps to start with.
I find the thing with refactoring large changes in a safe manner is always to have a reasonably clear picture of where you want to go. Then consider your existing system and try to find out which pieces you can introduce that have least likelyhood of being a radical change. Then you can introduce these in a controlled and well tested manner.
So what you do is to work in the vincinity of the nastiness. Not always attacking directly from the front, but sometimes just chipping away small pieces. Usually I wait, and only go for the "big prize" after a few rounds of chipping away at minor nastiness. But I know where I want to go.
The nice thing about working this way is that you can maintain progress. You never "stop development to do refactoring". Arguably there are cases where stopping is the correct situation, but most of the time it's not.
The idea here is that if you "start" with cashing in the prize money, you will be spending the next X days doing the drudgery. And there's risk, maybe you chicken out or it doesn't work - or spend 6 months instead of a week. If you do the drudgery first, cashing in the prize will be possible with less risk. And your code will improve as you go. Sometimes you can decide that doing half the job was enough, since your understanding of the problem increases. Sometimes your idea of where you wanted to go was slightly botched, and you can realign your goal as you progress.
But its tempting to go straight for the reward.
I tend to refactor in large steps most of the time so I can see the forest from the trees. It's a "stream of consciousness" kind of programming. As long as you have your last working version safe in your source control of choice...
that's where the "red, green, refactor" approach is useful. At each stage you have the ability to verify that your code's behaviour is unchanged, and the refactoring only has to integrate the new behaviour.
The rule of thumb I use is refactor with tests and only refactor as much code as you are confident too.
At 60 minutes are you certain that your code is doing exactly what it should be. You'd need a lot of tests to pass. I would just try and get one going and then move on to the next.
If I have a clear picture of what I want to do, and if I can easily verify that I haven't broken anything afterwards, I am taking larger steps.
If the refactoring is more complicated, I try to break it down into smaller steps and do heavy testing after each one.
I usually refactor code as I change it. That is, instead of taking a piece of code and rewriting it while maintaining its function, I rewrite it towards a new functionality and in the process of doing so I improve the design of the code.
Often this means that by the time I've implemented the feature I was after I haven't done a complete and satisfactory refactoring of the old code. It is improved though, and I know I'll have the time to improve it further the next time I'm about to change its function.
For testing this means that I get to test both the refactoring and the new feature at the same time, which should save some time.
It also mean that I only spend enough time on refactoring to improve the maintenance situation required for that particular feature. This should help to avoid over engineering and/or wasting time refactoring stuff that already works and wont benefit from a better desing. By focusing only on code I would change anyway, there is also a high probability I will revisit that code in the near time to do further changes while it's in the users attention span.
Small discrete steps is what I'm most comfortable with, though at some points it can be a test of my self-control to reign in what could be a refactoring blood-bath. If I notice any improvements (no-matter how large) that could be made, I make a note of them and consider how it'd be split up into individual refactoring tasks. Plus, having a saga of changes in the commit message doesn't help.
NB. The code-base I work on is quite old, and full of those mystical bugs named after scientists. With large portions still lacking anything near even 50% test coverage it would be careless to get carried away.
Yep. I like to run the tests continually and so a chain of tiny refactors works well. I get really uncomfortable having my code broken for more than a few minutes at a time, and I generally revert if my code is broken when I go home at night, the re-write the next morning ALWAYS works better than trying to pick up where I was.

What are some reasons why a sole developer should use TDD? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I'm a contract programmer with lots of experience. I'm used to being hired by a client to go in and do a software project of one form or another on my own, usually from nothing. That means a clean slate, almost every time. I can bring in libraries I've developed to get a quick start, but they're always optional. (and depend on getting the right IP clauses in the contract) Many times I can specify or even design the hardware platform... so we're talking serious freedom here.
I can see uses for constructing automated tests for certain code: Libraries with more than trivial functionality, core functionality with a high number of references, etc. Basically, as the value of a piece of code goes up through heavy use, I can see it would be more and more valuable to automatically test that code so that I know I don't break it.
However, in my situation, I find it hard to rationalize anything more than that. I'll adopt things as they prove useful, but I'm not about to blindly follow anything.
I find many of the things I do in 'maintenance' are actually small design changes. In this case, the tests would not have saved me anything and now they'd have to change too. A highly iterative, stub-first design approach works very well for me. I can't see actually saving myself that much time with more extensive tests.
Hobby projects are even harder to justify... they're usually anything from weekenders up to a say month long. Edge-case bugs rarely matter, it's all about playing with something.
Reading questions such as this one, The most voted on response seems to say that in that poster's experience/opinion TDD actually wastes time if you've got less than 5 people (even assuming a certain level of competence/experience with TDD). However, that appears to be covering initial development time, not maintenance. It's not clear how TDD stacks up over the entire life cycle of a project.
I think TDD could be a good step in the worthwhile goal of improving the quality of the products of our industry as a whole. Idealism on it's own is no longer all that effective at motivating me, though.
I do think TDD would be a good approach in large teams, or any size team containing at least one unreliable programmer. That's not my question.
Why would a sole developer with a good track record adopt TDD?
I'd love to hear of any kind of metrics done (formally or not) on TDD... focusing on solo developers or very small teams.
Failing that, anecdotes of your personal experiences would be nice, too. :)
Please avoid stating opinion without experience to back it. Let's not make this an ideology war. Also the skip greater employment options argument. This is simply an efficiency question.
I'm not about to blindly follow anything.
That's the right attitude. I use TDD all the time, but I don't adhere to it as strictly as some.
The best argument (in my mind) in favor of TDD is that you get a set of tests you can run when you finally get to the refactoring and maintenance phases of your project. If this is your only reason for using TDD, then you can write the tests any time you want, instead of blindly following the methodology.
The other reason I use TDD is that writing tests gets me thinking about my API up front. I'm forced to think about how I'm going to use a class before I write it. Getting my head into the project at this high level works for me. There are other ways to do this, and if you've found other methods (there are plenty) to do the same thing, then I'd say keep doing what works for you.
I find it even more useful when flying solo. With nobody around to bounce ideas off of and nobody around to perform peer reviews, you will need some assurance that you're code is solid. TDD/BDD will provide that assurance for you. TDD is a bit contraversial, though. Others may completely disagree with what I'm saying.
EDIT: Might I add that if done right, you can actually generate specifications for your software at the same time you write tests. This is a great side effect of BDD. You can make yourself look like super developer if you're cranking out solid code along with specs, all on your own.
Ok my turn... I'd do TDD even on my own (for non-spike/experimental/prototype code) because
Think before you leap: forces me to think what I want to get done before i start cranking out code. What am I trying to accomplish here.. 'If I assume I already had this piece.. how would I expect it to work?' Encourages interface-in design of objects.
Easier to change: I can make modifications with confidence.. 'I didn't break anything in step1-10 when i changed step5.' Regression testing is instantaneous
Better designs emerge: I've found better designs emerging without me investing effort in a design activity. test-first + Refactoring lead to loosely coupled, minimal classes with minimal methods.. no overengineering.. no YAGNI code. The classes have better public interfaces, small methods and are more readable. This is kind of a zen thing.. you only notice you got it when you 'get it'.
The debugger is not my crutch anymore : I know what my program does.. without having to spend hours stepping thru my own code. Nowadays If I spend more than 10 mins with the debugger.. mental alarms start ringing.
Helps me go home on time I have noticed a marked decrease in the number of bugs in my code since TDD.. even if the assert is like a Console trace and not a xUnit type AT.
Productivity / Flow: it helps me to identify the next discrete baby-step that will take me towards done... keeps the snowball rolling. TDD helps me get into a rhythm (or what XPers call flow) quicker. I get a bigger chunk of quality work done per unit time than before. The red-green-refactor cycle turns into... a kind of perpetual motion machine.
I can prove that my code works at the touch of a button
Practice makes perfect I find myself learning & spotting dragons faster.. with more TDD time under my belt. Maybe dissonance.. but I feel that TDD has made me a better programmer even when I don't go test first. Spotting refactoring opportunities has become second nature...
I'll update if I think of any more.. this is what i came up with in the last 2 mins of reflection.
I'm also a contract programmer. Here are my 12 Reasons Why I Love Unit Tests.
My best experience with TDD is centered around the pyftpdlib project. Most of the development is done by the original author, and I've made a few small contributions, but it's essentially a solo project. The test suite for the project is very thorough, and tests all the major features of the FTPd library. Before checking in changes or releasing a version, all tests are checked, and when a new feature is added, the test suite is always updated as well.
As a result of this approach, this is the only project I've ever worked on that didn't have showstopper bugs appear after a new release, have changes checked in that broke a major feature, etc. The code is very solid and I've been consistently impressed with how few bug reports have been opened during the life of the project. I (and the original author) attribute much of this success to the comprehensive test suite and the ability to test every major code path at will.
From a logical perspective, any code you write has to be tested, and without TDD then you'll be testing it yourself manually. On the flip side to pyftpdlib, the worst code by number of bugs and frequency of major issues, is code that is/was solely being tested by the developers and QA trying out new features manually. Things don't get tested because of time crunch or falling through the cracks. Old code paths are forgotten and even the oldest stable features end up breaking, major releases end up with important features non-functional. etc. Manual testing is critically important for verification and some randomization of testing, but based on my experiences I'd say that it's essential to have both manual testing and a carefully constructed unit test framework. Between the two approaches the gaps in coverage are smaller, and your likelihood of problems can only be reduced.
It does not matter whether you are the sole developer or not. You have to think of it from the application point of view. All the applications needs to work properly, all the applications need to be maintained, all the applications needs to be less buggy. There are of course certain scenarios where a TDD approach might not suit you. This is when the deadline is approaching very fast and no time to perform unit testing.
Anyways, TDD does not depend on a solo or a team environment. It depends on the application as a whole.
I don't have an enormous amount of experience, but I have had the experience of seeing sharply-contrasted approaches to testing.
In one job, there was no automated testing. "Testing" consisted of poking around in the application, trying whatever popped in your head, to see if it broke. Needless to say, it was easy for flat-out-broken code to reach our production server.
In my current job, there is lots of automated testing, and a full CI-system. Now when code gets broken, it is immediately obvious. Not only that, but as I work, the tests really document what features are working in my code, and what haven't yet. It gives me great confidence to be able to add new features, knowing that if I break existing ones, it won't go unnoticed.
So, to me, it depends not so much on the size of the team, but the size of the application. Can you keep track of every part of the application? Every requirement? Every test you need to run to make sure the application is working? What does it even mean to say that the application is "working", if you don't have tests to prove it?
Just my $0.02.
Tests allow you to refactor with confidence that you are not breaking the system. Writing the tests first allows the tests to define what is working behavior for the system. Any behavior that isn't defined by the test is by definition a by-product and allowed to change when refactoring. Writing tests first also drive the design in good directions. To support testability you find that you need to decouple classes, use interfaces, and follow good pattern (Inversion of Control, for instance) to make your code easily testable. If you write tests afterwards, you can't be sure that you've covered all the behavior expected of your system in the tests. You also find that some things are hard to test because of the design -- since it was likely developed without testing in mind -- and are tempted to skimp on or omit tests.
I generally work solo and mostly do TDD -- the cases where I don't are simply where I fail to live up to my practices or haven't yet found a good way that works for me to do TDD, for example with web interfaces.
TDD is not about testing it's about writing code. As such, it provides a lot of benefits to even a single developer. For many developers it is a mindshift to write more robust code. For example, how often do you think "Now how can this code fail?" after writing code without TDD? For many developers, the answer to that question is none. For TDD practioners it shifts the mindset to to doing things like checking if objects or strings are null before doing something with them because you are writing tests to specifically do that (break the code).
Another major reason is change. Anytime you deal with a customer, they can never seem to make up their minds. The only constant is change. TDD helps as a "safety net" to find all the other areas that could break.Even on small projects this can keep you from burning up precious time in the debugger.
I could go and on, but I think saying that TDD is more about writing code than anything should be enough to justify it's use as a sole developer.
I tend to agree with the validity of your point about the overhead of TDD for 'one developer' or 'hobby' projects not justifying the expenses.
You have to consider however that most best practices are relevant and useful if they are consistently applied for a long period of time.
For example TDD is saving you testing/bugfixing time in a long run, not within 5 minutes after you've created the first unit test.
You're a contract programmer which means that you will leave your current project when it will be finished and will switch to something else, most likely in another company. Your current client will have to maintain and support your application. If you do not leave the support team a good framework to work with they will be stuck. TDD will help the project to be sustainable. It will increase the stability of the code base so other people with less experience will not be able not do too much damage trying to change it.
The same applies for the hobby projects. You may be tired of it and will want to pass it to someone. You might become commercially successful (think Craiglist) and will have 5 more people working besides you.
Investment in proper process always pays-off, even if it is just gained experience. But most of the time you will be grateful that when you started a new project you decided to do it properly
You have to consider OTHER people when doing something. You you have to think ahead, plan for growth, plan for sustainability.
If you don't want to do that - stick to the cowboy coding, it's much simpler this way.
P.S. The same thing applies to other practices:
If you don't comment your code and you have ideal memory you'll be fine but someone else reading your code will not.
If you don't document your discussions with the customer somebody else will not know anything about a crucial decision you made
etc ad infinitum
I no longer refactor anything without a reasonable set of unit tests.
I don't do full-on TDD with unit tests first and code second. I do CALTAL -- Code A LIttle, Test A Little -- development. Generally, code goes first, but not always.
When I find that I've got to refactor, I make sure I've got enough tests and then I hack away at the structure with complete confidence that I don't have to keep the entire old-architecture-becomes-new-architecture plan in my head. I just have to get the tests to pass again.
I refactor the important bits. Get the existing suite of tests to pass.
Then I realize I forgot something, and I'm back to CALTAL development on the new stuff.
Then I see things I forgot to delete -- but are they really unused everywhere? Delete 'em and see what fails in the testing.
Just yesterday -- part way through a big refactoring -- I realized that I still didn't have the exact right design. But the tests still had to pass, so I was free to refactor my refactoring before I was even done with the first refactoring. (whew!) And it all worked nicely because I had a set of tests to validate the changes against.
For flying solo TDD is my copilot.
TDD lets me more clearly define the problem in my head. That helps me focus on implementing just the functionality that is required, and nothing more. It also helps me create a better API, because I'm writing a "client" before I write the code itself. I can also refactor without having to worry about breaking anything.
I'm going to answer this question quite quickly, and hopefully you will start to see some of the reasoning, even if you still disagree. :)
If you are lucky enough to be on a long-running project, then there will be times when you want to, for example, write your data tier first, then maybe the business tier, before moving on up the stack. If your client then makes a requirement change that requires re-work on your data layer, a set of unit tests on the data layer will ensure that your methods don't fail in undesirable ways (assuming you update the tests to reflect the new requirements). However, you are likely to be calling the data layer method from the business layer as well, and possibly in several places.
Let's assume you have 3 calls to a method in the business layer, but you only modify 2. In the third method, you may still be getting data back from your data layer that appears to be valid, but may break some of the assumptions you coded months before. Unit tests at this level (and above) should have been designed to spot broken assumptions, and in failing they should highlight to you that there is a section of code that needs to be revisited.
I'm hoping that this very simplistic example will be enough to get you thinking about TDD a little more, and that it might create a spark that makes you consider using it. Of course, if you still don't see the point, and you are confident in your own abilities to keep track of many thousands of lines of code, then I have no place to tell you you should start TDD.
The point about writing the tests first is that it enforces the requirements and design decisions you are making. When I mod the code, I want to make sure those are still enforced and it is easy enough to "break" something without getting a compiler or run-time error.
I have a test-first approach because I want to have a high degree of confidence in my code. Granted, the tests need to be good tests or they don't enforce anything.
I've got some pretty large code bases that I work on and there is a lot of non-trivial stuff going on. It is easy enough to make changes that ripple and suddenly X happens when X should never happen. My tests have saved me on several occasions from making a critical (but subtle) error that might have gone unnoticed by human testers.
When the tests do fail, they are opportunities to look at them and the production code and make sure that it is correct. Sometimes the design changes and the tests will need to be modified. Sometimes I'll write something that passes 99 out of 100 tests. That 1 test that didn't pass is like a co-worker reviewing my code (in a sense) to make sure I'm still building what I'm supposed to be building.
I feel that as a solo developer on a project, especially a larger one, you tend to be spread pretty thin.
You are in the middle of a large refactoring when all of a sudden a couple of critical bugs are detected that for some reason did not show up during pre-release testing. In this case you have to drop everything and fix them and after having spent two weeks tearing your hair out you can finally get back to whatever you were doing before.
A week later one of your largest customers realizes that they absolutely must have this cool new shiny feature or otherwise they won't place the order for those 1M units they should have already ordered a month ago.
Now, three months later you don't even remember why you started refactoring in the first place let alone what the code you are refactoring was supposed to do. Thank god you did a good job writing those unit tests because at least they tell you that your refactored code is still doing what it was supposed to do.
Lather, rinse, repeat.
..story of my life for the past 6 months. :-/
Sole developer should use TDD on his project (track record does not matter), since eventually this project could be passed to some other developer. Or more developers could be brought in.
New people will have extremely have hard time working with the code without the tests. They will break things.
Does your client own the source code when you deliver the product? If you can convince them that delivering the product with unit tests adds value, then you are up-selling your services and delivering a better product. From the client's perspective, test coverage not only ensures quality, it allows future maintainers to understand the code much more readily since the tests isolate functionality from the UI.
I think TDD as a methodology is not just about "having tests when making changes", thus it does not depend on team- nor on project size. It's about noting one's expectations about what a pice of code/an application does BEFORE one starts to really think about HOW the noted behaviour is implemented. The main focus of TDD is not only having test in place for written code but writing less code because you just do what make the test green (and refactor later).
If you're like me and find it quite hard to think about what a part/the whole application does WITHOUT thinking about how to implement it, I think its fine to write your test after your code and thus letting the code "drive" the tests.
If your question isn't so much about test-first (TDD) or test-after (good coding?) I think testing should be standard practise for any developer, wether alone or in a big team, who creates code which stays in production longer than three months. In my expirience that's the time-span after which even the original author has to think hard about what these twenty lines of complex, super-optimized, but sparsely documented code really code do. If you've got tests (which cover all paths throughth the code), there less to think - and less to ERR about, even years later...
Here are a few memes and my responses:
"TDD made me think about how it would fail, which made me a better programmer"
Given enough experience, being higly concerned with failure modes should naturally become part of your process anyway.
"Applications need to work properly"
This assumes you are able to test absolutely everything. You're not going to be any better at covering all possible tests correctly than you were at writing the functional code correctly in the first place. "Applications need to work better" is a much better argument. I agree with that, but it's idealistic and not quite tangible enough to motivate as much as I wish it would. Metrics/anecdotes would be great here.
"Worked great for my <library component X>"
I said in the question I saw value in these cases, but thanks for the anecdote.
"Think of the next developer"
This is probably one of the best arguments to me. However, it is quite likely that the next developer wouldn't practice TDD either, and it would therefore be a waste or possibly even a burden in that case. Back-door evangelism is what it amounts to there. I'm quite sure a TDD developer would really appeciate it, though.
How much are you going to appreciate projects done in deprecated must-do methodologies when you inherit one? RUP, anyone? Think of what TDD means to next developer if TDD isn't as great as everyone thinks it is.
"Refactoring is a lot easier"
Refactoring is a skill like any other, and iterative development certainly requires this skill. I tend to throw away considerable amounts of code if I think the new design will save time in the long run, and it feels like there would be an awful number of tests thrown away too. Which is more efficient? I don't know.
...
I would probably recommend some level of TDD to anyone new... but I'm still having trouble with the benefits for anyone who's been around the block a few times already. I will probably start adding automated tests to libraries. It's possible that after doing that, I'll see more value in doing it generally.
Motivated self interest.
In my case, sole developer translates to small business owner. I've written a reasonable amount of library code to (ostensibly) make my life easier. A lot of these routines and classes aren't rocket science, so I can be pretty sure they work properly (at least in most cases) by reviewing the code, some some spot testing and debugging into the methods to make sure they behave the way I think they do. Brute force, if you will. Life is good.
Over time, this library grows and gets used in more projects for different customers. Testing gets more time consuming. Especially cases where I'm (hopefully) fixing bugs and (even more hopefully) not breaking something else. And this isn't just for bugs in my code. I have to be careful adding functionality (customers keep asking for more "stuff") or making sure code still works when moved to a new version of my compiler (Delphi!), third party code, runtime environment or operating system.
Taken to the extreme, I could spend more time reviewing old code than working on new (read: billable) projects. Think of it as the angle of repose of software (how high can you stack untested software before it falls over :).
Techniques like TDD gives me methods and classes that are more thoughtfully designed, more thoroughly tested (before the customer gets them) and need less maintenance going forward.
Ultimately, it translates to less time doing maintenance and more time to spend doing things that are more profitable, more interesting (almost anything) and more important (like family).
We are all developers with a good track record. After all, we are all reading Stackoverflow. And many of us use TDD and perhaps those people have a great track record. I get hired because people want someone who writes great test automation and can teach that to others. When working alone, I do TDD on my coding projects at home because I found that if I don’t, I spent time doing manual testing or even debugging, and who needs that. (Perhaps those people have only good track records. I don’t know.)
When it comes to being a good automobile driver, everyone believes they are a “good driver.” This is a cognitive bias all drivers have. Programmers have their own biases. The reasons developers such as the OP don’t do TDD are covered in this Agile Thoughts podcast series. The podcast archive also has content on test automation concepts such as the test pyramid, and an intro about what is TDD and why write tests first starting with episode 9 in the podcast archive.

At what point does refactoring become not worth it?

Say you have a program that currently functions the way it is supposed to. The application has very poor code behind it, eats up a lot of memory, is unscalable and would take major rewriting to implement any changes in functionality.
At what point does refactoring become less logical then a total rebuild?
Joel wrote a nice essay about this very topic:
Things You Should Never Do, Part 1
The key lesson I got from this is that although the old code is horrible, hurts your eyes and your aesthetic sense, there's a pretty good chance that a lot of that code is patching undocumented errors and problems. Ie., it has a lot of domain knowledge embedded in it and it will be difficult or impossible for you to replicate it. You'll constantly be hitting against bugs-of-omission.
A book I found immensely useful is Working Effectively With Legacy Code by Michael C. Feathers. It offers strategies and methods for approaching even truly ugly legacy code.
One benefit of refactoring over rebuilding is that IF you can do refactoring step by step, i.e. in increments, you can test the increments in the context of the whole system, making development and debugging faster.
Old and deployed code, even when ugly and slow, has the benefit of having been tested thoroughly, and this benefit is lost if you start from scratch.
An incremental refactoring approach also has helps to ensure that there is always a product available which can be shipped (and it's improving constantly).
There is a nice article on the web about how Netscape 6 was written from scratch and it was business-wise a bad idea.
Robert L. Glass suggests that
Modification of reused code is particularly error-prone. If more than 20 to 25 percent of a component is to be revised, it is more efficient and effective to write it from scratch.
Well, the simplest answer is if it will take longer to refactor than it will to rebuild, then you should just rebuild.
If it's a personal project then you might want to rebuild it anyway as you will probably learn more from building from scratch than you would from refactoring, and that's one big objective of personal projects.
However, in a professional time-limited environment, you should always go with whatever costs the company the least amount of money (for the same payoff) in the long run, which means choosing whichever takes less time.
Of course, it can be a little more complicated than that. If other people can be working on features while the refactoring is being done, then that might be a better choice over having everyone wait for a completely new version to be built. In that case rebuilding might take less time than just the refactoring would have taken, but you need to take the entire project and all contributors of the project in to account.
When you spend more time refactoring than actually writing code.
At the point where the software doesn't do what it's supposed to do. Refactoring (changing the code without changing the functionality) makes sense if and only if the functionality is "as intended".
If you can afford the time to completely rebuild the app, don't need to improve functionality incrementally, and don't wish to retain any of the existing code then rewriting is certainly a viable alternative. You can, on the other hand, use refactoring to do an incremental rewrite by slowly replacing the existing functions with equivalent functions that are better written and more efficient.
If the application is very small, then you can rewrite it from scratch. If the application is big, never do it. Rewrite it progressively, one step at a time validating you didn't break anything.
The application is the specification. If your rewrite it from scratch you will most likely run into a lots of insidious bugs because "no one knew that the call to this function was supposed to return 3 in that very specific case" (undocumented behaviour...).
It's always more fun to rewrite from scratch so your brain might trick you into thinking it's the right choice. Be careful, it's most likely not.
I've worked with such applications in the past. The best approach I've found is a gradual one: When you are working on the code, find things that are done multiple times, group them together in functions. Keep a notebook (you know, a real one, with paper, and a pencil or pen) so that you can mark your progress. Use that in combination with your VCS, not instead of it. The notebook can be used to provide an overview of the new functions you've created as part of the refactoring, and the VCS of course fills in the blanks for the details.
Over time, you will have consolidated a lot of code into more appropriate places. Code duplication during this period of time is going to be next to impossible, so just do it as best as you can until you've reached a point where you can really start the refactoring process, auditing the entire code base and working on it as a whole.
If you've not enough time for that process (which will take a very long time), then rewriting from scratch using a test-first approach is probably better.
One option would be to write unit tests to cover the existing application and then start to refactor it bit by bit, using the unit tests to make sure everything works as before.
In an ideal world you'd already have unit tests for the program, but given your comments about the quality of the app I'm guessing you don't...
No document, no original writer, no test case, and a bunch of remaining bugs.
Uncle Bob weighs in with the following:
When is a redesign the right strategy?
I’m glad you asked that question. Here’s the answer. Never.
Look, you made the mess, now clean it up.
I’ve not had much luck with small incremental changes when the code I inherit is really bad. In theory the small incremental approach sounds good, but in practice all it ends up with is a better, but still poorly designed application that everyone thinks is now YOUR design. When things break, people no longer think it is because of the previous code, it now becomes YOUR fault. So, I would not use the word redesign, refactor or anything else that implies to a manager type that you are changing things to your way unless I was really going to do it my way. Otherwise, even though you may have fixed dozens of problems, any problems that still existed (but weren’t discovered) are now going to be attributed to your rework. And be assured that if the code is bad then your fixes will uncover a lot more bugs that were simply ignored before because the code was so bad to begin with.
If you truly know how to develop software systems then I would do a redesign of the whole system. If you don’t TRULY know how to design GOOD software then I’d say stick with the small incremental changes as you may otherwise end up with a code base that is just as bad as the original.
One mistake that is often made when redesigning is that people ignore the original code base. However, redesign does not have to mean totally ignore the old code. The old code still had to do what your new code has to do, so in many cases the steps you need are already in the old code. Copy and Paste then tweak works wonders when redesigning systems. I have found that in many cases, redesigning and rewriting an application and stealing snippets from the original code is far quicker and much more reliable than small incremental changes.

Resources