For a number of years now, I have been interested in TDD, but one or two things just didn't click. I am pretty sure it is the usual thoughts most people have when trying. "The examples in the book are wonderful, but my code is a lot more complicated than that. I never have a a procedure that does one thing, it will call three others, and they will call three others, and that will get data from the DB... bla bla bla".
A little while ago, I found some videos on SOLID (Anyone who is stuck, thinking TDD would be awesome, but... then find a few videos on SOLID, trust me). Each point became slightly more confusing, until the end, everything just went into place, including how I thought about testing code, and TDD.
I, of course, have a lot of old code, that isn't written like this, but I am okay with that, because I do see a better idea of how it should be. And whenever I work on anything, I can take it out, and do it properly (even when that means cutting out the small part of a method that needs updating, giving it it's own class, and calling that.
It has a few more questions. I would like to know where I might be able to find answers for that, or is there a standard.
How much should be tested?
My assumption is all of it. A lot of my functions will be take input parameters, and run a Stored Procedure. My guess on how to test that would be, with a given set of input parameters, is the stored procedure being called the correct one, are the parameters getting put in correct. Often this will be obvious (sometimes there will be array of numbers input that will be transformed to a comma separated string). If nothing else, this example, while the test might not be as valuable, will be documentation.
How do I name things?
This is the old problem with development. Should the class be named like the method would be, UpdateEmployee, or should there be a whole lot of er classes (EmployeeUpdater, EmplyeeGetter, etc.)
How is IOC generally handled?
This is still fine for now, I am creating interfaces, implementations, setting up IOC, etc.
I can see though, that pretty soon I am going to have pages and pages and pages of Interface/Class mappings in my IOC initialization method, or I would imagine it splitting into section, with one method that calls a few other methods, each registering classes (by namespace, or something). Is this how it generally works, or are there smarter ways of managing this?
I recommend reading Clean Code by Robert C Martin
In my view...
How much should be tested?
There is a big difference between how much and how well.
Ultimately its a judgment call and or a simple cost/benefit analysis.
Critical apps/code should be tested more thoroughly.
Working pure TDD means your code will be highly tested - easily > 90% coverage, but remember there is a difference between test quality and coverage. You may decide to test more edge cases.
You can get 100% coverage with one test case, but its pragmatic to test a range of values e.g. 0, 1, many & boundaries.
How do I name things?
For Java as an example, Look at the standard Java API documentation and see how they do it.
Referring to Clean Code, naming is and should be difficult, and maybe refactor if the name no longer fits.
Example Classes from Java's API's
Names should make it obvious what the class/method/variable does.
Refer to Kent Beck's Four Rules of Simple Design (Express intent)
How is IOC generally handled?
Maybe someone else can expand on this point more, but referring to Extreme Programming, don't use interfaces for the sake of it, but when you need them. If you only have one concrete instance, you probably don't need an interface. Refactor to add interfaces to follow known design patterns when you have a real need for them.
Do you use any metrics to make a decision which parts of the code (classes, modules, libraries) shall be consolidated or refactored next?
I don't use any metrics which can be calculated automatically.
I use code smells and similar heuristics to detect bad code, and then I'll fix it as soon as I have noticed it. I don't have any checklist for looking problems - mostly it's a gut feeling that "this code looks messy" and then reasoning that why it is messy and figuring out a solution. Simple refactorings like giving a more descriptive name to a variable or extracting a method take only a few seconds. More intensive refactorings, such as extracting a class, might take up to a an hour or two (in which case I might leave a TODO comment and refactor it later).
One important heuristic that I use is Single Responsibility Principle. It makes the classes nicely cohesive. In some cases I use the size of the class in lines of code as a heuristic for looking more carefully, whether a class has multiple responsibilities. In my current project I've noticed that when writing Java, most of the classes will be less than 100 lines long, and often when the size approaches 200 lines, the class does many unrelated things and it is possible to split it up, so as to get more focused cohesive classes.
Each time I need to add new functionality I search for already existing code that does something similar. Once I find such code I think of refactoring it to solve both the original task and the new one. Surely I don't decide to refactor each time - most often I reuse the code as it is.
I generally only refactor "on-demand", i.e. if I see a concrete, immediate problem with the code.
Often when I need to implement a new feature or fix a bug, I find that the current structure of the code makes this difficult, such as:
too many places to change because of copy&paste
unsuitable data structures
things hardcoded that need to change
methods/classes too big to understand
Then I will refactor.
I sometimes see code that seems problematic and which I'd like to change, but I resist the urge if the area is not currently being worked on.
I see refactoring as a balance between future-proofing the code, and doing things which do not really generate any immediate value. Therefore I would not normally refactor unless I see a concrete need.
I'd like to hear about experiences from people who refactor as a matter of routine. How do you stop yourself from polishing so much you lose time for important features?
We use Cyclomatic_complexity to identify the code that needs to be refactored next.
I use Source Monitor and routinely refactor methods when the complexity metric goes aboove around 8.0.
I have some research code that's a real rat's nest, with code duplication everywhere, and clearly needs to be refactored. However, the code base is evolving as I come up with new variations on the theme and fit them into the codebase. The reason I've put off refactoring so long is because I feel like the minute I spend a few days coming up with good abstractions, seeing what design patterns fit where, etc., I'll want to try out some new unforeseen idea that makes my abstractions completely inadequate. In other words, because of the rate at which the code is evolving, I really have no idea where abstraction lines belong, even though there is no shortage of (approximate) duplication and the general messiness of the code makes adding stuff to it a real pain. What are some general best practices for coping with this kind of situation?
Don't spend so long refactoring!
When you're about make a change in a piece of code, consider refactoring it to make the change easier.
After making the change, refactor again to clean up the damage done by that change.
In both cases, make the refactorings small and do them quickly, and move on.
You don't have to keep your code pristine at all times, but remember that it's easier to go fast if you have well-factored code to work in (and if you have good unit tests, of course).
Test Driven Development:
Red, Green, Refactor. Rinse, repeat.
Since it's one of the steps in every single cycle, you'll notice that's a LOT of usually minor refactoring taking place. That's the way it should be.
Your situation is pretty familiar to me. While doing investigative coding often you have no idea what the "right" abstraction will be, and as you say it can change with every new idea.Other posters have suggested:
Continuous small refactoring, which helps to avoid getting into the rats-nest situation
Test-Driven Development, which helps to find good, re-usable abstractions. It's important to note that TDD is less about testing than about doing good designs!
However, for investigative research code there is another strategy: the prototype. This seems to be what you are currently doing: coding as quickly as possible to prove a concept. There's nothing wrong with that, but a prototype should always be throw-away. Tweak it until you have all the necessary input and knowledge, then throw away the code and start over with TDD and continuous refactoring, and all your other "doing the things right" strategies.
Don't keep any of the code. Don't copy-paste anything. Don't refer back to it. Just start over with your new knowledge.
Clean up the code a little bit at a time. Always when you touch a class, try to leave the class cleaner that it was before you touched it ("the boy scout rule"). Refactoring is best done in very small steps, but very often.
Things like renaming some variable, splitting a method etc. take only some seconds or minutes. Large refactorings such as splitting or joining classes, may take an hour or two (and you make it in small steps, so that all tests pass at least every five minutes - otherwise you have entered Refactoring Hell and you should revert to the last known working state). If it takes days or weeks for you to refactor something, then it's not anymore "refactoring" - it's more like rewriting.
An article about this topic:
Put it in Distributed SCM like Git at least, that way when you break something refactoring you can reverse time divisibly to find the commit prior to the change, as well as being able to work on changes and commit them in branches without interfering with others work.
Gits Branch merge is great for things like this and you'll know easily if 2 people made incompatible changes in parallel without having to worry about the rest of the code.
For the above reasons, I would also create a seperate branch in the repository just for re factoring code with, and keep it up-dated regularly. This way, not only will others not interfere with your progress, but they can keep an eye on it and see changes in it that will eventually hit the main branch so they can pre-emptively code around those changes.
If you already know where there is duplication, you don't need several days to refactor it away.
Sometimes a rewrite is the only choice. This seems to be the case.
The CloneDR finds duplicate code, both exact copies and near-misses, across large source systems, parameterized by langauge syntax. It supports Java, C#, COBOL, C++, PHP and many other languages.
When it shows a parameterized abstraction of a set of found clones, it is essentially proposing that you refactor the code with that abstraction implemented (as a method, a function, a class, ...).
So running the CloneDR gets a list of potential abstractions to be added to your code, and replacing the clone instances by calls on the abstraction refactors your code thus cleaning it up (somewhat).
Even more remarkably, when it shows the parameter bindings used at each clone site needed to invoke the abstraction, it often shows a bungled clone instance, easily recognized when the bound paramters are conceptually inconsistent. If a parameer is bound to variables named YYYY-MM-DD, and one of them is YY-MM-DD, the "its a 4 digit-year" parameter type looks violated and in this this case there's a broken Y2K remediation. So examining the clone bindings often finds bugs.
This is a very common problem in scientific computing. Some of the most effective ideas for reducing the size and complexity of code require leveraging assumptions, and science demands that you constantly change those assumptions.
All you can do is try to refactor your code as you go, and try not to write yourself into any corners. Also work with good people who understand the value of not making a mess.
I work in a medium sized team and I run into these painfully large class files on a regular basis. My first tendency is to go at them with a knife, but that usually just makes matters worse and puts me into a bad state of mind.
For example, imagine you were just given a windows service to work on. Now there is a bug in this service and you need to figure out what the service does before you can have any hope of fixing it. You open the service up and see that someone decided to just use one file for everything. Start method is in there, Stop method, Timers, all the handling and functionality. I am talking thousands of lines of code. Methods under a hundred lines of code are rare.
Now assuming you cannot rewrite the entire class and these god classes are just going to keep popping up, what is the best way to deal with them? Where do you start? What do you try to accomplish first? How do you deal with this kind of thing and not just want to get all stabby.
If you have some strategy just to keep your temper in check, that is welcome as well.
Tips Thus Far:
Establish test coverage
Code folding
Reorganize existing methods
Document behavior as discovered
Aim for incremental improvement
Charles Conway recommend a podcast which turned out to be very helpful. link
Michael Feathers (guy in the podcast) begins with the premise that were are too afraid to simply take a project out of source control and just play with it directly and then throw away the changes. I can say that I am guilty of this.
He essentially said to take the item you want to learn more about and just start pulling it apart. Discover it's dependencies and then break them. Follow it through everywhere it goes.
Great Tip
Take the large class that is used elsewhere and have it implement an emtpy interface. Then take the code using the class and have it instantiate the interface instead. This will give you a complete list of all the dependencies to that large class in your code.
Ouch! Sounds like the place I use to work.
Take a look at Working effectivly with legacy code. It has some gems on how to deal with atrocious code.
DotNetRocks recently did a show on working with legacy code. There is no magic pill that is going to make it work.
The best advice I've heard is start incrementally wrapping the code in tests.
That reminds me of my current job and when I first joined. They didn't let me re-write anything because I had the same argument, "These classes are so big and poorly written! no one could possibly understand them let alone add new functionality to them."
So the first thing I would do is to make sure there are comprehensive testing behind the areas that you're looking to change. And at least then you will have a chance of changing the code and not having (too many) arguments (hopefully). And by tests, I mean testing the components functionally with integration or acceptance tests and making sure it is 100% covered. If the tests are good, then you should be able to confidently change the code by splitting up the big class into smaller ones, getting rid of duplication etc etc
Even if you cannot refactor the file, try to reorganize it. Move methods/functions so that they are at least organized within the file logically. Then put in lots of comments explaining each section. No, you haven't rewritten the program, but at least now you can read it properly, and the next time you have to work on the file, you'll have lots of comments, written by you (which hopefully means that you will be able to understand them) which will help you deal with the program.
Code Folding can help.
If you can move stuff around within the giant class and organize it in a somewhat logical way, then you can put folds around various blocks.
Hide everthing, and you're back to a C paradigm, except with folds rather than separate files.
I've come across this situation as well.
Personally I print out (yeah, it can be a lot of pages) the code first. Then I draw a box around sections of code that are not part of any "main-loop" or are just helper functions and make sure I understand these things first. The reason is they are probably referred to many times within the main body of the class and it's good to know what they do
Second, I identify the main algorithm(s) and decompose them into their parts using a numbering system that alternates between numbers and letters (it's ugly but works well for me). For example you could be looking at part of an algorithm 4 "levels" deep and the numbering would be 1.b.3.e or some other god awful thing. Note that when I say levels, I am not referring directly to control blocks or scope necessarily, but where I have identified steps and sub-steps of an algorithm.
Then it's a matter of just reading and re-reading the algorithm. When you start out it sounds like a lot of time, but I find that doing this develops a natural ability to comprehend a great deal of logic all at once. Also, if you discover an error attributed to this code, having visually broken it down on paper ahead of time helps you "navigate" the code later, since you have a sort of map of it in your head already.
If your bosses don't think you understand something until you have some form of UML describing it, a UML sequence diagram could help here if you pretend the sub-step levels are different "classes" represented horizontally, and start-to-finish is represented vertically from top-to-bottom.
I feel your pain. I tackled something like this once for a hobby project involving processing digital TV data on my computer. A fellow on a hardware forum had written an amazing tool for recording shows, seeing everything that was on, and more. Plus, he had done incredibly vital work of working around bugs in real broadcast signals that were in violation of the standard. He'd done amazing work with thread scheduling to be sure that no matter what, you wouldn't lose those real-time packets: on an old Pentium, he could record four streams simultaneously while also playing Doom and never lose a package. In short, this code incorporated a ton of great knowledge. I was hoping to take some pieces and incorporate them into my own project.
I got the source code. One file, 22,000 lines of C, no abstraction. I spent hours reading it; there was all this great work, but it was all done badly. I was not able to reuse a single line or even a single idea.
I'm not sure what the moral of the story is, but if I had been forced to use this stuff at work, I would have begged permission to chip pieces off it one at a time, build unit tests for each piece, and eventually grow a new, sensible thing out of the pieces. This approach is a bit different than trying to refactor and maintain a large brick in place, but I would rather have left the legacy code untouched and tried to bring up a new system in parallel.
The first thing I would do is write some unit tests to box the current behavior, assuming that there are none already. Then I'd start in the area where I need to make the change and try to get that method cleaned up -- i.e. refactor working code before introducing changes. Use common refactoring techniques to extract and reuse methods from existing long methods to make them more understandable. When you extract a method, look for other places in the code where similar code exists, box that area, and reuse the method you've just extracted.
Look for groups of methods that "hang together" that can be broken out into their own classes. Write some tests for how those classes should work, build the classes using the existing code as a template if need be, then substitute the new classes into the existing code, removing the methods that they replace. Again, using your tests to make sure that you're not breaking anything.
Make enough improvement to the existing code so that you feel you can implement your new feature/fix in a clean way. Then write the tests for the new feature/fix and implement to pass the tests. Don't feel that you have to fix everything the first time. Aim for gradual improvement, but always leave the code better than you found it.
A friend of mine was explaining how they do ping-pong pairing with TDD at his workplace and he said that they take an "adversarial" approach. That is, when the test writing person hands the keyboard over to the implementer, the implementer tries to do the bare simplest (and sometimes wrong thing) to make the test pass.
For example, if they're testing a GetName() method and the test checks for "Sally", the implementation of the GetName method would simply be:
public string GetName(){
return "Sally";
Which would, of course, pass the test (naively).
He explains that this helps eliminate naive tests that check for specific canned values rather than testing the actual behavior or expected state of components. It also helps drive the creation of more tests and ultimately better design and fewer bugs.
It sounded good, but in a short session with him, it seemed like it took a lot longer to get through a single round of tests than otherwise and I didn't feel that a lot of extra value was gained.
Do you use this approach, and if so, have you seen it pay off?
It can be very effective.
It forces you to think more about what test you have to write to get the other programmer to write the correct functionality you require.
You build up the code piece by piece passing the keyboard frequently
It can be quite tiring and time consuming but I have found that its rare I have had to come back and fix a bug in any code that has been written like this
I've used this approach. It doesn't work with all pairs; some people are just naturally resistant and won't give it an honest chance. However, it helps you do TDD and XP properly. You want to try and add features to your codebase slowly. You don't want to write a huge monolithic test that will take lots of code to satisfy. You want a bunch of simple tests. You also want to make sure you're passing the keyboard back and forth between your pairs regularly so that both pairs are engaged. With adversarial pairing, you're doing both. Simple tests lead to simple implementations, the code is built slowly, and both people are involved throughout the whole process.
I like it some of the time - but don't use that style the entire time. Acts as a nice change of pace at times. I don't think I'd like to use the style all of the time.
I've found it a useful tool with beginners to introduce how the tests can drive the implementation though.
(First, off, Adversarial TDD should be fun. It should be an opportunity for teaching. It shouldn't be an opportunity for human dominance rituals. If there isn't the space for a bit of humor then leave the team. Sorry. Life is to short to waste in a negative environment.)
The problem here is badly named tests. If the test looked like this:
foo = new Thing("Sally")
assertEquals("Sally", foo.getName())
Then I bet it was named "testGetNameReturnsNameField". This is a bad name, but not immediately obviously so. The proper name for this test is "testGetNameReturnsSally". That is what it does. Any other name is lulling you into a false sense of security. So the test is badly named. The problem is not the code. The problem is not even the test. The problem is the name of the test.
If, instead, the tester had named the test "testGetNameReturnsSally", then it would have been immediately obvious that this is probably not testing what we want.
It is therefore the duty of the implementor to demonstrate the poor choice of the tester. It is also the duty of the implementor to write only what the tests demand of them.
So many bugs in production occur not because the code did less than expected, but because it did more. Yes, there were unit tests for all the expected cases, but there were not tests for all the special edge cases that the code did because the programmer thought "I better just do this too, we'll probably need that" and then forgot about it. That is why TDD works better than test-after. That is why we throw code away after a spike. The code might do all the things you want, but it probably does somethings you thought you needed, and then forgot about.
Force the test writer to test what they really want. Only write code to make tests pass and no more.
RandomStringUtils is your friend.
It is based on the team's personality. Every team has a personality that is the sum of its members. You have to be careful not to practice passive-aggressive implementations done with an air of superiority. Some developers are frustrated by implementations like
return "Sally";
This frustration will lead to an unsuccessful team. I was among the frustrated and did not see it pay off. I think a better approach is more oral communication making suggestions about how a test might be better implemented.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I've inherited a project where the class diagrams closely resemble a spider web on a plate of spaghetti. I've written about 300 unit tests in the past two months to give myself a safety net covering the main executable.
I have my library of agile development books within reach at any given moment:
Working Effectively with Legacy Code
Code Complete
Agile Principles Patterns and Practices in C#
The problem is everything I touch seems to break something else.
The UI classes have business logic and database code mixed in. There are mutual dependencies between a number of classes. There's a couple of god classes that break every time I change any of the other classes. There's also a mutant singleton/utility class with about half instance methods and half static methods (though ironically the static methods rely on the instance and the instance methods don't).
My predecessors even thought it would be clever to use all the datasets backwards. Every database update is sent directly to the db server as parameters in a stored procedure, then the datasets are manually refreshed so the UI will display the most recent changes.
I'm sometimes tempted to think they used some form of weak obfuscation for either job security or as a last farewell before handing the code over.
Is there any good resources for detangling this mess? The books I have are helpful but only seem to cover half the scenarios I'm running into.
It sounds like you're tackling it in the right way.
Test again
Unfortunately, this can be a slow and tedious process. There's really no substitute for digging in and understanding what the code is trying to accomplish.
One book that I can recommend (if you don't already have it filed under "etc.") is Refactoring to Patterns. It's geared towards people who are in your exact situation.
I'm working in a similar situation.
If it is not a small utility but a big enterprise project then it is:
a) too late to fix it
b) beyond the capabilities of a single person to attempt a)
c) can only be fixed by a complete rewriting of the stuff which is out of the question
Refactoring can in many cases be only attempted in your private time at your personal risk. If you don't get an explicit mandate to do it as part of you daily job then you're likely not even get any credit for it. May even be criticized for "pointlessly wasting time on something that has perfectly worked for a long time already".
Just continue hacking it the way it has been hacked before, receive your paycheck and so on. When you get completely frustrated or the system reaches the point of being non-hackable any further, find another job.
EDIT: Whenever I attempt to address the question of the true architecture and doing the things the right way I usually get LOL in my face directly from responsible managers who are saying something like "I don't give a damn about good architecture" (attempted translation from German). I have personally brought one very bad component to the point of non-hackability while of course having given advanced warnings months in advance. They then had to cancel some promised features to customers because it was not doable any longer. Noone touches it anymore...
I've worked this job before. I spent just over two years on a legacy beast that is very similar. It took two of us over a year just to stabilize everything (it's still broke, but it's better).
First thing -- get exception logging into the app if it doesn't exist already. We used FogBugz, and it took us about a month to get reporting integrated into our app; it wasn't perfect right away, but it was reporting errors automatically. It's usually pretty safe to implement try-catch blocks in all your events, and that will cover most of your errors.
From there fix the bugs that come in first. Then fight the small battles, especially those based on the bugs. If you fix a bug that unexpectedly affects something else, refactor that block so that it is decoupled from the rest of the code.
It will take some extreme measures to rewrite a big, critical-to-company-success application no matter how bad it is. Even you get permission to do so, you'll be spending too much time supporting the legacy application to make any progress on the rewrite anyway. If you do many small refactorings, eventually either the big ones won't be that big or you'll have really good foundation classes for your rewrite.
One thing to take away from this is that it is a great experience. It will be frustrating, but you will learn a lot.
I have (once) come across code that was so insanely tangled that I couldn't fix it with a functional duplicate in a reasonable amount of time. That was sort of a special case though, as it was a parser and I had no idea how many clients might be "using" some of the bugs it had. Rendering hundreds of "working" source files erroneous was not a good option.
Most of the time it is imminently doable, just daunting. Read through that refactoring book.
I generally start fixing bad code by moving things around a bit (without actually changing implementation code more than required) so that modules and classes are at least somewhat coherent.
When that is done, you can take your more coherent class and rewrite its guts to perform the exact same way, but this time with sensible code. This is the tricky part with management, as they generally don't like to hear that you are going to take weeks to code and debug something that will behave exactly the same (if all goes well).
During this process I guarantee you will discover tons of bugs, and outright design stupidities. It's OK to fix trivial bugs while recoding, but otherwise leave such things for later.
Once this is done with a couple of classes, you will start to see where things can be modularized better, designed better, etc. Plus it will be easier to make such changes without impacting unrelated things because the code is now more modular, and you probably know it thoroughly.
Mostly, that sounds pretty bad. But I don't understand this part:
My predecessors even thought it would
be clever to use all the datasets
backwards. Every database update is
sent directly to the db server as
parameters in a stored procedure, then
the datasets are manually refreshed so
the UI will display the most recent
That sounds pretty close to a way I frequently write things. What's wrong with this? What's the correct way?
If your refactorings are breaking code, particularly code that seems to be unrelated, then you're trying to do too much at a time.
I recommend a first-pass refactoring where all you do is ExtractMethod: the goal is simply to name each step in the code, without any attempts at consolidation whatsoever.
After that, think about breaking dependencies, replacing singletons, consolidation.
If your refactorings are breaking things, then it means you don't have adequate unit test coverage - as the unit tests should have broken first. I recommend you get better unit test coverage second, after getting exception logging into place.
I then recommend you do small refactorings first - Extract Method to break large methods into understandable pieces; Introduce Variable to remove some duplication within a method; maybe Introduce Parameter if you find duplication between the variables used by your callers and the callee.
And run the unit test suite after each refactoring or set of refactorings. I'd say run them all until you gain confidence about which tests will need to be rerun every time.
No book will be able to cover all possible scenarios. It also depends on what you'll be expected to do with the project and whether there is any kind of external specification.
If you'll only have to do occasional small changes, just do those and don't bother starting to refactor.
If there is a specification (or you can get someone to write it), consider a complete rewrite if it can be justified by the foreseeable amount of changes to the project
If "the implementation is the specification" and there are a lot of changes planned, then you're pretty much hosed. Write LOTS of unit tests and start refactoring in small steps.
Actually, unit tests are going to be invaluable no matter what you do (if you can write them to an interface that's not going to change much with refactorings or a rewrite, that is).
See blog post Anatomy of an Anti-Corruption Layer, Part 1 and Anatomy of an Anti-Corruption Layer, Part 2.
It cites Eric Evans, Domain-Driven Design: Tackling Complexity in the Heart of Software:
Access the crap behind a facade
You could extract and then refactor some part of it, to break the dependencies and isolate layers into different modules, libraries, assemblies, directories. Then you re-inject the cleaned parts in to the application with a strangler application strategy. Lather, rinse, repeat.
Good luck, that is the tough part of being a developer.
I think your approach is good, but you need to focus on delivering business value (number of unit tests is not a measure of business value, but it may give you an indication if you are on or off track). It's important to have identified the behaviors that need to be changed, prioritize, and focus on the top ones.
The other piece of advise is to remain humble. Realize that if you wrote something so large under real deadlines and someone else saw your code, they would probably have problems understanding it as well. There is a skill in writing clean code, and there is a more important skill in dealing with other people's code.
The last piece of advise is to try to leverage the rest of your team. Past members may know information about the system you can learn. Also, they may be able to help test behaviors. I know the ideal is to have automated tests, but if someone can help by verifying things for you manually consider getting their help.
I particularly like the diagram in Code Complete, in which you start with just legacy code, a rectangle of fuzzy grey texture. Then when you replace some of it, you have fuzzy grey at the bottom, solid white at the top, and a jagged line representing the interface between the two.
That is, everything is either 'nasty old stuff' or 'nice new stuff'. One side of the line or the other.
The line is jagged, because you're migrating different parts of the system at different rates.
As you work, the jagged line gradually descends, until you have more white than grey, and eventually just grey.
Of course, that doesn't make the specifics any easier for you. But it does give you a model you can use to monitor your progress. At any one time you should have a clear understanding of where the line is: which bits are new, which are old, and how the two sides communicate.
You might find the following post useful:
As it is said in the post, don't discard a complete overwrite that easily. Also, if at all possible, try to replace whole layers or tiers with third-party solution like for example ORM for persistence or with new code. But most important of all, try to understand the logic (problem domain) behind the code.