the best or speedest way to understand uncommented and complex project [closed] - project-management

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I have complex project without comments. The project is programed in Java but have more than one main class, use several .txt files like a template and use several .bat files. I don't know where to start and how to start discovering the project, because I need to make some changes in that project.

As with others I say this is a slow process.
However having done this in the past many times, this is my methodology:
Identify as many requirements that the code fulfils. This may give you the some reasons as to why certain things are the way they are when you look deeper. A common way of finding these is look for any tests that be available. The automated ones are best, but usually they're as missing as the comments.
Find the entry points to the code. These will give you places where you can poke the code to see how different inputs affect the flow. Common entry points are Code Loading 'Main' type functions, service interfaces, web page post backs etc..
Diagram the code. Look for tools that can build black/white box pictures of the code. For me this invaluable. I have on occasion printed out large listings and attacked then with markers and rulers. You're aim to create your own flow chart (mental or other wise) of the code flow.
Using the above (iteratively) build a set of outputs to the code that you think should occur, and add to these the outputs you may already know about such as logs, data files, database writes etc..
Finally if you have time, create some manual tests though preferably in automated test harnesses to verify the above. This where I start to involve the debugger to see detail in the code.
This methodology usually gives me confidence to make changes.
Note this is iterative process and can be done with portions of the code or overall as you see fit. I usually favour a top down approach to start with and then as I gain some insight I drill down till details become overwhelming and then I repeat. However this is just because my mind works in this way - you may be different. Good luck.

Find the main Main class. The starting point.
Start drawing a picture of the classes and the objects they own and the external entities they reference.
Follow all the branches until you can find a logical ending.
I've used UML reverse engineering tools in the past and while a visual picture is good, stepping through the code has always been the hardest and yet best methodology for me.
And, as you step through each piece you can add in your own comments..

I usually start with doxygen, turning on every extracting option (especially EXTRACT_ALL and EXTRACT_PRIVATE), and enable the SOURCE_BROWSER, HAVE_DOT, CALL_GRAPH and CALLER_GRAPH options (you also need to have dot installed). This gives good view of the software. For every function the calls are displayed and linked in a graph, also the sources are linked from there.
While doxygen is intended for C and C++, it also works with Java sources (set the OPTIMIZE_OUTPUT_JAVA option).

Auch. I'm afraid there is no speedy way to do this. Comment out a line (or two) -> test -> see what breaks. You could also put break statements here and there and run the debugger. That should give you some indication how you got there (ie. what the hierarchy between the classes is).
Hopefully the original developers used some patterns that you can recognize and make notes. Make lots of notes of everything. Start by trying to understand the high level structure and work down from there.
Be prepared to spend endless hours not understanding what the thing is doing.
Speak to the client and try to understand what the project is for, and what are all the things that it does. Someone somewhere had to put in some requirements for the stuff that's in there, if only in an email.

I would try to find the first entry point in the code closest to where you suspect you'll need to start making your changes, set a breakpoint, and start debugging. Check out the contents of local variables and work your way deeper as you get to become familiar with whats going on. Then, when you have a basic understanding of the area of code you're going to be working with, start fiddling with some small changes. Test your understanding of it. Try diagramming what you see happening. If you can do that confidently, you'll be able to decide if you need to go back and continue learning more about the code, or if you know enough to get done what you need to get done.

A start is to use an automated uml modeling tool (if you use eclipse you can use a plug-gin), and start creating UML diagrams of the various classes to see how they are related in a high level and visualize the code. This has helped many times

If there are log files being generated, have a look at it to understand the flow from the starting point (main class). Otherwise, put debug statements to understand the flow.

Ya, that sounds like a pretty bad spot to be in.
I would say that the best way is to just walk through the program line for line. Try to grasp the big picture in the code, and write alot of notes, both on paper and in comments in the code.

I would say, a good approach would be to generate documentation using javadoc or doxygen's class diagram feature, then as you run the code traverse through the class diagrams generated using doxygen and see who calls what. This works wonderfully for me everytime i am working on such a project.

I completely agree to most of the answers posted.
I can add to use a development tool that reverse engineering the code and create a class diagram, to have an overall picture of what is involved.
Then you need patience. But you will be a stronger and smarter developer when you'll get through...
Good luck!

One of the best and first things to do is to try to build and run the code. It might sound a bit simplistic but the problem when you take over undocumented code is that you can't even build it and run it. When have no clue were to start.

Related

how to tackle a new project

I have a question about best practice on how to tackle a new project, any project. When starting a new project how do you go about tackling the project, do you split it into sections, start writing code, draw up flow diagrams.
I'm asking this question because I'm looking for advice on how I can start new projects so I can get going on them quicker. I can have it planned, designed and starting coding with everything worked out.
Any advice?
Thanks
Stephen
It all really depends. Is the project for controlling a space shuttle with 200+ people working on it, or is it a hobby project with 1 person.
I'm guessing this is a small project. In that case, do whatever works for you. Write a list of things you think are required. If there are parts you know you need to learn more about or research, get reading the web, try some stuff out with prototype code to see whether it works or not. Don't turn prototype code into real code though, start again with production code and make sure you get all the appropriate error handling etc in.
When you think you've got a good feel for what's needed, get coding. If you hit a point where you think it's not working, go back to the design and rethink it and sketch some more diagrams, and then go back to the code again.
It is extremely doubtful that you can work everything out in your plan and that's how things will actually work out. So, there's little point in trying to plan too far ahead because you'll be wasting time. Just plan out far enough ahead to keep yourself focused on working on the right things and so that you've given yourself a reasonable chance that the code you're working on will fit the big picture and solve the problem you're trying to solve.
Start by writing a simple functional spec, a few paragraphs from the user's perspective: what they see, how they perform actions, what they expect to happen if they click widget X. This will glue the logic together in your head, and on paper.
From there you can work on the technical spec, which details the gritty things like database structure, special controls and components you need, SDK's if any, and all other developer-type details that you need to implement.

When do you refactor code? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
Do you do it when you’re in the code doing something else?
When your manager approves it? (Seems this never happens)
I guess some of this depends on the impact of the changes. If I change the code and it affects nothing outside of the class, to me that is low impact.
What does it become a design change? When it effect X object or X projects?
I’m just curious how others teams tackle this...
As part of original development (red/green/refactor)
When suggested by a code reviewer
When we've noticed a design pain-point
When making another change, if the refactoring is low impact, i.e. typically not affecting any other files.
If it affects the public API, I generally like to make the refactoring a single source code commit which doesn't change behaviour (and then build new behaviour into another commit). If it affects other projects too, there needs to be consensus over it and I would want to get permission to change their code to go in the same refactoring commit.
I find I refactor when revisiting code (presumably to add/extend functionality) more than 3 months after it was written.
If it takes me more than 2 minutes to discern what a chunk of code is doing, I'll break it apart to make it more immediately understandable (or just add some more comments.)
as soon as all of the tests run.
I work in a large system, so I only change things I have to. It is easy to have bad side effects to changes.
I will refactor sections of code that are performing poorly, not working properly, or needs new functionality.
I never just decide to fix things, I would never be done. if it works, and no one is asking for changes or complaining about problems, move on. life is too short to fix everything.
I often refactor my code when there is a user requirement change or bug fixes. Then there will be a chance for people to review your changes.
Otherwise, I normally don't touch the workable code even it smells.
We found small refactorings are best done while we were working on a bit of code - do what's required, preferably paired.
For bigger things, we had a Technical Debt section on the wall - if you spotted something and didn't have the time to address it, or it was going to take some discussion to solve, you'd add it to the wall and they would be scheduled for future iterations (or when free time cropped up).
Refactoring while you're already in the code is sometimes easiest, especially if your manager does not support the initiative, but if you only change a small part it will break consistency with surrounding parts. In these cases it's better to be selective and, as you suggested, do things that are low-impact. It may also be helpful to refactor long select/switch statements into functions and delay on refactoring the inner code until sometime later.
At a previous job, I was the manager, so I refactored whenever I wanted. At my current job, I'm an analyst so most of the code is not directly my responsibility. When I do write code, I avoid impacting anything that I'm not writing. I have one project which is entirely under my own control and I refactor any time I learn a better way to do something.
We refactor as often as we can. Having unit tests to ensure that everything works pre- and post- refactoring really helps.
Code review processes often help with this. If I touch some code, it gets reviewed, reviewer asks, "why did you do it this way?", I say, "I had to because of (insert ugliness here)". This is a sign that the code should be refactored right after the review is done.
To look at our company, we have decided that our upcoming application release is mostly dedicated to performance optimizations rather than new functionality. This was something we felt was needed and also was requested by some clients.
Therefore we have spent a lot of time identifying performance bottlenecks in our app and reviewing code and refactoring it to make things run faster.
So in our case we did it because management approved us doing it for this new release, because we showed to them how much performance improvement could be gained.
Refactor when needed:
when you need a better understanding of the code you are working on (pairing often helps here), examples are: renaming, method extraction etc.
when the current design doesn't allow for a 'clean' change: at this point you can actually argue with your manager on a value basis (e.g. what is this new feature worth to the project)
I am always making small refactorings in my code. I know as long as I have my unit tests to verify that everything is still functioning properly afterward, I see no harm in doing it as I go. That way you don't get that vague "needs refactoring" feeling every time you work on it.
Now if it requires a large refactoring, it's best to plan for that and set aside some time.
Seems most other posters are resistant to refacotring mercilessly. Of course this isn't possible if the system you're working on doesn't support this through extensive unit tests. But in general, If I can see an opportunity to make the code tighter without spending more than a few minutes or hours at most, I go for it. If I'm not sure what I should be working on, I look for something to refactor.
I refactor when I'm fixing a bug or adding a feature and the process of refactoring makes the code easier to read and easier to maintain.
Following DRY principles vehemently will often be a trigger for me to refactor.
Insufficiently often, thus building up technical debt.
Sad, but so.
Do as I say, not as the team I work on does.

What helps to you improve your ability to find a bug?

I want to know if there are method to quickly find bugs in the program.
It seems that the more you master the architecture of your software, the more quickly
you can locate the bugs.
How the programmers improve their ability to find a bug?
Logging, and unit tests. The more information you have about what happened, the easier it is to reproduce it. The more modular you can make your code, the easier it is to check that it really is misbehaving where you think it is, and then check that your fix solves the problem.
Divide and conquer. Whenever you are debugging, you should be thinking about cutting down the possible locations of the problem. Every time you run the app, you should be trying to eliminate a possible source and zero in on the actual location. This can be done with logging, with a debugger, assertions, etc.
Here's a prophylactic method after you have found a bug: I find it really helpful to take a minute and think about the bug.
What was the bug exactly in essence.
Why did it occur.
Could you have found it earlier, easier.
Anything else you learned from the bug.
I find taking a minute to think about these things will make it far less likely that you will produce the same bug in the future.
I will assume you mean logic bugs. The best way I have found to capture logic bugs is to implement some sort of testing scheme. Check out jUnit as the standard. Pretty much you define a set of accepted outputs of your methods. Every time you compile your system it checks all of your test cases. If you have introduced new logic that breaks your tests, you will know about it instantly and know exactly what you have to fix.
Test driven design is a pretty big movement in programming right now. You will be hard pressed to find a language that doesn't support some kind of testing. Even JavaScript has a multitude of test suites.
Experience makes you a better debugger. Pay close attention to the bugs that you AND others commonly make. Try to figure out if/how these bugs apply to ALL code that affects you, not the single instance of where the bug was seen.
Raymond Chen is famous for his powers of psychic debugging.
Most of what looks like psychic
debugging is really just knowing what
people tend to get wrong.
That means that you don't necessarily have to be intimately familiar with the architecture / system. You just need enough knowledge to understand the types of bugs that apply and are easy to make.
I personally take the approach of thinking about where the bug may be in the code before actually opening up the code and taking a look. When you first start with this approach, it may not actually work very well, especially if you are pretty unfamiliar with the code base. However, over time someone will be able to tell you the behavior they are experiencing and you'll have a good idea where the problem is located or you may even know what to fix in the code to remedy the problem before even looking at the code.
I was on a project for several years that maintained by a vendor. They were not very good debuggers and most of the time it was up to us to point them to an area of the code that had the problem. What made our problem worse was that we didn't have a nice way to view the source code, so a lot of our "debugging" was just feeling.
Error checking and reporting. The #1 newbie coder debugging mistake is to turn off error reporting, avoid checking for whether what's going on makes sense, etc etc. In general, people feel like if they can't see anything going wrong then nothing is going wrong. Which of course could not be further from the case.
Instead, your code should be chock full of error conditions that will make lots of noise, with detailed reporting, someplace you will see it. (This doesn't mean inside a production web page.) Then, instead of having to trace an error all over the place because it got passed through sixteen layers of execution before it finally got someplace that broke, your errors start happening proximately to the actual issue.
It seems that the more you master the
architecture of your software ,the
more quickly you can locate the bugs.
After understanding the architecture, one's ability to find bugs in the application increases with their ability to identify and write extensive tests.
Know your tools.
Make sure that you know how to use conditional breakpoints and watches in your debugger.
Use static analysis tools as well - they can point out the more obvious issues.
Sleep and rest.
Use programming methods that produce fewer bugs in the first place.
If to implement a single stand-alone functional requirement it takes N separate point-edits to source code, the number of bugs put into the code is roughly proportional to N, so find programming methods that minimize N. Ways to do this: DRY (don't repeat yourself), code generation, and DSL (domain-specific-language).
Where bugs are likely, have unit tests.
Obviously.IMHO, the best unit tests are monte-carlo.
Make intermediate results visible.
For example, compilers have intermediate representations, in the form of 4-tuples. If there is a bug, the intermediate code can be examined. That tells if the bug is in the first or second half of the compiler.
P.S. Most programmers are not aware that they have a choice of how much data structure to use. The less data structure you use, the less are the chances for bugs (and performance issues) caused by it.
I find tracepoints to be an invaluable debugging tool. They are a bit like logging, except you create them during a debugging session to solve a particular issue, like breakpoints.
Printing the stacktrace in a tracepoint can be especially useful. For example, you can print the hash code and stacktrace in the constructor of an object, and then later on when the object is used again you can search for its hashcode to see which client code created it. Same for seeing who disposed it or called a certain method etc.
They are also great for debugging issues related to window focus changes etc, where the debugger would interfere if you drop in break mode.
Static code tools like FindBugs
Assertions, assertions, and assertions.
Some areas of our code has 4 or 5 assertions for each line of real code. When we get a bug report the first thing that happens is that the customer data is processed in our debug build 99 times out a hundred an assert will fire near the cause of the bug.
Additionally our debug build perform redundant calculations to ensure that an optimized algorithm is returning the correct result, and also debug functions are used to examine the sanity of data structures.
The hardest thing new developers have to contend with is getting their code to survive the assertions of the code gthey are calling.
Additionally we do not allow any code to be putback to toplevel that causes any integration or unit test to fail.
Stepping through the code, examining flow/state where unexpected behavior is occurring. (Then develop a test for it, of course).
Writing Debug.Write(message) in your code and using DebugView is another option. And then run your application find out what is going on.
"Architecture" in software means something like:
Several components
The components interact across clearly-defined interfaces
Each component has a well-defined responsibility
The responsibility of one component is unlike the responsibilities of other components
So, as you said, the better the architecture the easier it is to find bugs.
First: knowing the bug, you can decide which functionality is broken, and therefore know which component implements that functionality. For example, if the bug is that something isn't being logged properly, therefore this bug should be in one of 3 places:
In the component that's responsible for logging (your logging library)
Or, above that in the application code which is using this library
Or, below that in the system code which this library is using
Second: examine the data transfered across the interfaces between components. To continue the previous example above:
Set a debugger breakpoint on the application code which invokes the logger API, to verify whether the logger API is being used correctly (e.g. whether it's being invoked at all, whether parameters are as-expected, etc.).
Doing this tells you whether the bug is in the component above this interface, or in the component that's below this interface.
Repeat (perhaps using binary search if the call stack is very deep) until you've found which component is at fault.
When you come to the point that you think there must be a bug in the OS, check your assertions -- and put them into the code with "assert" statements.
Conversely, as you are writing the code, think of the range of valid inputs for your algorithms and put in assertions to make sure you have what you think you have. Same goes for output: Check that you produced what you think you produced.
E.g. if you expect a non-empty list:
l = getList(input)
assert l, "List was empty for input: %s" % str(input)
I'm part of the QA team # work, and knowing anything about the product and how it is developed, helps a lot in finding bugs, also when I make new QA tools I pass it to our dev team to test it, finding bugs in your own code is just plain hard!
Some people say programmers are tainted, so we cannot see bugs in their own product; we are not talking about code here, we are beyond that, usability and functionality itself.
Meanwhile unit testing seams to be a nice solution to find bugs in your own code, its totally pointless if you're wrong even before writing the unit test, how are you going to find the bugs then? you don't!, let your co-worker find them, hire a QA guy.
Scientific debugging is what I always used, and it greatly helps.
Basically, if you can replicate a bug, you can track its origin. You should then experiment some tests, observe the results, and infer hypotheses on why the bug happens.
Writing about all your hypotheses, attempts, expected results and observed results can help you track down the bugs, particularly if they're nasty.
There are automated tools that can help you with that process, particularly git-bisect (and similar bisection tools on other revision systems) to quickly find which change introduced the bug, unit testing to reproduce a bug and prevent regressions in your code (can be used in combination with bisect), and delta debugging to find the culprit in your code (similar to git-bisect but whereas git-bisect works on the code history, delta debugging works on the code directly).
But whatever the tools you are using, the most important benefit is in the scientific methodology, as this is the formalization of what most experienced debuggers do.

Project Transference [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I would like to know your experience when you need to take over somebody else's software project - more so when the original software developer has already resigned.
The most success that we've had with that is to "wiki" everything. During the notice period ask the leaving developer to help you document everything in the team/company wiki and see if you can do code reviews with him/her and add comments to the code while doing the reviews that explain sections. Best for the "taking over" developer to write the comments in the code under the supervision of the leaver.
Cases where original devs leaved before handing over the project are always the most interesting: you're stuck with a codebase in an unknown state. What I always find intriguing is how the new devs often do their utmost best to comment on how badly designed the code is: they forget about the constraints the old devs might have been under, the shortcuts they might have been forced to make. The saying is always Old dev == bad dev. What do you people think:
I would even call this out as an official bad practice: bad-mouthing the ones who have been before us.
I try to take as much a pragmatic approach as possible: learn the codebase, wander around a bit. Try to understand the relation between requirements and code, even is there is no clear initial relationship at all. There will always be the "aha moment" when you realise why they did something was done this way or that. If you're still convinced something is implemented the wrong way, do your refactorings if possible. And isolate the pieces of code you cannot change: unit test them by using a mocking framework.
Hail to the maintenance developer.
I once joined a team which has been handed over a pile of steaming crap from outsourcing. The original project - a multimedia content manager based on Java, Struts, Hibernate|Oracle - was well structured (it seems like it was the work of a couple of people, pair programming, wise use of design patterns, some unit testing). Then someone else inherited the project and endlessly copy-pasted features, loosened the business rules, patched, branched until it became a huge spaghetti monster with fine crafted piece of codes like:
List<Stuff> stuff = null;
if (LOG.isDebugEnabled())
{
stuff = findStuff();
LOG.debug("Yeah, I'm a smart guy!");
for (Stuff stu : stuff)
{
LOG.debug("I've got this stuff: " + stu);
}
}
methodThatUsesStuff(stuff);
hidden amongst the other brilliant ingenuity.
I tamed the beast via patient refactoring (extracting methods and classes more of the times), commenting the code from time to time, reorganizing everything till the codebase shrunk by 30%, getting more and more manageable over time.
I had to take over someone else’s code of different degrees of quality on several occasions. Hence the tips:
Make effort to take structured notes of any piece of significant information from minute one: names of stakeholders, business rules, code and document locations etc. It is best to dedicate a fresh spiral notebook, so you could tear pages out if you had to.
Make use of one of the better free indexing and desktop search tools available on the market (Google Desktop Search, MS Windows Search will do). Add all document, e-mail, code locations to it.
Before developing anything do document analysis: find everything you can get you hands on electronically on network and printed out docs, make effort of simply reading it. There is amazingly much of useful information even within unfinished drafts.
Mind map the code, architecture etc as you go.
With lesser documented and maintained systems you inevitably will have moments of despair that are likely to push you into procrastination mode. Especially during your first days or week when amount of new information your mind has to digest is overwhelming. At these times it is nice to have someone to remind you (or just do it yourself) to take it easy, concentrate on important things first and revert to making smaller steps in trying to gain understanding instead of trying to leap forward.
Keep taking notes, making diagrams, drawing rich pictures, mind mapping. It really helps to digest the copious amounts of new information, mostly disorganised.
Hei, good luck!
We actually have a specified set of "Deliverables" that has to be present for us to take over a project.
If we have the chance we try to push in one of our folks within the group developing the project at first. That way we get some firsthand knowledgde before our group takes over the code. (in the line of what #Guy wrote)
That being said, the most important part for me would be:
Some kind og highlevel overview(drawing?) of what the code do.
Easy access to ask questions of the people who actually wrote the code
This for me is alpha omega when taking over code and projects

What can you do to a legacy codebase that will have the greatest impact on improving the quality?

As you work in a legacy codebase what will have the greatest impact over time that will improve the quality of the codebase?
Remove unused code
Remove duplicated code
Add unit tests to improve test coverage where coverage is low
Create consistent formatting across files
Update 3rd party software
Reduce warnings generated by static analysis tools (i.e.Findbugs)
The codebase has been written by many developers with varying levels of expertise over many years, with a lot of areas untested and some untestable without spending a significant time on writing tests.
Read Michael Feather's book "Working effectively with Legacy Code"
This is a GREAT book.
If you don't like that answer, then the best advice I can give would be:
First, stop making new legacy code[1]
[1]: Legacy code = code without unit tests and therefore an unknown
Changing legacy code without an automated test suite in place is dangerous and irresponsible. Without good unit test coverage, you can't possibly know what affect those changes will have. Feathers recommends a "stranglehold" approach where you isolate areas of code you need to change, write some basic tests to verify basic assumptions, make small changes backed by unit tests, and work out from there.
NOTE: I'm not saying you need to stop everything and spend weeks writing tests for everything. Quite the contrary, just test around the areas you need to test and work out from there.
Jimmy Bogard and Ray Houston did an interesting screen cast on a subject very similar to this:
http://www.lostechies.com/blogs/jimmy_bogard/archive/2008/05/06/pablotv-eliminating-static-dependencies-screencast.aspx
I work with a legacy 1M LOC application written and modified by about 50 programmers.
* Remove unused code
Almost useless... just ignore it. You wont get a big Return On Investment (ROI) from that one.
* Remove duplicated code
Actually, when I fix something I always search for duplicate. If I found some I put a generic function or comment all code occurrence for duplication (sometime, the effort for putting a generic function doesn't worth it). The main idea, is that I hate doing the same action more than once. Another reason is because there's always someone (could be me) that forget to check for other occurrence...
* Add unit tests to improve test coverage where coverage is low
Automated unit tests is wonderful... but if you have a big backlog, the task itself is hard to promote unless you have stability issue. Go with the part you are working on and hope that in a few year you have decent coverage.
* Create consistent formatting across files
IMO the difference in formatting is part of the legacy. It give you an hint about who or when the code was written. This can gave you some clue about how to behave in that part of the code. Doing the job of reformatting, isn't fun and it doesn't give any value for your customer.
* Update 3rd party software
Do it only if there's new really nice feature's or the version you have is not supported by the new operating system.
* Reduce warnings generated by static analysis tools
It can worth it. Sometime warning can hide a potential bug.
I'd say 'remove duplicated code' pretty much means you have to pull code out and abstract it so it can be used in multiple places - this, in theory, makes bugs easier to fix because you only have to fix one piece of code, as opposed to many pieces of code, to fix a bug in it.
Add unit tests to improve test coverage. Having good test coverage will allow you to refactor and improve functionality without fear.
There is a good book on this written by the author of CPPUnit, Working Effectively with Legacy Code.
Adding tests to legacy code is certianly more challenging than creating them from scratch. The most useful concept I've taken away from the book is the notion of "seams", which Feathers defines as
"a place where you can alter behavior in your program without editing in that place."
Sometimes its worth refactoring to create seams that will make future testing easier (or possible in the first place.) The google testing blog has several interesting posts on the subject, mostly revolving around the process of Dependency Injection.
I can relate to this question as I currently have in my lap one of 'those' old school codebase. Its not really legacy but its certainly not followed the trend of the years.
I'll tell you the things I would love to fix in it as they bug me every day:
Document the input and output variables
Refactor the variable names so they actually mean something other and some hungarian notation prefix followed by an acronym of three letters with some obscure meaning. CammelCase is the way to go.
I'm scared to death of changing any code as it will affect hundreds of clients that use the software and someone WILL notice even the most obscure side effect. Any repeatable regression tests would be a blessing since there are zero now.
The rest is really peanuts. These are the main problems with a legacy codebase, they really eat up tons of time.
I'd say it largely depends on what you want to do with the legacy code...
If it will indefinitely remain in maintenance mode and it's working fine, doing nothing at all is your best bet. "If it ain't broke, don't fix it."
If it's not working fine, removing the unused code and refactoring the duplicate code will make debugging a lot easier. However, I would only make these changes on the erring code.
If you plan on version 2.0, add unit tests and clean up the code you will bring forward
Good documentation. As someone who has to maintain and extend legacy code, that is the number one problem. It's difficult, if not downright dangerous to change code you don't understand. Even if you're lucky enough to be handed documented code, how sure are you that the documentation is right? That it covers all of the implicit knowledge of the original author? That it speaks to all of the "tricks" and edge cases?
Good documentation is what allows those other than the original author to understand, fix, and extend even bad code. I'll take hacked yet well-documented code that I can understand over perfect yet inscrutable code any day of the week.
The single biggest thing that I've done to the legacy code that I have to work with is to build a real API around it. It's a 1970's style COBOL API that I've built a .NET object model around, so that all the unsafe code is in one place, all of the translation between the API's native data types and .NET data types is in one place, the primary methods return and accept DataSets, and so on.
This was immensely difficult to do right, and there are still some defects in it that I know about. It's not terrifically efficient either, with all the marshalling that goes on. But on the other hand, I can build a DataGridView that round-trips data to a 15-year-old application which persists its data in Btrieve (!) in about half an hour, and it works. When customers come to me with projects, my estimates are in days and weeks rather than months and years.
As a parallel to what Josh Segall said, I would say comment the hell out of it. I've worked on several very large legacy systems that got dumped in my lap, and I found the biggest problem was keeping track of what I already learned about a particular section of code. Once I started placing notes as I go, including "To Do" notes, I stopped re-figuring out what I already figured out. Then I could focus on how those code segments flow and interact.
I would say just leave it alone for the most part. If it's not broken then don't fix it. If it is broken then go ahead and fix and improve the portion of the code that is broken and its immediately surrounding code. You can use the pain of the bug or sorely missing feature to justify the effort and expense of improving that part.
I would not recommend any wholesale kind of rewrite, refactor, reformat, or putting in of unit tests that is not guided by actual business or end-user need.
If you do get the opportunity to fix something, then do it right (the chance of doing it right the first time might have already passed, but since you are touching that part again might as well do it right time around) and this includes all the items you mentioned.
So in summary, there's no single or just a few things that you should do. You should do it all but in small portions and in an opportunistic manner.
Late to the party, but the following may be worth doing where a function/method is used or referenced often:
Local variables often tend to be poorly named in legacy code (often owing to their scope expanding when a method is modified, and not being updated to reflect this). Renaming these in line with their actual purpose can help clarify legacy code.
Even just laying out the method slightly differently can work wonders - for instance, putting all the clauses of an if on one line.
There might be stale/confusing code comments there already. Remove them if they're not needed, or amend them if you absolutely have to. (Of course, I'm not advocating removal of useful comments, just those that are a hindrance.)
These might not have the massive headline impact you're looking for, but they are low risk, particularly if the code can't be unit tested.

Resources