how to tackle a new project - coding-style

I have a question about best practice on how to tackle a new project, any project. When starting a new project how do you go about tackling the project, do you split it into sections, start writing code, draw up flow diagrams.
I'm asking this question because I'm looking for advice on how I can start new projects so I can get going on them quicker. I can have it planned, designed and starting coding with everything worked out.
Any advice?
Thanks
Stephen

It all really depends. Is the project for controlling a space shuttle with 200+ people working on it, or is it a hobby project with 1 person.
I'm guessing this is a small project. In that case, do whatever works for you. Write a list of things you think are required. If there are parts you know you need to learn more about or research, get reading the web, try some stuff out with prototype code to see whether it works or not. Don't turn prototype code into real code though, start again with production code and make sure you get all the appropriate error handling etc in.
When you think you've got a good feel for what's needed, get coding. If you hit a point where you think it's not working, go back to the design and rethink it and sketch some more diagrams, and then go back to the code again.
It is extremely doubtful that you can work everything out in your plan and that's how things will actually work out. So, there's little point in trying to plan too far ahead because you'll be wasting time. Just plan out far enough ahead to keep yourself focused on working on the right things and so that you've given yourself a reasonable chance that the code you're working on will fit the big picture and solve the problem you're trying to solve.

Start by writing a simple functional spec, a few paragraphs from the user's perspective: what they see, how they perform actions, what they expect to happen if they click widget X. This will glue the logic together in your head, and on paper.
From there you can work on the technical spec, which details the gritty things like database structure, special controls and components you need, SDK's if any, and all other developer-type details that you need to implement.

Related

Advice on working with legacy code

I need some advice on how to work with legacy code.
A while ago, I was given the task to add a few reports to a reporting app. written in Struts 1, back in 2005. No big deal, but the code is quite messy. No usage of Action forms, and basically the code is one huge action, and a lot of if-else statements inside. Also, no one here has functional knowledge on this. We just happened to have it in our contract.
I'm quite unhappy about this, and not sure how to proceed. This application is invisible: Few people (but all very important) use it, so they don't care whether my eyes bleed while reading the code, standards, etc.
However, I feel that a technical debt is to be paid. How should I proceed on this? Continue down the if-else road, or try to do this requirement the right way, ignoring the rest of the project? Starting a huge refactor, risking my deadline?
Legacy code is a big issue, and I'm sure people will not agree!
I would say that starting a big re-factor could be a mistake.
A big re-factor means doing a lot of work to make it function exactly the way that it does now. If you choose to take this on on your own, there won't be a lot of visibility of what you are doing. If it works, no one will know the hours of work you put it. If it does NOT work, and you end up with tidy code, but add some bugs (and who has ever written code without adding some bugs) then you will get 'why did this change' type questions.
I have currently nearly completed a project working on a 10 year old code base. We have done quite a few bits of re-factoring along the way. But for each re-factor we have made we can justify 'this specific change will make the actual task we are doing now easier'. Rather than 'this is now cleaner for future work'. We have found that as we worked on the code, fixing the issues that we actually come up against one at a time, we have cleaned up a lot of it, without breaking it (much).
And I would say before you can re-factor much, you will need automated tests, so you can be fairly happy that you have put it back together right!
Most re-factoring is done to 'make maintenance and future development easier'. Your project sounds like there is not a lot of future development coming. That limits the advantage a re-factor will give the company.
Rule #1: If it ain't broke, don't fix it.
Rule #2: When in doubt, reread rule #1.
Unfortunately, legacy code can very rarely be described as "it ain't broke." Therefor we must tweak the existing code to correct a newly found bug, tweak the existing code to modify behavior that was previously acceptable, or tweak the existing code to add new functionality.
My experience has taught me that any refactoring must be done in 'infinitesimally' small increments. If you must break rule #2, I suggest that you start your search with the inner-most nested loop or IF structure and expand outward until you find a clean, logical separation point and create a new function/method/subroutine that contains only the guts of that loop or structure. This won't make anything more efficient but it should give you a clearer view of the underlying logic and structure. Once you have several new, smaller functions/methods/subroutines you can refactor and consolidate those into something more manageable.
Rule #3: Ignore my previous paragraph and reread the first two rules.
I agree with other comments. If you don't have to, then don't do it. It usually cost far more then it's worth if the code base is more or less dead any way.
On the other hand, if you feel that you cannot get your head around the code then a refactor is probably unavoidable. If this is the case then, since it's a web application, can you create a solid suite of functional tests using selenium? If so this is the fastest and most rewarding test approach for such code and will catch most bugs for the buck.
Second, start with the extract method refactoring to create compose methods of the big difficult methods. Every time you think to your self "This should have a comment to explain what it does" you should extract it to a method with a name that replaces the comment.
Once this has been done, if you still can't add the functionality required, you can go for more advanced refactorings, and perhaps even adding some unit tests. But I usually find that I can add what is required/fix the bug in legacy code by just creating self documenting code.
In a few words: before make any modifications to legacy code its good idea to start from automated unit tests.
This will give developer understanding about key things: dependencies this piece of code has, input data, output results, boundary conditions so on.
When it’s done most likely you will better understand what this code does and how it works.
After that its make sense (but not must) clean code a bit giving more accurate names to local variables, moving some functionality (repetitive code, if any) in to functions with clear human friendly names.
A simple clean up could make code more readable and at the same time save developer from regression issues with unit tests help.
Refactoring – make small changes, step by step, when you have a time and understanding of the requirements and functionality, regularly unit testing the code.
But do not start from refactoring

the best or speedest way to understand uncommented and complex project [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I have complex project without comments. The project is programed in Java but have more than one main class, use several .txt files like a template and use several .bat files. I don't know where to start and how to start discovering the project, because I need to make some changes in that project.
As with others I say this is a slow process.
However having done this in the past many times, this is my methodology:
Identify as many requirements that the code fulfils. This may give you the some reasons as to why certain things are the way they are when you look deeper. A common way of finding these is look for any tests that be available. The automated ones are best, but usually they're as missing as the comments.
Find the entry points to the code. These will give you places where you can poke the code to see how different inputs affect the flow. Common entry points are Code Loading 'Main' type functions, service interfaces, web page post backs etc..
Diagram the code. Look for tools that can build black/white box pictures of the code. For me this invaluable. I have on occasion printed out large listings and attacked then with markers and rulers. You're aim to create your own flow chart (mental or other wise) of the code flow.
Using the above (iteratively) build a set of outputs to the code that you think should occur, and add to these the outputs you may already know about such as logs, data files, database writes etc..
Finally if you have time, create some manual tests though preferably in automated test harnesses to verify the above. This where I start to involve the debugger to see detail in the code.
This methodology usually gives me confidence to make changes.
Note this is iterative process and can be done with portions of the code or overall as you see fit. I usually favour a top down approach to start with and then as I gain some insight I drill down till details become overwhelming and then I repeat. However this is just because my mind works in this way - you may be different. Good luck.
Find the main Main class. The starting point.
Start drawing a picture of the classes and the objects they own and the external entities they reference.
Follow all the branches until you can find a logical ending.
I've used UML reverse engineering tools in the past and while a visual picture is good, stepping through the code has always been the hardest and yet best methodology for me.
And, as you step through each piece you can add in your own comments..
I usually start with doxygen, turning on every extracting option (especially EXTRACT_ALL and EXTRACT_PRIVATE), and enable the SOURCE_BROWSER, HAVE_DOT, CALL_GRAPH and CALLER_GRAPH options (you also need to have dot installed). This gives good view of the software. For every function the calls are displayed and linked in a graph, also the sources are linked from there.
While doxygen is intended for C and C++, it also works with Java sources (set the OPTIMIZE_OUTPUT_JAVA option).
Auch. I'm afraid there is no speedy way to do this. Comment out a line (or two) -> test -> see what breaks. You could also put break statements here and there and run the debugger. That should give you some indication how you got there (ie. what the hierarchy between the classes is).
Hopefully the original developers used some patterns that you can recognize and make notes. Make lots of notes of everything. Start by trying to understand the high level structure and work down from there.
Be prepared to spend endless hours not understanding what the thing is doing.
Speak to the client and try to understand what the project is for, and what are all the things that it does. Someone somewhere had to put in some requirements for the stuff that's in there, if only in an email.
I would try to find the first entry point in the code closest to where you suspect you'll need to start making your changes, set a breakpoint, and start debugging. Check out the contents of local variables and work your way deeper as you get to become familiar with whats going on. Then, when you have a basic understanding of the area of code you're going to be working with, start fiddling with some small changes. Test your understanding of it. Try diagramming what you see happening. If you can do that confidently, you'll be able to decide if you need to go back and continue learning more about the code, or if you know enough to get done what you need to get done.
A start is to use an automated uml modeling tool (if you use eclipse you can use a plug-gin), and start creating UML diagrams of the various classes to see how they are related in a high level and visualize the code. This has helped many times
If there are log files being generated, have a look at it to understand the flow from the starting point (main class). Otherwise, put debug statements to understand the flow.
Ya, that sounds like a pretty bad spot to be in.
I would say that the best way is to just walk through the program line for line. Try to grasp the big picture in the code, and write alot of notes, both on paper and in comments in the code.
I would say, a good approach would be to generate documentation using javadoc or doxygen's class diagram feature, then as you run the code traverse through the class diagrams generated using doxygen and see who calls what. This works wonderfully for me everytime i am working on such a project.
I completely agree to most of the answers posted.
I can add to use a development tool that reverse engineering the code and create a class diagram, to have an overall picture of what is involved.
Then you need patience. But you will be a stronger and smarter developer when you'll get through...
Good luck!
One of the best and first things to do is to try to build and run the code. It might sound a bit simplistic but the problem when you take over undocumented code is that you can't even build it and run it. When have no clue were to start.

How to make your more experienced and authoritative teammates not to create 'fast temporary solutions'?

I'm currently working on a small short-lived project. But despite the size it's complicated enough with very unclear logic. That's why it was started by more experienced developers. They work on it from time to time because it's not their main project.
They made some code drafts with numerous places which 'would be rewritten in the nearest future'. After that they added several another 'temporary pieces'. And then again..
So, now the project is a mess of 'half-working' pieces of code with some hardcoded values, like file names or some constants which 'will be replaced latter with working parts'. The API is awful (nobody thinks about it actually).
And it's really, really hard to do development now (for me it's the main and only project). I caught myself thinking that I spent about an hour every day just to understand again all that tricky 'temporary' things and API weaknesses. And after that hour my brain melts.
I can't just say "guys, your code smells like a trash dump". What's the correct way?
It seems like the ultimate problem is they are writing code and not taking responsibility for its quality.
If this goes against the culture of the organization, it's a matter of making the situation known others. If the developers don't know, and have a modicum of empathy, I would take the "I don't quite understand this. Could you spend a few minutes walking me through it?" with them. They should soon realize what they are doing to you, and good programmers will adjust their practices. This may also have to be done via the management hierarchy-- "In order to progress on project X, I need Y hours of the programmers' time to work with their code effectively." It should either happen or bring up a "Why" conversation that should lead to changes.
If this is the culture of the organization, that's unfortunate. It may mean that the programmers producing the code don't care, and nor do any of the management. This is a bit of a political question-- who is most capable and/or interested in seeing this change? Find allies and proceed best you can. A candid conversation with the developers may be the best choice, as they are the people capable of change and no one else is going to induce them to-- so just ask outright.
Hope this helps.
Push for implementation of a formal code review process. Then they won't dare write code like that in the first place. I recommend using a collaborative tool like SmartBear's Code Collaborator or the free ReviewBoard.
Just like people drive slower when they know the cops are watching, they write better code if they know someone is going to be looking at it.
Are these 'other developers' no longer working on the project? And if so, are you the main person working on it? If the answer to both of these is "yes" the the project is yours. Start to make incremental improvements to make it more readable.
You might also like to show the code to a more experienced developer who didn't work on the project. See if they agree that the code is hard to maintain. Suggest to your boss that you set some time aside to 'finish off' the temporary work and bring it to a point where it is maintainable.
Implementing a formal code review process is also a good idea if you want to prevent this happening again.
And remember it may not have been the other developers fault. Sometimes people are told to spend the minimum amount of time, or are told that the code will be thrown away.

Removing tightly coupled code

Forgive me if this is a dupe but I couldn't find anything that hit this exact question.
I'm working with a legacy application that is very tightly coupled. We are pulling out some of the major functionality because we will be getting that functionality from an external service.
What is the best way to begin removing the now unused code? Should I just start at the extreme base, remove, and refactor my way up the stack? During lunch I'm going to go take a look at Working Effectively with Legacy Code.
If you can, and it makes sense in your problem domain, I would try to, during development, try and keep the legacy code functioning in parallel with the new API. And use the results from the legacy API to cross check that the new API is working as expected.
I think the most important thing you can do is to refactor/remove/test in very small chunks. It's tedious and time consuming but it will help limit risks and errors later on.
I would also start with code that is "low risk" to change.
My advice is to use findbugs and PMD/CPD (copy-paste-detector) to remove dead code (code that can not or will not be called) unused variables and duplicated code. Getting rid of this junk will make re-factoring easier.
Then learn the key mappings for the common re-factoring in your IDE. Extract method and introduce variable should be committed to muscle memory after an hour.
Use the primary disadvantage of tightly coupled code to... your advantage!
Step 1: Identify the area which provides the redundant functionality which you want to replace. Break it...do a quick smoke test of some of the critical parts of the application. Get the feel.
Step 2: Depending on what language it is find the relevant static-code analysis tools and get the needed refactoring info.
Step 3: Repeat Step 1 in incremental levels of narrowing down to the exact pattern.
All this of course, in a sandbox environment. This may seem a bit haphazard but if you limit yourself to critical functionality testing ... you may get many leads in the process. You will definitely identify the pattern of the legacy code, if nothing else.
You absolutely cannot do with with a live development version [new features being added]. You must start with a feature freeze.
I tend to look at all of the components of the system in an overview and see the biggest places of reuse. From there I would implement the appropriate design pattern to solve it, and make the new component reusable. Write test cases to ensure the new code works as expected, then refactor your code around the new change. Then repeat [overview, etc] till you are satisfied.
I would suggest this for many reasons:
Everyone working with you on refactoring will learn something
People learn how to avoid design mistakes down the road
Everyone working on it will get a better understanding of the code base

What can you do to a legacy codebase that will have the greatest impact on improving the quality?

As you work in a legacy codebase what will have the greatest impact over time that will improve the quality of the codebase?
Remove unused code
Remove duplicated code
Add unit tests to improve test coverage where coverage is low
Create consistent formatting across files
Update 3rd party software
Reduce warnings generated by static analysis tools (i.e.Findbugs)
The codebase has been written by many developers with varying levels of expertise over many years, with a lot of areas untested and some untestable without spending a significant time on writing tests.
Read Michael Feather's book "Working effectively with Legacy Code"
This is a GREAT book.
If you don't like that answer, then the best advice I can give would be:
First, stop making new legacy code[1]
[1]: Legacy code = code without unit tests and therefore an unknown
Changing legacy code without an automated test suite in place is dangerous and irresponsible. Without good unit test coverage, you can't possibly know what affect those changes will have. Feathers recommends a "stranglehold" approach where you isolate areas of code you need to change, write some basic tests to verify basic assumptions, make small changes backed by unit tests, and work out from there.
NOTE: I'm not saying you need to stop everything and spend weeks writing tests for everything. Quite the contrary, just test around the areas you need to test and work out from there.
Jimmy Bogard and Ray Houston did an interesting screen cast on a subject very similar to this:
http://www.lostechies.com/blogs/jimmy_bogard/archive/2008/05/06/pablotv-eliminating-static-dependencies-screencast.aspx
I work with a legacy 1M LOC application written and modified by about 50 programmers.
* Remove unused code
Almost useless... just ignore it. You wont get a big Return On Investment (ROI) from that one.
* Remove duplicated code
Actually, when I fix something I always search for duplicate. If I found some I put a generic function or comment all code occurrence for duplication (sometime, the effort for putting a generic function doesn't worth it). The main idea, is that I hate doing the same action more than once. Another reason is because there's always someone (could be me) that forget to check for other occurrence...
* Add unit tests to improve test coverage where coverage is low
Automated unit tests is wonderful... but if you have a big backlog, the task itself is hard to promote unless you have stability issue. Go with the part you are working on and hope that in a few year you have decent coverage.
* Create consistent formatting across files
IMO the difference in formatting is part of the legacy. It give you an hint about who or when the code was written. This can gave you some clue about how to behave in that part of the code. Doing the job of reformatting, isn't fun and it doesn't give any value for your customer.
* Update 3rd party software
Do it only if there's new really nice feature's or the version you have is not supported by the new operating system.
* Reduce warnings generated by static analysis tools
It can worth it. Sometime warning can hide a potential bug.
I'd say 'remove duplicated code' pretty much means you have to pull code out and abstract it so it can be used in multiple places - this, in theory, makes bugs easier to fix because you only have to fix one piece of code, as opposed to many pieces of code, to fix a bug in it.
Add unit tests to improve test coverage. Having good test coverage will allow you to refactor and improve functionality without fear.
There is a good book on this written by the author of CPPUnit, Working Effectively with Legacy Code.
Adding tests to legacy code is certianly more challenging than creating them from scratch. The most useful concept I've taken away from the book is the notion of "seams", which Feathers defines as
"a place where you can alter behavior in your program without editing in that place."
Sometimes its worth refactoring to create seams that will make future testing easier (or possible in the first place.) The google testing blog has several interesting posts on the subject, mostly revolving around the process of Dependency Injection.
I can relate to this question as I currently have in my lap one of 'those' old school codebase. Its not really legacy but its certainly not followed the trend of the years.
I'll tell you the things I would love to fix in it as they bug me every day:
Document the input and output variables
Refactor the variable names so they actually mean something other and some hungarian notation prefix followed by an acronym of three letters with some obscure meaning. CammelCase is the way to go.
I'm scared to death of changing any code as it will affect hundreds of clients that use the software and someone WILL notice even the most obscure side effect. Any repeatable regression tests would be a blessing since there are zero now.
The rest is really peanuts. These are the main problems with a legacy codebase, they really eat up tons of time.
I'd say it largely depends on what you want to do with the legacy code...
If it will indefinitely remain in maintenance mode and it's working fine, doing nothing at all is your best bet. "If it ain't broke, don't fix it."
If it's not working fine, removing the unused code and refactoring the duplicate code will make debugging a lot easier. However, I would only make these changes on the erring code.
If you plan on version 2.0, add unit tests and clean up the code you will bring forward
Good documentation. As someone who has to maintain and extend legacy code, that is the number one problem. It's difficult, if not downright dangerous to change code you don't understand. Even if you're lucky enough to be handed documented code, how sure are you that the documentation is right? That it covers all of the implicit knowledge of the original author? That it speaks to all of the "tricks" and edge cases?
Good documentation is what allows those other than the original author to understand, fix, and extend even bad code. I'll take hacked yet well-documented code that I can understand over perfect yet inscrutable code any day of the week.
The single biggest thing that I've done to the legacy code that I have to work with is to build a real API around it. It's a 1970's style COBOL API that I've built a .NET object model around, so that all the unsafe code is in one place, all of the translation between the API's native data types and .NET data types is in one place, the primary methods return and accept DataSets, and so on.
This was immensely difficult to do right, and there are still some defects in it that I know about. It's not terrifically efficient either, with all the marshalling that goes on. But on the other hand, I can build a DataGridView that round-trips data to a 15-year-old application which persists its data in Btrieve (!) in about half an hour, and it works. When customers come to me with projects, my estimates are in days and weeks rather than months and years.
As a parallel to what Josh Segall said, I would say comment the hell out of it. I've worked on several very large legacy systems that got dumped in my lap, and I found the biggest problem was keeping track of what I already learned about a particular section of code. Once I started placing notes as I go, including "To Do" notes, I stopped re-figuring out what I already figured out. Then I could focus on how those code segments flow and interact.
I would say just leave it alone for the most part. If it's not broken then don't fix it. If it is broken then go ahead and fix and improve the portion of the code that is broken and its immediately surrounding code. You can use the pain of the bug or sorely missing feature to justify the effort and expense of improving that part.
I would not recommend any wholesale kind of rewrite, refactor, reformat, or putting in of unit tests that is not guided by actual business or end-user need.
If you do get the opportunity to fix something, then do it right (the chance of doing it right the first time might have already passed, but since you are touching that part again might as well do it right time around) and this includes all the items you mentioned.
So in summary, there's no single or just a few things that you should do. You should do it all but in small portions and in an opportunistic manner.
Late to the party, but the following may be worth doing where a function/method is used or referenced often:
Local variables often tend to be poorly named in legacy code (often owing to their scope expanding when a method is modified, and not being updated to reflect this). Renaming these in line with their actual purpose can help clarify legacy code.
Even just laying out the method slightly differently can work wonders - for instance, putting all the clauses of an if on one line.
There might be stale/confusing code comments there already. Remove them if they're not needed, or amend them if you absolutely have to. (Of course, I'm not advocating removal of useful comments, just those that are a hindrance.)
These might not have the massive headline impact you're looking for, but they are low risk, particularly if the code can't be unit tested.

Resources