How much duplicated code do you tolerate? [closed] - refactoring

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
In a recent code review I spotted a few lines of duplicated logic in a class (less than 15 lines). When I suggested that the author refactor the code, he argued that the code is simpler to understand that way. After reading the code again, I have to agree extracting the duplicated logic would hurt readability a little.
I know DRY is guideline, not an absolute rule. But in general, are you willing to hurt readability in the name of DRY?

Refactoring: Improving the Design of Existing Code
The Rule of Three
The first time you do something, you
just do it. The second time you do
something similar, you wince at the duplication, but you do the duplicate
thing anyway. The third time you do something similar, you refactor.
Three strikes and you refactor.
Coders at Work
Seibel: So for each of these XII calls you're writing an
implementation.
Did you ever find that you were accumulating lots of
bits of very similar code?
Zawinski: Oh, yeah, definitely. Usually by the second or third time
you've cut and pasted
that piece of code it's like, alright, time to stop
cutting and pasting and put it in a
subroutine.

I tolerate none. I may end up having some due to time constraints or whatnot. But I still haven't found a case where duplicated code is really warranted.
Saying that it'll hurt readability only suggests that you are bad at picking names :-)

Personally, I prefer keeping code understandable, first and foremost.
DRY is about easing the maintenance in code. Making your code less understandable in order to remove repeated code hurts the maintainability more, in many cases, than having some repeated lines of code.
That being said, I do agree that DRY is a good goal to follow, when practical.

If the code in question has a clear business or technology-support purpose P, you should generally refactor it. Otherwise you'll have the classic problem with cloned code: eventually you'll discover a need to modify code supporting P, and you won't find all the clones that implement it.
Some folks suggest 3 or more copies is the threshold for refactoring. I believe that if you have two, you should do so; finding the other clone(s) [or even knowing they might exist] in a big system is hard, whether you have two or three or more.
Now this answer is provided in the context of not having any tools for finding the clones. If you can reliably find clones, then the original reason to refactor (avoiding maintenance errors) is less persausive (the utility of having a named abstraction is still real). What you really want is a way to find and track clones; abstracting them is one way to ensure you can "find" them (by making finding trivial).
A tool that can find clones reliably can at least prevent you from making failure-to-update-clone maintenance errors. One such tool (I'm the author) is the CloneDR. CloneDR finds clones using the targeted langauge structure as guidance, and thus finds clones regardless of whitespace layout, changes in comments, renamed variables, etc. (It is implemented for a number a languages including C, C++, Java, C#, COBOL and PHP). CloneDR will find clones across large systems, without being given any guidance. Detected clones are shown, as well as the antiunifier, which is essentially the abstraction you might have written instead. Versions of it (for COBOL) now integrate with Eclipse, and show you when you are editing inside a clone in a buffer, as well as where the other clones are, so that you may inspect/revise the others while you are there. (One thing you might do is refactor them :).
I used to think cloning was just outright wrong, but people do it because they don't know how the clone will vary from the original and so the final abstraction isn't clear at the moment the cloning act is occurring. Now I believe that cloning is good, if you can track the clones and you attempt to refactor after the abstraction becomes clear.

As soon as you repeat anything you're creating multiple places to have make edits if you find that you've made a mistake, need to extend it, edit, delete or any other of the dozens of other reasons you might come up against that force a change.
In most languages, extracting a block to a suitably named method can rarely hurt your readability.
It is your code, with your standards, but my basic answer to your "how much?" is none ...

you didn't say what language but in most IDEs it is a simple Refactor -> Extract Method. How much easier is that, and a single method with some arguments is much more maintainable than 2 blocks of duplicate code.

Very difficult to say in abstract. But my own belief is that even one line of duplicated code should be made into a function. Of course, I don't always achieve this high standard myself.

Refactoring can be difficult, and this depends on the language. All languages have limitations, and sometimes a refactored version of duplicated logic can be linguistically more complex than the repeated code.
Often duplications of code LOGIC occur when two objects, with different base classes, have similarities in the way they operate. For example 2 GUI components that both display values, but don't implement a common interface for accessing that value. Refactoring this kind of system either requires methods taking more generic objects than needed, followed by typechecking and casting, or else the class hierarchy needs to be rethought & restructured.
This situation is different than if the code was exactly duplicated. I would not necessarily create a new interface class if I only intended it to be used twice, and both times within the same function.

The point of DRY is maintainability. If code is harder to understand it's harder to maintain, so if refactoring hurts readability you may actually be failing to meet DRY's goal. For less than 15 lines of code, I'd be inclined to agree with your classmate.

In general, no. Not for readability anyway. There is always some way to refactor the duplicated code into an intention revealing common method that reads like a book, IMO.
If you want to make an argument for violating DRY in order to avoid introducing dependencies, that might carry more weight, and you can get Ayende's opinionated opinion along with code to illustrate the point here.
Unless your dev is actually Ayende though I would hold tight to DRY and get the readability through intention revealing methods.
BH

I accept NO duplicate code. If something is used in more than one place, it will be part of the framework or at least a utility library.
The best line of code is a line of code not written.

It really depends on many factors, how much the code is used, readability, etc. In this case, if there is just one copy of the code and it is easier to read this way then maybe it is fine. But if you need to use the same code in a third place I would seriously consider refactoring it into a common function.

Readability is one of the most important things code can have, and I'm unwilling to compromise on it. Duplicated code is a bad smell, not a mortal sin.
That being said, there are issues here.
If this code is supposed to be the same, rather than is coincidentally the same, there's a maintainability risk. I'd have comments in each place pointing to the other, and if it needed to be in a third place I'd refactor it out. (I actually do have code like this, in two different programs that don't share appropriate code files, so comments in each program point to the other.)
You haven't said if the lines make a coherent whole, performing some function you can easily describe. If they do, refactor them out. This is unlikely to be the case, since you agree that the code is more readable embedded in two places. However, you could look for a larger or smaller similarity, and perhaps factor out a function to simplify the code. Just because a dozen lines of code are repeated doesn't mean a function should consist of that dozen lines and no more.

Related

How do you decide which parts of the code shall be consolidated/refactored next?

Do you use any metrics to make a decision which parts of the code (classes, modules, libraries) shall be consolidated or refactored next?
I don't use any metrics which can be calculated automatically.
I use code smells and similar heuristics to detect bad code, and then I'll fix it as soon as I have noticed it. I don't have any checklist for looking problems - mostly it's a gut feeling that "this code looks messy" and then reasoning that why it is messy and figuring out a solution. Simple refactorings like giving a more descriptive name to a variable or extracting a method take only a few seconds. More intensive refactorings, such as extracting a class, might take up to a an hour or two (in which case I might leave a TODO comment and refactor it later).
One important heuristic that I use is Single Responsibility Principle. It makes the classes nicely cohesive. In some cases I use the size of the class in lines of code as a heuristic for looking more carefully, whether a class has multiple responsibilities. In my current project I've noticed that when writing Java, most of the classes will be less than 100 lines long, and often when the size approaches 200 lines, the class does many unrelated things and it is possible to split it up, so as to get more focused cohesive classes.
Each time I need to add new functionality I search for already existing code that does something similar. Once I find such code I think of refactoring it to solve both the original task and the new one. Surely I don't decide to refactor each time - most often I reuse the code as it is.
I generally only refactor "on-demand", i.e. if I see a concrete, immediate problem with the code.
Often when I need to implement a new feature or fix a bug, I find that the current structure of the code makes this difficult, such as:
too many places to change because of copy&paste
unsuitable data structures
things hardcoded that need to change
methods/classes too big to understand
Then I will refactor.
I sometimes see code that seems problematic and which I'd like to change, but I resist the urge if the area is not currently being worked on.
I see refactoring as a balance between future-proofing the code, and doing things which do not really generate any immediate value. Therefore I would not normally refactor unless I see a concrete need.
I'd like to hear about experiences from people who refactor as a matter of routine. How do you stop yourself from polishing so much you lose time for important features?
We use Cyclomatic_complexity to identify the code that needs to be refactored next.
I use Source Monitor and routinely refactor methods when the complexity metric goes aboove around 8.0.

Code reuse and refactoring [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
What's best practice for reuse of code versus copy/paste?
The problem with reuse can be that changing the reused code will affect many other pieces of functionality.
This is good & bad : good if the change is a bugfix or useful enhancement. Bad if other reusing code unexpectedly becomes broken because it relied on the old version (or the new version has a bug).
In some cases it would seem that copy/paste is better - each user of the pasted code has a private copy which it can customize without consequences.
Is there a best practice for this problem; does reuse require watertight unit tests?
Every line of code has a cost.
Studies show that the cost is not linear with the number of lines of code, it's exponential.
Copy/paste programming is the most expensive way to reuse software.
"does reuse require watertight unit tests?"
No.
All code requires adequate unit tests. All code is a candidate for reuse.
It seems to me that a piece of code that is used in multiple places that has the potential to change for one place and not for another place isn't following proper rules of scope. If the "same" method/class is needed by two different things to do two different functions, then that method/class should be split up.
Don't copy/paste. If it does turn out that you need to modify the code for one place, then you can extend it, possibly through inheritance, overloading, or if you must, copying and pasting. But don't start out by copy-pasting similar segments.
Using copy and paste is almost always a bad idea. As you said, you can have tests to check in case you break something.
The point is, when you call a method, you shouldn't really care about how it works, but about what it does. If you change the method, changing what it does, then it should be a new method, or you should check wherever this method is called.
On the other side, if the change doesn't modify WHAT the method does (only how), then you shouldn't have a problem elsewhere. If you do, you've done something wrong...
One very appropriate use of copy and paste is Triangulation. Write code for one case, see a second application that has some variation, copy & paste into the new context - but you're not done. It's if you stop at that point that you get into trouble. Having this code duplicated, perhaps with minor variation, exposes some common functionality that your code needs. Once it's in both places, tested, and working in both places, you should extract that commonality into a single place, call it from the two original places, and (of course) re-test.
If you have concerns that code which is called from multiple places is introducing risk of fragility, your functions are probably not fine-grained enough. Excessively coarse-grained functions, functions that do too much, are hard to reuse, hard to name, hard to debug. Find the atomic bits of functionality, name them, and reuse them.
So the consumer (reuser) code is dependent on the reused code, that's right.
You have to manage this dependency.
It is true for binary reuse (eg. a dll) and code reuse (eg. a script library) as well.
Consumer should depend on a certain (known) version of the reused code/binary.
Consumer should keep a copy of the reused code/binary, but never directly modify it, only update to a newer version when it is safe.
Think carefully when you modify resused codebase. Branch for breaking changes.
If a Consumer wants to update the reused code/binary then it first has to test to see if it's safe. If tests fail then Consumer can alway fall back to the last known (and kept) good version.
So you can benefit from reuse (eg. you have to fix a bug in one place), and still you're in control of changes. But nothing saves you from testing whenever you update the reused code/binary.
Is there a best practice for this
problem; does reuse require watertight
unit tests?
Yes and sort of yes. Rewriting code you have already did right once is never a good idea. If you never reuse code and just rewrite it you are doubling you bug surface. As with many best practice type questions Code Complete changed the way I do my work. Yes unit test to the best of your ability, yes reuse code and get a copy of Code Complete and you will be all set.
Copy and pasting is never good practice. Sometimes it might seem better as a short-term fix in a pretty poor codebase, but in a well designed codebase you will have the following affording easy re-use:
encapsulation
well defined interfaces
loose-coupling between objects (few dependencies)
If your codebase exhibits these properties, copy and pasting will never look like the better option. And as S Lott says, there is a huge cost to unnecessarily increasing the size of your codebase.
Copy/Paste leads to divergent functionality. The code may start out the same but over time, changes in one copy don't get reflected in all the other copies where it should.
Also, copy/paste may seem "OK" in very simple cases but it also starts putting programmers into a mindset where copy/paste is fine. That's the "slippery slope". Programmers start using copy/paste when refactoring should be the right approach. You always have to be careful about setting precedent and what signals that sends to future developers.
There's even a quote about this from someone with more experience than I,
"If you use copy and paste while you're coding, you're probably committing a design error."-- David Parnas
You should be writing unit tests, and while yes, having cloned code can in some sense give you the sense of security that your change isn't effecting a large number of other routines, it is probably a false sense of security. Basically, your sense of security comes from an ignorance of knowing how the code is used. (ignorance here isn't a pejorative, just comes from as a result of not being able to know everything about the codebase) Get used to using your IDE to learn where the code is being use, and get used to reading code to know how it is being used.
Where you write:
The problem with reuse can be that
changing the reused code will affect
many other pieces of functionality.
... In some cases it would seem that
copy/paste is better - each user of
the pasted code has a private copy
which it can customize without
consequences.
I think you've reversed the concerns related to copy-paste. If you copy code to 10 places and then need to make a slight modification to behavior, will you remember to change it in all 10 places?
I've worked on an unfortunately large number of big, sloppy codebases and generally what you'll see is the results of this - 20 versions of the same 4 lines of code. Some (usually small) subset of them have 1 minor change, some other small (and only partially intersecting subset) have some other minor change, not because the variations are correct but because the code was copied and pasted 20 times and changes were applied almost, but not quite consistently.
When it gets to that point it's nearly impossible to tell which of those variations are there for a reason and which are there because of a mistake (and since it's more often a mistake of omission - forgetting to apply a patch rather than altering something - there's not likely to be any evidence or comments).
If you need different functionality call a different function. If you need the same functionality, please avoid copy paste for the sanity of those who will follow you.
There are metrics that can be used to measure your code, and it's up to yo (or your development team) to decide on an adequate threshold. Ruby on Rails has the "Metric-Fu" Gem, which incorporates many tools that can help you refactor your code and keep it in tip top shape.
I'm not sure what tools are available for other laguages, but I believe there is one for .NET.
In general, copy and paste is a bad idea. However, like any rule, this has exceptions. Since the exceptions are less well-known than the rule I'll highlight what IMHO are some important ones:
You have a very simple design for something that you do not want to make more complicated with design patterns and OO stuff. You have two or three cases that vary in about a zillion subtle ways, i.e. a line here, a line there. You know from the nature of the problem that you won't likely ever have more than 2 or 3 cases. Sometimes it can be the lesser of two evils to just cut and paste than to engineer the hell out of the thing to solve a relatively simple problem like this. Code volume has its costs, but so does conceptual complexity.
You have some code that's very similar for now, but the project is rapidly evolving and you anticipate that the two instances will diverge significantly over time, to the point where trying to even identify reasonably large, factorable chunks of functionality that will stay common, let alone refactor these into reusable components, would be more trouble than it's worth. This applies when you believe that the probability of a divergent change to one instance is much greater than that of a change to common functionality.

Are you concerned about the aesthetics of your code? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I don't know if I'm the only person in the world which gets a bad feeling in my stomach if my code isn't "pretty". For example if I get a assignment that another person has been doing before me. I can't help it to clean the code and make it look "pretty". I don't know if it's some kind of OCD.
It's like I see the code as some kind of art that has be perfect in my own code convention to look good. I don't know if you understand what I'm trying to explain here.
But are you like me, trying always to make my code look good in a aesthetical point of view even though it won't make the code better?
Yes, I care about code aesthetics.. Code that is aestheticly pleasing is easy to read and therefore easy to understand.
No, I stopped trying anymore. You can't defeat an army of code monkeys.
Only with my personal project I'm aspired to make it perfect.
I think Robert Martin described it best in his Book Clean Code:A Handbook of Agile
Software Craftsmanship
It’s not enough to write the code
well. The code has to be kept clean
over time. We’ve all seen code rot and
degrade as time passes. So we must
take an active role in preventing this
degradation.
The Boy Scouts of America have a
simple rule that we can apply to our
profession.
Leave the campground cleaner than you
found it.
If we all checked-in our code a little
cleaner than when we checked it out,
the code simply could not rot. The
cleanup doesn’t have to be something
big. Change one variable name for the
better, break up one function that’s a
little too large, eliminate one small
bit of duplication, clean up one
composite if statement.
Can you
imagine working on a project where the
code simply got better as time passed?
Do you believe that any other option
is professional? Indeed, isn’t
continuous improvement an intrinsic
part of professionalism?
If you mean identation, I think it is essential.
If you mean readable (which for me is different from aesthetically pretty), it is also essential.
If you want what's written to look like flowers and birds flying, then no. I'm not concerned. :P
I hate that my collegues always write one letter variables, short named methods that start with underscores and generally ugly code. It seems to be the standard practice around these parts.
I always make my code look good. It's a visual representation of who I am, so I have to maintain it nice and neat, and properly indented.
I'm not so much concerned with whether or not it looks nice as much as with how readable it is. It just so happens that "prettier" code is usually easier to read and maintain.
Formatting code is one way (and possibly the most bang for your buck way at that) to make your code readable. Being confronted with readable code makes stepping through your program easier (whether in a debugger or code review). The same goes for sensible variable names and thinking about variable scope.
If, however, you're spending all of your time changing some perfectly acceptable notation for fields, locals, pointers etc. into some very personal Ancide-notation, then I'd be inclined to say that isn't really necessary.
I too find myself in such a position. Since clean code is easy to read and maintain, I always try to clean up and style my code.
I do that as well. I find that making the code look good makes it easier to read and understand.
Yes, I like to make the code look better, because it makes easier to maintain and it looks like people are concerned on making a good system.
When the code looks ugly, you don't feel yourself motivated to keep it cool.
And I feel i'm so concerned that i think my co-workers hate me =P
I make very good use of the build-in code formatter within Visual Studio. In Delphi, I even use an add-in that allows me to format my Delphi code. I also try to keep each source file below the 1000 lines of code, although I'm not worried if some files are becoming longer. I use descriptive variable names and occasionally add some additional comments when I suspect that the code (and names for fields, classes and parameters) isn't clear enough for the next one reading my code.
The result is very rewarding since I once had to maintain a piece of code that I wrote 5 years earlier. It's readability made my own pieces of code in the project still very readable. Others have been more careless, though. It gave me an easy trick to recognize my own code from the garbage that was added by some inexperienced semi-programmer/manager who was only capable of writing macro's in Word and Excel...
"Pretty" and "code aesthetics" are sort of proxy words - those terms sound trivial, but (at least to me) really mean "clearly and logically expressed ideas". Clearly and Logically expressed ideas matter.
Tidy code is more maintainable. Your brain is able to do amazing automated pattern matching on code, so you will often find that you spot bugs and problems in code just because it is the wrong "shape". I find tidiness so important I wrote a VS addin (AtomineerUtils) for adding and formatting doc comments to minimise the work I need to go to in order to keep my code tidy.
Of course, that's no reason to reformat someone else's code - you'll only upset other programmers if you change their code to your style for aesthetic reasons, not to mention you're spending a lot of time that could be put into new code, and every line of code you change is another potential bug that needs to be re-tested. So try to stop yourself going "too far".
I wouldn't go so far as to make things look aesthetically good purely for the aesthetic value, but I do think it's really important to write code that's readable and easily understood at a glance. Especially when writing things like XML/HTML, things like proper nesting and indentation can really make it easy to quickly get a sense of the structure and allow you to spend your time zeroing in on the areas that you care about. A short, well-organized method that's easy to read visually will save time and energy vs. something that takes ten minutes to understand.
Yeah, I have to have the code indented with spaces and tab 4 spaces wide and if it is C/C++/Java code put curly brace in its own line, Emacs macros do the rest :-)
Yes, I do. And because "you can't [indeed] fight an army of monkey" (if I may borrow this from one answer), I tend to try making this less painful and to automate what can be automated, e.g. performing cosmetic checks during the build (that will break if necessary). Another option would be to format code automatically on commit but I prefer the first one.
PS: I'm using Jalopy and Maven for this when doing Java.
Define "aesthetics." I think it means different things to different people.
The absolute most important thing to me about any code that I write (despite hasty code samples posted here) is that it works as intended. Once it works as intended, then, and only then, do I worry about the aesthetics.
Aesthetics are subjective. I may spend labor to make my code a work of art in my eyes, and someone else may come behind me and labor to change it to conform to their sense of what constitutes "beautiful code." After all, do you include design patterns, coding standards, naming conventions, and who-knows-what-else in that? Or is it a simple matter of indentation, curly brace alignment, type-alignment in variable declaration, and so forth?
No two developers will completely agree on what constitutes aesthetically pleasing code. That's not to say that you shouldn't strive to create it; but it should not be your number one priority. Writing working, maintainable code should be your number one priority. If it happens to be aesthetically pleasing as a result of that, so be it.
So your the guy making merging a complete nightmare? Undoing all the formatting that is aesthetically pleasing to me, the writer and primary maintainer of that code you just checked in?
Yes, I am shamelessly trying to acquire StackOverflow karma with silly questions.

More comments in code or just simple, readable, maintainable code suffices? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Sometimes its really difficult to decide on when exactly you have written enough comments for someone to understand your intentions.
I think one needs to just focus more on writing readable, easy to understand code than on including a large number of lines of comments explaining every detail of whats happening.
What are your views about this?
Comments aren't there to explain what you're doing. They're there to explain why you're doing it.
The argument is based on a false dilemma: Either your code is a horrible abomination and you write tons of comments to explain every statement and expression, or your code is beautiful poetry that can be understood by your grandmother with no documentation at all.
In reality, you should strive for the latter (well, maybe not your grandmother but other developers), but realize that there are times when a couple of comments will clear up an ambiguity or make the next ten lines of code so much more plain. People who advocate no comments at all are extremists.
Of course, gratuitous comments should be avoided. No amount of comments will help bad code be more understandable. They probably just make it worse. But unless you're only coding trivial systems, there will be times when comments will clarify the design decisions being made.
This can be helpful when catching bugs. Literate code can look perfectly legitimate while being completely wrong. Without the comments, others (or you six months later) have to guess about your intent: Did you mean to do that, or was it an accident? Is this the bug, or is it somewhere else? Maybe I should refer to the design documentation... Comments are inline documentation, visible right where you need it.
Properly deciding when the need for comments actually exists is the key.
Try to make the code self-explaining. One of the most important things is to use meaningful names for classes, functions, variables etc.
Comment the sections that aren't self-explaining. Trivial commenting (e.g. i++; // Add 1 to i) makes the code harder to read.
By the way - the closer to pseudocode you can work, the more self-explaining your code can become. This is a privilege of high-level languages; it's hard to make self-explaining assembly code.
Not all code is self-documenting.
I'm in the process of troubleshooting a performance issue now. The developer thought he discovered the source of the bottleneck; a block of code that was going to sleep for some reason. There were no comments around this code, no context as to why it was there. We removed the block and re-tested. Now, the app is failing under load where it wasn't before.
My guess is someone had previously run into a performance issue and put this code in to mitigate the problem. Whether or not that was the right solution is one thing, but a few comments about why this code is there would now be saving us a world of pain and a whole lot of time...
Why you need comments. The name of the method should be clear enough that you don't need comments.
Ex:
// This method is used to retrieve information about contact
public getContact()
{
}
In this case getContact doesn't need the comments
Aim for code that needs no comments, but don't beat yourself up too much if you miss.
I think commenting enough so that you could understand it if you had to review your code later in life should be sufficient.
I think there would a lot of time wasted if you commented for everyone; and going this route could make your code even harder to understand.
I agree that writing readable code is probably the most important part, but don't leave out comments. Take the extra time.
Readable code should be the number 1 priority. Comments are, as Paul Tomblin already wrote, to focus on the why part.
I try to avoid commenting as much as possible. Code should be self explanatory. Name variables and methods properly. Break large code blocks in methods which have a good name. Write methods that do one thing, the thing you named them for.
If you need to write a comment. Make it short. I often have the feeling that if you need to elaborate long on why this code block does this and that you already have a problem with the design.
Only comment when it adds something.
Something like this is useless and definitely decreases readability:
/// <summary>Handles the "event" event</summary>
/// <param name="sender">Event sender</param>
/// <param name="e">Event arguments</param>
protected void Event_Handler (object sender, EventArgs e)
{
}
Basically, putting aside a good but possibly brief comment at the beginning of a class/method/function declaration, and - if necessary - an introductory comment at the beginning of the file, a comment would be useful when a not-so-common or not-so-clearly-transparent operation is coded.
So, for example, you should avoid commenting what's obvious (i++; on a previous example), but what you know is less obvious and/or more tricky should deserve some clear, unconfusing, brilliant, complete line of comment, which naturally comes along with a Nobel prize for the clearest code in history ;).
And don't underestimate the fact that a comment should be also funny; programmers read much more gladly if you can intellectually tease them.
So, as a general principle tend to not be overwhelming with comments, but when you have to write one, be sure about it to be the clearest comment you could write down.
And personally I'm not a big fan of self-documenting code (a.k.a. code w/o a single damn slashstar): after months you've written it (it's just days for my personal scale) it's very likely you couldn't tell the true reason for choosing such design to represent that piece of your intelligence, so how could others?
Comments are not just that green stuff among code lines; they are the part of code which your brain is better willing to compile. Qualifying as braincode (laughing) I couldn't affirm comments are not part of the program you're writing. They're just the part of it which is not directed to the CPU.
Normally, I'm a fan of documentation comments that clearly spell out the intent of the code you're writing. Spiffy tools like NDoc and Sandcastle provide a nice, consistent way in which to write that documentation.
However, I've noticed a few things over the years.
Most documentation comments don't really tell me anything I can't really glean from the code. That assumes, of course, that I can make heads or tails out of the source code to begin with.
Comments are supposed to be used to document intent, not behavior. Unfortunately, in the vast majority of cases, this isn't how they're used. Tools like NDoc and Sandcastle only propagate the incorrect use of comments by providing a plethora of tags that encourage you to provide comments that tell the reader things that he should be able to discern from the code itself.
Over time, the comments tend to fall out of synch with the code. This tends to be true regardless of whether or not we're using documentation software, which purports to make documentation easier because it puts the documentation closer to the code it describes. Even though the documentation is right there next to the method, property, event, class, or other type, developers still have a hard time remembering to update it if and when the intrinsic behavior changes. Consequently, the documentation loses its value.
It's worth noting that these problems are, by and large, due to the misuse of comments. If comments are used solely as a means of conveying intent, these issues go the way of the dodo, since the intent of any given type or its members is unlikely to change over time. (If it does, a better plan is to write a new member and deprecate the old one with a reference to the new one.)
Comments can have immense value if they are used properly. But that means knowing what they are best used for, and constraining their use to that scope. If you fail to do that, what you end up with is a plethora of comments that are incorrect, misleading, and a source of busywork (at increased cost) since you now have to either remove them or somehow get them corrected.
It's worth it to have a strategy for using comments in a meaningful way that prevents them from becoming a time, energy, and money sink.
Studies have stated that optimal readability happens when you have about 1 line of comments for 10 lines of code. Of course, that's not to say that you need to keep your ration at 1/10 and panic if you go over. But it's a good way to give you an idea of how much you should be commenting.
Also remember that comments are a code smell. That is to say that they may be indicative of bad code but aren't necessarily so. The reason for this is that code that is more difficult to understand is commented more.

When does a code base become large and unwieldy? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
When do you start to consider a code base to be getting too large and unwieldy?
-when a significant amount of your coding time is devoted to "where do I put this code?"
-when reasoning about side-effects starts to become really hard.
-when there's a significant amount of code that's just "in there", and nobody knows what it does or if it's still running but it's too scary to remove
-when lots of team members spend significant chunks of their time chasing down intermittent bugs caused by some empty string somewhere in the data where it wasn't expected, or something that you think would usually be caught in a well-written application, in some edge case
-when, in considering how to implement a new feature, "complete rewrite" starts to seem like a good answer
-when you dread looking at the mess of code you need to maintain and wish you could find work building something clean and logical instead of dumpster diving through the detritus of someone else's poorly organized thinking
When it's over 100 lines. Joke. This is probably the hardest question to answer, because it's very individual.
But if you structure the application well and use different layers for i.e. interfaces, data, services and front-end you will automaticly get a nice "base"-structure. Then you can dividie each layer into different classes and then inside the classes you point out the appropriet methods for the class.
However, there's not an "x amount of lines per method is bad" but think of it more like this, if there is possibility of replication, split it from the current peice and make it re-usable.
Re-using code is the basics of all good structure.
And splitting up into different layers will help the base to become more and more flexible and modular.
There exist some calculable metrics if that's what you're searching for. Static code analysis tools can help with that:
Here's one list: http://checkstyle.sourceforge.net/config_metrics.html
Other factors can be the time it takes to change/add something.
Other non-calculable factors can be
the risk associated to changes
the level intermingling of features.
if the documentation can keep up with the features / code
if the documentation represent the application.
the level of training needed.
the quantity of repeat instead of reuse.
Ah, the god-program anti-pattern.
When you can't remember at least the
outline of sections of it.
When you have to think about how
changes will affect itself or
dependencies.
When you can't remember all the
things it's dependant on or depend
on it.
When it takes more than a few
minutes(?) to download the source or
compile.
When you have to worry about how to
deploy new versions.
When you encounter classes which are
functionally identical to other
classes elsewhere in the app.
So many possible signs.
I think there are many thoughts to why some code base is too large.
It is hard to remain in a constant naming convention. If classes/methods/atributes can't be named consistently or can't be found consistently, then it's time to reorganize.
When your programmers are surfing the web and going to lunch in order to compile. Keeping compiling/linking time to a minimum is important for management. The last thing you want is a programmer to get distracted by twiddling their thumbs for too long.
When small changes start to affect many MANY other places of code. There is a benefit to consolidation of code, but there is also a cost. If a small change to fix one bug causes a dozen more, and this is commonly happens, then your code base needs to be spread out (versioned libraries) or possibly unconsolidated (yes, duplicate code).
If the learning curve of new programmers to the project is obviously longer than acceptable (usually 90 days), then your code base/training isn't set up right.
..There are many many more, I'm sure. If you think about it from these three perspectives:
Is it hard to support?
Is it hard to change?
Is it hard to learn?
...Then you will have an idea if your code fits the "large and unwieldy" category
For me, code becomes unwieldy when there's been a lot of changes made to the codebase that weren't planned for when the program was initially written or last refactored significantly. At this point, stuff starts to get fitted into the existing codebase in odd places for expediency and you start to get a lot of design artifacts that only make sense if you know the history of the implementation.
Short answer: it depends on the project.
Long answer:
A codebase doesn't have to be large to be unwieldy - spaghetti code can be written from line 1. So, there's not really a magic tripping point from good to bad - it's more of a spectrum of great <---> awful, and it takes daily effort to keep your codebase from heading in the wrong direction. What you generally need is a lead developer that has the ability to review others' code objectively, and keep an eye on the architecture and design of the code as a whole - no one line developer can do that.
When I can't remember what a class does or what other classes it uses off the top of my head. It's really more a function of my cognitive capacity coupled with the code complexity.
I was trying to think of a way of deciding based on how your collegues perceive it to be.
During my first week at a gig a few years ago, I said during a stand-up that I had been tracking a white rabbit around the ContainerManagerBean, the ContainerManagementBean and the ContextManagerBean (it makes me shudder just recalling these words!). At least two of the developers looked at their shoes and I could see them keeping in a snigger.
Right then and there, I knew that this was not a problem with my lack of familiarity with the codebase - all the developers perceived a problem with it.
If over years of development different people code change requests and bug fixes you will sooner or later get parts of code with duplicated functionality, very similar classes, some spaghetti etc.
This is mostly due to the fact that a fix is needed fast and the "new guy" doesn't know the code base. So he happily codes away something which is already there.
But if you have automatic checks in place checking the style, unit test code coverage and similar you can avoid some of it.
A lot of the things that people have identified as indicating problems don't really have to do with the raw size of the codebase, but rather its comprehensibility. How does size relate to comprehensibility? If at all...
I've seen very short programs that are just a mess -- easier to throw away and redo from scratch. I've also seen very large programs whose structure is transparent enough that it is comprehensible even at progressively more detailed views of it. And everything in between...
I think look at this question from the standpoint of an entire codebase is a good one, but it probably pays to work up from the bottom and look first at the comprehensibility of individual classes, to multi-class components, to subsystems, and finally up to an entire system. I would expect the answers at each level of detail to build on each other.
For my money, the simplest benchmark is this: Can you explain the essence of what X does in one sentence? Where X is some granularity of component, and you can assume an understanding of the levels immediately above and below the component.
When you come to need a utility method or class, and have no idea whether someone else has already implemented it or have any idea where to look for one.
Related: when several slightly different implementations of the same functionality exist, because each author was unaware of other authors' work.

Resources