Should private and protected variables, methods, and classes be commented? - coding-style

When creating a private or protected variable, method, class, etc., should it be commented with the documentation comment?

Yes! The comments are to help any developer - yourself included - when reviewing, maintaining or extending the code in future. Whether it's public/private shouldn't be an influencing factor, quite simply if you think something isn't clear enough without a comment, put one in.
(Of course the best documentation is clear self-documenting code in the first place)

Some people will no doubt tell you that nothing needs to be commented (and technically they are right in that comments have no effect on output). However, it's up to 'coding style' like you tagged it as. I personally always comment all variables in addition to giving them a descriptive name. Remember other people may want to work with your source, or you might want to in a years time, in which case it's worth the few seconds to document it while you still know what it does.

Definitely yes. When for example you find a bug in your code after like three months, with commenting it will be easier to recall what this code was supposed to do.

Commenting individual variables is occasionally helpful, but more often than not variables will have logical groupings that will be expected to uphold certain invariants. A comment describing how the group as a whole is supposed to behave will often be more useful than comments describing individual variables.
For example, if an EditablePolygon class in Java might contain four essential fields:
int[] xCoords;
int[] yCoords;
int numCoords;
int sharedPortion;
and expect to uphold the invariants that both arrays will always be the same length, and that length will be >= numCoords, and all coordinates of interest will be in array slots below numCoords. It may further specify that there may exist multiple EditablePolygon objects sharing the same arrays, provided that all but one such object has a sharedPortion greater than numCoords or equal to the array length, and that one object's sharePortion is no less than the numCoords value of any of the others [making a clone of a shape require a defensive copy unless a change is requested to part of the original which was shared with the clone, or to any part of the clone [which is entirely shared with the original].
Note that the most important things for the comments to document are (1) the array lengths may exceed the number of points, and (2) certain portions of the array may be shared. The first may be somewhat obvious from the code, but the second will likely be far less obvious. The field sharedPortion does have some meaning in isolation, but its meaning and purpose can really only be understood in relation to the other variables.

It's a good practice to document methods and Classes. Moreover javadocs for public methods should be more stressed as those act as reference manual for external objects. Similarly Javadoc could be beneficial for public variables, though i personally is not in favor of having comments for variables.


Any documentation/article about the `&MyType{}` pattern in golang?

In most golang codebases I look, people are using types by reference:
type Foo struct {}
myFoo := &Foo{}
I usually take the opposite approach, passing everything as copy and only pass by reference when I want to perform something destructive on the value, which allows me to easily spot destructive functions (and which is fairly rare).
But seeing how references are commonplace, I guess it's not just a matter of taste. I get there's a cost in duplicating values, is it that much of a game changer? Or are there other reasons why references are preferred?
It would be great if someone could point me to an article or documentation about why references are preferred.
Go is pass by value. I try to use references like in your example as much as possible to remove the mental process of thinking about not making duplicates of objects. Go is mostly meant for networking & scaling, which makes performance a priority. Obvious downside of this is as you say, receiving methods can destroy the object that the pointer points to.
Otherwise there is no rule as to which you should use. Both are quite ok.
Also, somewhat related to the question, from the Go docs: Pointers vs. Values

When would you swap two numbers without using a third variable?

I have read several sources that discuss how to swap two numbers without using a third variable. These are a few of the most relevant:
How do you swap two integer variables without using any if conditions, casting, or additional variables?
Potential Problem in "Swapping values of two variables without using a third variable"
Swap two integers without using a third variable
I understand why it doesn't make sense to use the described methods in most cases: the code becomes cluttered and difficult to read and will usually execute more slowly than a solution utilizing a third "temp" variable. However, none of the questions I have found discuss any benefits of the two-variable methods in practice. Do they have any redeeming qualities or benefits(historical or contemporary), or are they only useful as obscure programming trivia?
At this point it's just a neat trick. Speed-wise if it makes sense though your compiler will recognize a normal swap and optimize it appropriately (but no guarantees that it will recognize weird xoring and optimize that appropriately).
Another strike against xor is that if one variable alias the other, xor’ing them will zero both out. Since you’ll have to check for and handle this condition, you’ll have extra code involved – probably by using the third variable method.
You could also try adding and subtracting values… except that you’d have to check for and handle overflow, which would involve more code (probably the third variable method). Multiplication and division have the same flaw, but more importantly, there’s the exquisite delight of representing fractions in binary (so this wouldn’t work in the first place).
Edit: D’oh, sorry for the thread necromancy… got so caught up in following links that I forgot to check the dates.

Should class members be sorted?

On a new project with a new team, should we enforce to sort the members of the classes automatically in a specific order (e.g. by modifier and alphabet) prior to check-in?
The alternative is to let each developer group the members as he thinks. And since everyone has a different opinion of what is related and how the grouping should be, this pretty much comes down to random order.
So what are the pros and cons of having them sorted automatically? Is this bound to a specific IDE/development-process/build-process/language? What else do we have to consider?
Edit to foster more answers:
I once was in a project where we had to maintain several branches. Because of the inability of the RCS to support this appropriately (SVN by the time), we had to manually move classes and methods from one branch to another and than merge back again (most RCS can maintain a subset-superset-relation only in one direction). Because the methods could appear anywhere in the class in any order, merging was a nightmare. Enforcing automatic sorting of members right from the beginning would have avoided much of the pain.
On the other hand, if working in a long existing project without automatic sort order, it can be a bad idea to enforce this. Moving all the members around is basically the same as throwing away the versioning up to this point, because comparing files with older versions via diff will be no good anymore for the same reason that merging in the other project was a pain.
Same goes if refactoring is due. When methods are renamed they will also be moved, making a diff of two versions practically pointless. With different names AND different places, it is difficult to recognize methods again.
Given that your IDE can sort your members the way you prefer, I'd personally avoid a global company policy on the matter.
I think rules-for-rules-sake are an important factor in de-motivating a team. As programmers we have a certain mindset, a certain way of seeing the world. Practicality and pragmatism are often valued higher by many programmers than policy.
If it's a quick click of a couple of menu items to have the code look the way you want it to when it's your turn to look at it, I'd stick with those few clicks. (and make this into a quick keyboard shortcut for your convenience)
I like to have a consistent code layout, but I have learned the hard way that anything which only touches the topic of "coding style" always leads to endless discussions and can waste a lot of time. It is not worth it.
Far more important is to make decisions on other topics (architecture and design, tests, how to communicate).
Usually I tend to assume that related members will be grouped together over time. I see no advantage in using an alphabetical sort order, because that is what the IDE can do for me.
Renaming, moving code, deleting green code, adding comments is nothing I like to see mixed with other changes. That is why I usually split it into two changes - one, that updates the "code layout/style" and another, which changes the behaviour of the program.
In my case... I consider usefull to order by access level. I follow the StyleCop rules (.net but valid in any other languaje)
Protected Internal
Inside of this groups... I've some randomness, but I always put things like Id's or unique identificator first.
I'm not saying this is the better good practice in the word, but at least people know where to look for things.
Depending of the lenguaje and the IDE you choose, maybe you could be lucky and find a tool that rearange the code for you based on your owns preferences. (Resharper, in my case, It's a good help)
I consider sorting of class members useful if it results in better readability of code. A sorting scheme should not be too strict but strict enough to add to better code readability. I prefer this sorting scheme:
static fields
instance fields
Each method that calls another method (mostly private) the called method should be below the calling method.
As pointed out above the only reason to order class members should be better readability because you write code once but read it a hundred times, so having an accepted (by the team) order system can boost productivity.
Ordering code to work around inabilities of RCS will not per se lead to better readability and thus will not boost productivity. In most cases such an ordering method will fail. I'm in doubt if an alphabetic order method could lead to better readability.

How much duplicated code do you tolerate? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
In a recent code review I spotted a few lines of duplicated logic in a class (less than 15 lines). When I suggested that the author refactor the code, he argued that the code is simpler to understand that way. After reading the code again, I have to agree extracting the duplicated logic would hurt readability a little.
I know DRY is guideline, not an absolute rule. But in general, are you willing to hurt readability in the name of DRY?
Refactoring: Improving the Design of Existing Code
The Rule of Three
The first time you do something, you
just do it. The second time you do
something similar, you wince at the duplication, but you do the duplicate
thing anyway. The third time you do something similar, you refactor.
Three strikes and you refactor.
Coders at Work
Seibel: So for each of these XII calls you're writing an
Did you ever find that you were accumulating lots of
bits of very similar code?
Zawinski: Oh, yeah, definitely. Usually by the second or third time
you've cut and pasted
that piece of code it's like, alright, time to stop
cutting and pasting and put it in a
I tolerate none. I may end up having some due to time constraints or whatnot. But I still haven't found a case where duplicated code is really warranted.
Saying that it'll hurt readability only suggests that you are bad at picking names :-)
Personally, I prefer keeping code understandable, first and foremost.
DRY is about easing the maintenance in code. Making your code less understandable in order to remove repeated code hurts the maintainability more, in many cases, than having some repeated lines of code.
That being said, I do agree that DRY is a good goal to follow, when practical.
If the code in question has a clear business or technology-support purpose P, you should generally refactor it. Otherwise you'll have the classic problem with cloned code: eventually you'll discover a need to modify code supporting P, and you won't find all the clones that implement it.
Some folks suggest 3 or more copies is the threshold for refactoring. I believe that if you have two, you should do so; finding the other clone(s) [or even knowing they might exist] in a big system is hard, whether you have two or three or more.
Now this answer is provided in the context of not having any tools for finding the clones. If you can reliably find clones, then the original reason to refactor (avoiding maintenance errors) is less persausive (the utility of having a named abstraction is still real). What you really want is a way to find and track clones; abstracting them is one way to ensure you can "find" them (by making finding trivial).
A tool that can find clones reliably can at least prevent you from making failure-to-update-clone maintenance errors. One such tool (I'm the author) is the CloneDR. CloneDR finds clones using the targeted langauge structure as guidance, and thus finds clones regardless of whitespace layout, changes in comments, renamed variables, etc. (It is implemented for a number a languages including C, C++, Java, C#, COBOL and PHP). CloneDR will find clones across large systems, without being given any guidance. Detected clones are shown, as well as the antiunifier, which is essentially the abstraction you might have written instead. Versions of it (for COBOL) now integrate with Eclipse, and show you when you are editing inside a clone in a buffer, as well as where the other clones are, so that you may inspect/revise the others while you are there. (One thing you might do is refactor them :).
I used to think cloning was just outright wrong, but people do it because they don't know how the clone will vary from the original and so the final abstraction isn't clear at the moment the cloning act is occurring. Now I believe that cloning is good, if you can track the clones and you attempt to refactor after the abstraction becomes clear.
As soon as you repeat anything you're creating multiple places to have make edits if you find that you've made a mistake, need to extend it, edit, delete or any other of the dozens of other reasons you might come up against that force a change.
In most languages, extracting a block to a suitably named method can rarely hurt your readability.
It is your code, with your standards, but my basic answer to your "how much?" is none ...
you didn't say what language but in most IDEs it is a simple Refactor -> Extract Method. How much easier is that, and a single method with some arguments is much more maintainable than 2 blocks of duplicate code.
Very difficult to say in abstract. But my own belief is that even one line of duplicated code should be made into a function. Of course, I don't always achieve this high standard myself.
Refactoring can be difficult, and this depends on the language. All languages have limitations, and sometimes a refactored version of duplicated logic can be linguistically more complex than the repeated code.
Often duplications of code LOGIC occur when two objects, with different base classes, have similarities in the way they operate. For example 2 GUI components that both display values, but don't implement a common interface for accessing that value. Refactoring this kind of system either requires methods taking more generic objects than needed, followed by typechecking and casting, or else the class hierarchy needs to be rethought & restructured.
This situation is different than if the code was exactly duplicated. I would not necessarily create a new interface class if I only intended it to be used twice, and both times within the same function.
The point of DRY is maintainability. If code is harder to understand it's harder to maintain, so if refactoring hurts readability you may actually be failing to meet DRY's goal. For less than 15 lines of code, I'd be inclined to agree with your classmate.
In general, no. Not for readability anyway. There is always some way to refactor the duplicated code into an intention revealing common method that reads like a book, IMO.
If you want to make an argument for violating DRY in order to avoid introducing dependencies, that might carry more weight, and you can get Ayende's opinionated opinion along with code to illustrate the point here.
Unless your dev is actually Ayende though I would hold tight to DRY and get the readability through intention revealing methods.
I accept NO duplicate code. If something is used in more than one place, it will be part of the framework or at least a utility library.
The best line of code is a line of code not written.
It really depends on many factors, how much the code is used, readability, etc. In this case, if there is just one copy of the code and it is easier to read this way then maybe it is fine. But if you need to use the same code in a third place I would seriously consider refactoring it into a common function.
Readability is one of the most important things code can have, and I'm unwilling to compromise on it. Duplicated code is a bad smell, not a mortal sin.
That being said, there are issues here.
If this code is supposed to be the same, rather than is coincidentally the same, there's a maintainability risk. I'd have comments in each place pointing to the other, and if it needed to be in a third place I'd refactor it out. (I actually do have code like this, in two different programs that don't share appropriate code files, so comments in each program point to the other.)
You haven't said if the lines make a coherent whole, performing some function you can easily describe. If they do, refactor them out. This is unlikely to be the case, since you agree that the code is more readable embedded in two places. However, you could look for a larger or smaller similarity, and perhaps factor out a function to simplify the code. Just because a dozen lines of code are repeated doesn't mean a function should consist of that dozen lines and no more.

When should I break a function?

Its prudent to break a long function into a chief function and helper functions.
I know that the outside the module only chief function will be called, but its long length may prove to be intimidating.
Textbooks put a limit on the number of lines, but I feel that this is too rigid.
P.S. I am programming in Python and need to process incoming, messages. The function returns a tuple containing the message but in Python's internal data types.
So you can see somewhat independent code for each message type.
Duplicate Question
When is a function too long?
I think you need to go about this from the other end of the problem. Think bottom-up. Identify small units of work, as small as possible, and start composing your code that way. You will only run into spaghetti-code issues when you code top-down and don't keep a structured approach.
If you already have spaghetti code and need to refactor, you pretty much have to start over. It is probably more work to break up existing spaghetti code than to rewrite it, and the result may not be as good.
I don't think there should be a hard number for the lines of code in a method either, but well written code does not have methods with more than 5 to 10 lines in the lower layers, and 20 to 30 lines in the business logic. To give you some kind of metric.
I'm not a big fan of breaking a function into multiple functions unnecessarily. It's not a hard and fast thing - if there are things that seem like distinct logical units, then by all means, break those out and think about them separately. But don't just break things out for the sake of some guideline like "one page per function" or "N lines per function".
One good rule of thumb is that if it doesn't fit on a single screen it is worth thinking about splitting it up. But only if it makes sense to split it up, some long functions are perfectly readable and it doesn't make any sense to slavishly split them into multiple functions just for the sake of it.
Never write a function that, when printed on fanfold paper, is taller than you are.
I like the rule of thumb that you should break out the subfunction if you can think of a good domain-relevant name for it.
When someone can understand the top-level function without necessarily having to look up the definition of the sub-function, you've likely made a net gain. (But when you break it down too far, your names start referring to your implementation artifacts rather than the domain)
I was recently discussing this with a friend. He suggested refactoring to separate concerns and I must say I have to agree. That is, one function should do one thing, if it does more than one thing, split it up. If not, let it be together, it makes no sense to split up a function, only to have it obfuscate the meaning. After all, a function is a block of code that does one thing!
The limit in term of number of lines is often impractical becuase it doesn't account for readability well. It's better to try to seperate groups of lines of code that have just a few inputs and just a few outputs and make this a separate functon. It's not always possible - then it's often wise to just leave the code as it is and not to refactor for the sake of refactoring.
Well since I am coding in Python so I have the liberty to write functions inside functions, unlike C, C++ or Java. This i feel is a better choice.
It's not specified. But line should be as low as possible. But you may follow the Role of 30. I follow this in my PHP scripts when needed.
Rule of 30:
“Rule of 30” in Refactoring in Large Software Projects by Martin Lippert and Stephen Roock:
Methods should not have more than an average of 30 code lines.
A class should contain an average of less than 30 methods.
A package/library shouldn’t contain more than 30 classes.
Subsystems should avoid more than 30 packages.
A system more than 30 subsystems may create problem.
If an element consists of more than 30 subelements, it is highly probable that there is a serious problem.
personally I break a function if it either saves total lines or total processing time.
if I only run the helper once per chief function I don't bother
The point is that in principal it's better to have specialiced functions. But where one sets the limit depends very much on
1) the "usual" programming style in certain languages. (one can observe that, object-oriented langauges tend to shorter procedureds than let's say C or the like
2) it depends on your way of programming. Every hard limit must be questioned. IMHO. Overall there will probably some "natural" distribution of programs
3) I think what one should keep on one's mind is that a function should do a certain task take for example some function for parsing it is usually much longer than a function just settin some field in a structure. Or getting back just consider how a event loop in the Windows API may look. So that all suggests that there may be good reasons for long methods...
If there is independent code (in your case specifics for each message type) those areas should be broken out.
Size matters not. Judge me by my size do you? - Yoda
Your main concerns are readability, simplicity and maintainability. A good indicator is if you need to write comments to explain a section of a function then that section is a good candidate for a separate function.
There are many reasons to break a long function into its constituent pieces. Most important is:
code clarity/intent
Some functions simple cannot be broken into smaller pieces without negatively impacting the listed goals, so there is no hard-and-fast rule.
If you didn't write it and it's already in production: NEVER!!! If you break it up, you're likely to break it, it's that simple.
If you are writing it and you're not sure, the on screen rule apples as others have said.
