Related
Like lots of you guys on SO, I often write in several languages. And when it comes to planning stuff, (or even answering some SO questions), I actually think and write in some unspecified hybrid language. Although I used to be taught to do this using flow diagrams or UML-like diagrams, in retrospect, I find "my" pseudocode language has components of C, Python, Java, bash, Matlab, perl, Basic. I seem to unconsciously select the idiom best suited to expressing the concept/algorithm.
Common idioms might include Java-like braces for scope, pythonic list comprehensions or indentation, C++like inheritance, C#-style lambdas, matlab-like slices and matrix operations.
I noticed that it's actually quite easy for people to recognise exactly what I'm triying to do, and quite easy for people to intelligently translate into other languages. Of course, that step involves considering the corner cases, and the moments where each language behaves idiosyncratically.
But in reality, most of these languages share a subset of keywords and library functions which generally behave identically - maths functions, type names, while/for/if etc. Clearly I'd have to exclude many 'odd' languages like lisp, APL derivatives, but...
So my questions are,
Does code already exist that recognises the programming language of a text file? (Surely this must be a less complicated task than eclipse's syntax trees or than google translate's language guessing feature, right?) In fact, does the SO syntax highlighter do anything like this?
Is it theoretically possible to create a single interpreter or compiler that recognises what language idiom you're using at any moment and (maybe "intelligently") executes or translates to a runnable form. And flags the corner cases where my syntax is ambiguous with regards to behaviour. Immediate difficulties I see include: knowing when to switch between indentation-dependent and brace-dependent modes, recognising funny operators (like *pointer vs *kwargs) and knowing when to use list vs array-like representations.
Is there any language or interpreter in existence, that can manage this kind of flexible interpreting?
Have I missed an obvious obstacle to this being possible?
edit
Thanks all for your answers and ideas. I am planning to write a constraint-based heuristic translator that could, potentially, "solve" code for the intended meaning and translate into real python code. It will notice keywords from many common languages, and will use syntactic clues to disambiguate the human's intentions - like spacing, brackets, optional helper words like let or then, context of how variables are previously used etc, plus knowledge of common conventions (like capital names, i for iteration, and some simplistic limited understanding of naming of variables/methods e.g containing the word get, asynchronous, count, last, previous, my etc). In real pseudocode, variable naming is as informative as the operations themselves!
Using these clues it will create assumptions as to the implementation of each operation (like 0/1 based indexing, when should exceptions be caught or ignored, what variables ought to be const/global/local, where to start and end execution, and what bits should be in separate threads, notice when numerical units match / need converting). Each assumption will have a given certainty - and the program will list the assumptions on each statement, as it coaxes what you write into something executable!
For each assumption, you can 'clarify' your code if you don't like the initial interpretation. The libraries issue is very interesting. My translator, like some IDE's, will read all definitions available from all modules, use some statistics about which classes/methods are used most frequently and in what contexts, and just guess! (adding a note to the program to say why it guessed as such...) I guess it should attempt to execute everything, and warn you about what it doesn't like. It should allow anything, but let you know what the several alternative interpretations are, if you're being ambiguous.
It will certainly be some time before it can manage such unusual examples like #Albin Sunnanbo's ImportantCustomer example. But I'll let you know how I get on!
I think that is quite useless for everything but toy examples and strict mathematical algorithms. For everything else the language is not just the language. There are lots of standard libraries and whole environments around the languages. I think I write almost as many lines of library calls as I write "actual code".
In C# you have .NET Framework, in C++ you have STL, in Java you have some Java libraries, etc.
The difference between those libraries are too big to be just syntactic nuances.
<subjective>
There has been attempts at unifying language constructs of different languages to a "unified syntax". That was called 4GL language and never really took of.
</subjective>
As a side note I have seen a code example about a page long that was valid as c#, Java and Java script code. That can serve as an example of where it is impossible to determine the actual language used.
Edit:
Besides, the whole purpose of pseudocode is that it does not need to compile in any way. The reason you write pseudocode is to create a "sketch", however sloppy you like.
foreach c in ImportantCustomers{== OrderValue >=$1M}
SendMailInviteToSpecialEvent(c)
Now tell me what language it is and write an interpreter for that.
To detect what programming language is used: Detecting programming language from a snippet
I think it should be possible. The approach in 1. could be leveraged to do this, I think. I would try to do it iteratively: detect the syntax used in the first line/clause of code, "compile" it to intermediate form based on that detection, along with any important syntax (e.g. begin/end wrappers). Then the next line/clause etc. Basically write a parser that attempts to recognize each "chunk". Ambiguity could be flagged by the same algorithm.
I doubt that this has been done ... seems like the cognitive load of learning to write e.g. python-compatible pseudocode would be much easier than trying to debug the cases where your interpreter fails.
a. I think the biggest problem is that most pseudocode is invalid in any language. For example, I might completely skip object initialization in a block of pseudocode because for a human reader it is almost always straightforward to infer. But for your case it might be completely invalid in the language syntax of choice, and it might be impossible to automatically determine e.g. the class of the object (it might not even exist). Etc.
b. I think the best you can hope for is an interpreter that "works" (subject to 4a) for your pseudocode only, no-one else's.
Note that I don't think that 4a,4b are necessarily obstacles to it being possible. I just think it won't be useful for any practical purpose.
Recognizing what language a program is in is really not that big a deal. Recognizing the language of a snippet is more difficult, and recognizing snippets that aren't clearly delimited (what do you do if four lines are Python and the next one is C or Java?) is going to be really difficult.
Assuming you got the lines assigned to the right language, doing any sort of compilation would require specialized compilers for all languages that would cooperate. This is a tremendous job in itself.
Moreover, when you write pseudo-code you aren't worrying about the syntax. (If you are, you're doing it wrong.) You'll wind up with code that simply can't be compiled because it's incomplete or even contradictory.
And, assuming you overcame all these obstacles, how certain would you be that the pseudo-code was being interpreted the way you were thinking?
What you would have would be a new computer language, that you would have to write correct programs in. It would be a sprawling and ambiguous language, very difficult to work with properly. It would require great care in its use. It would be almost exactly what you don't want in pseudo-code. The value of pseudo-code is that you can quickly sketch out your algorithms, without worrying about the details. That would be completely lost.
If you want an easy-to-write language, learn one. Python is a good choice. Use pseudo-code for sketching out how processing is supposed to occur, not as a compilable language.
An interesting approach would be a "type-as-you-go" pseudocode interpreter. That is, you would set the language to be used up front, and then it would attempt to convert the pseudo code to real code, in real time, as you typed. An interactive facility could be used to clarify ambiguous stuff and allow corrections. Part of the mechanism could be a library of code which the converter tried to match. Over time, it could learn and adapt its translation based on the habits of a particular user.
People who program all the time will probably prefer to just use the language in most cases. However, I could see the above being a great boon to learners, "non-programmer programmers" such as scientists, and for use in brainstorming sessions with programmers of various languages and skill levels.
-Neil
Programs interpreting human input need to be given the option of saying "I don't know." The language PL/I is a famous example of a system designed to find a reasonable interpretation of anything resembling a computer program that could cause havoc when it guessed wrong: see http://horningtales.blogspot.com/2006/10/my-first-pli-program.html
Note that in the later language C++, when it resolves possible ambiguities it limits the scope of the type coercions it tries, and that it will flag an error if there is not a unique best interpretation.
I have a feeling that the answer to 2. is NO. All I need to prove it false is a code snippet that can be interpreted in more than one way by a competent programmer.
Does code already exist that
recognises the programming language
of a text file?
Yes, the Unix file command.
(Surely this must be a less
complicated task than eclipse's syntax
trees or than google translate's
language guessing feature, right?) In
fact, does the SO syntax highlighter
do anything like this?
As far as I can tell, SO has a one-size-fits-all syntax highlighter that tries to combine the keywords and comment syntax of every major language. Sometimes it gets it wrong:
def median(seq):
"""Returns the median of a list."""
seq_sorted = sorted(seq)
if len(seq) & 1:
# For an odd-length list, return the middle item
return seq_sorted[len(seq) // 2]
else:
# For an even-length list, return the mean of the 2 middle items
return (seq_sorted[len(seq) // 2 - 1] + seq_sorted[len(seq) // 2]) / 2
Note that SO's highlighter assumes that // starts a C++-style comment, but in Python it's the integer division operator.
This is going to be a major problem if you try to combine multiple languages into one. What do you do if the same token has different meanings in different languages? Similar situations are:
Is ^ exponentiation like in BASIC, or bitwise XOR like in C?
Is || logical OR like in C, or string concatenation like in SQL?
What is 1 + "2"? Is the number converted to a string (giving "12"), or is the string converted to a number (giving 3)?
Is there any language or interpreter
in existence, that can manage this
kind of flexible interpreting?
On another forum, I heard a story of a compiler (IIRC, for FORTRAN) that would compile any program regardless of syntax errors. If you had the line
= Y + Z
The compiler would recognize that a variable was missing and automatically convert the statement to X = Y + Z, regardless of whether you had an X in your program or not.
This programmer had a convention of starting comment blocks with a line of hyphens, like this:
C ----------------------------------------
But one day, they forgot the leading C, and the compiler choked trying to add dozens of variables between what it thought was subtraction operators.
"Flexible parsing" is not always a good thing.
To create a "pseudocode interpreter," it might be necessary to design a programming language that allows user-defined extensions to its syntax. There already are several programming languages with this feature, such as Coq, Seed7, Agda, and Lever. A particularly interesting example is the Inform programming language, since its syntax is essentially "structured English."
The Coq programming language allows "syntax extensions", so the language can be extended to parse new operators:
Notation "A /\ B" := (and A B).
Similarly, the Seed7 programming language can be extended to parse "pseudocode" using "structured syntax definitions." The while loop in Seed7 is defined in this way:
syntax expr: .while.().do.().end.while is -> 25;
Alternatively, it might be possible to "train" a statistical machine translation system to translate pseudocode into a real programming language, though this would require a large corpus of parallel texts.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
In a recent code review I spotted a few lines of duplicated logic in a class (less than 15 lines). When I suggested that the author refactor the code, he argued that the code is simpler to understand that way. After reading the code again, I have to agree extracting the duplicated logic would hurt readability a little.
I know DRY is guideline, not an absolute rule. But in general, are you willing to hurt readability in the name of DRY?
Refactoring: Improving the Design of Existing Code
The Rule of Three
The first time you do something, you
just do it. The second time you do
something similar, you wince at the duplication, but you do the duplicate
thing anyway. The third time you do something similar, you refactor.
Three strikes and you refactor.
Coders at Work
Seibel: So for each of these XII calls you're writing an
implementation.
Did you ever find that you were accumulating lots of
bits of very similar code?
Zawinski: Oh, yeah, definitely. Usually by the second or third time
you've cut and pasted
that piece of code it's like, alright, time to stop
cutting and pasting and put it in a
subroutine.
I tolerate none. I may end up having some due to time constraints or whatnot. But I still haven't found a case where duplicated code is really warranted.
Saying that it'll hurt readability only suggests that you are bad at picking names :-)
Personally, I prefer keeping code understandable, first and foremost.
DRY is about easing the maintenance in code. Making your code less understandable in order to remove repeated code hurts the maintainability more, in many cases, than having some repeated lines of code.
That being said, I do agree that DRY is a good goal to follow, when practical.
If the code in question has a clear business or technology-support purpose P, you should generally refactor it. Otherwise you'll have the classic problem with cloned code: eventually you'll discover a need to modify code supporting P, and you won't find all the clones that implement it.
Some folks suggest 3 or more copies is the threshold for refactoring. I believe that if you have two, you should do so; finding the other clone(s) [or even knowing they might exist] in a big system is hard, whether you have two or three or more.
Now this answer is provided in the context of not having any tools for finding the clones. If you can reliably find clones, then the original reason to refactor (avoiding maintenance errors) is less persausive (the utility of having a named abstraction is still real). What you really want is a way to find and track clones; abstracting them is one way to ensure you can "find" them (by making finding trivial).
A tool that can find clones reliably can at least prevent you from making failure-to-update-clone maintenance errors. One such tool (I'm the author) is the CloneDR. CloneDR finds clones using the targeted langauge structure as guidance, and thus finds clones regardless of whitespace layout, changes in comments, renamed variables, etc. (It is implemented for a number a languages including C, C++, Java, C#, COBOL and PHP). CloneDR will find clones across large systems, without being given any guidance. Detected clones are shown, as well as the antiunifier, which is essentially the abstraction you might have written instead. Versions of it (for COBOL) now integrate with Eclipse, and show you when you are editing inside a clone in a buffer, as well as where the other clones are, so that you may inspect/revise the others while you are there. (One thing you might do is refactor them :).
I used to think cloning was just outright wrong, but people do it because they don't know how the clone will vary from the original and so the final abstraction isn't clear at the moment the cloning act is occurring. Now I believe that cloning is good, if you can track the clones and you attempt to refactor after the abstraction becomes clear.
As soon as you repeat anything you're creating multiple places to have make edits if you find that you've made a mistake, need to extend it, edit, delete or any other of the dozens of other reasons you might come up against that force a change.
In most languages, extracting a block to a suitably named method can rarely hurt your readability.
It is your code, with your standards, but my basic answer to your "how much?" is none ...
you didn't say what language but in most IDEs it is a simple Refactor -> Extract Method. How much easier is that, and a single method with some arguments is much more maintainable than 2 blocks of duplicate code.
Very difficult to say in abstract. But my own belief is that even one line of duplicated code should be made into a function. Of course, I don't always achieve this high standard myself.
Refactoring can be difficult, and this depends on the language. All languages have limitations, and sometimes a refactored version of duplicated logic can be linguistically more complex than the repeated code.
Often duplications of code LOGIC occur when two objects, with different base classes, have similarities in the way they operate. For example 2 GUI components that both display values, but don't implement a common interface for accessing that value. Refactoring this kind of system either requires methods taking more generic objects than needed, followed by typechecking and casting, or else the class hierarchy needs to be rethought & restructured.
This situation is different than if the code was exactly duplicated. I would not necessarily create a new interface class if I only intended it to be used twice, and both times within the same function.
The point of DRY is maintainability. If code is harder to understand it's harder to maintain, so if refactoring hurts readability you may actually be failing to meet DRY's goal. For less than 15 lines of code, I'd be inclined to agree with your classmate.
In general, no. Not for readability anyway. There is always some way to refactor the duplicated code into an intention revealing common method that reads like a book, IMO.
If you want to make an argument for violating DRY in order to avoid introducing dependencies, that might carry more weight, and you can get Ayende's opinionated opinion along with code to illustrate the point here.
Unless your dev is actually Ayende though I would hold tight to DRY and get the readability through intention revealing methods.
BH
I accept NO duplicate code. If something is used in more than one place, it will be part of the framework or at least a utility library.
The best line of code is a line of code not written.
It really depends on many factors, how much the code is used, readability, etc. In this case, if there is just one copy of the code and it is easier to read this way then maybe it is fine. But if you need to use the same code in a third place I would seriously consider refactoring it into a common function.
Readability is one of the most important things code can have, and I'm unwilling to compromise on it. Duplicated code is a bad smell, not a mortal sin.
That being said, there are issues here.
If this code is supposed to be the same, rather than is coincidentally the same, there's a maintainability risk. I'd have comments in each place pointing to the other, and if it needed to be in a third place I'd refactor it out. (I actually do have code like this, in two different programs that don't share appropriate code files, so comments in each program point to the other.)
You haven't said if the lines make a coherent whole, performing some function you can easily describe. If they do, refactor them out. This is unlikely to be the case, since you agree that the code is more readable embedded in two places. However, you could look for a larger or smaller similarity, and perhaps factor out a function to simplify the code. Just because a dozen lines of code are repeated doesn't mean a function should consist of that dozen lines and no more.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I heard people say they can understand their python code a year later but not their XYZ code. Why? I dont know what is good about python syntax or what is bad about another. I like C# but i have a feeling VB.NET code is easier to read. I am doing language design so what do you find makes code/syntax/language readable or not readable?
Experience.
IMO, one of the big things is significant white space. Block indention goes a long ways and languages like Python and F# that provide a level of significant white space can help with readability.
Code like Java and C# tend to be structured and the readability becomes a focal point of how it was coded to begin with and not of the language itself.
Code is readable when its written in a style of explicit "stating what you want to do".
This only depends on the language in sofar
it allows you to express what you want (functional-programming!)
it doesn't emphasize on cryptical statements
The rest depends on the style you use to write code (even Perl can be understandable!), but certain languages make it easier to hacky statements.
Clear:
expr = if not (ConditionA and ConditionB) then ... else ...
Unclear:
expr = (!(conditionA && conditionB)) ? ... : ...
Clear:
foreach line in lines:
if (line =~ /regex/):
// code
Unclear:
... if /regex/ foreach(#lines);
Clear:
x = length [ x | x <- [1..10], even x ]
Unclear:
int x = 0;
for (int i = 1; i <= 10; ++i)
if ((i&&1)==0) ++x;
Generally what makes Python considered readable is that it forces a standardized indentation. This means that you'll never be forced to wonder whether you're in an if block or a function, it is clear as day. Even poorly written code therefore becomes obvious.
One language which I generally consider difficult to read is PHP for the same reason (or rather, its opposite). Since programmers are allowed to indent at will, and store variables anywhere, it can get convoluted very quickly. Further, since PHP historically did not have case sensitive function names (PHP < 4.4.7 I believe), this means that there really isn't a consistency in the implementation of the core language either... (Don't get me wrong, I like the language, but a bad coder can REALLY make a mess).
JavaScript also has a lot of problems with undisciplined developers. You'll find yourself wondering where variables have been defined and what scope you're in. Code will not be in one consolidated place, but rather spread across multiple files, and often lurking where unexpected.
ActionScript 3 is a bit better. Generally, there has been a move to have everyone use similar syntax's, and Adobe has gone so far as to define its standards and make them accessible and common. It does not take much to see how the ECMAScript implementation which is supported by a for-profit company is superior to the generalized one.
Readability is a function that takes a lot of inputs. I don't think it's really possible to compile a full list of things that can affect a language's readability. The most general way to describe it is "minimizing cognitive load." A few major factors:
Subtleties of meaning. If two code snippets look very similar at a glance but do different things, it hurts readability because the reader has to stop and deduce what's actually happening.
Meaningless code — aka boilerplate. This doesn't necessarily mean code that does nothing, but code that doesn't tell me anything about what we're actually doing. Every bit of code that doesn't express the actual intent of a function or object reduces readability by that much.
Cramming meaning — aka golf. This is the opposite of the boilerplate problem. It's possible to compress code so far that the reader is forced to stop and examine it pretty much character by character. The exact line where this occurs is somewhat subjective (which is part of why some people love Perl and some people hate it), but it's definitely a real phenomenon.
The programmer makes code readable or unreadable, not the language. Thinking otherwise is just fooling yourself. This is because the only people who are qualified to judge readability are those who know the language. To the non-programmer, all languages are equally unreadable.
I heard people say they can understand their python code a year later but not their XYZ code. Why?
Firstly, I don't think that people say that based solely on syntax. There are a lot of other factors to take into consideration, to name just a few:
The fact that some languages tend to promote only one right way to do something (like Python), and others promote many different ways (Ruby for example, from what I hear [disclaimer: I am not a Ruby programmer])
The libraries the language has. The better designed ones tend to be incredibly easy to understand without needing documentation, and this also tends to help remember. A language with good libraries will therefore make things easier.
Having said that, my personal take on Python is the fact that many people call it "executable pseud-code". It supports a wide variety of things that tend to appear in pseudo-code, and as an extension, are the standard way to think about things.
Also, Python's un-C-like syntax, one of the features that make it so disliked by so many people, also makes Python look more like pseudocode.
Well, that's my take on Python's readability.
To be honest when it comes to what makes a language readable it is really seems to boil down to a combination of simplicity and personal preference. (Of course - it is always possible to write unreadable code in any language if you try hard enough). Since personal preference can't really be controlled, it comes down to ease of expression - the more complicated it is in a language to use simple features, the more difficult that language is likely to be in general from a readability standpoint.
A word required when one character will suffice - a stone in the garden of Pascal and VB.
Compare:
Block ()
Begin
// Content
End
vs.
Block
{
// Content
}
It requires extra brain processing to read a word and mentally associate it with a concept, while a single symbol is immediately recognized by its image.
It is the same thing as the difference with natural languages, usual textual languages vs. symbol languages with hieroglyphs (Asian group). The processing of the first group is slower because basically a text is parsed to a set of concepts while hieroglyphs represent concepts themselves. Compare it with what you already know - will a serialization/deserialization from an XML be faster than a custom search over a binary format?
IMHO, the more a computer language resembles a spoken language, the more readable it is. For extreme examples, take languages like J or Whitespace or Brainfuck... completely unreadable to the untrained eye.
But a language that resembles English can be more easily understood. Not that this makes it the best language, as COBOL can attest.
I think it has more to do with the person writing the code rather than the actual language itself. You can write very readable code in any language, and unreadable code in any language. Even a complex Regular expression can be formatted and commented so as to make it easy to read.
a coworker of mine use to have a saying: "You can write crap code in any language." I liked it and wanted to share today. What makes code readable? Here are my thoughts
The ability to read the syntax of the language.
Well formatted code.
Meaningfully named variables and functions
Comments to explain complex processing. Beware, too much commentes can make the code hard to read
Short functions are easier to read than long ones.
None of these have anything to do with the language, it's all about the coder, and the quality of their work.
I will try saying that a code is readable by its simplicity.
You got to get at first sight what it does and what its purpose is. Why write a thousand lines of code when only a few does what is required?
This is the spirit of a functional language like F#, for instance.
For me its mainly the question of wether the language allows you to develop more readable abstractions which do prevent getting lost in details.
This is where OOP comes in very handy with the hiding of details. If i can hide the details of a task behind an interface that has the behavior of a common concept (i.e. iterators in C++) i usually don't have to read the implementation details.
I think, language design (for normal languages, not brainfuck :) ) not matters. To make code readable you should follow standards, code conventions, and don't forget about refactoring.
It's all about clean code.
Keep it small, simple, well named, and formatted.
class ShoppingCart {
def add(item) {
println "you added some $item"
}
def remove(item) {
println "you just took out the $item"
}
}
def myCart = new ShoppingCart()
myCart.with {
add "juice"
add "milk"
add "cookies"
add "eggs"
remove "cookies"
}
The literacy level of the reader.
Two distinct aspects, really. First is syntax and whitespace. Python enforces a whitespace standard, dropping unnecessary {, } and ; characters. This makes it easy on the eyes. Second, and most importantly, clarity of expression- i.e. how easy is it to map code back to the way you think. There are several features (and non-features) in programming languages that contribute to the latter point:
Disallowing jumps. The goto statement in C is a typical example. Code that doesn't keep running out of structured blocks is easier to read.
Minimizing side-effects. Global variables are evil, remember?
Using more tailored functions. How can your head track a for loop with 5 iteration variables? The Common Lisp loop is much easier to read (although VERY difficult to write, but that's a different story)
Lexical closures. You can figure out a variable's value by just looking at it, as opposed to running the code in your head, and then figuring out which statement is shadowing which.
A couple of examples:
(loop
do for tweet = (master-response-parser (twitter-show-status tweet-id))
for tweet-id = tweet-id then (gethash in-reply-to tweet)
while tweet-id
collecting tweet)
and
listOfFacs = [x | x <- [1 ..], x == sumOfFacDigits x]
where sumOfFacDigits x = sum [factorial (x `div` y) | y <- [1 .. 10]]
Concerning the syntax, I think it it imperative that it be fairly descriptive. For instance, in many languages, you have the foreach statement, and each one handles it a bit differently.
// PHP
foreach ($array as $variable) ...
// Ruby
array.each{ |variable| ... }
// Python
for variable in array ...
// Java
for (String variable : array)
Honestly, I feel that PHP and Python have the clearest means of understanding, but, it all boils down to how smart and clear the programmer wants to be. For instance, a bad programmer could write the following:
// PHP
foreach ($user as $_user) ...
My guess is that you would have almost no idea what the heck the code is doing unless you tracked back and attempted to figure out what $user was and why you were iterating over it. Being clear and concise is all about making small chunks of code make sense without having to trace back through the program to figure out what variables/function names are.
Also, I would have to completely agree with whitespace. Tabs, newlines and spacing in-between operators really make a huge difference!
Edit 1: I might also interject that some languages have the syntax and tools readily available to make things more clear. Take Ruby for example:
if [1,2,3].include? variable then ... end
verses, say Java:
if (variable != 1 || variable != 2 || variable != 3) { ... }
One of these (IMHO) is certainly more clear and readable than the other.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Sometimes its really difficult to decide on when exactly you have written enough comments for someone to understand your intentions.
I think one needs to just focus more on writing readable, easy to understand code than on including a large number of lines of comments explaining every detail of whats happening.
What are your views about this?
Comments aren't there to explain what you're doing. They're there to explain why you're doing it.
The argument is based on a false dilemma: Either your code is a horrible abomination and you write tons of comments to explain every statement and expression, or your code is beautiful poetry that can be understood by your grandmother with no documentation at all.
In reality, you should strive for the latter (well, maybe not your grandmother but other developers), but realize that there are times when a couple of comments will clear up an ambiguity or make the next ten lines of code so much more plain. People who advocate no comments at all are extremists.
Of course, gratuitous comments should be avoided. No amount of comments will help bad code be more understandable. They probably just make it worse. But unless you're only coding trivial systems, there will be times when comments will clarify the design decisions being made.
This can be helpful when catching bugs. Literate code can look perfectly legitimate while being completely wrong. Without the comments, others (or you six months later) have to guess about your intent: Did you mean to do that, or was it an accident? Is this the bug, or is it somewhere else? Maybe I should refer to the design documentation... Comments are inline documentation, visible right where you need it.
Properly deciding when the need for comments actually exists is the key.
Try to make the code self-explaining. One of the most important things is to use meaningful names for classes, functions, variables etc.
Comment the sections that aren't self-explaining. Trivial commenting (e.g. i++; // Add 1 to i) makes the code harder to read.
By the way - the closer to pseudocode you can work, the more self-explaining your code can become. This is a privilege of high-level languages; it's hard to make self-explaining assembly code.
Not all code is self-documenting.
I'm in the process of troubleshooting a performance issue now. The developer thought he discovered the source of the bottleneck; a block of code that was going to sleep for some reason. There were no comments around this code, no context as to why it was there. We removed the block and re-tested. Now, the app is failing under load where it wasn't before.
My guess is someone had previously run into a performance issue and put this code in to mitigate the problem. Whether or not that was the right solution is one thing, but a few comments about why this code is there would now be saving us a world of pain and a whole lot of time...
Why you need comments. The name of the method should be clear enough that you don't need comments.
Ex:
// This method is used to retrieve information about contact
public getContact()
{
}
In this case getContact doesn't need the comments
Aim for code that needs no comments, but don't beat yourself up too much if you miss.
I think commenting enough so that you could understand it if you had to review your code later in life should be sufficient.
I think there would a lot of time wasted if you commented for everyone; and going this route could make your code even harder to understand.
I agree that writing readable code is probably the most important part, but don't leave out comments. Take the extra time.
Readable code should be the number 1 priority. Comments are, as Paul Tomblin already wrote, to focus on the why part.
I try to avoid commenting as much as possible. Code should be self explanatory. Name variables and methods properly. Break large code blocks in methods which have a good name. Write methods that do one thing, the thing you named them for.
If you need to write a comment. Make it short. I often have the feeling that if you need to elaborate long on why this code block does this and that you already have a problem with the design.
Only comment when it adds something.
Something like this is useless and definitely decreases readability:
/// <summary>Handles the "event" event</summary>
/// <param name="sender">Event sender</param>
/// <param name="e">Event arguments</param>
protected void Event_Handler (object sender, EventArgs e)
{
}
Basically, putting aside a good but possibly brief comment at the beginning of a class/method/function declaration, and - if necessary - an introductory comment at the beginning of the file, a comment would be useful when a not-so-common or not-so-clearly-transparent operation is coded.
So, for example, you should avoid commenting what's obvious (i++; on a previous example), but what you know is less obvious and/or more tricky should deserve some clear, unconfusing, brilliant, complete line of comment, which naturally comes along with a Nobel prize for the clearest code in history ;).
And don't underestimate the fact that a comment should be also funny; programmers read much more gladly if you can intellectually tease them.
So, as a general principle tend to not be overwhelming with comments, but when you have to write one, be sure about it to be the clearest comment you could write down.
And personally I'm not a big fan of self-documenting code (a.k.a. code w/o a single damn slashstar): after months you've written it (it's just days for my personal scale) it's very likely you couldn't tell the true reason for choosing such design to represent that piece of your intelligence, so how could others?
Comments are not just that green stuff among code lines; they are the part of code which your brain is better willing to compile. Qualifying as braincode (laughing) I couldn't affirm comments are not part of the program you're writing. They're just the part of it which is not directed to the CPU.
Normally, I'm a fan of documentation comments that clearly spell out the intent of the code you're writing. Spiffy tools like NDoc and Sandcastle provide a nice, consistent way in which to write that documentation.
However, I've noticed a few things over the years.
Most documentation comments don't really tell me anything I can't really glean from the code. That assumes, of course, that I can make heads or tails out of the source code to begin with.
Comments are supposed to be used to document intent, not behavior. Unfortunately, in the vast majority of cases, this isn't how they're used. Tools like NDoc and Sandcastle only propagate the incorrect use of comments by providing a plethora of tags that encourage you to provide comments that tell the reader things that he should be able to discern from the code itself.
Over time, the comments tend to fall out of synch with the code. This tends to be true regardless of whether or not we're using documentation software, which purports to make documentation easier because it puts the documentation closer to the code it describes. Even though the documentation is right there next to the method, property, event, class, or other type, developers still have a hard time remembering to update it if and when the intrinsic behavior changes. Consequently, the documentation loses its value.
It's worth noting that these problems are, by and large, due to the misuse of comments. If comments are used solely as a means of conveying intent, these issues go the way of the dodo, since the intent of any given type or its members is unlikely to change over time. (If it does, a better plan is to write a new member and deprecate the old one with a reference to the new one.)
Comments can have immense value if they are used properly. But that means knowing what they are best used for, and constraining their use to that scope. If you fail to do that, what you end up with is a plethora of comments that are incorrect, misleading, and a source of busywork (at increased cost) since you now have to either remove them or somehow get them corrected.
It's worth it to have a strategy for using comments in a meaningful way that prevents them from becoming a time, energy, and money sink.
Studies have stated that optimal readability happens when you have about 1 line of comments for 10 lines of code. Of course, that's not to say that you need to keep your ration at 1/10 and panic if you go over. But it's a good way to give you an idea of how much you should be commenting.
Also remember that comments are a code smell. That is to say that they may be indicative of bad code but aren't necessarily so. The reason for this is that code that is more difficult to understand is commented more.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
If you think it shouldn't, explain why.
If yes, how deep should the guidelines be in your opinion? For example, indentation of code should be included?
I think a team (rather than a company) need to agree on a set of guidelines for reasonably consistent style. It makes it more straightforward for maintenance.
How deep? As shallow as you can agree on. The shorter and clearer it is the more likely it is that all the team members can agree to it and will abide by it.
You want everybody reading and writing code in a standard way. There are two ways you can achieve this:
Clone a single developer several times and make sure they all go through the same training. Hopefully they should all be able to write the same codebase.
Give your existing developers explicit instruction on what you require. Tabs or spaces for indentation. Where braces sit. How to comment. Version-control commit guidelines.
The more you leave undefined, the higher the probability one of the developers will clash on style.
The company should impose that some style should be followed. What style that is and how deep the guidelines are should be decided collectively by the developer community in the company.
I'd definitely lay down guidelines on braces, indentation, naming etc...
You write code for readability and maintainability. Always assume someone else is going to read your code.
There are tools that will auto magically format your code , and you can mandate that everyone uses the tool.
If you are on .Net look at stylecop, fxcop and Resharper
Do you think a software company should impose developers a coding-style?
Not in a top-down manner. Developers in a software company should agree on a common coding style.
If yes, how deep should the guidelines be in your opinion?
They should only describe the differences from well-known conventions, trying to keep the deviation minimal. This is easy for languages like Python or Java, somewhat blurry for C/C++, and almost impossible for Perl and Ruby.
For example, indentation of code should be included?
Yes, it makes code much more readable. Keep indentation consistent in terms of spaces vs tabs and (if you opt for spaces) number of space characters. Also, agree on a margin (e.g. 76 chars or 120 chars) for long lines.
Yes, but within reason.
All modern IDEs offer one-keystroke code pretty-print, so the "indentation" point is quite irrelevant, in my opinion.
What is more important is to establish best practices: for example, use as little "out" or "ref" parameters as possible... In this example, you have 2 advantages: improves readability and also fixes a lot of mistakes (a lot of out parameters is a code smell and should probably be refactored).
Going beyond that is, in my honest opinion, a bit "anal" and unnecessarily annoying for the devs.
Good point by Hamish Smith:
Style is quite different from best
practices. It's a shame that 'coding
standards' tend to roll the two
together. If people could keep the
style part to a minimum and
concentrate on best practices that
would probably add more value.
I don't believe a dev team should have style guidelines they must follow as a general rule. There are exceptions, for example the use of <> vs. "" in #include statements, but these exceptions should come from necessity.
The most common reason I hear people use to explain why style guidelines are necessary is that code written in a common style is easier to maintain that code written in individual styles. I disagree. A professional programmer isn't going to be bogged down when they see this:
for( int n = 0; n < 42; ++42 ) {
// blah
}
...when they are used to seeing this:
for(int n = 0; n < 42; ++42 )
{
// blah
}
Moreover, I have found it's actually easier to maintain code in some cases if you can identify the programmer who wrote the original code by simply recognizing their style. Go ask them why they implemented the gizmo in such a convoluted way in 10 minutes instead of spending the better part of a day figuring out the very technical reason why they did something unexpected. True, the programmer should have commented the code to explain their reasoning, but in the real world programmers often don't.
Finally, if it takes Joe 10 minutes backspacing & moving his curly braces so that Bill can spend 3 fewer seconds looking at the code, did it really save any time to make Bill do something that doesn't come natural to him?
I believe having a consistent codebase is important. It increases the maintainability of ur code. If everyone expects the same kind of code, they can easily read and understand it.
Besides it is not much of a hassle given today's IDEs and their autoformatting capabilities.
P.S:
I have this annoying habit of putting my braces on the next line :). No one else seems to like it
I think that programmers should be able to adapt to the style of other programmers. If a new programmer is unable to adapt, that usually means that the new programmer is too stubborn to use the style of the company. It would be nice if we could all do our own thing; however, if we all code along some bast guideline, it makes debugging and maintenance easier. This is only true if the standard is well thought out and not too restrictive.
While I don't agree with everything, this book contains an excellent starting point for standards
The best solution would be for IDEs to regard such formatting as meta data. For example, the opening curly brace position (current line or next line), indentation and white space around operators should be configurable without changing the source file.
In my opinion I think it's highly necessary with standards and style guides. Because when your code-base grows you will want to have it consistent.
As a side note, that is why I love Python; because it already imposes quite a lot of rules on how to structure your applications and such. Compare that with Perl, Ruby or whatever where you have an extreme freedom(which isn't that good in this case).
There are plenty of good reasons for the standards to define the way the applications are developed and the way the code should look like. For example when everyone use the same standard an automatic style-checker could be used as a part of the project CI.
Using the same standards improve code readability and helps to reduce the tension between team members about re-factoring the same code in different ways.
Therefore:
All the code developed by the particular team should follow precisely the same standard.
All the code developed for a particular project should follow precisely the same standard.
It is desirable that teams belonging to the same company use the same standard.
In an outsourcing company an exception could be made for a team working for a customer if the customer wants to enforce a standard of their own. In this case the team adopts the customer's standard which could be incompatible with the one used by their company.
Like others have mentioned, I think it needs to be by engineering or by the team--the company (i.e. business units) should not be involved in that sort of decision.
But one other thing I'd add is any rules that are implemented should be enforced by tools and not by people. Worst case scenario, IMO, is some over-zealous grammar snob (yes, we exist; I know because we can smell our own) writes some documentation outlining a set of coding guidelines which absolutely nobody actually reads or follows. They become obsolete over time, and as new people are added to the team and old people leave, they simply become stale.
Then, some conflict arises, and someone is put in the uncomfortable position of having to confront someone else about coding style--this sort of confrontation should be done by tools and not by people. In short, this method of enforcement is the least desirable, in my opinion, because it is far too easy to ignore and simply begs programmers to argue about stupid things.
A better option (again, IMO) is to have warnings thrown at compile time (or something similar), so long as your build environment supports this. It's not hard to configure this in VS.NET, but I'm unaware of other development environments that have similar features.
Style guidelines are extremely important, whether they're for design or development, because they speed the communication and performance of people who work collaboratively (or even alone, sequentially, as when picking up the pieces of an old project). Not having a system of convention within a company is just asking people to be as unproductive as they can. Most projects require collaboration, and even those that don't can be vulnerable to our natural desire to exercise our programming chops and keep current. Our desire to learn gets in the way of our consistency - which is a good thing in and of itself, but can drive a new employee crazy trying to learn the systems they're jumping in on.
Like any other system that's meant for good and not evil, the real power of the guide lies in the hands of its people. The developers themselves will determine what the essential and useful parts are and then, hopefully, use them.
Like the law. Or the English language.
Style guides should be as deep as they want to be - if it comes up in the brainstorm session, it should be included. It's odd how you worded the question because at the end of the day there is no way to "impose" a style guide because it's only a GUIDE.
RTFM, then glean the good stuff and get on with it.
Yes, I think companies should. Developer may need to get used to the coding-style but in my opinion a good programmer should be able to work with any coding style. As Midhat said: It is important to have a consistent codebase.
I think this is also important for opensource projects, there is no supervisor to tell you how to write your code but many languages have specifications on how naming and organisation of your code should be. This helps a lot when integrating opensource components into your project.
Sure, guidelines are good, and unless it's badly-used Hungarian notation (ha!), it'll probably improve consistency and make reading other people's code easier. The guidelines should just be guidelines though, not strict rules enforced on programmers. You could tell me where to put my braces or not to use names like temp, but what you can't do is force me to have spaces around index values in array brackets (they tried once...)
Yes.
Coding standards are a common way of ensuring that code within a certain organization will follow the Principle of Least Surprise: consistency in standards starting from variable naming to indentation to curly brace use.
Coders having their own styles and their own standards will only produce a code-base that is inconsistent, confusing, and frustrating to read, especially on larger projects.
These are the coding standards for a company I used to work for. They're well defined, and, while it took me a while to get used to them, meant that the code was readable by all of us, and uniform all the way through.
I do think coding standards are important within a company, if none are set, there are going to be clashes between developers, and issues with readability.
Having the code uniform all the way through presents a better code to the end user (so it looks as if it's written by one person - which, from an End Users point of view, it should - that person being "the company" and it also helps with readability within the team...
A common coding style promotes consistency and makes it easy for different people to easily understand, maintain and expand the whole code base, not only their own pieces. It also makes it easier for new people to learn the code faster. Thus, any team should have a guidelines on how the code is expected to be written.
Important guidelines include (in no particular order):
whitespace and indentation
standard comments - file, class or method headers
naming convention - classes, interfaces, variables, namespaces, files
code annotations
project organization - folder structures, binaries
standard libraries - what templates, generics, containers and so on to use
error handling - exceptions, hresults, error codes
threading and synchronization
Also, be wary of programmers that can't or won't adapt to the style of the team, no matter how bright they might be. If they don't play by one of the team rules, they probably won't play by other team rules as well.
I would agree that consistency is key. You can't rely on IDE pretty-printing to save the day, because some of your developers may not like using an IDE, and because when you're trawling through a code base of thousands of source files, it's simply not feasible to pretty print all the files when you start working on them, and perform a roll-back afterwards so your VCS doesn't try to commit back all the changes (clogging the repository with needless updates that burden everyone).
I would suggest standardizing at least the following (in decreasing order of importance):
Whitespace (it's easiest if you choose a style that conforms to the automatic pretty-printing of some shared tool)
Naming (files and folders, classes, functions, variables, ...)
Commenting (using a format that allows automatic documentation generation)
My opinion:
Some basic rules are good as it helps everyone to read and maintain the code
Too many rules are bad as it stops developers innovating with clearer ways of laying out code
Individual style can be useful to determine the history of a code file. Diff/blame tools can be used but the hint is still useful
Modern IDEs let you define a formatting template. If there is a corporate standard, then develop a configuration file that defines all the formatting values you care about and make sure everyone runs the formatter before they check in their code. If you want to be even more rigorous about it you could add a commit hook for your version control system to indent the code before it is accepted.
Yes in terms of using a common naming standard as well as a common layout of classes and code behind files. Everything else is open.
Every company should. Consistent coding style ensures higher readibility and maintainability of the codebase across whole your team.
The shop I work at does not have a unified coding standard, and I can say we (as a team) vastly suffer from that. When there is no will from the individuals (like in case of some of my colleagues), the team leader has to bang his fist on the table and impose some form of standardised coding guidelines.
Ever language has general standards that are used by the community. You should follow those as well as possible so that your code can be maintained by other people used to the language, but there's no need to be dictatorial about it.
The creation of an official standard is wrong because a company coding standard is usually too rigid, and unable to flow with the general community using the language.
If you're having a problem with a team member really be out there in coding style, that's an excellent thing for the group to gently suggest is not a good idea at a code review.
Coding standards: YES. For reasons already covered in this thread.
Styling standards: NO. Your readable, is my bewildering junk, and vice versa. Good commenting and code factoring have a far greater benefit. Also gnu indent.
I like Ilya's answer because it incorporates the importance of readability, and the use of continuous integration as the enforcement mechanism. Hibri mentioned FxCop, and I think its use in the build process as one of the criteria for determining whether a build passes or fails would be more flexible and effective than merely documenting a standard.
I entirely agree that coding standards should be applied, and that it should almost always be at the team level. However there are a couple of exceptions.
If the team is writing code that is to be used by other teams (and here I mean that other teams will have to look at the source, not just use it as a library) then there are benefits to making common standards across all the teams using it. Similarly if the policy of the company is to frequently move programmers from one team to another, or is in a position where one team frequently wants to reuse code from another team then it is probably best to impose standards across the company.
There are two types of conventions.
Type A conventions: "please do these, it is better"
and Type B: "please drive on the right hand side of the road", while it is okay to drive on the other side, as long as everyone does the same way.
There's no such thing as a separate team. All code in a good firm is connected somehow, and style should be consistent. It's easier to get yourself used to one new style than to twenty different styles.
Also, a new developer should be able to respect the practices of existing codebase and to follow them.