Using single characters for variable names in loops/exceptions - coding-style

I've had a couple of discussions with a co-worker about the use of single letter variable names in certain circumstances inside our codebase, at which we both disagree.
He favours more verbose naming convention for these, and I do not.
There are three scenarios in my opinion where I use single letter variable names:
Loops - i for(int i = 0; i < 10; i++) { ... }
Lambda expressions in C# - x/y/z: .Where(x => x == 5)
Exceptions - e: try { ... } catch(ExceptionType e) { /* usage of 'e' */ }
These are the only scenarios where I would use it, and I obviously use more verbose naming conventions elsewhere.
My colleague put forward the following arguments for exceptions and loops:
i - it doesn't mean anything.
e - it's the most common letter in the English language. If you wanted to search the solution for exceptions, you'd find lots of undesired instances of e.
I accept these arguments, but have retorts that, if one does not know what i means in a for loop, then they probably shouldn't be a programmer. It's a very common term for loops and exceptions, as is e. I have also mentioned that, if one wanted, they could search for catch in the case of the exception.
I realise that this is subjective, but then, one could argue that coding standards are just that - opinions, albeit opinions by academics.
I would be happy either way, and will forward the results to him, but would rather that we (our company) continue to use a single coding standard, rather than have two developers with different opinions on what to use.
Thanks in advance.

If the lexical scope of a variable is more than 20 or 25 lines, then the variable should probably not have a single letter name. If a large number of variables in your code base have a lexical scope larger than 25 lines (or so), then your code base has a much bigger problem than can be dealt with by using a verbose naming convention.

i doesn't mean anything
Yes it does. It's the index in a for loop or counter.
e is the most common letter in the English language. If you wanted to search the solution for exceptions, you'd find lots of undesired instances of e
This just doesn't even make any sense. Why would you search for e if you wanted to find instances of Exception?
Serioulsy, I'd just laugh at anyone who came out with these arguments. Everyone knows what i and e represent in these scenarios. They are universally accepted conventions. It sounds to me like your colleague is just trying to be a smart-ass.
Edit - This question reminded me of this wtf.

Another exception to the rule that I apply is naming of exception variables that need to be thrown. For instance, the code should read:
Exception yourToys = new Exception(...);
throw yourToys;
or
Exception up_in_a_bucket = new Exception(...);
throw up_in_a_bucket;

I recently had a conversation with somebody about this.
I'm come to the opinion that, for operations that are a functional abstraction, using a "meaningful" name can be overstated.
For instance, in JavaScript:
myArrayOfNames.forEach ( function ( name ) { } );
myArrayOfNames.map ( function ( name ) { } );
myArrayOfNames.filter ( function ( name ) { } );
I generally use "each", "obj" or just "d" for these sorts of things, because I see these as course-grained abstractions. "name" really tells me nothing other than it's a name from an array of names.
Who cares? Because I've seen developers iterate reviews arguing about what is "meaningful". More than once.
So over the years, I gravitated towards settling it by saying, the operation is a functional abstraction (iteration) applied to a specific list of some kind. Reflect that language, and usage, in your code:
myUsefullyNamedArray.filter ( function ( d ) {
return ( 'someval' in d );
} );

Related

Anonymous Methods / Lambda's (Coding Standards)

In Jeffrey Richter's "CLR via C#" (the .net 2.0 edtion page, 353) he says that as a self-discipline, he never makes anonymous functions longer than 3 lines of code in length. He cites mostly readability / understandability as his reasons. This suites me fine, because I already had a self-discipline of using no more than 5 lines for an anonymous method.
But how does that "coding standard" advice stack against lambda's? At face value, I'd treat them the same - keeping a lambda equally as short. But how do others feel about this? In particular, when lambda's are being used where (arguably) they shine brightest - when used in LINQ statements - is there genuine cause to abandon that self-discipline / coding standard?
Bear in mind that things have changed a lot since 2.0. For example, consider .NET 4's Parallel Extensions, which use delegates heavily. You might have:
Parallel.For(0, 100, i =>
{
// Potentially significant amounts of code
});
To me it doesn't matter whether this is a lambda expression or an anonymous method - it's not really being used in the same way that delegates typically were in .NET 2.0.
Within normal LINQ, I don't typically find myself using large lambda expressions - certainly not in terms of the number of statements. Sometimes a particular single expression will be quite long in terms of lines because it's projecting a number of properties; the alternative is having huge lines!
In fact, LINQ tends to favour single-expression lambda expressions (which don't even have braces). I'd be fairly surprised to see a good use of LINQ which had a lambda expression with 5 statements in.
I don't know if having a guideline for short lambda's and delegates is really useful. However, have a guideline for having short functions. The methods I write are on average 6 or 7 lines long. Functions should hardly ever be 20 lines long. You should create the most readable code and if you follow Robert Martin's or Steve McConnell's advice, they tell you to keep functions short and also keep the inner part of loops as short of possible, favorably just a single method call.
So you shouldn't write a for loop as follows:
for (int i = 0; i < 100; i++)
{
// Potentially significant amounts of code
}
but simply with a single method call inside the loop:
for (int i = 0; i < 100; i++)
{
WellDescribedOperationOnElementI(i);
}
With this in mind, while I in general agree with Jon Skeet’s answer, I don't see any reason why you shouldn't want his example to be written as:
Parallel.For(0, 100, i =>
{
WellDescribedPartOfHeavyCalculation(i);
});
or
Parallel.For(0, 100, i => WellDescribedPartOfHeavyCalculation(i));
or even:
Parallel.For(0, 100, WellDescribedPartOfHeavyCalculation);
Always go for the most readable code, and many times this means: short anonymous methods, and short lambda's, but most of all short -but well described- methods.

Using function arguments as local variables

Something like this (yes, this doesn't deal with some edge cases - that's not the point):
int CountDigits(int num) {
int count = 1;
while (num >= 10) {
count++;
num /= 10;
}
return count;
}
What's your opinion about this? That is, using function arguments as local variables.
Both are placed on the stack, and pretty much identical performance wise, I'm wondering about the best-practices aspects of this.
I feel like an idiot when I add an additional and quite redundant line to that function consisting of int numCopy = num, however it does bug me.
What do you think? Should this be avoided?
As a general rule, I wouldn't use a function parameter as a local processing variable, i.e. I treat function parameters as read-only.
In my mind, intuitively understandabie code is paramount for maintainability, and modifying a function parameter to use as a local processing variable tends to run counter to that goal. I have come to expect that a parameter will have the same value in the middle and bottom of a method as it does at the top. Plus, an aptly-named local processing variable may improve understandability.
Still, as #Stewart says, this rule is more or less important depending on the length and complexity of the function. For short simple functions like the one you show, simply using the parameter itself may be easier to understand than introducing a new local variable (very subjective).
Nevertheless, if I were to write something as simple as countDigits(), I'd tend to use a remainingBalance local processing variable in lieu of modifying the num parameter as part of local processing - just seems clearer to me.
Sometimes, I will modify a local parameter at the beginning of a method to normalize the parameter:
void saveName(String name) {
name = (name != null ? name.trim() : "");
...
}
I rationalize that this is okay because:
a. it is easy to see at the top of the method,
b. the parameter maintains its the original conceptual intent, and
c. the parameter is stable for the rest of the method
Then again, half the time, I'm just as apt to use a local variable anyway, just to get a couple of extra finals in there (okay, that's a bad reason, but I like final):
void saveName(final String name) {
final String normalizedName = (name != null ? name.trim() : "");
...
}
If, 99% of the time, the code leaves function parameters unmodified (i.e. mutating parameters are unintuitive or unexpected for this code base) , then, during that other 1% of the time, dropping a quick comment about a mutating parameter at the top of a long/complex function could be a big boon to understandability:
int CountDigits(int num) {
// num is consumed
int count = 1;
while (num >= 10) {
count++;
num /= 10;
}
return count;
}
P.S. :-)
parameters vs arguments
http://en.wikipedia.org/wiki/Parameter_(computer_science)#Parameters_and_arguments
These two terms are sometimes loosely used interchangeably; in particular, "argument" is sometimes used in place of "parameter". Nevertheless, there is a difference. Properly, parameters appear in procedure definitions; arguments appear in procedure calls.
So,
int foo(int bar)
bar is a parameter.
int x = 5
int y = foo(x)
The value of x is the argument for the bar parameter.
It always feels a little funny to me when I do this, but that's not really a good reason to avoid it.
One reason you might potentially want to avoid it is for debugging purposes. Being able to tell the difference between "scratchpad" variables and the input to the function can be very useful when you're halfway through debugging.
I can't say it's something that comes up very often in my experience - and often you can find that it's worth introducing another variable just for the sake of having a different name, but if the code which is otherwise cleanest ends up changing the value of the variable, then so be it.
One situation where this can come up and be entirely reasonable is where you've got some value meaning "use the default" (typically a null reference in a language like Java or C#). In that case I think it's entirely reasonable to modify the value of the parameter to the "real" default value. This is particularly useful in C# 4 where you can have optional parameters, but the default value has to be a constant:
For example:
public static void WriteText(string file, string text, Encoding encoding = null)
{
// Null means "use the default" which we would document to be UTF-8
encoding = encoding ?? Encoding.UTF8;
// Rest of code here
}
About C and C++:
My opinion is that using the parameter as a local variable of the function is fine because it is a local variable already. Why then not use it as such?
I feel silly too when copying the parameter into a new local variable just to have a modifiable variable to work with.
But I think this is pretty much a personal opinion. Do it as you like. If you feel sill copying the parameter just because of this, it indicates your personality doesn't like it and then you shouldn't do it.
If I don't need a copy of the original value, I don't declare a new variable.
IMO I don't think mutating the parameter values is a bad practice in general,
it depends on how you're going to use it in your code.
My team coding standard recommends against this because it can get out of hand. To my mind for a function like the one you show, it doesn't hurt because everyone can see what is going on. The problem is that with time functions get longer, and they get bug fixes in them. As soon as a function is more than one screen full of code, this starts to get confusing which is why our coding standard bans it.
The compiler ought to be able to get rid of the redundant variable quite easily, so it has no efficiency impact. It is probably just between you and your code reviewer whether this is OK or not.
I would generally not change the parameter value within the function. If at some point later in the function you need to refer to the original value, you still have it. in your simple case, there is no problem, but if you add more code later, you may refer to 'num' without realizing it has been changed.
The code needs to be as self sufficient as possible. What I mean by that is you now have a dependency on what is being passed in as part of your algorithm. If another member of your team decides to change this to a pass by reference then you might have big problems.
The best practice is definitely to copy the inbound parameters if you expect them to be immutable.
I typically don't modify function parameters, unless they're pointers, in which case I might alter the value that's pointed to.
I think the best-practices of this varies by language. For example, in Perl you can localize any variable or even part of a variable to a local scope, so that changing it in that scope will not have any affect outside of it:
sub my_function
{
my ($arg1, $arg2) = #_; # get the local variables off the stack
local $arg1; # changing $arg1 here will not be visible outside this scope
$arg1++;
local $arg2->{key1}; # only the key1 portion of the hashref referenced by $arg2 is localized
$arg2->{key1}->{key2} = 'foo'; # this change is not visible outside the function
}
Occasionally I have been bitten by forgetting to localize a data structure that was passed by reference to a function, that I changed inside the function. Conversely, I have also returned a data structure as a function result that was shared among multiple systems and the caller then proceeded to change the data by mistake, affecting these other systems in a difficult-to-trace problem usually called action at a distance. The best thing to do here would be to make a clone of the data before returning it*, or make it read-only**.
* In Perl, see the function dclone() in the built-in Storable module.
** In Perl, see lock_hash() or lock_hash_ref() in the built-in Hash::Util module).

Help me refactor this loop

I am working on the redesign of an existing class. In this class about a 400-line while loop that does most of the work. The body of the loop is a minefield of if statements, variable assignments and there is a "continue" in the middle somewhere. The purpose of the loop is hard to understand.
In pseudocode, here's where I'm at the redesign:
/* Some code here to create the objects based on config parameters */
/* Rather than having if statements scattered through the loop I */
/* create instances of the appropriate classes. The constructors */
/* take a database connection. */
FOR EACH row IN mySourceOfData
int p = batcher.FindOrCreateBatch( row );
int s = supplierBatchEntryCreator.CreateOrUpdate( row, p );
int b = buyerBatchEntryCreator.CreateOrUpdate( row, p );
mySouceOfData.UpdateAsIncludedInBatch( p, s, b);
NEXT
/* Allow things to complete their last item */
mySupplierBatchEntry.finish();
myBuyerBatchEntry.finish();
myBatcher.finish();
/* Some code here to dispose of things */
RETURN myBatch.listOfBatches();
Inside FindOrCreateBatch() it figures out using some rules if a new batch needs to be created or if an existing one can be used. The different implementations of this interface will have different rules for how it finds them, etc. The return value is the surrogate key (id) from the database of the payment batch that it found or created. This id is required by following processes that take p as a parameter.
This is an improvement over where I started, but I have an uneasy feeling about the class containing this loop.
It doesn't seem to a be a domain object, it's more of a "Manager" or "Controller" type class.
It seems to be getting inbetween batcher and supplierBatchEntryCreator (and the other classes). At the moment only an int is passed, but if that changes all three classes need to change. This seems like a Law of Dementer violation.
Any suggestions, or is this ok? The actual language is java.
I have a couple of questions to ask you:
Does it work?
Is it fast enough?
Is it readable/maintainable?
If the answer to all three is yes then, beyond that, further changes are really just wasted effort in my opinion. Don't refactor just for the sake of refactoring.
Far too often people change things in anticipation of what might be (your "changing int" for example). I prefer to subscribe to the YAGNI school of thought. The right time to worry about that is when you do it.
And the Law of Demeter is a design guideline, not a rule. In the real world, pragmatism usually beats dogmatism :-)
What is the relationship between each XXXEntryCreator and XXXEntry? I feel like I am missing something, since the "Creators" only return integers.
Beyond that, you took 400 lines of crud down to something that fits on a screen, and has a reasonably visible data flow between steps. Kudos. (I have experienced strong resistance in the past for trying to make such changes -- why do people write N-100/1000 line run-on else-if drivel?)
FindOrCreate and CreateOrUpdate suggest to me that maybe multiple passes would be simpler (and not knowing the rest of the code, I can't know if it would degrade performance, which is a common concern raised when multiple passes are suggested).
If you had one loop to create any missing batches, suppliers, and buyers (or three loops), then this loop could be reduced to
FOR EACH row IN mySourceOfData
int p = batcher.FindBatch( row );
int s = supplierBatchEntryCreator.Update( row, p );
int b = buyerBatchEntryCreator.Update( row, p );
mySouceOfData.UpdateAsIncludedInBatch( p, s, b);
NEXT
Now I see that the Creator's are updating - is that right? Does splitting the creation and update responsibility into two classes make sense, perhaps?
It's starting to look a little simpler to me. Does it help?

Is there a case where parameter validation may be considered redundant?

The first thing I do in a public method is to validate every single parameter before they get any chance to get used, passed around or referenced, and then throw an exception if any of them violate the contract. I've found this to be a very good practice as it lets you catch the offender the moment the infraction is committed but then, quite often I write a very simple getter/indexer such as this:
private List<Item> m_items = ...;
public Item GetItemByIdx( int idx )
{
if( (idx < 0) || (idx >= m_items.Count) )
{
throw new ArgumentOutOfRangeException( "idx", "Invalid index" );
}
return m_items[ idx ];
}
In this case the index parameter directly relates to the indexes in the list, and I know for a fact (e.g. documentation) that the list itself will do exactly the same and will throw the same exception. Should I remove this verification or I better leave it alone?
I wanted to know what you guys think, as I'm now in the middle of refactoring a big project and I've found many cases like the above.
Thanks in advance.
It's not just a matter of taste, consider
if (!File.Exists(fileName)) throw new ArgumentException("...");
var s = File.OpenText(fileName);
This looks similar to your example but there are several reasons (concurrency, access rights) why the OpenText() method could still fail, even with a FileNotFound error. So the Exists-check is just giving a false feeling of security and control.
It is a mind-set thing, when you are writing the GetItemByIdx method it probably looks quite sensible. But if you look around in a random piece of code there are usually lots of assumptions you could check before proceeding. It's just not practical to check them all, over and over. We have to be selective.
So in a simple pass-along method like GetItemByIdx I would argue against redundant checks. But as soon as the function adds more functionality or if there is a very explicit specification that says something about idx that argument turns around.
As a rule of thumb an exception should be thrown when a well defined condition is broken and that condition is relevant at the current level. If the condition belongs to a lower level, then let that level handle it.
I would only do parameter verification where it would lead to some improvement in code behavior. Since you know, in this case, that the check will be performed by the List itself, then your own check is redundant and provides no extra value, so I wouldn't bother.
It's true that possibly you duplicated work that's already been done in the API, but it's there now. If your error handling framework works and is solid, and isn't causing performance issues (profiling IYF) then I reckon leave it, and gradually phase it out if you have time. It doesn't sound like a top priority!

What are the pros and cons of putting as much logic as possible in a minimum(one-liners) piece of code?

Is it cool?
IMO one-liners reduces the readability and makes debugging/understanding more difficult.
Maximize understandability of the code.
Sometimes that means putting (simple, easily understood) expressions on one line in order to get more code in a given amount of screen real-estate (i.e. the source code editor).
Other times that means taking small steps to make it obvious what the code means.
One-liners should be a side-effect, not a goal (nor something to be avoided).
If there is a simple way of expressing something in a single line of code, that's great. If it's just a case of stuffing in lots of expressions into a single line, that's not so good.
To explain what I mean - LINQ allows you to express quite complicated transformations in relative simplicity. That's great - but I wouldn't try to fit a huge LINQ expression onto a single line. For instance:
var query = from person in employees
where person.Salary > 10000m
orderby person.Name
select new { person.Name, person.Deparment };
is more readable than:
var query = from person in employees where person.Salary > 10000m orderby person.Name select new { person.Name, person.Deparment };
It's also more readabe than doing all the filtering, ordering and projection manually. It's a nice sweet-spot.
Trying to be "clever" is rarely a good idea - but if you can express something simply and concisely, that's good.
One-liners, when used properly, transmit your intent clearly and make the structure of your code easier to grasp.
A python example is list comprehensions:
new_lst = [i for i in lst if some_condition]
instead of:
new_lst = []
for i in lst:
if some_condition:
new_lst.append(i)
This is a commonly used idiom that makes your code much more readable and compact. So, the best of both worlds can be achieved in certain cases.
This is by definition subjective, and due to the vagueness of the question, you'll likely get answers all over the map. Are you referring to a single physical line or logical line? EG, are you talking about:
int x = BigHonkinClassName.GetInstance().MyObjectProperty.PropertyX.IntValue.This.That.TheOther;
or
int x = BigHonkinClassName.GetInstance().
MyObjectProperty.PropertyX.IntValue.
This.That.TheOther;
One-liners, to me, are a matter of "what feels right." In the case above, I'd probably break that into both physical and logic lines, getting the instance of BigHonkinClassName, then pulling the full path to .TheOther. But that's just me. Other people will disagree. (And there's room for that. Like I said, subjective.)
Regarding readability, bear in mind that, for many languages, even "one-liners" can be broken out into multiple lines. If you have a long set of conditions for the conditional ternary operator (? :), for example, it might behoove you to break it into multiple physical lines for readability:
int x = (/* some long condition */) ?
/* some long method/property name returning an int */ :
/* some long method/property name returning an int */ ;
At the end of the day, the answer is always: "It depends." Some frameworks (such as many DAL generators, EG SubSonic) almost require obscenely long one-liners to get any real work done. Othertimes, breaking that into multiple lines is quite preferable.
Given concrete examples, the community can provide better, more practical advice.
In general, I definitely don't think you should ever "squeeze" a bunch of code onto a single physical line. That doesn't just hurt legibility, it smacks of someone who has outright disdain for the maintenance programmer. As I used to teach my students: always code for the maintenance programmer, because it will often be you.
:)
Oneliners can be useful in some situations
int value = bool ? 1 : 0;
But for the most part they make the code harder to follow. I think you only should put things on one line when it is easy to follow, the intent is clear, and it won't affect debugging.
One-liners should be treated on a case-by-case basis. Sometimes it can really hurt readability and a more verbose (read: easy-to-follow) version should be used.
There are times, however when a one-liner seems more natural. Take the following:
int Total = (Something ? 1 : 2)
+ (SomethingElse ? (AnotherThing ? x : y) : z);
Or the equivalent (slightly less readable?):
int Total = Something ? 1 : 2;
Total += SomethingElse ? (AnotherThing ? x : y) : z;
IMHO, I would prefer either of the above to the following:
int Total;
if (Something)
Total = 1;
else
Total = 2;
if (SomethingElse)
if (AnotherThing)
Total += x;
else
Total += y;
else
Total += z
With the nested if-statements, I have a harder time figuring out the final result without tracing through it. The one-liner feels more like the math formula it was intended to be, and consequently easier to follow.
As far as the cool factor, there is a certain feeling of accomplishment / show-off factor in "Look Ma, I wrote a whole program in one line!". But I wouldn't use it in any context other than playing around; I certainly wouldn't want to have to go back and debug it!
Ultimately, with real (production) projects, whatever makes it easiest to understand is best. Because there will come a time that you or someone else will be looking at the code again. What they say is true: time is precious.
That's true in most cases, but in some cases where one-liners are common idioms, then it's acceptable. ? : might be an example. Closure might be another one.
No, it is annoying.
One liners can be more readable and they can be less readable. You'll have to judge from case to case.
And, of course, on the prompt one-liners rule.
VASTLY more important is developing and sticking to a consistent style.
You'll find bugs MUCH faster, be better able to share code with others, and even code faster if you merely develop and stick to a pattern.
One aspect of this is to make a decision on one-liners. Here's one example from my shop (I run a small coding department) - how we handle IFs:
Ifs shall never be all on one line if they overflow the visible line length, including any indentation.
Thou shalt never have else clauses on the same line as the if even if it comports with the line-length rule.
Develop your own style and STICK WITH IT (or, refactor all code in the same project if you change style).
.
The main drawback of "one liners" in my opinion is that it makes it hard to break on the code and debug. For example, pretend you have the following code:
a().b().c(d() + e())
If this isn't working, its hard to inspect the intermediate values. However, it's trivial to break with gdb (or whatever other tool you may be using) in the following, and check each individual variable and see precisely what is failing:
A = a();
B = A.b();
D = d();
E = e(); // here i can query A B D and E
B.C(d + e);
One rule of thumb is if you can express the concept of the one line in plain language in a very short sentence. "If it's true, set it to this, otherwise set it to that"
For a code construct where the ultimate objective of the entire structure is to decide what value to set a single variable, With appropriate formatting, it is almost always clearer to put multiple conditonals into a single statement. With multiple nested if end if elses, the overall objective, to set the variable...
" variableName = "
must be repeated in every nested clause, and the eye must read all of them to see this.. with a singlr statement, it is much clearer, and with the appropriate formatting, the complexity is more easily managed as well...
decimal cost =
usePriority? PriorityRate * weight:
useAirFreight? AirRate * weight:
crossMultRegions? MultRegionRate:
SingleRegionRate;
The prose is an easily understood one liner that works.
The cons is the concatenation of obfuscated gibberish on one line.
Generally, I'd call it a bad idea (although I do it myself on occasion) -- it strikes me as something that's done more to impress on how clever someone is than it is to make good code. "Clever tricks" of that sort are generally very bad.
That said, I personally aim to have one "idea" per line of code; if this burst of logic is easily encapsulated in a single thought, then go ahead. If you have to stop and puzzle it out a bit, best to break it up.

Resources