In Jeffrey Richter's "CLR via C#" (the .net 2.0 edtion page, 353) he says that as a self-discipline, he never makes anonymous functions longer than 3 lines of code in length. He cites mostly readability / understandability as his reasons. This suites me fine, because I already had a self-discipline of using no more than 5 lines for an anonymous method.
But how does that "coding standard" advice stack against lambda's? At face value, I'd treat them the same - keeping a lambda equally as short. But how do others feel about this? In particular, when lambda's are being used where (arguably) they shine brightest - when used in LINQ statements - is there genuine cause to abandon that self-discipline / coding standard?
Bear in mind that things have changed a lot since 2.0. For example, consider .NET 4's Parallel Extensions, which use delegates heavily. You might have:
Parallel.For(0, 100, i =>
{
// Potentially significant amounts of code
});
To me it doesn't matter whether this is a lambda expression or an anonymous method - it's not really being used in the same way that delegates typically were in .NET 2.0.
Within normal LINQ, I don't typically find myself using large lambda expressions - certainly not in terms of the number of statements. Sometimes a particular single expression will be quite long in terms of lines because it's projecting a number of properties; the alternative is having huge lines!
In fact, LINQ tends to favour single-expression lambda expressions (which don't even have braces). I'd be fairly surprised to see a good use of LINQ which had a lambda expression with 5 statements in.
I don't know if having a guideline for short lambda's and delegates is really useful. However, have a guideline for having short functions. The methods I write are on average 6 or 7 lines long. Functions should hardly ever be 20 lines long. You should create the most readable code and if you follow Robert Martin's or Steve McConnell's advice, they tell you to keep functions short and also keep the inner part of loops as short of possible, favorably just a single method call.
So you shouldn't write a for loop as follows:
for (int i = 0; i < 100; i++)
{
// Potentially significant amounts of code
}
but simply with a single method call inside the loop:
for (int i = 0; i < 100; i++)
{
WellDescribedOperationOnElementI(i);
}
With this in mind, while I in general agree with Jon Skeet’s answer, I don't see any reason why you shouldn't want his example to be written as:
Parallel.For(0, 100, i =>
{
WellDescribedPartOfHeavyCalculation(i);
});
or
Parallel.For(0, 100, i => WellDescribedPartOfHeavyCalculation(i));
or even:
Parallel.For(0, 100, WellDescribedPartOfHeavyCalculation);
Always go for the most readable code, and many times this means: short anonymous methods, and short lambda's, but most of all short -but well described- methods.
Related
I have been playing with Julia because it seems syntactically similar to python (which I like) but claims to be faster. However, I tried making a similar script to something I have in python for tesing where numerical values are within a text file which uses this function:
function isFloat(s)
try:
float64(s)
return true
catch:
return false
end
end
For some reason, this takes a great deal of time for a text file with a reasonable amount of rows of text (~500000).
Why would this be? Is there a better way to do this? What general feature of the language can I understand from this to apply to other languages?
Here are the two exact scripts i ran with the times for reference:
python: ~0.5 seconds
def is_number(s):
try:
np.float64(s)
return True
except ValueError:
return False
start = time.time()
file_data = open('SMW100.asc').readlines()
file_data = map(lambda line: line.rstrip('\n').replace(',',' ').split(), file_data)
bools = [(all(map(is_number, x)), x) for x in file_data]
print time.time() - start
julia: ~73.5 seconds
start = time()
function isFloat(s)
try:
float64(s)
return true
catch:
return false
end
end
x = map(x-> split(replace(x, ",", " ")), open(readlines, "SMW100.asc"))
u = [(all(map(isFloat, i)), i) for i in x]
print(start - time())
Note also that you can use the float64_isvalid function in the standard library to (a) check whether a string is a valid floating-point value and (b) return the value.
Note also that the colons (:) after try and catch in your isFloat code are wrong in Julia (this is a Pythonism).
A much faster version of your code should be:
const isFloat2_out = [1.0]
isFloat2(s::String) = float64_isvalid(s, isFloat2_out)
function foo(L)
x = split(L, ",")
(all(isFloat2, x), x)
end
u = map(foo, open(readlines, "SMW100.asc"))
On my machine, for a sample file with 100,000 rows and 10 columns of data, 50% of which are valid numbers, your Python code takes 4.21 seconds and my Julia code takes 2.45 seconds.
This is an interesting performance problem that might be worth submitting to julia-users to get more focused feedback than SO will probably provide. At a first glance, I think you're hitting problems because (1) try/catch is just slightly slow to begin with and then (2) you're using try/catch in a context where there's a very considerable amount of type uncertainty because of lots of function calls that don't return stable types. As a result, the Julia interpreter spend its time trying to figure out the types of objects rather than doing your computation. It's a bit hard to tell exactly where the big bottlenecks are because you're doing a lot of things that are not very idiomatic in Julia. Also you seem to be doing your computations in the global scope, where Julia's compiler can't perform many meaningful optimizations due to additional type uncertainty.
Python is oddly ambiguous on the subject of whether using exceptions for control flow is good or bad. See Python using exceptions for control flow considered bad?. But even in Python, the consensus is that user code shouldn't use exceptions for control flow (although for some reason generators are allowed to do this). So basically, the simple answer is that you should not be doing that – exceptions are for exceptional situations, not for control flow. That is why almost zero effort has been put into making Julia's try/catch construct faster – you shouldn't be using it like that in the first place. Of course, we will probably get around to making it faster at some point.
That said, the onus is on us as the designers of Julia's standard library to make sure that we provide APIs that never force you to use exceptions for control flow. In this case, you need a function that allows you to try to parse something as a floating-point value and indicate whether that was possible or not – not by throwing an exception, but rather by returning normal values. We don't provide such an API, so this ultimately a shortcoming of Julia's standard library – as it exists right now. I've opened an issue to discuss this API design question: https://github.com/JuliaLang/julia/issues/5704. We'll see how it pans out.
I'm running into serious performance issues with anonymous functions in matlab 2011a, where the overhead introduced by an anonymous container function is far greater than the time taken by the enclosed function itself.
I've read a couple of related questions in which users have helpfully explained that this is a problem that others experience, showing that I could increase performance dramatically by doing away with the anonymous containers. Unfortunately, my code is structured in such a way that I'm not sure how to do that without breaking a lot of things.
So, are there workarounds to improve performance of anonymous functions without doing away with them entirely, or design patterns that would allow me to do away with them without bloating my code and spending a lot of time refactoring?
Some details that might help:
Below is the collection of anonymous functions, which are stored as a class property. Using an int array which is in turn used by a switch statement could replace the array in principle, but the content of GPs is subject to change -- there are other functions with the same argument structure as traingps that could be used there -- and GPs' contents may in some cases be determined at runtime.
m3.GPs = {#(X,ytrain,xStar,noisevar,params)traingp(X,ytrain,xStar,noisevar,1,params,[1 0]');
#(X,ytrain,xStar,noisevar,params)traingp(X,ytrain,xStar,noisevar,1,params,[-1 1]');
#(X,ytrain,xStar,noisevar,params)traingp(X,ytrain,xStar,noisevar,2,params,0);
#(X,ytrain,xStar,noisevar,params)traingp(X,ytrain,xStar,noisevar,3,params,0);
#(X,ytrain,xStar,noisevar,params)traingp(X,ytrain,xStar,noisevar,4,params,[0 0 0]')};
Later, elements of GPs are called by a member function of the class, like so:
GPt = GPs{t(j)}(xj,yj,gridX(xi),thetaT(1),thetaT(2:end));
According to the profiler, the self-time for the anonymous wrapper takes 95% of the total time (1.7 seconds for 44 calls!), versus 5% for the contained function. I'm using a similar approach elsewhere, where the anonymous wrapper's cost is even greater, proportionally speaking.
Does anyone have any thoughts on how to reduce the overhead of the anonymous calls, or, absent that, how to replace the anonymous function while retaining the flexibility they provide (and not introducing a bunch of additional bookkeeping and argument passing)?
Thanks!
It all comes down to how much pain are you willing to endure to improve performance. Here's one trick that avoids anonymous functions. I don't know how it will profile for you. You can put these "tiny" functions at end of class files I believe (I know you can put them at the end of regular function files.)
function [output] = GP1(x,ytrain,xstar,noisevar,params)
output = traingp(X,ytrain,xStar,noisevar,1,params,[1 0]);
end
...
m3.GPS = {#GP1, #GP2, ...};
Perhaps a function "factory" would help:
>> factory = #(a,b,c) #(x,y,z) a*x+b*y+c*z;
>> f1 = factory(1,2,3);
>> f2 = factory(0,1,2);
>> f1(1,2,3)
ans =
14
>> f1(4,5,6)
ans =
32
>> f2(1,2,3)
ans =
8
>> f2(4,5,6)
ans =
17
Here, factory is a function that return a new function with different arguments. Another example could be:
factory = #(a,b,c) #(x,y,z) some_function(x,y,z,a,b,c)
which returns a function of x,y,z with a,b,c specified.
I have had some ideas for a new programming language floating around in my head, so I thought I'd take a shot at implementing it. A friend suggested I try using Treetop (the Ruby gem) to create a parser. Treetop's documentation is sparse, and I've never done this sort of thing before.
My parser is acting like it has an infinite loop in it, but with no stack traces; it is proving difficult to track down. Can somebody point me in the direction of an entry-level parsing/AST guide? I really need something that list rules, common usage etc for using tools like Treetop. My parser grammer is on GitHub, in case someone wishes to help me improve it.
class {
initialize = lambda (name) {
receiver.name = name
}
greet = lambda {
IO.puts("Hello, #{receiver.name}!")
}
}.new(:World).greet()
I asked treetop to compile your language into an .rb file. That gave me something to dig into:
$ tt -o /tmp/rip.rb /tmp/rip.treetop
Then I used this little stub to recreate the loop:
require 'treetop'
load '/tmp/rip.rb'
RipParser.new.parse('')
This hangs. Now, isn't that interesting! An empty string reproduces the behavior just as well as the dozen-or-so-line example in your question.
To find out where it's hanging, I used an Emacs keyboard macro to edit rip.rb, adding a debug statement to the entry of each method. For example:
def _nt_root
p [__LINE__, '_nt_root'] #DEBUG
start_index = index
Now we can see the scope of the loop:
[16, "root"]
[21, "_nt_root"]
[57, "_nt_statement"]
...
[3293, "_nt_eol"]
[3335, "_nt_semicolon"]
[3204, "_nt_comment"]
[57, "_nt_statement"]
[57, "_nt_statement"]
[57, "_nt_statement"]
...
Further debugging from there reveals that an integer is allowed to be an empty string:
rule integer
digit*
end
This indirectly allows a statement to be an empty string, and the top-level rule statement* to forever consume empty statements. Changing * to + fixes the loop, but reveals another problem:
/tmp/rip.rb:777:in `_nt_object': stack level too deep (SystemStackError)
from /tmp/rip.rb:757:in `_nt_compound_object'
from /tmp/rip.rb:1726:in `_nt_range'
from /tmp/rip.rb:1671:in `_nt_special_literals'
from /tmp/rip.rb:825:in `_nt_literal_object'
from /tmp/rip.rb:787:in `_nt_object'
from /tmp/rip.rb:757:in `_nt_compound_object'
from /tmp/rip.rb:1726:in `_nt_range'
from /tmp/rip.rb:1671:in `_nt_special_literals'
... 3283 levels...
Range is left-recursing, indirectly, via special_literals, literal_object, object, and compound_object. Treetop, when faced with left recursion, eats stack until it pukes. I don't have a quick fix for that problem, but at least you've got a stack trace to go from now.
Also, this is not your immediate problem, but the definition of digit is odd: It can either one digit, or multiple. This causes digit* or digit+ to allow the (presumably) illegal integer 1________2.
I really enjoyed Language Implementation Patterns by Parr; since Parr created the ANTLR parser generator, it's the tool he uses throughout the book, but it should be simple enough to learn from it all the same.
What I really liked about it was the way each example grew upon the previous one; he doesn't start out with a gigantic AST-capable parser, instead he slowly introduces problems that need more and more 'backend smarts' to do the job, so the book scales well along with the language that needs parsing.
What I wish it covered in a little more depth is the types of languages that one can write and give advice on Do's and Do Not Do's when designing languages. I've seen some languages that are a huge pain to parse and I'd have liked to know more about the design decisions that could have been made differently.
I've had a couple of discussions with a co-worker about the use of single letter variable names in certain circumstances inside our codebase, at which we both disagree.
He favours more verbose naming convention for these, and I do not.
There are three scenarios in my opinion where I use single letter variable names:
Loops - i for(int i = 0; i < 10; i++) { ... }
Lambda expressions in C# - x/y/z: .Where(x => x == 5)
Exceptions - e: try { ... } catch(ExceptionType e) { /* usage of 'e' */ }
These are the only scenarios where I would use it, and I obviously use more verbose naming conventions elsewhere.
My colleague put forward the following arguments for exceptions and loops:
i - it doesn't mean anything.
e - it's the most common letter in the English language. If you wanted to search the solution for exceptions, you'd find lots of undesired instances of e.
I accept these arguments, but have retorts that, if one does not know what i means in a for loop, then they probably shouldn't be a programmer. It's a very common term for loops and exceptions, as is e. I have also mentioned that, if one wanted, they could search for catch in the case of the exception.
I realise that this is subjective, but then, one could argue that coding standards are just that - opinions, albeit opinions by academics.
I would be happy either way, and will forward the results to him, but would rather that we (our company) continue to use a single coding standard, rather than have two developers with different opinions on what to use.
Thanks in advance.
If the lexical scope of a variable is more than 20 or 25 lines, then the variable should probably not have a single letter name. If a large number of variables in your code base have a lexical scope larger than 25 lines (or so), then your code base has a much bigger problem than can be dealt with by using a verbose naming convention.
i doesn't mean anything
Yes it does. It's the index in a for loop or counter.
e is the most common letter in the English language. If you wanted to search the solution for exceptions, you'd find lots of undesired instances of e
This just doesn't even make any sense. Why would you search for e if you wanted to find instances of Exception?
Serioulsy, I'd just laugh at anyone who came out with these arguments. Everyone knows what i and e represent in these scenarios. They are universally accepted conventions. It sounds to me like your colleague is just trying to be a smart-ass.
Edit - This question reminded me of this wtf.
Another exception to the rule that I apply is naming of exception variables that need to be thrown. For instance, the code should read:
Exception yourToys = new Exception(...);
throw yourToys;
or
Exception up_in_a_bucket = new Exception(...);
throw up_in_a_bucket;
I recently had a conversation with somebody about this.
I'm come to the opinion that, for operations that are a functional abstraction, using a "meaningful" name can be overstated.
For instance, in JavaScript:
myArrayOfNames.forEach ( function ( name ) { } );
myArrayOfNames.map ( function ( name ) { } );
myArrayOfNames.filter ( function ( name ) { } );
I generally use "each", "obj" or just "d" for these sorts of things, because I see these as course-grained abstractions. "name" really tells me nothing other than it's a name from an array of names.
Who cares? Because I've seen developers iterate reviews arguing about what is "meaningful". More than once.
So over the years, I gravitated towards settling it by saying, the operation is a functional abstraction (iteration) applied to a specific list of some kind. Reflect that language, and usage, in your code:
myUsefullyNamedArray.filter ( function ( d ) {
return ( 'someval' in d );
} );
Is it cool?
IMO one-liners reduces the readability and makes debugging/understanding more difficult.
Maximize understandability of the code.
Sometimes that means putting (simple, easily understood) expressions on one line in order to get more code in a given amount of screen real-estate (i.e. the source code editor).
Other times that means taking small steps to make it obvious what the code means.
One-liners should be a side-effect, not a goal (nor something to be avoided).
If there is a simple way of expressing something in a single line of code, that's great. If it's just a case of stuffing in lots of expressions into a single line, that's not so good.
To explain what I mean - LINQ allows you to express quite complicated transformations in relative simplicity. That's great - but I wouldn't try to fit a huge LINQ expression onto a single line. For instance:
var query = from person in employees
where person.Salary > 10000m
orderby person.Name
select new { person.Name, person.Deparment };
is more readable than:
var query = from person in employees where person.Salary > 10000m orderby person.Name select new { person.Name, person.Deparment };
It's also more readabe than doing all the filtering, ordering and projection manually. It's a nice sweet-spot.
Trying to be "clever" is rarely a good idea - but if you can express something simply and concisely, that's good.
One-liners, when used properly, transmit your intent clearly and make the structure of your code easier to grasp.
A python example is list comprehensions:
new_lst = [i for i in lst if some_condition]
instead of:
new_lst = []
for i in lst:
if some_condition:
new_lst.append(i)
This is a commonly used idiom that makes your code much more readable and compact. So, the best of both worlds can be achieved in certain cases.
This is by definition subjective, and due to the vagueness of the question, you'll likely get answers all over the map. Are you referring to a single physical line or logical line? EG, are you talking about:
int x = BigHonkinClassName.GetInstance().MyObjectProperty.PropertyX.IntValue.This.That.TheOther;
or
int x = BigHonkinClassName.GetInstance().
MyObjectProperty.PropertyX.IntValue.
This.That.TheOther;
One-liners, to me, are a matter of "what feels right." In the case above, I'd probably break that into both physical and logic lines, getting the instance of BigHonkinClassName, then pulling the full path to .TheOther. But that's just me. Other people will disagree. (And there's room for that. Like I said, subjective.)
Regarding readability, bear in mind that, for many languages, even "one-liners" can be broken out into multiple lines. If you have a long set of conditions for the conditional ternary operator (? :), for example, it might behoove you to break it into multiple physical lines for readability:
int x = (/* some long condition */) ?
/* some long method/property name returning an int */ :
/* some long method/property name returning an int */ ;
At the end of the day, the answer is always: "It depends." Some frameworks (such as many DAL generators, EG SubSonic) almost require obscenely long one-liners to get any real work done. Othertimes, breaking that into multiple lines is quite preferable.
Given concrete examples, the community can provide better, more practical advice.
In general, I definitely don't think you should ever "squeeze" a bunch of code onto a single physical line. That doesn't just hurt legibility, it smacks of someone who has outright disdain for the maintenance programmer. As I used to teach my students: always code for the maintenance programmer, because it will often be you.
:)
Oneliners can be useful in some situations
int value = bool ? 1 : 0;
But for the most part they make the code harder to follow. I think you only should put things on one line when it is easy to follow, the intent is clear, and it won't affect debugging.
One-liners should be treated on a case-by-case basis. Sometimes it can really hurt readability and a more verbose (read: easy-to-follow) version should be used.
There are times, however when a one-liner seems more natural. Take the following:
int Total = (Something ? 1 : 2)
+ (SomethingElse ? (AnotherThing ? x : y) : z);
Or the equivalent (slightly less readable?):
int Total = Something ? 1 : 2;
Total += SomethingElse ? (AnotherThing ? x : y) : z;
IMHO, I would prefer either of the above to the following:
int Total;
if (Something)
Total = 1;
else
Total = 2;
if (SomethingElse)
if (AnotherThing)
Total += x;
else
Total += y;
else
Total += z
With the nested if-statements, I have a harder time figuring out the final result without tracing through it. The one-liner feels more like the math formula it was intended to be, and consequently easier to follow.
As far as the cool factor, there is a certain feeling of accomplishment / show-off factor in "Look Ma, I wrote a whole program in one line!". But I wouldn't use it in any context other than playing around; I certainly wouldn't want to have to go back and debug it!
Ultimately, with real (production) projects, whatever makes it easiest to understand is best. Because there will come a time that you or someone else will be looking at the code again. What they say is true: time is precious.
That's true in most cases, but in some cases where one-liners are common idioms, then it's acceptable. ? : might be an example. Closure might be another one.
No, it is annoying.
One liners can be more readable and they can be less readable. You'll have to judge from case to case.
And, of course, on the prompt one-liners rule.
VASTLY more important is developing and sticking to a consistent style.
You'll find bugs MUCH faster, be better able to share code with others, and even code faster if you merely develop and stick to a pattern.
One aspect of this is to make a decision on one-liners. Here's one example from my shop (I run a small coding department) - how we handle IFs:
Ifs shall never be all on one line if they overflow the visible line length, including any indentation.
Thou shalt never have else clauses on the same line as the if even if it comports with the line-length rule.
Develop your own style and STICK WITH IT (or, refactor all code in the same project if you change style).
.
The main drawback of "one liners" in my opinion is that it makes it hard to break on the code and debug. For example, pretend you have the following code:
a().b().c(d() + e())
If this isn't working, its hard to inspect the intermediate values. However, it's trivial to break with gdb (or whatever other tool you may be using) in the following, and check each individual variable and see precisely what is failing:
A = a();
B = A.b();
D = d();
E = e(); // here i can query A B D and E
B.C(d + e);
One rule of thumb is if you can express the concept of the one line in plain language in a very short sentence. "If it's true, set it to this, otherwise set it to that"
For a code construct where the ultimate objective of the entire structure is to decide what value to set a single variable, With appropriate formatting, it is almost always clearer to put multiple conditonals into a single statement. With multiple nested if end if elses, the overall objective, to set the variable...
" variableName = "
must be repeated in every nested clause, and the eye must read all of them to see this.. with a singlr statement, it is much clearer, and with the appropriate formatting, the complexity is more easily managed as well...
decimal cost =
usePriority? PriorityRate * weight:
useAirFreight? AirRate * weight:
crossMultRegions? MultRegionRate:
SingleRegionRate;
The prose is an easily understood one liner that works.
The cons is the concatenation of obfuscated gibberish on one line.
Generally, I'd call it a bad idea (although I do it myself on occasion) -- it strikes me as something that's done more to impress on how clever someone is than it is to make good code. "Clever tricks" of that sort are generally very bad.
That said, I personally aim to have one "idea" per line of code; if this burst of logic is easily encapsulated in a single thought, then go ahead. If you have to stop and puzzle it out a bit, best to break it up.