Go switch vs if-else efficiency - go

In Go, switches are much more flexible than in C (and C++) since they can handle cases of boolean expressions and replace large else-if ladders seemingly entirely, especially with the default switch {...} blocks.
switch {
case x < 5 && y > 2:
//...
case y == 1 || x > 2:
//...
default:
}
Is there any efficiency advantage to using a switch over else-if in Go? It seems that the boosted efficiency would be lost by the switch's flexibility. Is it just up to the compiler to figure it out and see if it can make a jump table?
Is there any performance advantage to using switch over if and else?

Unless all your case are integral constants then you lose the possibility of transforming the switch to a jump-table.
So, at best, Go's switch might be equivalent to C++'s switch if you only use integral constants, but otherwise it will be no more efficient than if/else.

It's completely up to the compiler to figure it out and choose a good implementation strategy for your code. You can always find out what code the compiler is generating by requesting an assembly listing of the compiler output. See the -S option to the Go compiler.

It's surely irrelevant for your application performance. There is probably other more complex situation where you can improve performance. Like saving a single SQL query is probably like 1 million if/else/switch.
Do not worry much about detail like that and focus on higher level stuff.

Related

OpenCL, substituting branches with arithmetic operations

The following question is more related to design, rather than actual coding. I don't know if there's a technical term for such problem, so I'll proceed with an example.
I have some openCL code not optimized at all, and in the Kernel there's essentially a switch statement similar to the following
switch(const) {
case const_a : do_something_a(...); break;
case const_b : do_something_b(....); break;
... //etc
}
I cannot write the actual statement since is quite long. As a simple example consider the following switch statement:
int a;
switch(input):
case 13 : {a = 3; break;}
case 1 : {a = 7; break;}
case 23 : {a = 1; break;}
default : {...}
The question is... would it be better to change such switch with an expression like
a = (input == 13)*3 + (input == 1)*7 + (input == 23)
?
If it's not, is it possible to make it more efficient anyway?
You can assume input only takes values in the set of cases of the switch statement.
You've discovered an interesting question that GPU compilers wrestle with. The general advice is try not to branch. Tricks to make that possible are splitting kernels up (as suggested above) and preprocessor (program-time definitions). Research in GPU algorithm development basically works from this axiom.
Branching all over the place won't get great efficiency because of the inherent divergence (channel = work item within the SIMD thread/warp). Remember that all these channels must execute together. So in a switch where all are taking different paths everyone else goes along for the ride silently waiting for their "case" to execute. Now, if input is always always the same value, it can still be a win.
Another popular option is a table indirection.
kernel void foo(const int *t, ...)
...
a = tbl[input];
This case has a few problems too depending on hardware, inputs, and problem size.
Without more specific context, I can conjure up a case where any of these can run well or poorly.
Switching (or big if-then-else chains).
PROS: If all work items generally take the same path (input is mostly the same value), it's going to be efficient. You could also write an if-then-else chain putting the most common cases first. (On GPUs a switch is not necessarily as easy as an indirect jump since there are multiple work items and they may take different paths.)
CONS: Might generate lots of program code and could blow out the instruction cache. Branching all over the place can get a little costly depending on how many cases need to be evaluated. It might just be better to grind through the compute with the predicated code.
Predicated Code (Your (input == 13)*3 ... code).
PROS: This will probably generate smaller programs and stress the I$ less. (Lookup the OpenCL select function to see a more general approach for your case.)
CONS: We've basically punted and decided to evaluate every "case in the switch". If input is usually the same value, we're wasting time here.
Lookup-table based approaches (my example).
PROS: If the switch you are evaluating has a massive number of cases (branches), but can be indexed by integer you might be ahead to just use a lookup table. On some hardware this means a read from global memory (far far away). Other architectures have a dedicated constant cache, but I understand that a vector lookup will serialize (K cycles for each channel). So it might be only marginally better than the global memory table. However, the code table-lookup generated will be short (I$ friendly) and as the number of branches (case statements) grow this will win in the limit. This approach also deals well with uniform/scattered distributions of input's value.
CONS: The read from global memory (or serialized access from the constant cache) has a big latency even compared to branching. In some cases, to eliminate the extra memory traffic I've seen compilers convert lookup tables into if-then-else/switch chains. It's rare that we have 100 element case statements.
I am now inspired to go study this cutoff. :-)

How to recognize variables that don't affect the output of a program?

Sometimes the value of a variable accessed within the control-flow of a program cannot possibly have any effect on a its output. For example:
global var_1
global var_2
start program hello(var_3, var_4)
if (var_2 < 0) then
save-log-to-disk (var_1, var_3, var_4)
end-if
return ("Hello " + var_3 + ", my name is " + var_1)
end program
Here only var_1 and var_3 have any influence on the output, while var_2 and var_4 are only used for side effects.
Do variables such as var_1 and var_3 have a name in dataflow-theory/compiler-theory?
Which static dataflow analysis techniques can be used to discover them?
References to academic literature on the subject would be particularly appreciated.
The problem that you stated is undecidable in general,
even for the following very narrow special case:
Given a single routine P(x), where x is a parameter of type integer. Is the output of P(x) independent of the value of x, i.e., does
P(0) = P(1) = P(2) = ...?
We can reduce the following still undecidable version of the halting problem to the question above: Given a Turing machine M(), does the program
never stop on the empty input?
I assume that we use a (Turing-complete) language in which we can build a "Turing machine simulator":
Given the program M(), construct this routine:
P(x):
if x == 0:
return 0
Run M() for x steps
if M() has terminated then:
return 1
else:
return 0
Now:
P(0) = P(1) = P(2) = ...
=>
M() does not terminate.
M() does terminate
=> P(x) = 1 for a sufficiently large x
=> P(x) != P(0) = 0
So, it is very difficult for a compiler to decide whether a variable actually does not influence the return value of a routine; in your example, the "side effect routine" might manipulate one of its values (or even loop infinitely, which would most definitely change the return value of the routine ;-)
Of course overapproximations are still possible. For example, one might conclude that a variable does not influence the return value if it does not appear in the routine body at all. You can also see some classical compiler analyses (like Expression Simplification, Constant propagation) having the side effect of eliminating appearances of such redundant variables.
Pachelbel has discussed the fact that you cannot do this perfectly. OK, I'm an engineer, I'm willing to accept some dirt in my answer.
The classic way to answer you question is to do dataflow tracing from program outputs back to program inputs. A dataflow is the connection of a program assignment (or sideeffect) to a variable value, to a place in the application that consumes that value.
If there is (transitive) dataflow from a program output that you care about (in your example, the printed text stream) to an input you supplied (var2), then that input "affects" the output. A variable that does not flow from the input to your desired output is useless from your point of view.
If you focus your attention only the computations involved in the dataflows, and display them, you get what is generally called a "program slice" . There are (very few) commercial tools that can show this to you.
Grammatech has a good reputation here for C and C++.
There are standard compiler algorithms for constructing such dataflow graphs; see any competent compiler book.
They all suffer from some limitation due to Turing's impossibility proofs as pointed out by Pachelbel. When you implement such a dataflow algorithm, there will be places that it cannot know the right answer; simply pick one.
If your algorithm chooses to answer "there is no dataflow" in certain places where it is not sure, then it may miss a valid dataflow and it might report that a variable does not affect the answer incorrectly. (This is called a "false negative"). This occasional error may be satisfactory if
the algorithm has some other nice properties, e.g, it runs really fast on a millions of code. (The trivial algorithm simply says "no dataflow" in all places, and it is really fast :)
If your algorithm chooses to answer "yes there is a dataflow", then it may claim that some variable affects the answer when it does not. (This is called a "false positive").
You get to decide which is more important; many people prefer false positives when looking for a problem, because then you have to at least look at possibilities detected by the tool. A false negative means it didn't report something you might care about. YMMV.
Here's a starting reference: http://en.wikipedia.org/wiki/Data-flow_analysis
Any of the books on that page will be pretty good. I have Muchnick's book and like it lot. See also this page: (http://en.wikipedia.org/wiki/Program_slicing)
You will discover that implementing this is pretty big effort, for any real langauge. You are probably better off finding a tool framework that does most or all this for you already.
I use the following algorithm: a variable is used if it is a parameter or it occurs anywhere in an expression, excluding as the LHS of an assignment. First, count the number of uses of all variables. Delete unused variables and assignments to unused variables. Repeat until no variables are deleted.
This algorithm only implements a subset of the OP's requirement, it is horribly inefficient because it requires multiple passes. A garbage collection may be faster but is harder to write: my algorithm only requires a list of variables with usage counts. Each pass is linear in the size of the program. The algorithm effectively does a limited kind of dataflow analysis by elimination of the tail of a flow ending in an assignment.
For my language the elimination of side effects in the RHS of an assignment to an unused variable is mandated by the language specification, it may not be suitable for other languages. Effectiveness is improved by running before inlining to reduce the cost of inlining unused function applications, then running it again afterwards which eliminates parameters of inlined functions.
Just as an example of the utility of the language specification, the library constructs a thread pool and assigns a pointer to it to a global variable. If the thread pool is not used, the assignment is deleted, and hence the construction of the thread pool elided.
IMHO compiler optimisations are almost invariably heuristics whose performance matters more than effectiveness achieving a theoretical goal (like removing unused variables). Simple reductions are useful not only because they're fast and easy to write, but because a programmer using a language who understand basics of the compiler operation can leverage this knowledge to help the compiler. The most well known example of this is probably the refactoring of recursive functions to place the recursion in tail position: a pointless exercise unless the programmer knows the compiler can do tail-recursion optimisation.

Is it worth it to rewrite an if statement to avoid branching?

Recently I realized I have been doing too much branching without caring the negative impact on performance it had, therefore I have made up my mind to attempt to learn all about not branching. And here is a more extreme case, in attempt to make the code to have as little branch as possible.
Hence for the code
if(expression)
A = C; //A and C have to be the same type here obviously
expression can be A == B, or Q<=B, it could be anything that resolve to true or false, or i would like to think of it in term of the result being 1 or 0 here
I have come up with this non branching version
A += (expression)*(C-A); //Edited with thanks
So my question would be, is this a good solution that maximize efficiency?
If yes why and if not why?
Depends on the compiler, instruction set, optimizer, etc. When you use a boolean expression as an int value, e.g., (A == B) * C, the compiler has to do the compare, and the set some register to 0 or 1 based on the result. Some instruction sets might not have any way to do that other than branching. Generally speaking, it's better to write simple, straightforward code and let the optimizer figure it out, or find a different algorithm that branches less.
Jeez, no, don't do that!
Anyone who "penalize[s] [you] a lot for branching" would hopefully send you packing for using something that awful.
How is it awful, let me count the ways:
There's no guarantee you can multiply a quantity (e.g., C) by a boolean value (e.g., (A==B) yields true or false). Some languages will, some won't.
Anyone casually reading it is going observe a calculation, not an assignment statement.
You're replacing a comparison, and a conditional branch with two comparisons, two multiplications, a subtraction, and an addition. Seriously non-optimal.
It only works for integral numeric quantities. Try this with a wide variety of floating point numbers, or with an object, and if you're really lucky it will be rejected by the compiler/interpreter/whatever.
You should only ever consider doing this if you had analyzed the runtime properties of the program and determined that there is a frequent branch misprediction here, and that this is causing an actual performance problem. It makes the code much less clear, and its not obvious that it would be any faster in general (this is something you would also have to measure, under the circumstances you are interested in).
After doing research, I came to the conclusion that when there are bottleneck, it would be good to include timed profiler, as these kind of codes are usually not portable and are mainly used for optimization.
An exact example I had after reading the following question below
Why is it faster to process a sorted array than an unsorted array?
I tested my code on C++ using that, that my implementation was actually slower due to the extra arithmetics.
HOWEVER!
For this case below
if(expression) //branched version
A += C;
//OR
A += (expression)*(C); //non-branching version
The timing was as of such.
Branched Sorted list was approximately 2seconds.
Branched unsorted list was aproximately 10 seconds.
My implementation (whether sorted or unsorted) are both 3seconds.
This goes to show that in an unsorted area of bottleneck, when we have a trivial branching that can be simply replaced by a single multiplication.
It is probably more worthwhile to consider the implementation that I have suggested.
** Once again it is mainly for the areas that is deemed as the bottleneck **

Expression performance of overload operator?

(i++) and (i = i + 1)
(i += n) and (i = i + n)
which is better (performance)?
It doesn't matter
The compiler will convert statements like that to (what it thinks, and often is) their most efficient form.
I'd recommend you write statements like this in the same way as the rest of your code base in order to keep consistency.
If you are just doing your own thing on a personal project you can either do what you prefer or what is common for your particular language.
It does not matter, the performance is the same. In 1978 when C was invented these would map to different PDP-11 instructions, resulting in faster performance of ++ and +=. These days, however, the operations are optimized into the same exact sequences of instructions.

How to explain to a developer that adding extra if - else if conditions is not a good way to "improve" readability?

Recently I've bumped into the following C++ code:
if (a)
{
f();
}
else if (b)
{
f();
}
else if (c)
{
f();
}
Where a, b and c are all different conditions, and they are not very short.
I tried to change the code to:
if (a || b || c)
{
f();
}
But the author opposed saying that my change will decrease readability of the code. I had two arguments:
1) You should not increase readability by replacing one branching statement with three (though I really doubt that it's possible to make code more readable by using else if instead of ||).
2) It's not the fastest code, and no compiler will optimize this.
But my arguments did not convince him.
What would you tell a programmer writing such a code?
Do you think complex condition is an excuse for using else if instead of OR?
This code is redundant. It is prone to error.
If you were to replace f(); someday with something else, there is the danger you miss one out.
There may though be a motivation behind that these three condition bodies could one day become different and you sort of prepare for this situation. If there is a strong possibility it will happen, it may be okay to do something of the sort. But I'd advice to follow the YAGNI principle (You Ain't Gonna Need It). Can't say how much bloated code has been written not because of the real need but just in anticipation of it becoming needed tomorrow. Practice shows this does not bring any value during the entire lifetime of an application but heavily increases maintenance overhead.
As to how to approach explaining it to your colleague, it has been discussed numerous times. Look here:
How do you tell someone they’re writing bad code?
How to justify to your colleagues that they produce crappy code?
How do you handle poor quality code from team members?
“Mentor” a senior programmer or colleague without insulting
Replace the three complex conditions with one function, making it obvious why f() should be executed.
bool ShouldExecute; { return a||b||c};
...
if ShouldExecute {f();};
Since the conditions are long, have him do this:
if ( (aaaaaaaaaaaaaaaaaaaaaaaaaaaa)
|| (bbbbbbbbbbbbbbbbbbbbbbbbbbbb)
|| (cccccccccccccccccccccccccccc) )
{
f();
}
A good compiler might turn all of these into the same code anyway, but the above is a common construct for this type of thing. 3 calls to the same function is ugly.
In general I think you are right in that if (a || b || c) { f(); } is easier to read. He could also make good use of whitespace to help separate the three blocks.
That said, I would be interested to see what a, b, c, and f look like. If f is just a single function call and each block is large, I can sort of see his point, although I cringe at violating the DRY principle by calling f three different times.
Performance is not an issue here.
Many people wrap themselves in the flag of "readability" when it's really just a matter of individual taste.
Sometimes I do it one way, sometimes the other. What I'm thinking about is -
"Which way will make it easier to edit the kinds of changes that might have to be made in the future?"
Don't sweat the small stuff.
I think that both of your arguments (as well as Developer Art's point about maintainability) are valid, but apparently your discussion partner is not open for a discussion.
I get the feeling that you are having this discussion with someone who is ranked as more senior. If that's the case, you have a war to fight and this is just one small battle, which is not important for you to win. Instead of spending time arguing about this thing, try to make your results (which will be far better than your discussion partner's if he's writing that kind of kode) speak for themselves. Just make sure that you get credit for your work, not the whole team or someone else.
This is probably not the kind of answer you expected to the question, but I got a feeling that there's something more to it than just this small argument...
I very much doubt there will be any performance gains of this, except at least in a very specific scenario. In this scenario you change a, b, and c, and thus which of the three that triggers the code changes, but the code executes anyhow, then reducing the code to one if-statement might improve, since the CPU might have the code in the branch cache when it gets to it next time. If you triple the code, so that it occupies 3 times the space in the branch cache, there is a higher chance one or more of the paths will be pushed out, and thus you won't have the most performant execution.
This is very low-level, so again, I doubt this will make much of an impact.
As for readability, which one is easier to read:
if something, do this
if something else, do this
if yet another something else, do this
"this" is the same in all three cases
or this:
if something, or something else, or yet another something else, then do this
Place some more code in there, other than just a simple function call, and it starts getting hard to identify that this is actually three identical pieces of code.
Maintainability goes down with the 3 if-statement organization because of this.
You also have duplicated code, almost always a sign of bad structure and design.
Basically, I would tell the programmer that if he has problems reading the 1 if-statement way of writing it, maybe C++ is not what he should be doing.
Now, let's assume that the "a", "b", and "c" parts are really big, so that the OR's in there gets lost in lots of noise with parenthesis or what not.
I would still reorganize the code so that it only called the function (or executed the code in there) in one place, so perhaps this is a compromise?
bool doExecute = false;
if (a) doExecute = true;
if (b) doExecute = true;
if (c) doExecute = true;
if (doExecute)
{
f();
}
or, even better, this way to take advantage of boolean logic short circuiting to avoid evaluating things unnecessary:
bool doExecute = a;
doExecute = doExecute || b;
doExecute = doExecute || c;
if (doExecute)
{
f();
}
Performance shouldn't really even come into question
Maybe later he wont call f() in all 3 conditons
Repeating code doesn't make things clearer, your (a||b||c) is much clearer although maybe one could refactor it even more (since its C++) e.g.
x.f()

Resources