I have recently joined the team that has a very peculiar coding guidance:
Whenever they have an if block not followed by "else", they put a semicolone after the closing brace. The rationale is that it would signal to the reader that if there is an "else" below it, it belongs to the outer level "if" (in case the indentation is wrong). A small example:
if (condition1)
{
//do something
if (condition2)
{
// do something else
};
}
else
{
// do something as a negative response to the first if
}
I have not seen it before, so my question is - other than being an eyesore, is there any performance penalty for these empty statements at the end of the block, or they are just being ignored by the compilers? This is not a single or rare occasion, I am seeing these empty statements all over a file I am supposed to modify, one among many similarly coded...
No, there is no runtime performance penalty for empty statements in C/C++ assuming optimizations are enabled or a mainstream compiler is used.
Basic optimizations are enough to remove any runtime overhead of the generated program. Mainstream compilers like Clang, GCC and MSVC remove such useless statements very early in the front-end part of the compilation stage. For example, Clang generates a Null statement during the generation of the Abstract Syntax Tree (AST), but it does not generate Intermediate Representation (IR) instruction related to the useless statements. The IR code is then used to optimize the output program and generate the assembly code. Note that this introduces some (small) overhead during the compilation though since compiler generate useless temporary data that should be stored and processed.
Related
Consider the following two programs:
unit module Comp;
say 'Hello, world!'
and
unit module Comp;
CHECK { if $*DISTRO.is-win { say 'compiling on Windows' }}
say 'Hello, world!'
Naively, I would have expected both programs to compile to exactly the same bytecode: the CHECK block specifies code to run at the end of compilation; checking a variable and then doing nothing has no effect on the run-time behavior of the program, and thus (I would have thought) shouldn't need to be included in the compiled bytecode.
However, compiling these two programs does not result in the same bytecode. Specifically, compiling the version without the CHECK block creates 24K of bytecode versus 60K for the version with it. Why is the bytecode different for these two versions? Does this difference in bytecode have (or potentially have) a runtime cost? (It seems like it must, but I want to be sure).
And one more related question: how do DOC CHECK blocks fit in with the above? My understanding is that even the compiler skips DOC CHECK blocks when it's not run with the --doc flag. Consistent with that, the bytecode for a hello-world program does not increase in size when given a DOC CHECK block like the one above. However, it does increase in size if the block includes a use statement. From that, I conclude that use is somehow special-cased and gets executed even in DOC CHECK blocks. Is that correct? If so, are there other simillarly special-cased forms I should know about?
A CHECK or BEGIN block (or other BEGIN-time constructs) may contain code that escapes. For example:
BEGIN SomeClass.^add_method('foo', anon method foo() { 42 })
Adds a method to a class, which exists beyond the bounds of the BEGIN block. That method's bytecode is therefore required in the compiled output. Currently, Rakudo conservatively includes the bytecode of everything in a BEGIN or CHECK block. It may be possible to avoid that for some simple cases in the future.
So far as the runtime cost goes, the implementation goes to some lengths to minimize the cost of bytecode that is never run (not so much for this case, but because the standard library is huge but many programs use only a fraction of it). For example:
Bytecode is mmap'd, so some unused parts of it may not actually be paged into memory
Bytecode is only validated on the first call to that frame
Frame meta-data (what lexicals does it have) is only deserialized on the first call to the frame
Unless something references it, the code object will not be deserialized
So far as use goes, its action is performed as soon as it is parsed. Being inside a DOC CHECK block does not suppress that - and in general can not, because the use might bring in things that need to be known in order to finish parsing the contents of that block.
I have a chunk of lua code that I'd like to be able to (selectively) ignore. I don't have the option of not reading it in and sometimes I'd like it to be processed, sometimes not, so I can't just comment it out (that is, there's a whole bunch of blocks of code and I either have the option of reading none of them or reading all of them). I came up with two ways to implement this (there may well be more - I'm very much a beginner): either enclose the code in a function and then call or not call the function (and once I'm sure I'm passed the point where I would call the function, I can set it to nil to free up the memory) or enclose the code in an if ... end block. The former has slight advantages in that there are several of these blocks and using the former method makes it easier for one block to load another even if the main program didn't request it, but the latter seems the more efficient. However, not knowing much, I don't know if the efficiency saving is worth it.
So how much more efficient is:
if false then
-- a few hundred lines
end
than
throwaway = function ()
-- a few hundred lines
end
throwaway = nil -- to ensure that both methods leave me in the same state after garbage collection
?
If it depends a lot on the lua implementation, how big would the "few hundred lines" need to be to reliably spot the difference, and what sort of stuff should it include to best test (the main use of the blocks is to define a load of possibly useful functions)?
Lua's not smart enough to dump the code for the function, so you're not going to save any memory.
In terms of speed, you're talking about a different of nanoseconds which happens once per program execution. It's harming your efficiency to worry about this, which has virtually no relevance to actual performance. Write the code that you feel expresses your intent most clearly, without trying to be clever. If you run into performance issues, it's going to be a million miles away from this decision.
If you want to save memory, which is understandable on a mobile platform, you could put your conditional code in it's own module and never load it at all of not needed (if your framework supports it; e.g. MOAI does, Corona doesn't).
If there is really a lot of unused code, you can define it as a collection of Strings and loadstring() it when needed. Storing functions as strings will reduce the initial compile time, however of most functions the string representation probably takes up more memory than it's compiled form and what you save when compiling is probably not significant before a few thousand lines... Just saying.
If you put this code in a table, you could compile it transparently through a metatable for minimal performance impact on repeated calls.
Example code
local code_uncompiled = {
f = [=[
local x, y = ...;
return x+y;
]=]
}
code = setmetatable({}, {
__index = function(self, k)
self[k] = assert(loadstring(code_uncompiled[k]));
return self[k];
end
});
local ff = code.f; -- code of x gets compiled here
ff = code.f; -- no compilation here
for i=1, 1000 do
print( ff(2*i, -i) ); -- no compilation here either
print( code.f(2*i, -i) ); -- no compile either, but table access (slower)
end
The beauty of it is that this compiles as needed and you don't really have to waste another thought on it, it's just like storing a function in a table and allows for a lot of flexibility.
Another advantage of this solution is that when the amount of dynamically loaded code gets out of hand, you could transparently change it to load code from external files on demand through the __index function of the metatable. Also, you can mix compiled and uncompiled code by populating the "code" table with "real" functions.
Try the one that makes the code more legible to you first. If it runs fast enough on your target machine, use that.
If it doesn't run fast enough, try the other one.
lua can ignore multiple lines by:
function dostuff()
blabla
faaaaa
--[[
ignore this
and this
maybe this
this as well
]]--
end
We have a strange error with our debuggers when running the debugger on a phone in release mode. Whether we are using gdb or lldb with xcode 4.3.3, the code will land on breakpoints even though the code's PC is not really pointing at that spot.
Example fake code:
if (true) {
// set breakpoint-A here
} else {
// set breakpoint-B here
}
// set another breakpoint-C here.
It will land in breakpoint-B and then jump to breakpoint-A.
Is the cause because we are in "release" mode and it's optimizing?
Thanks!
Yes, there are three things going on here: When you build in release mode, the compiler is doing optimized code generation. The compiler may change the order that source lines are compiled into the program (as long as it doesn't change the meaning of the code), instructions between different source lines may be intermixed or rearranged, and finally there can be problems with the line table that the compiler emits.
Imagine two source lines, each which turn in to 8 assembly language instructions. The compiler may rearrange these 16 instructions (as long as it doesn't change the results of them) to keep the CPU(s) running most efficiently. But in this situation, what instruction should the compiler say is equivalent to line 1? and what instruction should the compiler say is equivalent to line 2?
With optimized code debugging, if you're debugging at a source code level, you have to live with the reality that the "currently executing source line" is going to bounce around a lot while you step through your program. Variables that seem to be in scope will appear and disappear at nonobvious times. The compiler's ways are tricky and hard to understand. You need to debug with the assembly language code in front of you (or a source + assembly display) to really follow what's happening.
There are improvements that compilers and debuggers can make to improve optimized code source-level debugging but it will probably always be a little hard to follow.
Xcode also tends to jump from any return statement in a method to the first return statement in that method. (Xcode 4.3.3 still does it. I am not sure about 4.5 yet.)
Just ignore that last highlighted 'return' statment.
Off late I'd been hearing that applications written in different languages can call each other's functions/subroutines. Now, till recently I felt that was very natural - since all, yes all - that's what I thought then, silly me! - languages are compiled into machine code and that should be same for all the languages. Only some time back did I realise that even languages compiled in 'higher machine code' - IL, byte code etc. can interact with each other, the applications actually. I tried to find the answer a lot of times, but failed - no answer satisfied me - either they assumed I knew a lot about compilers, or something that I totally didn't agree with, and other stuff...Please explain in an easy to understand way how this works out. Especially how languages compiled into 'pure' machine code have different something called 'calling conventions' is what is making me clutch my hair.
This is actually a very broad topic. Languages compiled to machine code can often call each others' routines, though usually not without effort; e.g., C++ code can call C routines when properly declared:
// declare the C function foo so it can be called by C++ code
extern "C" {
void foo(int, char *);
}
This is about as simple as it gets, because C++ was explicitly designed for compatibility with C (it includes support for calling C++ routines from C as well).
Calling conventions indeed complicate the picture in that C routines compiled by one compiler might not be callable from C compiled by another compiler, unless they share a common calling convention. For example, one compiler might compile
foo(i, j);
to (pseudo-assembly)
PUSH the value of i on the stack
PUSH the value of j on the stack
JUMP into foo
while another might push the values of i and j in reverse order, or place them in registers. If foo was compiled by a compiler following another convention, it might try to fetch its arguments off the stack in the wrong order, leading to unpredictable behavior (consider yourself lucky if it crashes immediately).
Some compilers support various calling conventions for this purpose. The Wikipedia article introduces calling conventions; for more details, consult your compiler's documentation.
Finally, mixing bytecode-compiled or interpreted languages and lower-level ones in the same address space is still more complicated. High-level language implementations commonly come with their own set of conventions to extend them with lower-level (C or C++) code. E.g., Java has JNI and JNA.
I commonly place into variables values that are only used once after assignment. I do this to make debugging more convenient later, as I'm able to hover the value on the one line where it's later used.
For example, this code doesn't let you hover the value of GetFoo():
return GetFoo();
But this code does:
var foo = GetFoo();
return foo; // your hover-foo is great
This smells very YAGNI-esque, as the functionality of the foo's assignment won't ever be used until someone needs to debug its value, which may never happen. If it weren't for the merely foreseen debugging session, the first code snippet above keeps the code simpler.
How would you write the code to best compromise between simplicity and ease of debugger use?
I don't know about other debuggers, but the integrated Visual Studio debugger will report what was returned from a function in the "Autos" window; once you step over the return statement, the return value shows up as "[function name] returned" with a value of whatever value was returned.
gdb supports the same functionality as well; the "finish" command executes the rest of the current function and prints the return value.
This being a very useful feature, I'd be surprised if most other debuggers didn't support this capability.
As for the more general "problem" of "debugger-only variables," are they really debugger-only? I tend to think that the use of well-named temporary variables can significantly improve code readability as well.
Another possibility is to learn enough assembly programming that you can read the code your compiler generates. With that skill, you can figure out where the value is being held (in a register, in memory) and see the value without having to store it in a variable.
This skill is very useful if you are ever need to debug an optimized executable. The optimizer can generate code that is significantly different from how you wrote it such that symbolic debugging is not helpful.
Another reason why you don't need intermediate variables in the Visual Studio debugger is that you can evaluate the function in the Watch Window and the Immediate window. For the watch window, just simply highlight the statement you want evaluated and drag it into the window.
I'd argue that it's not worth worrying about. Given that there's no runtime overhead in the typical case, go nuts. I think that breaking down complex statements into multiple simple statements usually increases readability.
I would leave out the assignment until it is needed. If you never happen to be in that bit of code, wanting a look at that variable, you haven't cluttered up your code unnecessarily. When you run across the need, put it in (it should be a trivial Extract Variable refactoring). And when you're done with that debugging session, get rid of it (Inline Variable). If you find yourself debugging so much - and so much at that particular point - that you're weary of refactoring back and forth, then think about ways to avoid the need; maybe more unit tests would help.