Does defining variables within a loop matter? - performance

This is a piece of code I am writing.
var cList:XMLList = xml.defines.c;
var className:String;
var properties:XMLList;
var property:XML;
var i:int,l:int;
var c:XML;
for each(c in cList)
{
className = String(c.#name);
if(cDict[className])
{
throw new Error('class name has been defined' + className);
}
if(className)
{
cDict[className] = c;
}
properties = c.property;
i = 0,
l = properties.length();
if(l)
{
propertyDict[className] = new Dictionary();
for(;i<l;i++)
{
// ...
}
}
}
As you can see, I defined all variables outside of loops. I am always worried, that if I defined them inside the loop, it might slow down the process speed, though I don't have proof - it's just a feeling.
I also don't like that the as3 grammar allows using a variable name before the defintion. So I always define vars at the very beginning of my functions.
Now I am worried these habits might backfire on me someday. Or is it just a matter of personal taste?

No it doesn't matters because the compiler use variable hoisting, so it means that that the compiler moves all variable declarations to the top of the function :
More explanation on variables:
http://help.adobe.com/en_US/ActionScript/3.0_ProgrammingAS3/WS5b3ccc516d4fbf351e63e3d118a9b90204-7f9d.html

AS3 IDEs allow you to use variable names before the declaration, because they know that the compiler uses a mechanism called "hoisting" to move all variable definitions to the top of a function, anyway. This happens without you noticing it, so that you can conveniently keep your code more readable. Therefore, it does not really make a difference if you manually move all the definitions to the top - unless you like your code to be structured in that way.
For the same reason, variable declaration within loops does not affect performance, unless you keep those loops in separate functions - only then will it result in actual allocation of a variable.

A lot of AS3 programmers, me including, would consider the way you have it now to be "the right" way, while putting variables inside any block to be the "wrong way". The speed does not matter in this situation, I'll try to present both side's arguments in as little biased way as I can.
Put variables definition as close to the code it is used in. The motivation is simple: if the variable is used somewhere, you might want to know where was it declared. This is useful if you aren't sure of type of the variable, or its modifiers. Normally, the declaration is also the place to put commentary.
Put variables at the place where they are actually declared. Even a seasoned ActionScript programmer may eventually confuse him- or herself by declaring variables inside blocks, where a seemingly uninitialized variable would suddenly contain some random value. The common case looks like this:
for (var i:int; i < x; i++) {
// this loop is entered exactly once, instead of `i' times
for (var j:int; j < y: j++) { ... }
}
There is also a long tradition, originating in the C89 standard (aka ANSI C), which didn't have block-scoped variables and would not allow variable definition inside a loop. This has been later altered so that variables were scoped to the block of code where they are declared. Many modern C-like languages, for example C# treat variables like that. So, in the example above, C# would re-initialize j every time the inner loop was entered.
Programmers with longer tradition of writing code in other C-like languages would be led to believe thus, once they see variables declared inside a block, that the variable is scoped to the block. The "hoisting" is thus thought of as counter-intuitive. Therefore error-prone.

Related

Passing the same variable through intermediate methods which don't directly process it

More than often I find myself passing a variable into a method, where it is not directly used, but is yet passed to another method for further processing. I don't know if there is a name for this practice either. I can explain it better with a sample code:
static void main() {
int a = method1(var1, var2, var3)
}
int method1(int var1, int var2, int var3) {
var4 = some_function_of(var1, var2)
return method2(var3, var4)
}
int method2(int var 3, int var4) {
return some_other_function_of(var3, var4)
}
This case can be expanded where there the same variable (var3) is passed through even longer chains of methods. I think this could be a bad practice, as in my example method1 is some sort of intermediate that is not manipulating var3. Is there a better practice for performance, design and/or readability?
At least for object oriented languages the answer would be:
You definitely want to avoid such code - as you struggle to reduce your parameter list to the absolut minimum; the goal is zero.
If you find that your class should offer various methods that all require the "same" parameter; than that makes that parameter a candidate to be a field within your class.
In non-oo languages, I think you have to pragmatically balance between having more functions and parameter list length. In your example,
static void main() {
int var4 = some_function_of(var1, var2)
int a = method2(var3, var4)
}
avoiding method1 ... saves you passing var3 to your first method. And you are still within the rules of the "single layer of abstraction" principle.
This is not at all uncommon and not necessarily a bad practice. It can impact all three of the metrics you mentioned though:
Performance: Adding another parameter to a function call may result in a performance hit but not always. It depends on the language, compiler/interpreter, and platform. For example, an optimizing compiler for C++ will try to avoid copying a variable even if it is passed by value if it can (sometimes it will eliminate a function call completely). But passing a value through multiple functions might mess up the compiler's optimizations if it can't follow the path well. Still, I expect any performance hit from this to be minimal.
Design: Depending on your language's paradigm (object oriented, functional, etc...) this might indicate that your design could be improved, perhaps by encapsulating the data in a structure or class so that only one parameter is passed (a class instance pointer) and each function accesses only the class members it needs.
Readability: This shouldn't make the individual functions harder to read, since they shouldn't care where parameters come from and it is clear that the parameter is being passed to another function. It could make it harder to understand the whole program though because it can be hard to keep track of where values originate if they are passed through a long chain of calls before being touched.
In general, it is good to minimize the parameter list (for all of these reasons) and to keep data "closer" to code that needs it. If you do those things, this case shouldn't pop up much and when it does it will be less likely to be due to bad design.

Mathematica - can I define a block of code using a single variable?

It has been a while since I've used Mathematica, and I looked all throughout the help menu. I think one problem I'm having is that I do not know what exactly to look up. I have a block of code, with things like appending lists and doing basic math, that I want to define as a single variable.
My goal is to loop through a sequence and when needed I wanted to call a block of code that I will be using several times throughout the loop. I am guessing I should just put it all in a loop anyway, but I would like to be able to define it all as one function.
It seems like this should be an easy and straightforward procedure. Am I missing something simple?
This is the basic format for a function definition in Mathematica.
myFunc[par1_,par2_]:=Module[{localVar1,localVar2},
statement1; statement2; returnStatement ]
Your question is not entirely clear, but I interpret that you want something like this:
facRand[] :=
({b, x} = Last#FactorInteger[RandomInteger[1*^12]]; Print[b])
Now every time facRand[] is called a new random integer is factored, global variables b and x are assigned, and the value of b is printed. This could also be done with Function:
Clear[facRand]
facRand =
({b, x} = Last#FactorInteger[RandomInteger[1*^12]]; Print[b]) &
This is also called with facRand[]. This form is standard, and allows addressing or passing the symbol facRand without triggering evaluation.

Using Context as a scoping construct in Mathematica

Thinking about a solution to my previous question about switching between numerical and analytical "modes" in a large Mathematica project, I thought about the idea of using Context as a scoping construct.
The basic idea is to make all numerical value assignments in their own context, e.g.
Begin["Global`Numerical`"]
par1 = 1;
par2 = 2;
...
End[]
and have all the complicated analytical functions, matrices, etc. in the Global context.
Ideally I would be able to work in the Global context and switch to everything being numeric with a simple Begin[Global'Numeric'] and switch back with End[].
Unfortunately this doen not work, since e.g. f[par1_,par2_,...] := foo defined in the Global context will not use par1, par2, etc which have been defined in a sub context of Global.
Is there a way to make sub contexts inherit definitions from their parent context? Is there some other way to use contexts to create a simple switchable scope?
Well, here's one way to get around (what I think) is your problem by adjusting $ContextPath appropriately:
SetOptions[EvaluationNotebook[], CellContext -> "GlobalTestCtxt`"];
Remove[f, GlobalTestCtxt`Numerical`f, par1, par2];
f[par1_, par2_] := {par1, par2};
savedContextPath = $ContextPath;
Begin["GlobalTestCtxt`Numerical`"];
Print[{$ContextPath, $Context}];
$ContextPath = DeleteCases[$ContextPath, "GlobalTestCtxt`"];
par1 = 1;
par2 = 2;
End[];
$ContextPath = savedContextPath;
Now, this will evaluate analytically:
f[par1, par2]
And this numerically:
savedContextPath = $ContextPath;
Begin["GlobalTestCtxt`Numerical`"];
$ContextPath = Prepend[$ContextPath, $Context];
f[par1, par2]
End[];
$ContextPath = savedContextPath;
The reason I say it's fragile is that unless you are careful, it's easy to get the symbol into the wrong context. For instance, suppose you forgot to evaluate f in the global context before evaluating the "numerical" block. Well, now your numerical block will not work simply because it'll turn to a (perfectly valid) symbol GlobalTestCtxt`Numerical`f, which you have inadvertently entered into the symbol table when you first evaluated the numerical block. Because of potential bugs like this, I personally don't use this approach.
Edit: fixed a bug (it is necessary to hide the "Global" context when doing assignments in numerical context)

Using function arguments as local variables

Something like this (yes, this doesn't deal with some edge cases - that's not the point):
int CountDigits(int num) {
int count = 1;
while (num >= 10) {
count++;
num /= 10;
}
return count;
}
What's your opinion about this? That is, using function arguments as local variables.
Both are placed on the stack, and pretty much identical performance wise, I'm wondering about the best-practices aspects of this.
I feel like an idiot when I add an additional and quite redundant line to that function consisting of int numCopy = num, however it does bug me.
What do you think? Should this be avoided?
As a general rule, I wouldn't use a function parameter as a local processing variable, i.e. I treat function parameters as read-only.
In my mind, intuitively understandabie code is paramount for maintainability, and modifying a function parameter to use as a local processing variable tends to run counter to that goal. I have come to expect that a parameter will have the same value in the middle and bottom of a method as it does at the top. Plus, an aptly-named local processing variable may improve understandability.
Still, as #Stewart says, this rule is more or less important depending on the length and complexity of the function. For short simple functions like the one you show, simply using the parameter itself may be easier to understand than introducing a new local variable (very subjective).
Nevertheless, if I were to write something as simple as countDigits(), I'd tend to use a remainingBalance local processing variable in lieu of modifying the num parameter as part of local processing - just seems clearer to me.
Sometimes, I will modify a local parameter at the beginning of a method to normalize the parameter:
void saveName(String name) {
name = (name != null ? name.trim() : "");
...
}
I rationalize that this is okay because:
a. it is easy to see at the top of the method,
b. the parameter maintains its the original conceptual intent, and
c. the parameter is stable for the rest of the method
Then again, half the time, I'm just as apt to use a local variable anyway, just to get a couple of extra finals in there (okay, that's a bad reason, but I like final):
void saveName(final String name) {
final String normalizedName = (name != null ? name.trim() : "");
...
}
If, 99% of the time, the code leaves function parameters unmodified (i.e. mutating parameters are unintuitive or unexpected for this code base) , then, during that other 1% of the time, dropping a quick comment about a mutating parameter at the top of a long/complex function could be a big boon to understandability:
int CountDigits(int num) {
// num is consumed
int count = 1;
while (num >= 10) {
count++;
num /= 10;
}
return count;
}
P.S. :-)
parameters vs arguments
http://en.wikipedia.org/wiki/Parameter_(computer_science)#Parameters_and_arguments
These two terms are sometimes loosely used interchangeably; in particular, "argument" is sometimes used in place of "parameter". Nevertheless, there is a difference. Properly, parameters appear in procedure definitions; arguments appear in procedure calls.
So,
int foo(int bar)
bar is a parameter.
int x = 5
int y = foo(x)
The value of x is the argument for the bar parameter.
It always feels a little funny to me when I do this, but that's not really a good reason to avoid it.
One reason you might potentially want to avoid it is for debugging purposes. Being able to tell the difference between "scratchpad" variables and the input to the function can be very useful when you're halfway through debugging.
I can't say it's something that comes up very often in my experience - and often you can find that it's worth introducing another variable just for the sake of having a different name, but if the code which is otherwise cleanest ends up changing the value of the variable, then so be it.
One situation where this can come up and be entirely reasonable is where you've got some value meaning "use the default" (typically a null reference in a language like Java or C#). In that case I think it's entirely reasonable to modify the value of the parameter to the "real" default value. This is particularly useful in C# 4 where you can have optional parameters, but the default value has to be a constant:
For example:
public static void WriteText(string file, string text, Encoding encoding = null)
{
// Null means "use the default" which we would document to be UTF-8
encoding = encoding ?? Encoding.UTF8;
// Rest of code here
}
About C and C++:
My opinion is that using the parameter as a local variable of the function is fine because it is a local variable already. Why then not use it as such?
I feel silly too when copying the parameter into a new local variable just to have a modifiable variable to work with.
But I think this is pretty much a personal opinion. Do it as you like. If you feel sill copying the parameter just because of this, it indicates your personality doesn't like it and then you shouldn't do it.
If I don't need a copy of the original value, I don't declare a new variable.
IMO I don't think mutating the parameter values is a bad practice in general,
it depends on how you're going to use it in your code.
My team coding standard recommends against this because it can get out of hand. To my mind for a function like the one you show, it doesn't hurt because everyone can see what is going on. The problem is that with time functions get longer, and they get bug fixes in them. As soon as a function is more than one screen full of code, this starts to get confusing which is why our coding standard bans it.
The compiler ought to be able to get rid of the redundant variable quite easily, so it has no efficiency impact. It is probably just between you and your code reviewer whether this is OK or not.
I would generally not change the parameter value within the function. If at some point later in the function you need to refer to the original value, you still have it. in your simple case, there is no problem, but if you add more code later, you may refer to 'num' without realizing it has been changed.
The code needs to be as self sufficient as possible. What I mean by that is you now have a dependency on what is being passed in as part of your algorithm. If another member of your team decides to change this to a pass by reference then you might have big problems.
The best practice is definitely to copy the inbound parameters if you expect them to be immutable.
I typically don't modify function parameters, unless they're pointers, in which case I might alter the value that's pointed to.
I think the best-practices of this varies by language. For example, in Perl you can localize any variable or even part of a variable to a local scope, so that changing it in that scope will not have any affect outside of it:
sub my_function
{
my ($arg1, $arg2) = #_; # get the local variables off the stack
local $arg1; # changing $arg1 here will not be visible outside this scope
$arg1++;
local $arg2->{key1}; # only the key1 portion of the hashref referenced by $arg2 is localized
$arg2->{key1}->{key2} = 'foo'; # this change is not visible outside the function
}
Occasionally I have been bitten by forgetting to localize a data structure that was passed by reference to a function, that I changed inside the function. Conversely, I have also returned a data structure as a function result that was shared among multiple systems and the caller then proceeded to change the data by mistake, affecting these other systems in a difficult-to-trace problem usually called action at a distance. The best thing to do here would be to make a clone of the data before returning it*, or make it read-only**.
* In Perl, see the function dclone() in the built-in Storable module.
** In Perl, see lock_hash() or lock_hash_ref() in the built-in Hash::Util module).

General programming - calling a non void method but not using value

This is general programming, but if it makes a difference, I'm using objective-c. Suppose there's a method that returns a value, and also performs some actions, but you don't care about the value it returns, only the stuff that it does. Would you just call the method as if it was void? Or place the result in a variable and then delete it or forget about it? State your opinion, what you would do if you had this situation.
A common example of this is printf, which returns an int... but you rarely see this:
int val = printf("Hello World");
Yeah just call the method as if it was void. You probably do it all the time without noticing it. The assignment operator '=' actually returns a value, but it's very rarely used.
It depends on the environment (the language, the tools, the coding standard, ...).
For example in C, it is perfectly possible to call a function without using its value. With some functions like printf, which returns an int, it is done all the time.
Sometimes not using a value will cause a warning, which is undesirable. Assigning the value to a variable and then not using it will just cause another warning about an unused variable. For this case the solution is to cast the result to void by prefixing the call with (void), e.g.
(void) my_function_returning_a_value_i_want_to_ignore().
There are two separate issues here, actually:
Should you care about returned value?
Should you assign it to a variable you're not going to use?
The answer to #2 is a resounding "NO" - unless, of course, you're working with a language where that would be illegal (early Turbo Pascal comes to mind). There's absolutely no point in defining a variable only to throw it away.
First part is not so easy. Generally, there is a reason value is returned - for idempotent functions the result is function's sole purpose; for non-idempotent it usually represents some sort of return code signifying whether operation was completed normally. There are exceptions, of course - like method chaining.
If this is common in .Net (for example), there's probably an issue with the code breaking CQS.
When I call a function that returns a value that I ignore, it's usually because I'm doing it in a test to verify behavior. Here's an example in C#:
[Fact]
public void StatService_should_call_StatValueRepository_for_GetPercentageValues()
{
var statValueRepository = new Mock<IStatValueRepository>();
new StatService(null, statValueRepository.Object).GetValuesOf<PercentageStatValue>();
statValueRepository.Verify(x => x.GetStatValues());
}
I don't really care about the return type, I just want to verify that a method was called on a fake object.
In C it is very common, but there are places where it is ok to do so and other places where it really isn't. Later versions of GCC have a function attribute so that you can get a warning when a function is used without checking the return value:
The warn_unused_result attribute causes a warning to be emitted if a caller of the function with this attribute does not use its return value. This is useful for functions where not checking the result is either a security problem or always a bug, such as realloc.
int fn () __attribute__ ((warn_unused_result));
int foo ()
{
if (fn () < 0) return -1;
fn ();
return 0;
}
results in warning on line 5.
Last time I used this there was no way of turning off the generated warning, which causes problems when you're compiling 3rd-party code you don't want to modify. Also, there is of course no way to check if the user actually does something sensible with the returned value.

Resources