Caching of data in Mathematica - caching

there is a very time-consuming operation which generates a dataset in my package. I would like to save this dataset and let the package rebuild it only when I manually delete the cached file. Here is my approach as part of the package:
myDataset = Module[{fname, data},
fname = "cached-data.mx";
If[FileExistsQ[fname],
Get[fname],
data = Evaluate[timeConsumingOperation[]];
Put[data, fname];
data]
];
timeConsumingOperation[]:=Module[{},
(* lot of work here *)
{"data"}
];
However, instead of writing the long data set to the file, the Put command only writes one line: "timeConsumingOperation[]", even if I wrap it with Evaluate as above. (To be true, this behaviour is not consistent, sometimes the dataset is written, sometimes not.)
How do you cache your data?

Another caching technique I use very often, especially when you might not want to insert the precomputed form in e.g. a package, is to memoize the expensive evaluation(s), such that it is computed on first use but then cached for subsequent evaluations. This is readily accomplished with SetDelayed and Set in concert:
f[arg1_, arg2_] := f[arg1, arg2] = someExpensiveThing[arg1, arg2]
Note that SetDelayed (:=) binds higher than Set (=), so the implied order of evaluation is the following, but you don't actually need the parens:
f[arg1_, arg2_] := ( f[arg1, arg2] = someExpensiveThing[arg1, arg2])
Thus, the first time you evaluate f[1,2], the evaluation-delayed RHS is evaluated, causing resulting value is computed and stored as an OwnValue of f[1,2] with Set.
#rcollyer is also right in that you don't need to use empty brackets if you have no arguments, you could just as easily write:
g := g = someExpensiveThing[...]
There's no harm in using them, though.

In the past, whenever I've had trouble with things evaluating it is usually when I have not correctly matched the pattern required by the function. For instance,
f[x_Integers]:= x
which won't match anything. Instead, I meant
f[x_Integer]:=x
In your case, though, you have no pattern to match: timeConsumingOperation[].
You're problem is more likely related to when timeConsumingOperation is defined relative to myDataset. In the code you've posted above, timeConsumingOperation is defined after myDataset. So, on the first run (or immediately after you've cleared the global variables) you would get exactly the result you're describing because timeConsumingOperation is not defined when the code for myDataset is run.
Now, SetDelayed (:=) automatically causes the variable to be recalculated whenever it is used, and since you do not require any parameters to be passed, the square brackets are not necessary. The important point here is that timeConsumingOperation can be declared, as written, prior to myDataset because SetDelayed will cause it not to be executed until it is used.
All told, your caching methodology looks exactly how I would go about it.

Related

Local declaration of (built-in) Lua functions to reduce overhead

It is often said that one should re-declare (certain) Lua functions locally, as this reduces the overhead.
But what is the exact rule / principle behind this? How do I know for which functions this should be done and for which it is superfluous? Or should it be done for EVERY function, even your own?
Unfortunately I can't figure it out from the Lua manual.
The principle is that every time you write table.insert for example, the Lua interpreter looks up the "insert" entry in the table called table. Actually, it means _ENV.table.insert - _ENV is where the "global variables" are in Lua 5.2+. Lua 5.1 has something similar but it's not called _ENV. The interpreter looks up the string "table" in _ENV and then looks up the string "insert" in that table. Two table lookups every time you call table.insert, before the function actually gets called.
But if you put it in a local variable then the interpreter gets the function directly from the local variable, which is faster. It still has to look it up, to fill in the local variable.
It is superfluous if you only call the function once within the scope of the local variable, but that is pretty rare. There is no reason to do it for functions which are already declared as local. It also makes the code harder to read, so typically you won't do it except when it actually matters (in code that runs a lot of times).
My favorit tool for speed up things in Lua is to place all the useable stuff for a table in a metatable called: __index
A common example for this is the datatype: string
It has all string functions in his __index metatable as methods.
Therefore you can do things like that directly on a string...
print(('istaqsinaayok'):upper():reverse())
-- Output: KOYAANISQATSI
The Logic above...
The lookup for a method in a string fails directly and therefore the __index metamethod will be looked up for that method.
I like to implement same behaviour for the datatype number...
-- do debug.setmetatable() only once for all further defined/used numbers
math.pi = debug.setmetatable(math.pi, {__index = math})
-- From now numbers are objects ;-)
-- Lets output Pi but not using Pi this time
print((180):rad()) -- Pi calcing with method rad()
-- Output: 3.1415926535898
The Logic: If not exists then lookup __index
Is only one step behind: local
...imho.
Another Example, that works with this method...
-- koysenv.lua
_G = setmetatable(_G,
{ -- Metamethods
__index = {}, -- Table constructor
__name = 'Global Environment'
})
-- Reference whats in _G into __index
for key, value in pairs(_G) do
getmetatable(_G)['__index'][key] = value
end
-- Remove all whats in __index now from _G
for key, value in pairs(getmetatable(_G)['__index']) do
_G[key] = nil
end
return _G
When started as a last require it move all in _G into fresh created metatable method __index.
After that _G looks totally empty ;-P
...but the environment is working like nothing happen.
To add to what #user253751 already said:
Code Quality
Lua is a very flexible language. Other languages require you to import the parts of the standard library you use; Lua doesn't. Lua usually provides one global environment not to be polluted. If you play with the environment _ENV (setfenv/getfenv on Lua 5.1 / LuaJIT), you'll want to be able to still access Lua libraries. For that purpose you may to localize them before changing the environment; you can then use your "clean" environment for your module / API table / class / whatever. Another option here is to use metatables; metatable chains may quickly get hairy though and are likely to harm performance, as a failed table lookup is required each time to trigger indexing metamethods. localizing otherwise global variables can thus be seen as a way of importing them; to give a minimal & rough example:
local print = print -- localize ("import") everything we need first
_ENV = {} -- set environment to clean table for module
function hello() -- this writes to _ENV instead of _G
print("Hello World!")
end
hello() -- inside the environment, all variables set here are accessible
return _ENV -- "export" the API table
Performance
Very minor nitpick: Local variables aren't strictly always faster. In very extreme cases (i.e. lots of upvalues), indexing a table (which doesn't need an upvalue if it's the environment, the string metatable or the like) may actually be faster.
I imagine that localizing variables is required for many optimizations of optimizing compilers such as LuaJIT to be applicable though; otherwise Lua makes very little code. A global like print might as well be overwritten somewhere in a deep code path - thus the indexing operation has to be repeated every time; for a local on the other hand, the interpreter will has way more guarantees regarding its scope. It is thus able to detect constants that are only written to once, on initialization for instance; for globals very little code analysis is possible.

Multiple statements in mathematica function

I wanted to know, how to evaluate multiple statements in a function in Mathematica.
E.g.
f[x_]:=x=x+5 and then return x^2
I know this much can be modified as (x+5)^2 but originally I wanted to read data from the file in the function and print the result after doing some data manipulation.
If you want to group several commands and output the last use the semicolon (;) between them, like
f[y_]:=(x=y+5;x^2)
Just don't use a ; for the last statement.
If your set of commands grows bigger you might want to use scoping structures like Module or Block.
You are looking for CompoundExpression (short form ;):
f[x_]:= (thing = x+5 ; thing^2)
The parentheses are necessary due to the very low precedence of ;.
As Szabolcs called me on, you cannot write:
f[x_]:= (x = x+5 ; x^2)
See this answer for an explanation and alternatives.
Leonid, who you should listen to, says that thing should be localized. I didn't do this above because I wanted to emphasize CompoundExpression as a specific fit for your "and then" construct. As it is written, this will affect the global value of thing which may or may not be what you actually want to do. If it is not, see both the answer linked above, and also:
Mathematica Module versus With or Block - Guideline, rule of thumb for usage?
Several people have mentioned already that you can use CompoundExpression:
f[x_] := (y=x+5; y^2)
However, if you use the same variable x in the expression as in the argument,
f[x_] := (x=x+5; x^2)
then you'll get errors when evaluating the function with a number. This is because := essentially defines a replacement of the pattern variables from the lhs, i.e. f[1] evaluates to the (incorrect) (1 = 1+5; 1^2).
So, as Sjoerd said, use Module (or Block sometimes, but this one has caveats!) to localize a function-variable:
f[x_] := Module[{y}, y=x+5; y^2]
Finally, if you need a function that modified its arguments, then you can set the attribute HoldAll:
Clear[addFive]
SetAttributes[addFive, HoldAll]
addFive[x_] := (x=x+5)
Then use it as
a = 3;
addFive[a]
a

Unwanted evaluation in assignments in Mathematica: why it happens and how to debug it during the package-loading?

I am developing a (large) package which does not load properly anymore.
This happened after I changed a single line of code.
When I attempt to load the package (with Needs), the package starts loading and then one of the setdelayed definitions “comes alive” (ie. Is somehow evaluated), gets trapped in an error trapping routine loaded a few lines before and the package loading aborts.
The error trapping routine with abort is doing its job, except that it should not have been called in the first place, during the package loading phase.
The error message reveals that the wrong argument is in fact a pattern expression which I use on the lhs of a setdelayed definition a few lines later.
Something like this:
……Some code lines
Changed line of code
g[x_?NotGoodQ]:=(Message[g::nogood, x];Abort[])
……..some other code lines
g/: cccQ[g[x0_]]:=True
When I attempt to load the package, I get:
g::nogood: Argument x0_ is not good
As you see the passed argument is a pattern and it can only come from the code line above.
I tried to find the reason for this behavior, but I have been unsuccessful so far.
So I decided to use the powerful Workbench debugging tools .
I would like to see step by step (or with breakpoints) what happens when I load the package.
I am not yet too familiar with WB, but it seems that ,using Debug as…, the package is first loaded and then eventually debugged with breakpoints, ect.
My problem is that the package does not even load completely! And any breakpoint set before loading the package does not seem to be effective.
So…2 questions:
can anybody please explain why these code lines "come alive" during package loading? (there are no obvious syntax errors or code fragments left in the package as far as I can see)
can anybody please explain how (if) is possible to examine/debug
package code while being loaded in WB?
Thank you for any help.
Edit
In light of Leonid's answer and using his EvenQ example:
We can avoid using Holdpattern simply by definying upvalues for g BEFORE downvalues for g
notGoodQ[x_] := EvenQ[x];
Clear[g];
g /: cccQ[g[x0_]] := True
g[x_?notGoodQ] := (Message[g::nogood, x]; Abort[])
Now
?g
Global`g
cccQ[g[x0_]]^:=True
g[x_?notGoodQ]:=(Message[g::nogood,x];Abort[])
In[6]:= cccQ[g[1]]
Out[6]= True
while
In[7]:= cccQ[g[2]]
During evaluation of In[7]:= g::nogood: -- Message text not found -- (2)
Out[7]= $Aborted
So...general rule:
When writing a function g, first define upvalues for g, then define downvalues for g, otherwise use Holdpattern
Can you subscribe to this rule?
Leonid says that using Holdpattern might indicate improvable design. Besides the solution indicated above, how could one improve the design of the little code above or, better, in general when dealing with upvalues?
Thank you for your help
Leaving aside the WB (which is not really needed to answer your question) - the problem seems to have a straightforward answer based only on how expressions are evaluated during assignments. Here is an example:
In[1505]:=
notGoodQ[x_]:=True;
Clear[g];
g[x_?notGoodQ]:=(Message[g::nogood,x];Abort[])
In[1509]:= g/:cccQ[g[x0_]]:=True
During evaluation of In[1509]:= g::nogood: -- Message text not found -- (x0_)
Out[1509]= $Aborted
To make it work, I deliberately made a definition for notGoodQ to always return True. Now, why was g[x0_] evaluated during the assignment through TagSetDelayed? The answer is that, while TagSetDelayed (as well as SetDelayed) in an assignment h/:f[h[elem1,...,elemn]]:=... does not apply any rules that f may have, it will evaluate h[elem1,...,elem2], as well as f. Here is an example:
In[1513]:=
ClearAll[h,f];
h[___]:=Print["Evaluated"];
In[1515]:= h/:f[h[1,2]]:=3
During evaluation of In[1515]:= Evaluated
During evaluation of In[1515]:= TagSetDelayed::tagnf: Tag h not found in f[Null]. >>
Out[1515]= $Failed
The fact that TagSetDelayed is HoldAll does not mean that it does not evaluate its arguments - it only means that the arguments arrive to it unevaluated, and whether or not they will be evaluated depends on the semantics of TagSetDelayed (which I briefly described above). The same holds for SetDelayed, so the commonly used statement that it "does not evaluate its arguments" is not literally correct. A more correct statement is that it receives the arguments unevaluated and does evaluate them in a special way - not evaluate the r.h.s, while for l.h.s., evaluate head and elements but not apply rules for the head. To avoid that, you may wrap things in HoldPattern, like this:
Clear[g,notGoodQ];
notGoodQ[x_]:=EvenQ[x];
g[x_?notGoodQ]:=(Message[g::nogood,x];Abort[])
g/:cccQ[HoldPattern[g[x0_]]]:=True;
This goes through. Here is some usage:
In[1527]:= cccQ[g[1]]
Out[1527]= True
In[1528]:= cccQ[g[2]]
During evaluation of In[1528]:= g::nogood: -- Message text not found -- (2)
Out[1528]= $Aborted
Note however that the need for HoldPattern inside your left-hand side when making a definition is often a sign that the expression inside your head may also evaluate during the function call, which may break your code. Here is an example of what I mean:
In[1532]:=
ClearAll[f,h];
f[x_]:=x^2;
f/:h[HoldPattern[f[y_]]]:=y^4;
This code attempts to catch cases like h[f[something]], but it will obviously fail since f[something] will evaluate before the evaluation comes to h:
In[1535]:= h[f[5]]
Out[1535]= h[25]
For me, the need for HoldPattern on the l.h.s. is a sign that I need to reconsider my design.
EDIT
Regarding debugging during loading in WB, one thing you can do (IIRC, can not check right now) is to use good old print statements, the output of which will appear in the WB's console. Personally, I rarely feel a need for debugger for this purpose (debugging package when loading)
EDIT 2
In response to the edit in the question:
Regarding the order of definitions: yes, you can do this, and it solves this particular problem. But, generally, this isn't robust, and I would not consider it a good general method. It is hard to give a definite advice for a case at hand, since it is a bit out of its context, but it seems to me that the use of UpValues here is unjustified. If this is done for error - handling, there are other ways to do it without using UpValues.
Generally, UpValues are used most commonly to overload some function in a safe way, without adding any rule to the function being overloaded. One advice is to avoid associating UpValues with heads which also have DownValues and may evaluate -by doing this you start playing a game with evaluator, and will eventually lose. The safest is to attach UpValues to inert symbols (heads, containers), which often represent a "type" of objects on which you want to overload a given function.
Regarding my comment on the presence of HoldPattern indicating a bad design. There certainly are legitimate uses for HoldPattern, such as this (somewhat artificial) one:
In[25]:=
Clear[ff,a,b,c];
ff[HoldPattern[Plus[x__]]]:={x};
ff[a+b+c]
Out[27]= {a,b,c}
Here it is justified because in many cases Plus remains unevaluated, and is useful in its unevaluated form - since one can deduce that it represents a sum. We need HoldPattern here because of the way Plus is defined on a single argument, and because a pattern happens to be a single argument (even though it describes generally multiple arguments) during the definition. So, we use HoldPattern here to prevent treating the pattern as normal argument, but this is mostly different from the intended use cases for Plus. Whenever this is the case (we are sure that the definition will work all right for intended use cases), HoldPattern is fine. Note b.t.w., that this example is also fragile:
In[28]:= ff[Plus[a]]
Out[28]= ff[a]
The reason why it is still mostly OK is that normally we don't use Plus on a single argument.
But, there is a second group of cases, where the structure of usually supplied arguments is the same as the structure of patterns used for the definition. In this case, pattern evaluation during the assignment indicates that the same evaluation will happen with actual arguments during the function calls. Your usage falls into this category. My comment for a design flaw was for such cases - you can prevent the pattern from evaluating, but you will have to prevent the arguments from evaluating as well, to make this work. And pattern-matching against not completely evaluated expression is fragile. Also, the function should never assume some extra conditions (beyond what it can type-check) for the arguments.

Fixing Combinatorica redefinition of Element

My code relies on version of Element which works like MemberQ, but when I load Combinatorica, Element gets redefined to work like Part. What is the easiest way to fix this conflict? Specifically, what is the syntax to remove Combinatorica's definition from DownValues? Here's what I get for DownValues[Element]
{HoldPattern[
Combinatorica`Private`a_List \[Element] \
{Combinatorica`Private`index___}] :>
Combinatorica`Private`a[[Combinatorica`Private`index]],
HoldPattern[Private`x_ \[Element] Private`list_List] :>
MemberQ[Private`list, Private`x]}
If your goal is to prevent Combinatorica from installing the definition in the first place, you can achieve this result by loading the package for the first time thus:
Block[{Element}, Needs["Combinatorica`"]]
However, this will almost certainly make any Combinatorica features that depend upon the definition fail (which may or may not be of concern in your particular application).
You can do several things. Let us introduce a convenience function
ClearAll[redef];
SetAttributes[redef, HoldRest];
redef[f_, code_] := (Unprotect[f]; code; Protect[f])
If you are sure about the order of definitions, you can do something like
redef[Element, DownValues[Element] = Rest[DownValues[Element]]]
If you want to delete definitions based on the context, you can do something like this:
redef[Element, DownValues[Element] =
DeleteCases[DownValues[Element],
rule_ /; Cases[rule, x_Symbol /; (StringSplit[Context[x], "`"][[1]] ===
"Combinatorica"), Infinity, Heads -> True] =!= {}]]
You can also use a softer way - reorder definitions rather than delete:
redef[Element, DownValues[Element] = RotateRight[DownValues[Element]]]
There are many other ways of dealing with this problem. Another one (which I already recommended) is to use UpValues, if this is suitable. The last one I want to mention here is to make a kind of custom dynamic scoping construct based on Block, and wrap it around your code. I personally find it the safest variant, in case if you want strictly your definition to apply (because it does not care about the order in which various definitions could have been created - it removes all of them and adds just yours). It is also safer in that outside those places where you want your definitions to apply (by "places" I mean parts of the evaluation stack), other definitions will still apply, so this seems to be the least intrusive way. Here is how it may look:
elementDef[] := Element[x_, list_List] := MemberQ[list, x];
ClearAll[elemExec];
SetAttributes[elemExec, HoldAll];
elemExec[code_] := Block[{Element}, elementDef[]; code];
Example of use:
In[10]:= elemExec[Element[1,{1,2,3}]]
Out[10]= True
Edit:
If you need to automate the use of Block, here is an example package to show one way how this can be done:
BeginPackage["Test`"]
var;
f1;
f2;
Begin["`Private`"];
(* Implementations of your functions *)
var = 1;
f1[x_, y_List] := If[Element[x, y], x^2];
f2[x_, y_List] := If[Element[x, y], x^3];
elementDef[] := Element[x_, list_List] := MemberQ[list, x];
(* The following part of the package is defined at the start and you don't
touch it any more, when adding new functions to the package *)
mainContext = StringReplace[Context[], x__ ~~ "Private`" :> x];
SetAttributes[elemExec, HoldAll];
elemExec[code_] := Block[{Element}, elementDef[]; code];
postprocessDefs[context_String] :=
Map[
ToExpression[#, StandardForm,
Function[sym,DownValues[sym] =
DownValues[sym] /.
Verbatim[RuleDelayed][lhs_,rhs_] :> (lhs :> elemExec[rhs])]] &,
Select[Names[context <> "*"], ToExpression[#, StandardForm, DownValues] =!= {} &]];
postprocessDefs[mainContext];
End[]
EndPackage[]
You can load the package and look at the DownValues for f1 and f2, for example:
In[17]:= DownValues[f1]
Out[17]= {HoldPattern[f1[Test`Private`x_,Test`Private`y_List]]:>
Test`Private`elemExec[If[Test`Private`x\[Element]Test`Private`y,Test`Private`x^2]]}
The same scheme will also work for functions not in the same package. In fact, you could separate
the bottom part (code-processing package) to be a package on its own, import it into any other
package where you want to inject Block into your functions' definitions, and then just call something like postprocessDefs[mainContext], as above. You could make the function which makes definitions inside Block (elementDef here) to be an extra parameter to a generalized version of elemExec, which would make this approach more modular and reusable.
If you want to be more selective about the functions where you want to inject Block, this can also be done in various ways. In fact, the whole Block-injection scheme can be made cleaner then, but it will require slightly more care when implementing each function, while the above approach is completely automatic. I can post the code which will illustrate this, if needed.
One more thing: for the less intrusive nature of this method you pay a price - dynamic scope (Block) is usually harder to control than lexically-scoped constructs. So, you must know exactly the parts of evaluation stack where you want that to apply. For example, I would hesitate to inject Block into a definition of a higher order function, which takes some functions as parameters, since those functions may come from code that assumes other definitions (like for example Combinatorica` functions relying on overloaded Element). This is not a big problem, just requires care.
The bottom line of this seems to be: try to avoid overloading built-ins if at all possible. In this case you faced this definitions clash yourself, but it would be even worse if the one who faces this problem is a user of your package (may be yourself a few months later), who wants to combine your package with another one (which happens to overload same system functions as yours). Of course, it also depends on who will be the users of your package - only yourself or potentially others as well. But in terms of design, and in the long term, you may be better off assuming the latter scenario from the start.
To remove Combinatorica's definition, use Unset or the equivalent form =.. The pattern to unset you can grab from the Information output you show in the question:
Unprotect[Element];
Element[a_List, {index___}] =.
Protect[Element];
The worry would be, of course, that Combinatorica depends internally on this ill-conceived redefinition, but you have reason to believe this to not be the case as the Information output from the redefined Element says:
The use of the function
Element in Combinatorica is now
obsolete, though the function call
Element[a, p] still gives the pth
element of nested list a, where p is a
list of indices.
HTH
I propose an entirely different approach than removing Element from DownValues. Simply use the full name of the Element function.
So, if the original is
System`Element[]
the default is now
Combinatorica`Element[]
because of loading the Combinatorica Package.
Just explicitly use
System`Element[]
wherever you need it. Of course check that System is the correct Context using the Context function:
Context[Element]
This approach ensures several things:
The Combinatorica Package will still work in your notebook, even if the Combinatorica Package is updated in the future
You wont have to redefine the Element function, as some have suggested
You can use the Combinatorica`Element function when needed
The only downside is having to explicitly write it every time.

Using function arguments as local variables

Something like this (yes, this doesn't deal with some edge cases - that's not the point):
int CountDigits(int num) {
int count = 1;
while (num >= 10) {
count++;
num /= 10;
}
return count;
}
What's your opinion about this? That is, using function arguments as local variables.
Both are placed on the stack, and pretty much identical performance wise, I'm wondering about the best-practices aspects of this.
I feel like an idiot when I add an additional and quite redundant line to that function consisting of int numCopy = num, however it does bug me.
What do you think? Should this be avoided?
As a general rule, I wouldn't use a function parameter as a local processing variable, i.e. I treat function parameters as read-only.
In my mind, intuitively understandabie code is paramount for maintainability, and modifying a function parameter to use as a local processing variable tends to run counter to that goal. I have come to expect that a parameter will have the same value in the middle and bottom of a method as it does at the top. Plus, an aptly-named local processing variable may improve understandability.
Still, as #Stewart says, this rule is more or less important depending on the length and complexity of the function. For short simple functions like the one you show, simply using the parameter itself may be easier to understand than introducing a new local variable (very subjective).
Nevertheless, if I were to write something as simple as countDigits(), I'd tend to use a remainingBalance local processing variable in lieu of modifying the num parameter as part of local processing - just seems clearer to me.
Sometimes, I will modify a local parameter at the beginning of a method to normalize the parameter:
void saveName(String name) {
name = (name != null ? name.trim() : "");
...
}
I rationalize that this is okay because:
a. it is easy to see at the top of the method,
b. the parameter maintains its the original conceptual intent, and
c. the parameter is stable for the rest of the method
Then again, half the time, I'm just as apt to use a local variable anyway, just to get a couple of extra finals in there (okay, that's a bad reason, but I like final):
void saveName(final String name) {
final String normalizedName = (name != null ? name.trim() : "");
...
}
If, 99% of the time, the code leaves function parameters unmodified (i.e. mutating parameters are unintuitive or unexpected for this code base) , then, during that other 1% of the time, dropping a quick comment about a mutating parameter at the top of a long/complex function could be a big boon to understandability:
int CountDigits(int num) {
// num is consumed
int count = 1;
while (num >= 10) {
count++;
num /= 10;
}
return count;
}
P.S. :-)
parameters vs arguments
http://en.wikipedia.org/wiki/Parameter_(computer_science)#Parameters_and_arguments
These two terms are sometimes loosely used interchangeably; in particular, "argument" is sometimes used in place of "parameter". Nevertheless, there is a difference. Properly, parameters appear in procedure definitions; arguments appear in procedure calls.
So,
int foo(int bar)
bar is a parameter.
int x = 5
int y = foo(x)
The value of x is the argument for the bar parameter.
It always feels a little funny to me when I do this, but that's not really a good reason to avoid it.
One reason you might potentially want to avoid it is for debugging purposes. Being able to tell the difference between "scratchpad" variables and the input to the function can be very useful when you're halfway through debugging.
I can't say it's something that comes up very often in my experience - and often you can find that it's worth introducing another variable just for the sake of having a different name, but if the code which is otherwise cleanest ends up changing the value of the variable, then so be it.
One situation where this can come up and be entirely reasonable is where you've got some value meaning "use the default" (typically a null reference in a language like Java or C#). In that case I think it's entirely reasonable to modify the value of the parameter to the "real" default value. This is particularly useful in C# 4 where you can have optional parameters, but the default value has to be a constant:
For example:
public static void WriteText(string file, string text, Encoding encoding = null)
{
// Null means "use the default" which we would document to be UTF-8
encoding = encoding ?? Encoding.UTF8;
// Rest of code here
}
About C and C++:
My opinion is that using the parameter as a local variable of the function is fine because it is a local variable already. Why then not use it as such?
I feel silly too when copying the parameter into a new local variable just to have a modifiable variable to work with.
But I think this is pretty much a personal opinion. Do it as you like. If you feel sill copying the parameter just because of this, it indicates your personality doesn't like it and then you shouldn't do it.
If I don't need a copy of the original value, I don't declare a new variable.
IMO I don't think mutating the parameter values is a bad practice in general,
it depends on how you're going to use it in your code.
My team coding standard recommends against this because it can get out of hand. To my mind for a function like the one you show, it doesn't hurt because everyone can see what is going on. The problem is that with time functions get longer, and they get bug fixes in them. As soon as a function is more than one screen full of code, this starts to get confusing which is why our coding standard bans it.
The compiler ought to be able to get rid of the redundant variable quite easily, so it has no efficiency impact. It is probably just between you and your code reviewer whether this is OK or not.
I would generally not change the parameter value within the function. If at some point later in the function you need to refer to the original value, you still have it. in your simple case, there is no problem, but if you add more code later, you may refer to 'num' without realizing it has been changed.
The code needs to be as self sufficient as possible. What I mean by that is you now have a dependency on what is being passed in as part of your algorithm. If another member of your team decides to change this to a pass by reference then you might have big problems.
The best practice is definitely to copy the inbound parameters if you expect them to be immutable.
I typically don't modify function parameters, unless they're pointers, in which case I might alter the value that's pointed to.
I think the best-practices of this varies by language. For example, in Perl you can localize any variable or even part of a variable to a local scope, so that changing it in that scope will not have any affect outside of it:
sub my_function
{
my ($arg1, $arg2) = #_; # get the local variables off the stack
local $arg1; # changing $arg1 here will not be visible outside this scope
$arg1++;
local $arg2->{key1}; # only the key1 portion of the hashref referenced by $arg2 is localized
$arg2->{key1}->{key2} = 'foo'; # this change is not visible outside the function
}
Occasionally I have been bitten by forgetting to localize a data structure that was passed by reference to a function, that I changed inside the function. Conversely, I have also returned a data structure as a function result that was shared among multiple systems and the caller then proceeded to change the data by mistake, affecting these other systems in a difficult-to-trace problem usually called action at a distance. The best thing to do here would be to make a clone of the data before returning it*, or make it read-only**.
* In Perl, see the function dclone() in the built-in Storable module.
** In Perl, see lock_hash() or lock_hash_ref() in the built-in Hash::Util module).

Resources