Using Context as a scoping construct in Mathematica - wolfram-mathematica

Thinking about a solution to my previous question about switching between numerical and analytical "modes" in a large Mathematica project, I thought about the idea of using Context as a scoping construct.
The basic idea is to make all numerical value assignments in their own context, e.g.
Begin["Global`Numerical`"]
par1 = 1;
par2 = 2;
...
End[]
and have all the complicated analytical functions, matrices, etc. in the Global context.
Ideally I would be able to work in the Global context and switch to everything being numeric with a simple Begin[Global'Numeric'] and switch back with End[].
Unfortunately this doen not work, since e.g. f[par1_,par2_,...] := foo defined in the Global context will not use par1, par2, etc which have been defined in a sub context of Global.
Is there a way to make sub contexts inherit definitions from their parent context? Is there some other way to use contexts to create a simple switchable scope?

Well, here's one way to get around (what I think) is your problem by adjusting $ContextPath appropriately:
SetOptions[EvaluationNotebook[], CellContext -> "GlobalTestCtxt`"];
Remove[f, GlobalTestCtxt`Numerical`f, par1, par2];
f[par1_, par2_] := {par1, par2};
savedContextPath = $ContextPath;
Begin["GlobalTestCtxt`Numerical`"];
Print[{$ContextPath, $Context}];
$ContextPath = DeleteCases[$ContextPath, "GlobalTestCtxt`"];
par1 = 1;
par2 = 2;
End[];
$ContextPath = savedContextPath;
Now, this will evaluate analytically:
f[par1, par2]
And this numerically:
savedContextPath = $ContextPath;
Begin["GlobalTestCtxt`Numerical`"];
$ContextPath = Prepend[$ContextPath, $Context];
f[par1, par2]
End[];
$ContextPath = savedContextPath;
The reason I say it's fragile is that unless you are careful, it's easy to get the symbol into the wrong context. For instance, suppose you forgot to evaluate f in the global context before evaluating the "numerical" block. Well, now your numerical block will not work simply because it'll turn to a (perfectly valid) symbol GlobalTestCtxt`Numerical`f, which you have inadvertently entered into the symbol table when you first evaluated the numerical block. Because of potential bugs like this, I personally don't use this approach.
Edit: fixed a bug (it is necessary to hide the "Global" context when doing assignments in numerical context)

Related

What does auto always deduce the basic type mean here

I was reading this article which states
that
widget w = get_gadget(); -->a
auto w = get_gadget(); -->b
in statement 'a' a temporary is created then widget w is move constructed from the temporary. I totally understand that statement. What I don't understand is how statement 'b' using the auto variable is better than statement 'a'. It says
.... we could write the following which guarantees there is no
implicit conversion because auto always deduces the basic type
exactly:
// better, if you don't need an explicit type
auto w = get_gadget();
Could anyone please explain why statement 'b' is better than 'a' ?
It depends on what your criteria are for "better".
If the return type of get_gadget() is actually widget, and that will forever be true, then there is no difference whatsoever. However, real-world program designs change, and there may be a need to change the return type, or change properties of the type returned.
If the return type of get_gadget() is something (say gadget) that can be implicitly converted to widget, then the working of "a" is effectively
gadget temp = get_gadget();
widget w = widget(temp); // assuming widget has a constructor that accepts a gadget
or to
gadget temp = get_gadget();
widget w = (gadget)widget; // assuming gadget has an implicit conversion to widget
In either case, a temporary object is created and then some conversion is performed.
In comparison, the case "b" is deduced by the compiler as being
gadget w = get_gadget();
There is also potentially a maintenance advantage in using auto. Let's say that our return type from get_gadget() (gadget) is changed so it can no longer be converted to a widget. In that case, the case "a" will simply not compile. Whereas case "b" will compile, and the code which uses w will still work (assuming all operations on it are supported by the new return type).
While that last case could easily be fixed by changing to
gadget w = get_gadget();
even this can be broken again by changing the return type of get_gadget() to better_gadget.
The bottom line is that auto hands the work to the compiler of worrying about what type w needs to be.
Sutter explains it in the paragraph.
This works, assuming that gadget is implicitly convertible to widget, but creates a temporary object. That’s a potential performance pitfall, as the creation of the temporary object is not at all obvious from reading the call site alone in a code review. If we can use a gadget just as well as a widget in this calling code and so don’t explicitly need to commit to the widget type, we could write the following which guarantees there is no implicit conversion because auto always deduces the basic type exactly
Statement b creates a gadget directly. That bypasses the creation of the temporary and the conversion, which could be expensive operations depending on the class implementation. auto picks the exact type needed and directly creates w as that type. His caveat is important though: the code can't care that w is a gadget and not a widget.
For example, suppose that widget inherits from gadget. In statement b w would be a plain gadget, thus lacking the extra stuff that you'd get with a widget. When using statement b your code can't care that you got a gadget. If you're using statement a then you get the widget at the added expense of creating a temporary gadget and converting it to a widget.

Does defining variables within a loop matter?

This is a piece of code I am writing.
var cList:XMLList = xml.defines.c;
var className:String;
var properties:XMLList;
var property:XML;
var i:int,l:int;
var c:XML;
for each(c in cList)
{
className = String(c.#name);
if(cDict[className])
{
throw new Error('class name has been defined' + className);
}
if(className)
{
cDict[className] = c;
}
properties = c.property;
i = 0,
l = properties.length();
if(l)
{
propertyDict[className] = new Dictionary();
for(;i<l;i++)
{
// ...
}
}
}
As you can see, I defined all variables outside of loops. I am always worried, that if I defined them inside the loop, it might slow down the process speed, though I don't have proof - it's just a feeling.
I also don't like that the as3 grammar allows using a variable name before the defintion. So I always define vars at the very beginning of my functions.
Now I am worried these habits might backfire on me someday. Or is it just a matter of personal taste?
No it doesn't matters because the compiler use variable hoisting, so it means that that the compiler moves all variable declarations to the top of the function :
More explanation on variables:
http://help.adobe.com/en_US/ActionScript/3.0_ProgrammingAS3/WS5b3ccc516d4fbf351e63e3d118a9b90204-7f9d.html
AS3 IDEs allow you to use variable names before the declaration, because they know that the compiler uses a mechanism called "hoisting" to move all variable definitions to the top of a function, anyway. This happens without you noticing it, so that you can conveniently keep your code more readable. Therefore, it does not really make a difference if you manually move all the definitions to the top - unless you like your code to be structured in that way.
For the same reason, variable declaration within loops does not affect performance, unless you keep those loops in separate functions - only then will it result in actual allocation of a variable.
A lot of AS3 programmers, me including, would consider the way you have it now to be "the right" way, while putting variables inside any block to be the "wrong way". The speed does not matter in this situation, I'll try to present both side's arguments in as little biased way as I can.
Put variables definition as close to the code it is used in. The motivation is simple: if the variable is used somewhere, you might want to know where was it declared. This is useful if you aren't sure of type of the variable, or its modifiers. Normally, the declaration is also the place to put commentary.
Put variables at the place where they are actually declared. Even a seasoned ActionScript programmer may eventually confuse him- or herself by declaring variables inside blocks, where a seemingly uninitialized variable would suddenly contain some random value. The common case looks like this:
for (var i:int; i < x; i++) {
// this loop is entered exactly once, instead of `i' times
for (var j:int; j < y: j++) { ... }
}
There is also a long tradition, originating in the C89 standard (aka ANSI C), which didn't have block-scoped variables and would not allow variable definition inside a loop. This has been later altered so that variables were scoped to the block of code where they are declared. Many modern C-like languages, for example C# treat variables like that. So, in the example above, C# would re-initialize j every time the inner loop was entered.
Programmers with longer tradition of writing code in other C-like languages would be led to believe thus, once they see variables declared inside a block, that the variable is scoped to the block. The "hoisting" is thus thought of as counter-intuitive. Therefore error-prone.

Fixing Combinatorica redefinition of Element

My code relies on version of Element which works like MemberQ, but when I load Combinatorica, Element gets redefined to work like Part. What is the easiest way to fix this conflict? Specifically, what is the syntax to remove Combinatorica's definition from DownValues? Here's what I get for DownValues[Element]
{HoldPattern[
Combinatorica`Private`a_List \[Element] \
{Combinatorica`Private`index___}] :>
Combinatorica`Private`a[[Combinatorica`Private`index]],
HoldPattern[Private`x_ \[Element] Private`list_List] :>
MemberQ[Private`list, Private`x]}
If your goal is to prevent Combinatorica from installing the definition in the first place, you can achieve this result by loading the package for the first time thus:
Block[{Element}, Needs["Combinatorica`"]]
However, this will almost certainly make any Combinatorica features that depend upon the definition fail (which may or may not be of concern in your particular application).
You can do several things. Let us introduce a convenience function
ClearAll[redef];
SetAttributes[redef, HoldRest];
redef[f_, code_] := (Unprotect[f]; code; Protect[f])
If you are sure about the order of definitions, you can do something like
redef[Element, DownValues[Element] = Rest[DownValues[Element]]]
If you want to delete definitions based on the context, you can do something like this:
redef[Element, DownValues[Element] =
DeleteCases[DownValues[Element],
rule_ /; Cases[rule, x_Symbol /; (StringSplit[Context[x], "`"][[1]] ===
"Combinatorica"), Infinity, Heads -> True] =!= {}]]
You can also use a softer way - reorder definitions rather than delete:
redef[Element, DownValues[Element] = RotateRight[DownValues[Element]]]
There are many other ways of dealing with this problem. Another one (which I already recommended) is to use UpValues, if this is suitable. The last one I want to mention here is to make a kind of custom dynamic scoping construct based on Block, and wrap it around your code. I personally find it the safest variant, in case if you want strictly your definition to apply (because it does not care about the order in which various definitions could have been created - it removes all of them and adds just yours). It is also safer in that outside those places where you want your definitions to apply (by "places" I mean parts of the evaluation stack), other definitions will still apply, so this seems to be the least intrusive way. Here is how it may look:
elementDef[] := Element[x_, list_List] := MemberQ[list, x];
ClearAll[elemExec];
SetAttributes[elemExec, HoldAll];
elemExec[code_] := Block[{Element}, elementDef[]; code];
Example of use:
In[10]:= elemExec[Element[1,{1,2,3}]]
Out[10]= True
Edit:
If you need to automate the use of Block, here is an example package to show one way how this can be done:
BeginPackage["Test`"]
var;
f1;
f2;
Begin["`Private`"];
(* Implementations of your functions *)
var = 1;
f1[x_, y_List] := If[Element[x, y], x^2];
f2[x_, y_List] := If[Element[x, y], x^3];
elementDef[] := Element[x_, list_List] := MemberQ[list, x];
(* The following part of the package is defined at the start and you don't
touch it any more, when adding new functions to the package *)
mainContext = StringReplace[Context[], x__ ~~ "Private`" :> x];
SetAttributes[elemExec, HoldAll];
elemExec[code_] := Block[{Element}, elementDef[]; code];
postprocessDefs[context_String] :=
Map[
ToExpression[#, StandardForm,
Function[sym,DownValues[sym] =
DownValues[sym] /.
Verbatim[RuleDelayed][lhs_,rhs_] :> (lhs :> elemExec[rhs])]] &,
Select[Names[context <> "*"], ToExpression[#, StandardForm, DownValues] =!= {} &]];
postprocessDefs[mainContext];
End[]
EndPackage[]
You can load the package and look at the DownValues for f1 and f2, for example:
In[17]:= DownValues[f1]
Out[17]= {HoldPattern[f1[Test`Private`x_,Test`Private`y_List]]:>
Test`Private`elemExec[If[Test`Private`x\[Element]Test`Private`y,Test`Private`x^2]]}
The same scheme will also work for functions not in the same package. In fact, you could separate
the bottom part (code-processing package) to be a package on its own, import it into any other
package where you want to inject Block into your functions' definitions, and then just call something like postprocessDefs[mainContext], as above. You could make the function which makes definitions inside Block (elementDef here) to be an extra parameter to a generalized version of elemExec, which would make this approach more modular and reusable.
If you want to be more selective about the functions where you want to inject Block, this can also be done in various ways. In fact, the whole Block-injection scheme can be made cleaner then, but it will require slightly more care when implementing each function, while the above approach is completely automatic. I can post the code which will illustrate this, if needed.
One more thing: for the less intrusive nature of this method you pay a price - dynamic scope (Block) is usually harder to control than lexically-scoped constructs. So, you must know exactly the parts of evaluation stack where you want that to apply. For example, I would hesitate to inject Block into a definition of a higher order function, which takes some functions as parameters, since those functions may come from code that assumes other definitions (like for example Combinatorica` functions relying on overloaded Element). This is not a big problem, just requires care.
The bottom line of this seems to be: try to avoid overloading built-ins if at all possible. In this case you faced this definitions clash yourself, but it would be even worse if the one who faces this problem is a user of your package (may be yourself a few months later), who wants to combine your package with another one (which happens to overload same system functions as yours). Of course, it also depends on who will be the users of your package - only yourself or potentially others as well. But in terms of design, and in the long term, you may be better off assuming the latter scenario from the start.
To remove Combinatorica's definition, use Unset or the equivalent form =.. The pattern to unset you can grab from the Information output you show in the question:
Unprotect[Element];
Element[a_List, {index___}] =.
Protect[Element];
The worry would be, of course, that Combinatorica depends internally on this ill-conceived redefinition, but you have reason to believe this to not be the case as the Information output from the redefined Element says:
The use of the function
Element in Combinatorica is now
obsolete, though the function call
Element[a, p] still gives the pth
element of nested list a, where p is a
list of indices.
HTH
I propose an entirely different approach than removing Element from DownValues. Simply use the full name of the Element function.
So, if the original is
System`Element[]
the default is now
Combinatorica`Element[]
because of loading the Combinatorica Package.
Just explicitly use
System`Element[]
wherever you need it. Of course check that System is the correct Context using the Context function:
Context[Element]
This approach ensures several things:
The Combinatorica Package will still work in your notebook, even if the Combinatorica Package is updated in the future
You wont have to redefine the Element function, as some have suggested
You can use the Combinatorica`Element function when needed
The only downside is having to explicitly write it every time.

Caching of data in Mathematica

there is a very time-consuming operation which generates a dataset in my package. I would like to save this dataset and let the package rebuild it only when I manually delete the cached file. Here is my approach as part of the package:
myDataset = Module[{fname, data},
fname = "cached-data.mx";
If[FileExistsQ[fname],
Get[fname],
data = Evaluate[timeConsumingOperation[]];
Put[data, fname];
data]
];
timeConsumingOperation[]:=Module[{},
(* lot of work here *)
{"data"}
];
However, instead of writing the long data set to the file, the Put command only writes one line: "timeConsumingOperation[]", even if I wrap it with Evaluate as above. (To be true, this behaviour is not consistent, sometimes the dataset is written, sometimes not.)
How do you cache your data?
Another caching technique I use very often, especially when you might not want to insert the precomputed form in e.g. a package, is to memoize the expensive evaluation(s), such that it is computed on first use but then cached for subsequent evaluations. This is readily accomplished with SetDelayed and Set in concert:
f[arg1_, arg2_] := f[arg1, arg2] = someExpensiveThing[arg1, arg2]
Note that SetDelayed (:=) binds higher than Set (=), so the implied order of evaluation is the following, but you don't actually need the parens:
f[arg1_, arg2_] := ( f[arg1, arg2] = someExpensiveThing[arg1, arg2])
Thus, the first time you evaluate f[1,2], the evaluation-delayed RHS is evaluated, causing resulting value is computed and stored as an OwnValue of f[1,2] with Set.
#rcollyer is also right in that you don't need to use empty brackets if you have no arguments, you could just as easily write:
g := g = someExpensiveThing[...]
There's no harm in using them, though.
In the past, whenever I've had trouble with things evaluating it is usually when I have not correctly matched the pattern required by the function. For instance,
f[x_Integers]:= x
which won't match anything. Instead, I meant
f[x_Integer]:=x
In your case, though, you have no pattern to match: timeConsumingOperation[].
You're problem is more likely related to when timeConsumingOperation is defined relative to myDataset. In the code you've posted above, timeConsumingOperation is defined after myDataset. So, on the first run (or immediately after you've cleared the global variables) you would get exactly the result you're describing because timeConsumingOperation is not defined when the code for myDataset is run.
Now, SetDelayed (:=) automatically causes the variable to be recalculated whenever it is used, and since you do not require any parameters to be passed, the square brackets are not necessary. The important point here is that timeConsumingOperation can be declared, as written, prior to myDataset because SetDelayed will cause it not to be executed until it is used.
All told, your caching methodology looks exactly how I would go about it.

Using function arguments as local variables

Something like this (yes, this doesn't deal with some edge cases - that's not the point):
int CountDigits(int num) {
int count = 1;
while (num >= 10) {
count++;
num /= 10;
}
return count;
}
What's your opinion about this? That is, using function arguments as local variables.
Both are placed on the stack, and pretty much identical performance wise, I'm wondering about the best-practices aspects of this.
I feel like an idiot when I add an additional and quite redundant line to that function consisting of int numCopy = num, however it does bug me.
What do you think? Should this be avoided?
As a general rule, I wouldn't use a function parameter as a local processing variable, i.e. I treat function parameters as read-only.
In my mind, intuitively understandabie code is paramount for maintainability, and modifying a function parameter to use as a local processing variable tends to run counter to that goal. I have come to expect that a parameter will have the same value in the middle and bottom of a method as it does at the top. Plus, an aptly-named local processing variable may improve understandability.
Still, as #Stewart says, this rule is more or less important depending on the length and complexity of the function. For short simple functions like the one you show, simply using the parameter itself may be easier to understand than introducing a new local variable (very subjective).
Nevertheless, if I were to write something as simple as countDigits(), I'd tend to use a remainingBalance local processing variable in lieu of modifying the num parameter as part of local processing - just seems clearer to me.
Sometimes, I will modify a local parameter at the beginning of a method to normalize the parameter:
void saveName(String name) {
name = (name != null ? name.trim() : "");
...
}
I rationalize that this is okay because:
a. it is easy to see at the top of the method,
b. the parameter maintains its the original conceptual intent, and
c. the parameter is stable for the rest of the method
Then again, half the time, I'm just as apt to use a local variable anyway, just to get a couple of extra finals in there (okay, that's a bad reason, but I like final):
void saveName(final String name) {
final String normalizedName = (name != null ? name.trim() : "");
...
}
If, 99% of the time, the code leaves function parameters unmodified (i.e. mutating parameters are unintuitive or unexpected for this code base) , then, during that other 1% of the time, dropping a quick comment about a mutating parameter at the top of a long/complex function could be a big boon to understandability:
int CountDigits(int num) {
// num is consumed
int count = 1;
while (num >= 10) {
count++;
num /= 10;
}
return count;
}
P.S. :-)
parameters vs arguments
http://en.wikipedia.org/wiki/Parameter_(computer_science)#Parameters_and_arguments
These two terms are sometimes loosely used interchangeably; in particular, "argument" is sometimes used in place of "parameter". Nevertheless, there is a difference. Properly, parameters appear in procedure definitions; arguments appear in procedure calls.
So,
int foo(int bar)
bar is a parameter.
int x = 5
int y = foo(x)
The value of x is the argument for the bar parameter.
It always feels a little funny to me when I do this, but that's not really a good reason to avoid it.
One reason you might potentially want to avoid it is for debugging purposes. Being able to tell the difference between "scratchpad" variables and the input to the function can be very useful when you're halfway through debugging.
I can't say it's something that comes up very often in my experience - and often you can find that it's worth introducing another variable just for the sake of having a different name, but if the code which is otherwise cleanest ends up changing the value of the variable, then so be it.
One situation where this can come up and be entirely reasonable is where you've got some value meaning "use the default" (typically a null reference in a language like Java or C#). In that case I think it's entirely reasonable to modify the value of the parameter to the "real" default value. This is particularly useful in C# 4 where you can have optional parameters, but the default value has to be a constant:
For example:
public static void WriteText(string file, string text, Encoding encoding = null)
{
// Null means "use the default" which we would document to be UTF-8
encoding = encoding ?? Encoding.UTF8;
// Rest of code here
}
About C and C++:
My opinion is that using the parameter as a local variable of the function is fine because it is a local variable already. Why then not use it as such?
I feel silly too when copying the parameter into a new local variable just to have a modifiable variable to work with.
But I think this is pretty much a personal opinion. Do it as you like. If you feel sill copying the parameter just because of this, it indicates your personality doesn't like it and then you shouldn't do it.
If I don't need a copy of the original value, I don't declare a new variable.
IMO I don't think mutating the parameter values is a bad practice in general,
it depends on how you're going to use it in your code.
My team coding standard recommends against this because it can get out of hand. To my mind for a function like the one you show, it doesn't hurt because everyone can see what is going on. The problem is that with time functions get longer, and they get bug fixes in them. As soon as a function is more than one screen full of code, this starts to get confusing which is why our coding standard bans it.
The compiler ought to be able to get rid of the redundant variable quite easily, so it has no efficiency impact. It is probably just between you and your code reviewer whether this is OK or not.
I would generally not change the parameter value within the function. If at some point later in the function you need to refer to the original value, you still have it. in your simple case, there is no problem, but if you add more code later, you may refer to 'num' without realizing it has been changed.
The code needs to be as self sufficient as possible. What I mean by that is you now have a dependency on what is being passed in as part of your algorithm. If another member of your team decides to change this to a pass by reference then you might have big problems.
The best practice is definitely to copy the inbound parameters if you expect them to be immutable.
I typically don't modify function parameters, unless they're pointers, in which case I might alter the value that's pointed to.
I think the best-practices of this varies by language. For example, in Perl you can localize any variable or even part of a variable to a local scope, so that changing it in that scope will not have any affect outside of it:
sub my_function
{
my ($arg1, $arg2) = #_; # get the local variables off the stack
local $arg1; # changing $arg1 here will not be visible outside this scope
$arg1++;
local $arg2->{key1}; # only the key1 portion of the hashref referenced by $arg2 is localized
$arg2->{key1}->{key2} = 'foo'; # this change is not visible outside the function
}
Occasionally I have been bitten by forgetting to localize a data structure that was passed by reference to a function, that I changed inside the function. Conversely, I have also returned a data structure as a function result that was shared among multiple systems and the caller then proceeded to change the data by mistake, affecting these other systems in a difficult-to-trace problem usually called action at a distance. The best thing to do here would be to make a clone of the data before returning it*, or make it read-only**.
* In Perl, see the function dclone() in the built-in Storable module.
** In Perl, see lock_hash() or lock_hash_ref() in the built-in Hash::Util module).

Resources