Using a struct to pass parameters by reference - wolfram-mathematica

This question is on the subject of passing by reference in M (a related question of mine is here simple question on passing data between functions)
While I was trying to find a way to pass things by reference without using Unevaluted[] or HoldFirst[] , I hit on this method by mistake and it looks really working well for me, even though I do not understand how it works and any hidden risks of using it. So I'd like to show it here to the experts and ask if they think it is safe to use (I have very large demo and needed to package parameters into number of different structs to help manage them, and this is how I found this method while I was trying things).
Here is the method: First we know that one can not write the following:
Remove[p]
foo[p_] := Module[{u},
u = Table[99, {10}];
p = u
];
p = 0;
foo[p];
One way to to update 'p' in the above is to change to call to become
foo[Unevaluated#p];
Or by defining foo[] with HoldFirst.
But here is the way I found, which does the pass by reference, without either of these:
I put all the parameters in a struct (I do this anyway now), and pass the struct, and then one can update the fields of the struct inside foo[] and the updates will be reflected in the way back from the function call:
Remove[parms]
foo[parms_] := Module[{u},
u = Table[99, {10}];
parms["p"] = u
];
parms["p"] = 0;
foo[parms];
Now, parms["p"] contained the new list {99, 99, 99, 99, 99, 99, 99, 99, 99, 99}
So, the parms was overwritten/updated inside foo[] without me having to tell M to pass parms by reference !
I tried this in my program, and I see no strange side effects. The CDF was updated ok with no error. I do not know how this works, may be M treats all the fields inside parms in this example as global?
But either case, I am happy with this so far as it provides me a way to package my many parameters into structs, and at the same time I am able to do the updated inside the functions to the struct.
But my question is: do you see a major problems with this method? How does it work internally? I mean how does M handle this passing without me doing HoldFirst or Unevaluated? I know that I lost the ability now to do parameters checking as before, but I can't have everything I want. As I said before, M needs a real build-in struct as part of the language and integrated into it. But this is for another time to talk about.
Btw, the best struct emulation I've see so far was by this one by Leonid Shifrin posted at the end of this thread here but unfortunately I could not use it in my demo as it uses symbols not allowed in a demo CDF.
thanks
Update:
Btw, this below is what I think how M handles this. This is just a guess of mine: I think param is some kind of a lookup table, and its fields are just used as an index into it (may be a hash table?) Then param["p1"] will contain in it the address (not the value) of a location in the heap of where the actual value of param["p1"] live.
Something like this:
So, when passing param, then inside the function foo[], when typing param["p1"]=u it will cause the current memory pointed to by "p1" to be freed up, and then a new memory allocated from the heap, and in it the value of u is copied to.
And so, on return back, this is why we see the content of param["p1"] has changed. But what actually changed, is the content of the memory being pointed to by the address which param["p1"] represent. (there is a name, which is "p1", and an address field, which points to the content that the name "p1" represents)
But then this means, that the address itself, which "p1" represents has changed, but the lookup name itself (which is "p1") did not change.
So, as the name itself did not change, there was no need to use HoldFirst, even though the data being pointed to by what this name represent has been modified?
Update:
There is one small glitch in this method, but it is not a big deal to work around it: when passing things using this indexed object method, it turns out that one can't modify part of the object. For example, this below does not work:
foo[param_] := Module[{},
param["u"][[3]] = 99 (*trying to update PART of u *)
];
param["u"] = Table[0, {5}];
foo[param];
The above gives the error
Set::setps: "param[u] in the part assignment is not a symbol"
But the workaround is easy. make a local copy of the whole field that one wants to update part of it, then update (part) of the local copy, then write the copy back to the field, like this
foo[param_] := Module[{u = param["u"]}, (* copy the whole field *)
u[[3]] = 99; (*update local copy *)
param["u"] = u (*now update the field, ok *)
];
param["u"] = Table[0, {5}];
foo[param];
Well. It would have been better if one can update part of the field, so no 'special' handling would be needed. But at least the work around is not that bad.
Update
For completeness, I thought I mention another tiny thing about using indexed objects and a work around.
I wrote
param[u] = {1, 2, 3}
param[u][[1 ;; -1]]
Which returns {1,2,3} as expected.
Then I found that I can use param#u instead of param[u], which really helped as too many solid brackets are starting to give me a headache.
But then when I typed
param#u[[1 ;; -1]]
expecting to get back the same answer as before, it did not. one gets an error, (I think I know why, operator precedence issue, but that is not the point now), just wanted to say that the workaround is easy, either one can use param[u][[1 ;; -1]] or use (param#u)[[1 ;; -1]]
I like the (param#u)[[1 ;; -1]] more, so that is what I am using now to access lists inside indexed objects.
The notation param#u is as close I can get to the standard record notation which is param.u so I am now happy now with indexed objects, and it seems to be working really well. Just couple of minor things to watch out for.

I am not going to respond to the question of low level data structures as I am simply not qualified to do so.
You are creating "indexed objects" and using immutable strings as indices. Generally speaking I believe that these are similar to hash tables.
Taking your code example, let us remove a possible ambiguity by using a unique name for the argument of foo:
Remove[parms, foo]
foo[thing_] :=
Module[{u},
u = Table[99, {10}];
thing["p"] = u
];
parms["p"] = 0;
foo[parms];
parms["p"]
DownValues[parms]
{HoldPattern[parms["p"]] :> {99, 99, 99, 99, 99, 99, 99, 99, 99, 99}}
This shows how the data is stored in terms of high level Mathematica structures.
parms is a global symbol. As you have observed, it can be safely "passed" without anything happening, because it is just a symbol without OwnValues and nothing is triggered when it is evaluated. This is exactly the same as writing/evaluating foo by itself.
The replacement that takes place in foo[thing_] := ... is analogous to With. If we write:
With[{thing = parms}, thing[x]]
thing in thing[x] is replaced with parms before thing[x] evaluates. Likewise, in the code above we get parms["p"] = u before thing["p"] or the Set evaluates. At that time, the HoldFirst attribute of Set takes over, and you get what you want.
Since you are using an immutable string as your index, it is not in danger of changing. As long as you are aware of how to handle non-localized symbols such as parms then there is also no new danger that I see in using this method.

Related

Accidental shadowing and `Removed[symbol]`

If you evaluate the following code twice, results will be different. Can anyone explain what's going on?
findHull[points_] := Module[{},
Needs["ComputationalGeometry`"];
ConvexHull[points]
];
findHull[RandomReal[1, {10, 2}]];
Remove["Global`ConvexHull"];
findHull[RandomReal[1, {10, 2}]]
The problem is that even though the module is not evaluated untill you call findHull, the symbols are resolved when you define findHull (i.e.: The new downvalue for findHull is stored in terms of symbols, not text).
This means that during the first round, ConvexHull resolves to Global`ConvexHull because the Needs is not evaluated.
During the second round, ComputationalGeometry is on $ContextPath and so ConvexHull resolves as you intended.
If you really cannot bear to load ComputationalGeometry beforehand, just refer to ConvexHull by its full name: ComputationalGeometry`ConvexHull. See also this related answer.
HTH
Not a direct answer to the question, but a bit too large for a comment. As another alternative, a general way to delay the symbol parsing until run-time is to use Symbol["your-symbol-name"]. In your case, you can replace ConvexHull on the r.h.s. of your definition by Symbol["ConvexHull"]:
findHull[points_] :=
Module[{},
Needs["ComputationalGeometry`"];
Symbol["ConvexHull"][points]];
This solution is not very elegant though, since Symbol["ConvexHull"] will be executed every time afresh. This can also be somewhat error-prone, if you do non-trivial manipulations with $ContextPath. Here is a modified version, combined with a generally useful trick with self-redefinition, that I use in similar cases:
Clear[findHull];
findHull[points_] :=
Module[{},
Needs["ComputationalGeometry`"];
With[{ch = Symbol["ConvexHull"]},
findHull[pts_] := ch[pts];
findHull[points]]];
For example,
findHull[RandomReal[1, {10, 2}]]
{4, 10, 9, 1, 6, 2, 5}
What happens is that the first time the function is called, the original definition with Module gets replaced by the inner one, and that happens already after the needed package is loaded and its context placed on the $ContextPath. Here we exploit the fact that Mathematica replaces an old definition with a new one if it can determine that the patterns are the same - as it can in such cases.
Other instances when self-redefinition trick is useful are cases when, for example, a function call results in some expensive computation, which we want to cache, but we are not sure whether or not the function will be called at all. Then, such construct allows to cache the computed (say, symbolically) result automatically upon the first call to the function.

Programmatically creating multivariate functions in Mathematica

This is a split from discussion on earlier question.
Suppose I need to define a function f which checks if given labeling of a graph is a proper coloring. In other words, we have an integer assigned to every node and no two adjacent nodes get the same answer. For instance, for {"Path",3}, f[{1,2,3}] returns True and f[{1,1,2}] returns False. How would I go about creating such a function for arbitrary graph?
The following does essentially what I need, but generates Part warnings.
g[edges_] := Function ## {{x}, And ## (x[[First[#]]] != x[[Last[#]]] & /# edges)}
f = g[GraphData[{"Path", 3}, "EdgeIndices"]];
f[{1, 2, 1}]==False
This is a toy instance problem I regularly come across -- I need to programmatically create a multivariate function f, and end up with either 1) part warning 2) deferring evaluation of g until evaluation of f
Here's something. When nothing else is working, Hold and rules can usually get the job done. I'm not sure it produces the correct results w.r.t. your graph-coloring question but hopefully gives you a starting place. I ended up using Slot instead of named variable because there were some scoping issues (also present in my previous suggestion, x$ vs. x) when I used a named variable that I didn't spend the time trying to work around.
g[edges_] :=
With[{ors = (Hold ## edges) /. {a_, b_} :> #[[a]] == #[[b]]},
Function[!ors] /. Hold -> Or
]
In[90]:= f = g[GraphData[{"Path", 3}, "EdgeIndices"]]
Out[90]= !(#1[[1]] == #1[[2]] || #1[[2]] == #1[[3]]) &
In[91]:= f[{1, 2, 3}]
Out[91]= True
In[92]:= f[{1, 1, 2}]
Out[92]= False
I feel like it lacks typical Mathematica elegance, but it works. I'll update if I'm inspired with something more beautiful.
There are a couple other solutions to this kind of problem which don't require you to use Hold or ReleaseHold, but instead rely on the fact that Function already has the HoldAll attribute. You first locally "erase" the definitions of Part using Block, so the expression you're interested in can be safely constructed, and then uses With to interpolate that into a Function which can then be safely returned outside of the Block, and also uses the fact that Slot doesn't really mean anything outside of Function.
Using your example:
coloringChecker[edges_List] :=
Block[{Part},
With[{body = And ## Table[#[[First#edge]] != #[[Last#edge]], {edge, edges}]},
body &]]
I don't know if this is all that much less cumbersome than using Hold, but it is different.
I'm still puzzled by the difficulties that you seem to be having. Here's a function which checks that no 2 consecutive elements in a list are identical:
f[l_List] := Length[Split[l]] == Length[l]
No trouble with Part, no error messages for the simple examples I've tried so far including OP's 'test' cases. I also contend that this is more concise and more readable than either of the other approaches seen so far.

Caching of data in Mathematica

there is a very time-consuming operation which generates a dataset in my package. I would like to save this dataset and let the package rebuild it only when I manually delete the cached file. Here is my approach as part of the package:
myDataset = Module[{fname, data},
fname = "cached-data.mx";
If[FileExistsQ[fname],
Get[fname],
data = Evaluate[timeConsumingOperation[]];
Put[data, fname];
data]
];
timeConsumingOperation[]:=Module[{},
(* lot of work here *)
{"data"}
];
However, instead of writing the long data set to the file, the Put command only writes one line: "timeConsumingOperation[]", even if I wrap it with Evaluate as above. (To be true, this behaviour is not consistent, sometimes the dataset is written, sometimes not.)
How do you cache your data?
Another caching technique I use very often, especially when you might not want to insert the precomputed form in e.g. a package, is to memoize the expensive evaluation(s), such that it is computed on first use but then cached for subsequent evaluations. This is readily accomplished with SetDelayed and Set in concert:
f[arg1_, arg2_] := f[arg1, arg2] = someExpensiveThing[arg1, arg2]
Note that SetDelayed (:=) binds higher than Set (=), so the implied order of evaluation is the following, but you don't actually need the parens:
f[arg1_, arg2_] := ( f[arg1, arg2] = someExpensiveThing[arg1, arg2])
Thus, the first time you evaluate f[1,2], the evaluation-delayed RHS is evaluated, causing resulting value is computed and stored as an OwnValue of f[1,2] with Set.
#rcollyer is also right in that you don't need to use empty brackets if you have no arguments, you could just as easily write:
g := g = someExpensiveThing[...]
There's no harm in using them, though.
In the past, whenever I've had trouble with things evaluating it is usually when I have not correctly matched the pattern required by the function. For instance,
f[x_Integers]:= x
which won't match anything. Instead, I meant
f[x_Integer]:=x
In your case, though, you have no pattern to match: timeConsumingOperation[].
You're problem is more likely related to when timeConsumingOperation is defined relative to myDataset. In the code you've posted above, timeConsumingOperation is defined after myDataset. So, on the first run (or immediately after you've cleared the global variables) you would get exactly the result you're describing because timeConsumingOperation is not defined when the code for myDataset is run.
Now, SetDelayed (:=) automatically causes the variable to be recalculated whenever it is used, and since you do not require any parameters to be passed, the square brackets are not necessary. The important point here is that timeConsumingOperation can be declared, as written, prior to myDataset because SetDelayed will cause it not to be executed until it is used.
All told, your caching methodology looks exactly how I would go about it.

Using function arguments as local variables

Something like this (yes, this doesn't deal with some edge cases - that's not the point):
int CountDigits(int num) {
int count = 1;
while (num >= 10) {
count++;
num /= 10;
}
return count;
}
What's your opinion about this? That is, using function arguments as local variables.
Both are placed on the stack, and pretty much identical performance wise, I'm wondering about the best-practices aspects of this.
I feel like an idiot when I add an additional and quite redundant line to that function consisting of int numCopy = num, however it does bug me.
What do you think? Should this be avoided?
As a general rule, I wouldn't use a function parameter as a local processing variable, i.e. I treat function parameters as read-only.
In my mind, intuitively understandabie code is paramount for maintainability, and modifying a function parameter to use as a local processing variable tends to run counter to that goal. I have come to expect that a parameter will have the same value in the middle and bottom of a method as it does at the top. Plus, an aptly-named local processing variable may improve understandability.
Still, as #Stewart says, this rule is more or less important depending on the length and complexity of the function. For short simple functions like the one you show, simply using the parameter itself may be easier to understand than introducing a new local variable (very subjective).
Nevertheless, if I were to write something as simple as countDigits(), I'd tend to use a remainingBalance local processing variable in lieu of modifying the num parameter as part of local processing - just seems clearer to me.
Sometimes, I will modify a local parameter at the beginning of a method to normalize the parameter:
void saveName(String name) {
name = (name != null ? name.trim() : "");
...
}
I rationalize that this is okay because:
a. it is easy to see at the top of the method,
b. the parameter maintains its the original conceptual intent, and
c. the parameter is stable for the rest of the method
Then again, half the time, I'm just as apt to use a local variable anyway, just to get a couple of extra finals in there (okay, that's a bad reason, but I like final):
void saveName(final String name) {
final String normalizedName = (name != null ? name.trim() : "");
...
}
If, 99% of the time, the code leaves function parameters unmodified (i.e. mutating parameters are unintuitive or unexpected for this code base) , then, during that other 1% of the time, dropping a quick comment about a mutating parameter at the top of a long/complex function could be a big boon to understandability:
int CountDigits(int num) {
// num is consumed
int count = 1;
while (num >= 10) {
count++;
num /= 10;
}
return count;
}
P.S. :-)
parameters vs arguments
http://en.wikipedia.org/wiki/Parameter_(computer_science)#Parameters_and_arguments
These two terms are sometimes loosely used interchangeably; in particular, "argument" is sometimes used in place of "parameter". Nevertheless, there is a difference. Properly, parameters appear in procedure definitions; arguments appear in procedure calls.
So,
int foo(int bar)
bar is a parameter.
int x = 5
int y = foo(x)
The value of x is the argument for the bar parameter.
It always feels a little funny to me when I do this, but that's not really a good reason to avoid it.
One reason you might potentially want to avoid it is for debugging purposes. Being able to tell the difference between "scratchpad" variables and the input to the function can be very useful when you're halfway through debugging.
I can't say it's something that comes up very often in my experience - and often you can find that it's worth introducing another variable just for the sake of having a different name, but if the code which is otherwise cleanest ends up changing the value of the variable, then so be it.
One situation where this can come up and be entirely reasonable is where you've got some value meaning "use the default" (typically a null reference in a language like Java or C#). In that case I think it's entirely reasonable to modify the value of the parameter to the "real" default value. This is particularly useful in C# 4 where you can have optional parameters, but the default value has to be a constant:
For example:
public static void WriteText(string file, string text, Encoding encoding = null)
{
// Null means "use the default" which we would document to be UTF-8
encoding = encoding ?? Encoding.UTF8;
// Rest of code here
}
About C and C++:
My opinion is that using the parameter as a local variable of the function is fine because it is a local variable already. Why then not use it as such?
I feel silly too when copying the parameter into a new local variable just to have a modifiable variable to work with.
But I think this is pretty much a personal opinion. Do it as you like. If you feel sill copying the parameter just because of this, it indicates your personality doesn't like it and then you shouldn't do it.
If I don't need a copy of the original value, I don't declare a new variable.
IMO I don't think mutating the parameter values is a bad practice in general,
it depends on how you're going to use it in your code.
My team coding standard recommends against this because it can get out of hand. To my mind for a function like the one you show, it doesn't hurt because everyone can see what is going on. The problem is that with time functions get longer, and they get bug fixes in them. As soon as a function is more than one screen full of code, this starts to get confusing which is why our coding standard bans it.
The compiler ought to be able to get rid of the redundant variable quite easily, so it has no efficiency impact. It is probably just between you and your code reviewer whether this is OK or not.
I would generally not change the parameter value within the function. If at some point later in the function you need to refer to the original value, you still have it. in your simple case, there is no problem, but if you add more code later, you may refer to 'num' without realizing it has been changed.
The code needs to be as self sufficient as possible. What I mean by that is you now have a dependency on what is being passed in as part of your algorithm. If another member of your team decides to change this to a pass by reference then you might have big problems.
The best practice is definitely to copy the inbound parameters if you expect them to be immutable.
I typically don't modify function parameters, unless they're pointers, in which case I might alter the value that's pointed to.
I think the best-practices of this varies by language. For example, in Perl you can localize any variable or even part of a variable to a local scope, so that changing it in that scope will not have any affect outside of it:
sub my_function
{
my ($arg1, $arg2) = #_; # get the local variables off the stack
local $arg1; # changing $arg1 here will not be visible outside this scope
$arg1++;
local $arg2->{key1}; # only the key1 portion of the hashref referenced by $arg2 is localized
$arg2->{key1}->{key2} = 'foo'; # this change is not visible outside the function
}
Occasionally I have been bitten by forgetting to localize a data structure that was passed by reference to a function, that I changed inside the function. Conversely, I have also returned a data structure as a function result that was shared among multiple systems and the caller then proceeded to change the data by mistake, affecting these other systems in a difficult-to-trace problem usually called action at a distance. The best thing to do here would be to make a clone of the data before returning it*, or make it read-only**.
* In Perl, see the function dclone() in the built-in Storable module.
** In Perl, see lock_hash() or lock_hash_ref() in the built-in Hash::Util module).

Introspection of messages generated in Mathematica

Is there any way to get at the actual messages generated during the evaluation of an expression in Mathematica? Say I'm numerically solving an ODE and it blows up, like so
In[1] := sol = NDSolve[{x'[t] == -15 x[t], x[0] == 1}, x, {t, 0, 1},
Method -> "ExplicitEuler"];
In this case, I'll get the NDSolve::mxst error, telling me the maximum number of 10000 steps was reached at t == 0.08671962566152185. Now, if I look at the $MessageList variable, I only receive the message name; in particular, the information about the value of t where NDSolve decided to quit has been lost.
Now, I can always get that information from sol using the InterpolatingFunctionDomain function from one of the standard add-on packages, but if I can somehow pull it out of the message, it would be quite helpful.
You might be able to use $MessagePrePrint to set up a function which would store away each of the messages for later retrieval.
I don't know if this will work, but if the only thing you want to know are the values of specific parameters at the point of error then a kludgy way of getting them would be to define those variables with dummy values globally. This works with loop counters, but I don't know if it works from within NDSolve. Another kludge would be to make t Dynamic and have an evaluated cell with t.
A more elegant (and probably the correct) approach would be to use Reap and Sow.

Resources