Julia function naming: When should I append a bang? - coding-style

The Julia style guide says that functions which "modify their arguments" should have their name end on a !.
However, what about:
functions that do modify their arguments, but return them to their original state before returning?
functions that return a Task which modifies the argument when executed?
functions that return such a Task, but when it is done, the arguments will have been restored to their original states?
Should their names end on a !?
As an example, consider this module for finding exact covers using Knuth's Dancing Links Algorithm. It implements a type CoverSet that can be populated with the subsets and then queried for the first exact cover:
set = CoverSet()
push!(set, [1, 2])
push!(set, [2, 3])
push!(set, [3, 4])
push!(set, [4, 1])
find_exact_cover(set) # returns [1, 3]
The find_exact_cover function temporarily modifies the data in set while searching for a solution, but by the time find_exact_cover returns, set will be in its original state. Should it be named find_exact_cover! instead?
Similarly, exact_cover_producer returns a Task that produces all exact covers, but when that Task is done, set will have been restored:
for cover in exact_cover_producer(set)
println(cover) # prints [1,3] and [2,4]
end
# By now, set is restored.
Should it be exact_cover_producer!?
I am aware that this might be considered subjective, so let me clarify what I am asking for: I would like to know whether there is a convention on this in the Julia community, and ideally also examples from the standard library or any packages that use either style.

See e.g. the discussion at Julia commit e7ce4cba44fa3b508fd50e0c3d03f6bc5a7a5032: the current convention is that a function is mutating and hence has a ! appended if it changes what one of its arguments would be == to.
(But there is some justification for slightly broader definitions as well; see the abovementioned discussion.)

Related

Exceptions to the "Command-Query Separation" Rule?

Command-Query Separation "states that every method should either be a command that performs an action, or a query that returns data to the caller, but not both. In other words, asking a question should not change the answer."
a = [1, 2, 3]
last = a.pop
Here, in Ruby, the pop command returns the item is popped off the Array.
This an example of a command and query within one method, and it seems necessary for it to be so.
If this is the case, is it really a code smell to have a method that is essentially a query as well as a command?
Stack popping is a well-known exception to CQS. Martin Fowler notes it as a good place to break the rule.
I would say in this case it's not a code smell, but in general it is a code smell.

May I write {x,a,b}//Do[...,#]& instead of Do[...,{x,a,b}]?

I'm in love with Ruby. In this language all core functions are actually methods. That's why I prefer postfix notation – when the data, which I want to process is placed left from the body of anonymous processing function, for example: array.map{...}. I believe, that it has advantages in how easy is this code to read.
But Mathetica, being functional (yeah, it can be procedural if you want) dictates a style, where Function name is placed left from the data. As we can see in its manuals, // is used only when it's some simple Function, without arguments, like list // MatrixForm. When Function needs a lot of arguments, people who wrote manuals, use syntax F[data].
It would be okay, but my problem is the case F[f,data], for example Do[function, {x, a, b}]. Most of Mathematica functions (if not all) have arguments in exactly this order – [function, data], not [data, function]. As I prefer to use pure functions to keep namespace clean instead of creating a lot of named functions in my notebook, the argument function can be too big – so big, that argument data would be placed on the 5-20th line of code after the line with Function call.
This is why sometimes, when evil Ruby nature takes me under control, I rewrite such functions in postfix way:
Because it's important for me, that pure function (potentially big code) is placed right from processing data. Yeah I do it and I'm happy. But there are two things:
this causes Mathematica's highlighting parser problem: the x in postfix notation is highlighted with blue color, not turquoise;
everytime when I look into Mathematica manuals, I see examples like this one: Do[x[[i]] = (v[[i]] - U[[i, i + 1 ;; n]].x[[i + 1 ;; n]])/ U[[i, i]], {i, n, 1, -1}];, which means... hell, they think it's easy to read/support/etc.?!
So these two things made me ask this question here: am I so bad boy, that use my Ruby-style, and should I write code like these guys do, or is it OK, and I don't have to worry, and should write as I like to?
The style you propose is frequently possible, but is inadvisable in the case of Do. The problem is that Do has the attribute HoldAll. This is important because the loop variable (x in the example) must remain unevaluated and be treated as a local variable. To see this, try evaluating these expressions:
x = 123;
Do[Print[x], {x, 1, 2}]
(* prints 1 and 2 *)
{x, 1, 2} // Do[Print[x], #]&
(* error: Do::itraw: Raw object 123 cannot be used as an iterator.
Do[Print[x], {123, 1, 2}]
*)
The error occurs because the pure function Do[Print[x], #]& lacks the HoldAll attribute, causing {x, 1, 2} to be evaluated. You could solve the problem by explicitly defining a pure function with the HoldAll attribute, thus:
{x, 1, 2} // Function[Null, Do[Print[x], #], HoldAll]
... but I suspect that the cure is worse than the disease :)
Thus, when one is using "binding" expressions like Do, Table, Module and so on, it is safest to conform with the herd.
I think you need to learn to use the styles that Mathematica most naturally supports. Certainly there is more than one way, and my code does not look like everyone else's. Nevertheless, if you continue to try to beat Mathematica syntax into your own preconceived style, based on a different language, I foresee nothing but continued frustration for you.
Whitespace is not evil, and you can easily add line breaks to separate long arguments:
Do[
x[[i]] = (v[[i]] - U[[i, i + 1 ;; n]].x[[i + 1 ;; n]]) / U[[i, i]]
, {i, n, 1, -1}
];
This said, I like to write using more prefix (f # x) and infix (x ~ f ~ y) notation that I usually see, and I find this valuable because it is easy to determine that such functions are receiving one and two arguments respectively. This is somewhat nonstandard, but I do not think it is kicking over the traces of Mathematica syntax. Rather, I see it as using the syntax to advantage. Sometimes this causes syntax highlighting to fail, but I can live with that:
f[x] ~Do~ {x, 2, 5}
When using anything besides the standard form of f[x, y, z] (with line breaks as needed), you must be more careful of evaluation order, and IMHO, readability can suffer. Consider this contrived example:
{x, y} // # + 1 & ## # &
I do not find this intuitive. Yes, for someone intimate with Mathematica's order of operations, it is readable, but I believe it does not improve clarity. I tend to reserve // postfix for named functions where reading is natural:
Do[f[x], {x, 10000}] //Timing //First
I'd say it is one of the biggest mistakes to try program in a language B in ways idiomatic for a language A, only because you happen to know the latter well and like it. There is nothing wrong in borrowing idioms, but you have to make sure to understand the second language well enough so that you know why other people use it the way they do.
In the particular case of your example, and generally, I want to draw attention to a few things others did not mention. First, Do is a scoping construct which uses dynamic scoping to localize its iterator symbols. Therefore, you have:
In[4]:=
x=1;
{x,1,5}//Do[f[x],#]&
During evaluation of In[4]:= Do::itraw: Raw object
1 cannot be used as an iterator. >>
Out[5]= Do[f[x],{1,1,5}]
What a surprise, isn't it. This won't happen when you use Do in a standard fashion.
Second, note that, while this fact is largely ignored, f[#]&[arg] is NOT always the same as f[arg]. Example:
ClearAll[f];
SetAttributes[f, HoldAll];
f[x_] := Print[Unevaluated[x]]
f[5^2]
5^2
f[#] &[5^2]
25
This does not affect your example, but your usage is close enough to those cases affected by this, since you manipulate the scopes.
Mathematica supports 4 ways of applying a function to its arguments:
standard function form: f[x]
prefix: f#x or g##{x,y}
postfix: x // f, and
infix: x~g~y which is equivalent to g[x,y].
What form you choose to use is up to you, and is often an aesthetic choice, more than anything else. Internally, f#x is interpreted as f[x]. Personally, I primarily use postfix, like you, because I view each function in the chain as a transformation, and it is easier to string multiple transformations together like that. That said, my code will be littered with both the standard form and prefix form mostly depending on whim, but I tend to use standard form more as it evokes a feeling of containment with regards to the functions parameters.
I took a little liberty with the prefix form, as I included the shorthand form of Apply (##) alongside Prefix (#). Of the built in commands, only the standard form, infix form, and Apply allow you easily pass more than one variable to your function without additional work. Apply (e.g. g ## {x,y}) works by replacing the Head of the expression ({x,y}) with the function, in effect evaluating the function with multiple variables (g##{x,y} == g[x,y]).
The method I use to pass multiple variables to my functions using the postfix form is via lists. This necessitates a little more work as I have to write
{x,y} // f[ #[[1]], #[[2]] ]&
to specify which element of the List corresponds to the appropriate parameter. I tend to do this, but you could combine this with Apply like
{x,y} // f ## #&
which involves less typing, but could be more difficult to interpret when you read it later.
Edit: I should point out that f and g above are just placeholders, they can, and often are, replaced with pure functions, e.g. #+1& # x is mostly equivalent to #+1&[x], see Leonid's answer.
To clarify, per Leonid's answer, the equivalence between f#expr and f[expr] is true if f does not posses an attribute that would prevent the expression, expr, from being evaluated before being passed to f. For instance, one of the Attributes of Do is HoldAll which allows it to act as a scoping construct which allows its parameters to be evaluated internally without undo outside influence. The point is expr will be evaluated prior to it being passed to f, so if you need it to remain unevaluated, extra care must be taken, like creating a pure function with a Hold style attribute.
You can certainly do it, as you evidently know. Personally, I would not worry about how the manuals write code, and just write it the way I find natural and memorable.
However, I have noticed that I usually fall into definite patterns. For instance, if I produce a list after some computation and incidentally plot it to make sure it's what I expected, I usually do
prodListAfterLongComputation[
args,
]//ListPlot[#,PlotRange->Full]&
If I have a list, say lst, and I am now focusing on producing a complicated plot, I'll do
ListPlot[
lst,
Option1->Setting1,
Option2->Setting2
]
So basically, anything that is incidental and perhaps not important to be readable (I don't need to be able to instantaneously parse the first ListPlot as it's not the point of that bit of code) ends up being postfix, to avoid disrupting the already-written complicated code it is applied to. Conversely, complicated code I tend to write in the way I find easiest to parse later, which, in my case, is something like
f[
g[
a,
b,
c
]
]
even though it takes more typing and, if one does not use the Workbench/Eclipse plugin, makes it more work to reorganize code.
So I suppose I'd answer your question with "do whatever is most convenient after taking into account the possible need for readability and the possible loss of convenience such as code highlighting, extra work to refactor code etc".
Of course all this applies if you're the only one working with some piece of code; if there are others, it is a different question alltogether.
But this is just an opinion. I doubt it's possible for anybody to offer more than this.
For one-argument functions (f#(arg)), ((arg)//f) and f[arg] are completely equivalent even in the sense of applying of attributes of f. In the case of multi-argument functions one may write f#Sequence[args] or Sequence[args]//f with the same effect:
In[1]:= SetAttributes[f,HoldAll];
In[2]:= arg1:=Print[];
In[3]:= f#arg1
Out[3]= f[arg1]
In[4]:= f#Sequence[arg1,arg1]
Out[4]= f[arg1,arg1]
So it seems that the solution for anyone who likes postfix notation is to use Sequence:
x=123;
Sequence[Print[x],{x,1,2}]//Do
(* prints 1 and 2 *)
Some difficulties can potentially appear with functions having attribute SequenceHold or HoldAllComplete:
In[18]:= Select[{#, ToExpression[#, InputForm, Attributes]} & /#
Names["System`*"],
MemberQ[#[[2]], SequenceHold | HoldAllComplete] &][[All, 1]]
Out[18]= {"AbsoluteTiming", "DebugTag", "EvaluationObject", \
"HoldComplete", "InterpretationBox", "MakeBoxes", "ParallelEvaluate", \
"ParallelSubmit", "Parenthesize", "PreemptProtect", "Rule", \
"RuleDelayed", "Set", "SetDelayed", "SystemException", "TagSet", \
"TagSetDelayed", "Timing", "Unevaluated", "UpSet", "UpSetDelayed"}

Mathematica Module versus With or Block - Guideline, rule of thumb for usage?

Leonid wrote in chapter iv of his book : "... Module, Block and With. These constructs are explained in detail in Mathematica Book and Mathematica Help, so I will say just a few words about them here. ..."
From what I have read ( been able to find ) I am still in the dark. For packaged functions I ( simply ) use Module, because it works and I know the construct. It may not be the best choice though. It is not entirely clear to me ( from the documentation ) when, where or why to use With ( or Block ).
Question. Is there a rule of thumb / guideline on when to use Module, With or Block ( for functions in packages )? Are there limitations compared to Module? The docs say that With is faster. I want to be able to defend my =choice= for Module ( or another construct ).
A more practical difference between Block and Module can be seen here:
Module[{x}, x]
Block[{x}, x]
(*
-> x$1979
x
*)
So if you wish to return eg x, you can use Block. For instance,
Plot[D[Sin[x], x], {x, 0, 10}]
does not work; to make it work, one could use
Plot[Block[{x}, D[Sin[x], x]], {x, 0, 10}]
(of course this is not ideal, it is simply an example).
Another use is something like Block[{$RecursionLimit = 1000},...], which temporarily changes $RecursionLimit (Module would not have worked as it renames $RecursionLimit).
One can also use Block to block evaluation of something, eg
Block[{Sin}, Sin[.5]] // Trace
(*
-> {Block[{Sin},Sin[0.5]],Sin[0.5],0.479426}
*)
ie, it returns Sin[0.5] which is only evaluated after the Block has finished executing. This is because Sin inside the Block is just a symbol, rather than the sine function. You could even do something like
Block[{Sin = Cos[#/4] &}, Sin[Pi]]
(*
-> 1/Sqrt[2]
*)
(use Trace to see how it works). So you can use Block to locally redefine built-in functions, too:
Block[{Plus = Times}, 3 + 2]
(*
-> 6
*)
As you mentioned there are many things to consider and a detailed discussion is possible. But here are some rules of thumb that I apply the majority of the time:
Module[{x}, ...] is the safest and may be needed if either
There are existing definitions for x that you want to avoid breaking during the evaluation of the Module, or
There is existing code that relies on x being undefined (for example code like Integrate[..., x]).
Module is also the only choice for creating and returning a new symbol. In particular, Module is sometimes needed in advanced Dynamic programming for this reason.
If you are confident there aren't important existing definitions for x or any code relying on it being undefined, then Block[{x}, ...] is often faster. (Note that, in a project entirely coded by you, being confident of these conditions is a reasonable "encapsulation" standard that you may wish to enforce anyway, and so Block is often a sound choice in these situations.)
With[{x = ...}, expr] is the only scoping construct that injects the value of x inside Hold[...]. This is useful and important. With can be either faster or slower than Block depending on expr and the particular evaluation path that is taken. With is less flexible, however, since you can't change the definition of x inside expr.
Andrew has already provided a very comprehensive answer. I would just summarize by noting that Module is for defining local variables that can be redefined within the scope of a function definition, while With is for defining local constants, which can't be. You also can't define a local constant based on the definition of another local constant you have set up in the same With statement, or have multiple symbols on the LHS of a definition. That is, the following does not work.
With[{{a,b}= OptionValue /# {opt1,opt2} }, ...]
I tend to set up complicated function definitions with Module enclosing a With. I set up all the local constants I can first inside the With, e.g. the Length of the data passed to the function, if I need that, then other local variables as needed. The reason is that With is a little faster of you genuinely do have constants not variables.
I'd like to mention the official documentation on the difference between Block and Module is available at http://reference.wolfram.com/mathematica/tutorial/BlocksComparedWithModules.html.

Accidental shadowing and `Removed[symbol]`

If you evaluate the following code twice, results will be different. Can anyone explain what's going on?
findHull[points_] := Module[{},
Needs["ComputationalGeometry`"];
ConvexHull[points]
];
findHull[RandomReal[1, {10, 2}]];
Remove["Global`ConvexHull"];
findHull[RandomReal[1, {10, 2}]]
The problem is that even though the module is not evaluated untill you call findHull, the symbols are resolved when you define findHull (i.e.: The new downvalue for findHull is stored in terms of symbols, not text).
This means that during the first round, ConvexHull resolves to Global`ConvexHull because the Needs is not evaluated.
During the second round, ComputationalGeometry is on $ContextPath and so ConvexHull resolves as you intended.
If you really cannot bear to load ComputationalGeometry beforehand, just refer to ConvexHull by its full name: ComputationalGeometry`ConvexHull. See also this related answer.
HTH
Not a direct answer to the question, but a bit too large for a comment. As another alternative, a general way to delay the symbol parsing until run-time is to use Symbol["your-symbol-name"]. In your case, you can replace ConvexHull on the r.h.s. of your definition by Symbol["ConvexHull"]:
findHull[points_] :=
Module[{},
Needs["ComputationalGeometry`"];
Symbol["ConvexHull"][points]];
This solution is not very elegant though, since Symbol["ConvexHull"] will be executed every time afresh. This can also be somewhat error-prone, if you do non-trivial manipulations with $ContextPath. Here is a modified version, combined with a generally useful trick with self-redefinition, that I use in similar cases:
Clear[findHull];
findHull[points_] :=
Module[{},
Needs["ComputationalGeometry`"];
With[{ch = Symbol["ConvexHull"]},
findHull[pts_] := ch[pts];
findHull[points]]];
For example,
findHull[RandomReal[1, {10, 2}]]
{4, 10, 9, 1, 6, 2, 5}
What happens is that the first time the function is called, the original definition with Module gets replaced by the inner one, and that happens already after the needed package is loaded and its context placed on the $ContextPath. Here we exploit the fact that Mathematica replaces an old definition with a new one if it can determine that the patterns are the same - as it can in such cases.
Other instances when self-redefinition trick is useful are cases when, for example, a function call results in some expensive computation, which we want to cache, but we are not sure whether or not the function will be called at all. Then, such construct allows to cache the computed (say, symbolically) result automatically upon the first call to the function.

Programmatically creating multivariate functions in Mathematica

This is a split from discussion on earlier question.
Suppose I need to define a function f which checks if given labeling of a graph is a proper coloring. In other words, we have an integer assigned to every node and no two adjacent nodes get the same answer. For instance, for {"Path",3}, f[{1,2,3}] returns True and f[{1,1,2}] returns False. How would I go about creating such a function for arbitrary graph?
The following does essentially what I need, but generates Part warnings.
g[edges_] := Function ## {{x}, And ## (x[[First[#]]] != x[[Last[#]]] & /# edges)}
f = g[GraphData[{"Path", 3}, "EdgeIndices"]];
f[{1, 2, 1}]==False
This is a toy instance problem I regularly come across -- I need to programmatically create a multivariate function f, and end up with either 1) part warning 2) deferring evaluation of g until evaluation of f
Here's something. When nothing else is working, Hold and rules can usually get the job done. I'm not sure it produces the correct results w.r.t. your graph-coloring question but hopefully gives you a starting place. I ended up using Slot instead of named variable because there were some scoping issues (also present in my previous suggestion, x$ vs. x) when I used a named variable that I didn't spend the time trying to work around.
g[edges_] :=
With[{ors = (Hold ## edges) /. {a_, b_} :> #[[a]] == #[[b]]},
Function[!ors] /. Hold -> Or
]
In[90]:= f = g[GraphData[{"Path", 3}, "EdgeIndices"]]
Out[90]= !(#1[[1]] == #1[[2]] || #1[[2]] == #1[[3]]) &
In[91]:= f[{1, 2, 3}]
Out[91]= True
In[92]:= f[{1, 1, 2}]
Out[92]= False
I feel like it lacks typical Mathematica elegance, but it works. I'll update if I'm inspired with something more beautiful.
There are a couple other solutions to this kind of problem which don't require you to use Hold or ReleaseHold, but instead rely on the fact that Function already has the HoldAll attribute. You first locally "erase" the definitions of Part using Block, so the expression you're interested in can be safely constructed, and then uses With to interpolate that into a Function which can then be safely returned outside of the Block, and also uses the fact that Slot doesn't really mean anything outside of Function.
Using your example:
coloringChecker[edges_List] :=
Block[{Part},
With[{body = And ## Table[#[[First#edge]] != #[[Last#edge]], {edge, edges}]},
body &]]
I don't know if this is all that much less cumbersome than using Hold, but it is different.
I'm still puzzled by the difficulties that you seem to be having. Here's a function which checks that no 2 consecutive elements in a list are identical:
f[l_List] := Length[Split[l]] == Length[l]
No trouble with Part, no error messages for the simple examples I've tried so far including OP's 'test' cases. I also contend that this is more concise and more readable than either of the other approaches seen so far.

Resources