How to debug Julia macros? - debugging

Note: This question refers to Julia v1.6. Of course, at any time the answers should ideally also answer the question for the most recent version.
There seem to be a lot of questions and confusion about macro hygiene in Julia. While I read the manual pages in question, I still really struggle to write macros while using things like interpolation ($name), quote and other quoting syntax, the differences in behavior between macros and functions acting on expressions, esc, etc.
What are the tools Julia provides for finding bugs in macros and how to use them effectively?
This is certainly a broad question, which I think very much deserves a dedicated manual page, rather than the current afterthought in an overview of meta-programing. Nevertheless, I think it can be answered effectively (i.e., in a way that teaches me and others a lot about the main, general question) by considering and debugging a concrete example. Hence, I will discuss a simple
toy-example macro:
(Note that the macro Base.#locals
"Construct[s] a dictionary of the names (as symbols) and values of all local variables defined as of the call site" [from the docstring].)
# Julia 1.5
module MyModule
foo = "MyModule's foo"
macro mac(print_local=true)
println("Dump of argument:{")
dump(print_local)
println("}\n\n")
local_inmacro = "local in the macro"
return quote
println(repeat("-", 30)) # better readability of output
# intention: use variable local to the macro to make a temporary variable in the user's scope
# (can you think of a reason why one might want to do this?)
var_inquote = $local_inmacro * "_modified"
# intention: evaluate `print_local` in user scope
# (THIS CONTAINS AN ERROR ON PURPOSE!
# One should write `if $(esc(print_local))` to achieve intention.)
if $print_local
# intention: get local variables in caller scope
println("Local in caller scope: ", Base.#locals)
else
# intention: local to macro or module AA.
println($foo)
println($local_inmacro)
println(var_inquote)
end
end
end
end # module MyModule
Some code to test this
function testmacro()
foo = "caller's foo"
MyModule.#mac # prints `Dict` containing "caller's foo"
MyModule.#mac true # (Exactly the same)
MyModule.#mac false # prints stuff local to `#mac` and `MyModule`
# If a variable name is passed instead of `true` or `false`,
# it doesn't work. This is because of macro hygiene,
# which renames and rescopes interpolated variables.
# (Intended behaviour is achieved by proper escaping the variable in the macro)
var_false = false
MyModule.#mac var_false # gives `UndefVarError`
end
testmacro()
Pretend that you don't understand why the error happens. How do we find out what's going on?
Debugging techniques (that I'm aware of) include:
#macroexpand (expr) : expand all macros inside (expr)
#macroexpand1 (expr) : expand only the outer-most macro in (expr), usually just the macro you are debugging. Useful, e.g., if the macro you're debugging returns expressions with #warn inside, which you don't want to see expanded.
macroexpand(m::Module, x; recursive=true) : combines the above two and allows to specify the "caller"-module
dump(arg) : can be used inside a macro to inspect its argument arg.
eval(expr) : to evaluate expressions (should almost never be used inside a macro body).
Please help add useful things to this list.
Using dump reveals that the argument print_local during the problematic (i.e. last) macro call is a Symbol, to be exact, it has the value :var_false.
Let's look at the expression that the macro returns. This can be done, e.g., by replacing the last macro call (MyModule.#mac var_false) by return (#macroexpand1 MyModule.#mac var_false). Result:
quote
#= <CENSORED PATH>.jl:14 =#
Main.MyModule.println(Main.MyModule.repeat("-", 30))
#= <CENSORED PATH>.jl:18 =#
var"#5#var_inquote" = "local in the macro" * "_modified"
#= <CENSORED PATH>.jl:23 =#
if Main.MyModule.var_false
#= <CENSORED PATH>.jl:25 =#
Main.MyModule.println("Local in caller scope: ", #= <CENSORED PATH>.jl:25 =# Base.#locals())
else
#= <CENSORED PATH>.jl:28 =#
Main.MyModule.println("MyModule's foo")
#= <CENSORED PATH>.jl:29 =#
Main.MyModule.println("local in the macro")
#= <CENSORED PATH>.jl:30 =#
Main.MyModule.println(var"#5#var_inquote")
end
end
We could manually remove the annoying comments (surely there is a built-in way to do that?).
In this simplistic example, the debugging tools listed here are enough to see the problem. We notice that the if statement in the macro's return expression "rescopes" the interpolated symbol to the macro's parent module: it looks at Main.MyModule.var_false. We intended for it to be Main.var_false in the caller scope.
One can solve this problem by replacing if $print_local by if $(esc(print_local)). In that case, macro hygiene will leave the contents of the print_local variable alone. I am still a bit confused as to the order and placement of esc and $ for interpolation into expressions.
Suppose that we mess up and write if $esc(print_local) instead, thus interpolating the esc function into the expression, rather than escaping anything (similar mistakes have cost me quite a bit of headache). This results in the returned expression (obtained via #macroexpand1) being impossible to execute via eval, since the esc function is weird outside of a macro, returning in stuff like:($(Expr(:escape, <somthing>))). In fact, I am generally confused as to when Expressions obtained via #macroexpand are actually executable (to the same effect as the macro call) and how to execute them (eval doesn't always do the trick). Any thoughts on this?

Related

What is # in Julia?

Recently I started learning Julia and have studied a lot of examples. I noticed the # sign/syntax a couple of times. Here is an example:
using DataFrames
using Statistics
df = DataFrame(x = rand(10), y = rand(10))
#df df scatter(:x, :y)
This will simply create a scatterplot. You could also use scatter(df[!, :x], df[!, :y]) without the # and get the same result. I can't find any documentation about this syntax. So I was wondering what this syntax is and when you should use this in Julia?
When you do not know how something works try typing ? followed by what you want to know in Julia REPL.
For an example typing ?# and pressing ENTER yields:
The at sign followed by a macro name marks a macro call. Macros provide the ability to include generated code in the
final body of a program. A macro maps a tuple of arguments, expressed as space-separated expressions or a
function-call-like argument list, to a returned expression. The resulting expression is compiled directly into the
surrounding code. See Metaprogramming for more details and examples.
Macros are a very advanced language concept. They generally take code as an argument and generate new code that gets compiled.
Consider this macro:
macro myshow(expr)
es = string(expr)
quote
println($es," = ",$expr)
end
end
Which can be used as:
julia> #myshow 2+2
2 + 2 = 4
To understand what is really going try #macroexpand:
julia> #macroexpand #myshow 2+2
quote
Main.println("2 + 2", " = ", 2 + 2)
end
You can see that one Julia command (2+2) has been packed around with additional julia code. You can try #macroexpand with other macros that you are using.
For more information see the Metaprogramming section of Julia manual.
What is # in Julia?
Macros have a dedicated character in Julia's syntax: the # (at-sign), followed by the unique name declared in a macro NAME ... end block.
So in the example you noted, the #df is a macro, and the df is its name.
Read here about macros. This concept belongs to the meta-programming feature of Julia. I guess you used the StatsPlots.jl package since #df is one of its prominent tools; using the #macroexpand, you can investigate the functionality of the given macro:
julia> using StatsPlots
julia> #macroexpand #df df scatter(:x, :y)
:(((var"##312"->begin
((var"##x#313", var"##y#314"), var"##315") = (StatsPlots).extract_columns_and_names(var"##312", :x, :y)
(StatsPlots).add_label(["x", "y"], scatter, var"##x#313", var"##y#314")
end))(df))

Overriding Ruby's & and | methods doesn't require . operator? [duplicate]

I'm wondering why calls to operator methods don't require a dot? Or rather, why can't normal methods be called without a dot?
Example
class Foo
def +(object)
puts "this will work"
end
def plus(object)
puts "this won't"
end
end
f = Foo.new
f + "anything" # "this will work"
f plus "anything" # NoMethodError: undefined method `plus' for main:Object
The answer to this question, as to pretty much every language design question is: "Just because". Language design is a series of mostly subjective trade-offs. And for most of those subjective trade-offs, the only correct answer to the question why something is the way it is, is simply "because Matz said so".
There are certainly other choices:
Lisp doesn't have operators at all. +, -, ::, >, = and so on are simply normal legal function names (variable names, actually), just like foo or bar?
(plus 1 2)
(+ 1 2)
Smalltalk almost doesn't have operators. The only special casing Smalltalk has is that methods which consist only of operator characters do not have to end with a colon. In particular, since there are no operators, all method calls have the same precedence and are evaluated strictly left-to-right: 2 + 3 * 4 is 20, not 14.
1 plus: 2
1 + 2
Scala almost doesn't have operators. Just like Lisp and Smalltalk, *, -, #::: and so on are simply legal method names. (Actually, they are also legal class, trait, type and field names.) Any method can be called either with or without a dot. If you use the form without the dot and the method takes only a single argument, then you can leave off the brackets as well. Scala does have precedence, though, although it is not user-definable; it is simply determined by the first character of the name. As an added twist, operator method names that end with a colon are inverted or right-associative, i.e. a :: b is equivalent to b.::(a) and not a.::(b).
1.plus(2)
1 plus(2)
1 plus 2
1.+(2)
1 +(2)
1 + 2
In Haskell, any function whose name consists of operator symbols is considered an operator. Any function can be treated as an operator by enclosing it in backticks and any operator can be treated as a function by enclosing it in brackets. In addition, the programmer can freely define associativity, fixity and precedence for user-defined operators.
plus 1 2
1 `plus` 2
(+) 1 2
1 + 2
There is no particular reason why Ruby couldn't support user-defined operators in a style similar to Scala. There is a reason why Ruby can't support arbitrary methods in operator position, simply because
foo plus bar
is already legal, and thus this would be a backwards-incompatible change.
Another thing to consider is that Ruby wasn't actually fully designed in advance. It was designed through its implementation. Which means that in a lot of places, the implementation is leaking through. For example, there is absolutely no logical reason why
puts(!true)
is legal but
puts(not true)
isn't. The only reason why this is so, is because Matz used an LALR(1) parser to parse a non-LALR(1) language. If he had designed the language first, he would have never picked an LALR(1) parser in the first place, and the expression would be legal.
The Refinement feature currently being discussed on ruby-core is another example. The way it is currently specified, will make it impossible to optimize method calls and inline methods, even if the program in question doesn't actually use Refinements at all. With just a simple tweak, it can be just as expressive and powerful, and ensure that the pessimization cost is only incurred for scopes that actually use Refinements. Apparently, the sole reason why it was specified this way, is that a) it was easier to prototype this way, and b) YARV doesn't have an optimizer, so nobody even bothered to think about the implications (well, nobody except Charles Oliver Nutter).
So, for basically any question you have about Ruby's design, the answer will almost always be either "because Matz said so" or "because in 1993 it was easier to implement that way".
The implementation doesn't have the additional complexity that would be needed to allow generic definition of new operators.
Instead, Ruby has a Yacc parser that uses a statically defined grammar. You get the built-in operators and that's it. Symbols occur in a fixed set of sentences in the grammar. As you have noted, the operators can be overloaded, which is more than most languages offer.
Certainly it's not because Matz was lazy.
Ruby actually has a fiendishly complex grammar that is roughly at the limit of what can be accomplished in Yacc. To get more complex would require using a less portable compiler generator or it would have required writing the parser by hand in C, and doing that would have limited future implementation portability in its own way as well as not providing the world with the Yacc input. That would be a problem because Ruby's Yacc source code is the only Ruby grammar documentation and is therefore "the standard".
Because Ruby has "syntax sugar" that allows for a variety of convenient syntax for preset situations. For example:
class Foo
def bar=( o ); end
end
# This is actually calling the bar= method with a parameter, not assigning a value
Foo.new.bar = 42
Here's a list of the operator expressions that may be implemented as methods in Ruby.
Because Ruby's syntax was designed to look roughly like popular OO languages, and those use the dot operator to call methods. The language it borrowed its object model from, Smalltalk, didn't use dots for messages, and in fact had a fairly "weird" syntax that many people found off-putting. Ruby has been called "Smalltalk with an Algol syntax," where Algol is the language that gave us the conventions you're talking about here. (Of course, there are actually more differences than just the Algol syntax.)
Missing braces was some "advantage" for ruby 1.8, but with ruby 1.9 you can't even write method_0 method_1 some param it will be rejected, so the language goes rather to the strict version instead of freeforms.

Provide alias for Ruby's built-in keyword

For example, I want to make Object#rescue another name so I can use in my code like:
def dangerous
something_dangerous!
dont_worry # instead of rescue here
false
end
I tried
class ::Object
alias :dont_worry :rescue
end
But cannot find the rescue method on Object:
`<class:Object>': undefined method `rescue' for class `Object' (NameError)
Another example is I would like to have when in the language to replace:
if cond
# eval when cond is truthy
end
to
when cond
# eval when cond is truthy
end
Is it possible to give a Ruby keyword alias done in Ruby?
Or I need to hack on Ruby C source code?
Thanks!
This is not possible without some deep changes to the Ruby language itself. The things you describe are not methods but keywords of the language, i.e. the actual core of what is Ruby. As such, these things are not user-changeable at all.
If you still want to change the names of the keywords, you would at least have to adapt the language parser. If you don't change semantics at all, this might do it as is. But if you want to change what these keywords represent, things get messy really quick.
Also note that Ruby in itself is sometimes quite ambiguous (e.g. with regards to parenthesis, dots, spacing) and goes to great length to resolve this in a mostly consistent way. If you change keywords, you would have to ensure that things won't get any more ambiguous. This could e.g. happen with your change of if to when. when is used as a keywords is case statements already and would thus could be a source of ambiguity when used as an if.

Mathematica Module versus With or Block - Guideline, rule of thumb for usage?

Leonid wrote in chapter iv of his book : "... Module, Block and With. These constructs are explained in detail in Mathematica Book and Mathematica Help, so I will say just a few words about them here. ..."
From what I have read ( been able to find ) I am still in the dark. For packaged functions I ( simply ) use Module, because it works and I know the construct. It may not be the best choice though. It is not entirely clear to me ( from the documentation ) when, where or why to use With ( or Block ).
Question. Is there a rule of thumb / guideline on when to use Module, With or Block ( for functions in packages )? Are there limitations compared to Module? The docs say that With is faster. I want to be able to defend my =choice= for Module ( or another construct ).
A more practical difference between Block and Module can be seen here:
Module[{x}, x]
Block[{x}, x]
(*
-> x$1979
x
*)
So if you wish to return eg x, you can use Block. For instance,
Plot[D[Sin[x], x], {x, 0, 10}]
does not work; to make it work, one could use
Plot[Block[{x}, D[Sin[x], x]], {x, 0, 10}]
(of course this is not ideal, it is simply an example).
Another use is something like Block[{$RecursionLimit = 1000},...], which temporarily changes $RecursionLimit (Module would not have worked as it renames $RecursionLimit).
One can also use Block to block evaluation of something, eg
Block[{Sin}, Sin[.5]] // Trace
(*
-> {Block[{Sin},Sin[0.5]],Sin[0.5],0.479426}
*)
ie, it returns Sin[0.5] which is only evaluated after the Block has finished executing. This is because Sin inside the Block is just a symbol, rather than the sine function. You could even do something like
Block[{Sin = Cos[#/4] &}, Sin[Pi]]
(*
-> 1/Sqrt[2]
*)
(use Trace to see how it works). So you can use Block to locally redefine built-in functions, too:
Block[{Plus = Times}, 3 + 2]
(*
-> 6
*)
As you mentioned there are many things to consider and a detailed discussion is possible. But here are some rules of thumb that I apply the majority of the time:
Module[{x}, ...] is the safest and may be needed if either
There are existing definitions for x that you want to avoid breaking during the evaluation of the Module, or
There is existing code that relies on x being undefined (for example code like Integrate[..., x]).
Module is also the only choice for creating and returning a new symbol. In particular, Module is sometimes needed in advanced Dynamic programming for this reason.
If you are confident there aren't important existing definitions for x or any code relying on it being undefined, then Block[{x}, ...] is often faster. (Note that, in a project entirely coded by you, being confident of these conditions is a reasonable "encapsulation" standard that you may wish to enforce anyway, and so Block is often a sound choice in these situations.)
With[{x = ...}, expr] is the only scoping construct that injects the value of x inside Hold[...]. This is useful and important. With can be either faster or slower than Block depending on expr and the particular evaluation path that is taken. With is less flexible, however, since you can't change the definition of x inside expr.
Andrew has already provided a very comprehensive answer. I would just summarize by noting that Module is for defining local variables that can be redefined within the scope of a function definition, while With is for defining local constants, which can't be. You also can't define a local constant based on the definition of another local constant you have set up in the same With statement, or have multiple symbols on the LHS of a definition. That is, the following does not work.
With[{{a,b}= OptionValue /# {opt1,opt2} }, ...]
I tend to set up complicated function definitions with Module enclosing a With. I set up all the local constants I can first inside the With, e.g. the Length of the data passed to the function, if I need that, then other local variables as needed. The reason is that With is a little faster of you genuinely do have constants not variables.
I'd like to mention the official documentation on the difference between Block and Module is available at http://reference.wolfram.com/mathematica/tutorial/BlocksComparedWithModules.html.

Overloading Set[a, b] (a = b)

I would like to overload Mathematica's Set function (=), which turns out to be too tricky for me (see following code example). I successfully overloaded other functions (e.g. Reverse in the code example). Any suggestions?
In[17]:= ClearAll[struct];
In[18]:= var1=struct[{1,2}]
Out[18]= struct[{1,2}]
In[19]:= Reverse#var1
Out[19]= struct[{1,2}]
In[20]:= Head[var1]
Out[20]= struct
In[21]:= struct/:Reverse[stuff_struct]:=struct[Reverse#stuff[[1]]]
In[22]:= Reverse#var1
Out[22]= struct[{2,1}]
In[23]:= struct/:Set[stuff_struct,rhs_]:=Set[struct[[1]],rhs]
In[24]:= var1="Success!"
Out[24]= Success!
In[25]:= var1
Out[25]= Success!
In[26]:= Head[var1]
Out[26]= String
In[27]:= ??struct
Global`struct
Reverse[stuff_struct]^:=struct[Reverse[stuff[[1]]]]
(stuff_struct=rhs_)^:=struct[[1]]=rhs
I don't think that what you want can be done with UpValues (alas), since the symbol (tag) must be not deeper than level one for definition to work. Also, the semantics you want is somewhat unusual in Mathematica, since most Mathematica expressions are immutable (not L-values), and their parts can not be assigned values. I believe that this code will do something similar to what you want:
Unprotect[Set];
Set[var_Symbol, rhs_] /;
MatchQ[Hold[var] /. OwnValues[var], Hold[_struct]] := Set[var[[1]], rhs];
Protect[Set];
For example:
In[33]:= var1 = struct[{1, 2}]
Out[33]= struct[{1, 2}]
In[34]:= var1 = "Success!"
Out[34]= "Success!"
In[35]:= var1
Out[35]= struct["Success!"]
But generally, adding DownValues to such important commands as Set is not recommended since this may corrupt the system in subtle ways.
EDIT
Expanding a bit on why your attempt failed: Mathematica implements flow control and assignment operators using the mechanism of argument holding (Hold* - attributes, described here). This mechanism allows it to, in particular, imitate pass-by-reference semantics needed for assignments. But then, at the moment when you assign to var1, Set does not know what is stored in var1 already, since it only has the symbol var1, not its value. The pattern _struct does not match because, even if the variable already stores some struct, Set only has the variable name. For the match to be successful, the variable inside Set would have to evaluate to its value. But then, the value is immutable and you can not assign to it. The code I suggested tests whether the variable has an assigned value that is of the form struct[something], and if so, modifies the first part (the Part command is an exception, it can modify parts of an L-value expression provided that those parts already exist).
You can read more on the topics of Hold* - attributes and related issues in many places, for example here and here
I also do not believe that this can be done with TagSet, because the first argument of Set must be held.
It seems to me that if modifying Set, it can be done with:
Unprotect[Set]
Set[s_, x_] /; Head[s] === struct := s[[1]] = x
However, Leonid knows Mathematica better than I, and he probably has a good reason for the longer definition.

Resources