Here's a short snippet that demonstrates the problem:
(defmulti test-dummy type)
(defmacro silly [t]
`(defmethod test-dummy ~(resolve t) [some-arg] "FOO!"))
(silly String)
Evaluating this results in "Can't use qualified name as parameter: user/some-arg", but running macroexpand gives a perfectly good result:
(defmethod test-dummy java.lang.String [some-arg] "FOO!")
Typing ~' before the argument name to make it evaluate into a symbol works, but what is going on?
Okay. So the issue here is that Clojure attempts to enfoce macro hygiene by ensuring that no symbols in a macro's expansion are unqualified locals which could capture from the macro's expansion environment.
Traditionally, Lisp dialects have allowed macro expansions to contain arbitrary symbols. This creates issues where the expression containing a macro to be expanded defines a symbol some-arg which is used without definition in the expanded result of the macro. This means that the macro is "capturing" a symbol/value from its expansion environment which is rarely desired behavior. This is exactly what the Clojure compiler thinks is going on here with your symbol some-arg. The Clojure compiler attempts to resolve some-arg to a namespace level symbol (a previous definition or require creating an alias to the symbol some-var) and it fails to do so thus generating the warning that user/some-arg is undefined.
There are two cannonical solutions to this problem. The first is to use a gensym for some-arg which the macro expansion system knows denotes a local and will not attempt to resolve.
(defmacro silly [t]
`(defmethod test-dummy ~(resolve t) [some-arg#] "FOO!"))
The other method is that you can use the macro splice operator ~ to insert the value of a quoted symbol.
(defmacro silly [t]
`(defmethod test-dummy ~(resolve t) [~'some-arg] "FOO!"))
In both cases you have to use the same expression (either the gensym or the splice) at all uses of the symbol. The gensym will as the name suggests generate a symbol for use and thus will not produce repeatable naming. This is a feature for escaping symbol collisions. The splice however will enable you to always generate a specified symbol in case you need a real human-usable name for something (say a def) or you actually do want to close over something from the environment explicitly.
Related
Note: This question refers to Julia v1.6. Of course, at any time the answers should ideally also answer the question for the most recent version.
There seem to be a lot of questions and confusion about macro hygiene in Julia. While I read the manual pages in question, I still really struggle to write macros while using things like interpolation ($name), quote and other quoting syntax, the differences in behavior between macros and functions acting on expressions, esc, etc.
What are the tools Julia provides for finding bugs in macros and how to use them effectively?
This is certainly a broad question, which I think very much deserves a dedicated manual page, rather than the current afterthought in an overview of meta-programing. Nevertheless, I think it can be answered effectively (i.e., in a way that teaches me and others a lot about the main, general question) by considering and debugging a concrete example. Hence, I will discuss a simple
toy-example macro:
(Note that the macro Base.#locals
"Construct[s] a dictionary of the names (as symbols) and values of all local variables defined as of the call site" [from the docstring].)
# Julia 1.5
module MyModule
foo = "MyModule's foo"
macro mac(print_local=true)
println("Dump of argument:{")
dump(print_local)
println("}\n\n")
local_inmacro = "local in the macro"
return quote
println(repeat("-", 30)) # better readability of output
# intention: use variable local to the macro to make a temporary variable in the user's scope
# (can you think of a reason why one might want to do this?)
var_inquote = $local_inmacro * "_modified"
# intention: evaluate `print_local` in user scope
# (THIS CONTAINS AN ERROR ON PURPOSE!
# One should write `if $(esc(print_local))` to achieve intention.)
if $print_local
# intention: get local variables in caller scope
println("Local in caller scope: ", Base.#locals)
else
# intention: local to macro or module AA.
println($foo)
println($local_inmacro)
println(var_inquote)
end
end
end
end # module MyModule
Some code to test this
function testmacro()
foo = "caller's foo"
MyModule.#mac # prints `Dict` containing "caller's foo"
MyModule.#mac true # (Exactly the same)
MyModule.#mac false # prints stuff local to `#mac` and `MyModule`
# If a variable name is passed instead of `true` or `false`,
# it doesn't work. This is because of macro hygiene,
# which renames and rescopes interpolated variables.
# (Intended behaviour is achieved by proper escaping the variable in the macro)
var_false = false
MyModule.#mac var_false # gives `UndefVarError`
end
testmacro()
Pretend that you don't understand why the error happens. How do we find out what's going on?
Debugging techniques (that I'm aware of) include:
#macroexpand (expr) : expand all macros inside (expr)
#macroexpand1 (expr) : expand only the outer-most macro in (expr), usually just the macro you are debugging. Useful, e.g., if the macro you're debugging returns expressions with #warn inside, which you don't want to see expanded.
macroexpand(m::Module, x; recursive=true) : combines the above two and allows to specify the "caller"-module
dump(arg) : can be used inside a macro to inspect its argument arg.
eval(expr) : to evaluate expressions (should almost never be used inside a macro body).
Please help add useful things to this list.
Using dump reveals that the argument print_local during the problematic (i.e. last) macro call is a Symbol, to be exact, it has the value :var_false.
Let's look at the expression that the macro returns. This can be done, e.g., by replacing the last macro call (MyModule.#mac var_false) by return (#macroexpand1 MyModule.#mac var_false). Result:
quote
#= <CENSORED PATH>.jl:14 =#
Main.MyModule.println(Main.MyModule.repeat("-", 30))
#= <CENSORED PATH>.jl:18 =#
var"#5#var_inquote" = "local in the macro" * "_modified"
#= <CENSORED PATH>.jl:23 =#
if Main.MyModule.var_false
#= <CENSORED PATH>.jl:25 =#
Main.MyModule.println("Local in caller scope: ", #= <CENSORED PATH>.jl:25 =# Base.#locals())
else
#= <CENSORED PATH>.jl:28 =#
Main.MyModule.println("MyModule's foo")
#= <CENSORED PATH>.jl:29 =#
Main.MyModule.println("local in the macro")
#= <CENSORED PATH>.jl:30 =#
Main.MyModule.println(var"#5#var_inquote")
end
end
We could manually remove the annoying comments (surely there is a built-in way to do that?).
In this simplistic example, the debugging tools listed here are enough to see the problem. We notice that the if statement in the macro's return expression "rescopes" the interpolated symbol to the macro's parent module: it looks at Main.MyModule.var_false. We intended for it to be Main.var_false in the caller scope.
One can solve this problem by replacing if $print_local by if $(esc(print_local)). In that case, macro hygiene will leave the contents of the print_local variable alone. I am still a bit confused as to the order and placement of esc and $ for interpolation into expressions.
Suppose that we mess up and write if $esc(print_local) instead, thus interpolating the esc function into the expression, rather than escaping anything (similar mistakes have cost me quite a bit of headache). This results in the returned expression (obtained via #macroexpand1) being impossible to execute via eval, since the esc function is weird outside of a macro, returning in stuff like:($(Expr(:escape, <somthing>))). In fact, I am generally confused as to when Expressions obtained via #macroexpand are actually executable (to the same effect as the macro call) and how to execute them (eval doesn't always do the trick). Any thoughts on this?
I am learning OCaml and I'm a complete beginner at this point. I'm trying to get used to the syntax and I just spent 15 minutes debugging a stupid syntax error.
let foo a b = "bar";;
let biz = foo 2. -1.;;
I was getting an error This expression has type 'a -> string but an expression was expected of type int. I resolved the error, but it prompted me to learn what is the best way to handle this syntax peculiarity.
Basically OCaml treats what I intended as the numeric constant -1. as two separate tokens: - and 1. and I end up passing just 1 argument to foo. In other languages I'm familiar with this doesn't happen because arguments are separated with a comma (or in Scheme there are parentheses).
What is the usual way to handle this syntax peculiarity in OCaml? Is it surrounding the number with parentheses (foo 2. (-1.)) or there is some other way?
There is an unary minus operator ~-. that can be used to avoid this issue: foo ~-.1. (and its integer counterpart ~-) but it is generally simpler to add parentheses around the problematic expression.
I am trying to understand how the Scheme meta-circular evaluator handles quoted expressions differently than symbolic data.
The accepted answer Stack Overflow question What exactly is a symbol in lisp/scheme? defines the "symbol" data object in Scheme:
In Scheme and Racket, a symbol is like an immutable string that happens to be interned
The accepted answer writes that in Scheme, there is a built-in correspondence between identifiers and symbols:
To call a method, you look up the symbol that corresponds to the method name. Lisp/Scheme/Racket makes that really easy, because the language already has a built-in correspondence between identifiers (part of the language's syntax) and symbols (values in the language).
To understand the correspondance, I read the page "A Note on Identifiers" in An Introduction to Scheme and Its Implementation, which says
Scheme identifiers (variable names and special form names and keywords) have almost the same restrictions as Scheme symbol object character sequences, and it's no coincidence. Most implementations of Scheme happen to be written in Scheme, and symbol objects are used in the interpreter or compiler to represent variable names.
Based on the above, I'm wondering if my understanding of what is happening in the following session is correct:
user#host:/home/user $ scheme
MIT/GNU Scheme running under GNU/Linux
Type `^C' (control-C) followed by `H' to obtain information about interrupts.
Copyright (C) 2011 Massachusetts Institute of Technology
This is free software; see the source for copying conditions. There is NO warranty; not even for
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Image saved on Sunday February 7, 2016 at 10:35:34 AM
Release 9.1.1 || Microcode 15.3 || Runtime 15.7 || SF 4.41 || LIAR/x86-64 4.118 || Edwin 3.116
1 ]=> (define a (lambda (i) (+ i 1)))
;Value: a
1 ]=> a
;Value 13: #[compound-procedure 13 a]
1 ]=> (quote a)
;Value: a
1 ]=> (eval a (the-environment))
;Value 13: #[compound-procedure 13 a]
1 ]=> (eval (quote a) (the-environment))
;Value 13: #[compound-procedure 13 a]
1 ]=>
The first define statement is a special form captured by the evaluator, which creates a binding for the symbol a to a compound procedure object in the global environment.
Writing a in the top-level causes the evaluator to receive the symbol object 'a, which evaluates to the compound-procedure object that 'a points to in the global environment.
Writing (quote a) in the top-level causes the evaluator to receive a list of symbols ('quote 'a)); this expression is a special form captured by the evaluator, which evaluates to the quoted expression, namely the symbol object 'a.
Writing (eval a (the-environment)) causes the evaluator to receive a list of symbols ('eval 'a ...) (ignoring the environment). The evaluator performs a lookup for 'eval, which yields the eval compiled procedure object, a lookup for 'a, which yields the compound-procedure. Finally, the top-level evaluator applies the eval procedure to its arguments, since a compound-procedure is self-evaluating (not true in Scheme48), the final value of the expression is the compound-procedure itself.
Writing (eval (quote a) (the-environment)) causes the evaluator to receive a list of symbols ('eval ('quote 'a) ...). The evaluator performs a lookup for 'eval, which yields the eval compiled procedure object. It evaluates the expression ('quote 'a) which yields the symbol object 'a. Finally, the top-level evaluator applies the eval procedure to 'a, which is a symbol object and therefore invokes an environment lookup that yields the compound procedure.
Does this explanation correctly describe (at a high level) how a Scheme interpreter might differentiate between symbol objects and identifiers in the language? Are there fundamental misunderstandings in these descriptions?
The R6RS Scheme report, in 4.2 Lexical Syntax, uses the term identifer to refer to the character-level syntax. That is to say, roughly, identifier means something like the lexical token from which a symbol is constructed when the expression becomes an object. However, elsewhere in the text, identifier seems to be freely used as a synonym for symbol. E.g. "Scheme allows identifiers to stand for locations containing values. These identifiers are called variables." (1.3 Variables and Binding). Basically, the spec seems to be loose with regard to this terminology. Depending on context, an identifier is either the same thing as a symbol (an object), or else <identifier>: the grammar category from the lexical syntax.
In a sentence which says something like that a certain character may or may not appear in an identifier, the context is clearly lexical syntax, because a symbol object is an atom and not a character string; it doesn't contain anything. But when we talk about an identifier denoting a memory location (being a variable), that's the symbol; we're past the issue of what kinds of tokens can produce the symbol in the textual source code.
The An Introduction to Scheme and Its Implementation tutorial linked to in the question is using its own peculiar definition of identifier which is at odds with the Scheme language. It implies that identifiers are "variable names, and special form names and keywords" (so that symbols which are not variable names are not identifiers, which is not supported by the specification).
ObPreface: Apologies in advance for telling you things you already know!
Your very first sentence is raising big XY question issues for me. You write "I am trying to understand how the Scheme meta-circular evaluator handles quoted expressions differently than symbolic data." What do you mean by "the Scheme meta-circular evaluator"? Also, what do you mean by "symbolic data"? Both of these terms suggest to me that you want to ask some more high-level questions.
Regardless, your title suggests a question about the difference between identifiers and symbols. The difference is this:
"Identifiers" are a syntactic category. That is, suppose we take a text file and break it up into tokens. Some of those tokens will be left-parens. Some will be right-parens. Some will be numbers. Some will be identifiers. Every language has its own set of syntactic categories, but many of them use the name "identifier" for "word-like thing that can usually be a function name or a variable name or whatever."
"Symbols", on the other hand, are a particular kind of value in Scheme and Lisp systems. Scheme has lots of different kinds of values: Numbers, Booleans, Strings, Pairs, Symbols, and others.
In Scheme, when developing a parser/interpreter/compiler/whatever, it turns out to be very convenient to use symbols (the values) to represent identifiers (the syntactic entities). Specifically, "quote" has a special ability to turn certain host language token sequences into lists of symbols, numbers, strings, and booleans. You don't need to take advantage of this, but it eliminates a lot of code.
Is there a way in Chicken Scheme to determine at run-time if a variable is currently defined?
(let ((var 1))
(print (is-defined? var)) ; #t
(print (is-defined? var)) ; #f
EDIT: XY problem.
I'm writing a macro that generates code. This generated code must call the macro in mutual recursion - having the macro simply call itself won't work. When the macro is recursively called, I need it to behave differently than when it is called initially. I would use a nested function, but uh....it's a macro.
Rough example:
(defmacro m (nested)
(if nested
BACKQUOTE(print "is nested")
BACKQUOTE(m #t)
(yes, I know scheme doesn't use defmacro, but I'm coming from Common Lisp. Also I can't seem to put backquotes in here without it all going to hell.)
I don't want the INITIAL call of the macro to take an extra argument that only has meaning when called recursively. I want it to know by some other means.
Can I get the generated code to call a macro that is nested within the first macro and doesn't exist at the call site, maybe? For example, generating code that calls (,other-macro) instead of (macro)?
But that shouldn't work, because a macro isn't a first-class object like a function is...
When you write recursive macros I get the impression that you have an macro expansion (m a b ...) that turns into a (m-helper a (b ...)) that might turn into (let (a ...) (m b ...)). That is not directly recursive since you are turning code into code that just happens to contain a macro.
With destructuring-bind you really only need to keep track of two variables. One for car and one for cdr and with an implicit renaming macro the stuff not coming from the form is renamed and thus hygenic:
(define-syntax destructuring-bind
(ir-macro-transformer
(lambda (form inject compare?)
(define (parse-structure structure expression optional? body)
;;actual magic happens here. Returns list structure with a mix of parts from structure as well as introduced variables and globals
)
(match form
[(structure expression) . body ]
`(let ((tmp ,expression))
,(parse-structure structure 'tmp #f body))))))
To check if something from input is the same symbol you use the supplied compare? procedure. eg. (compare? expression '&optional).
There's no way to do that in general, because Scheme is lexically scoped. It doesn't make much sense to ask if a variable is defined if an referencing an undefined variable is an error.
For toplevel/global variables, you can use the symbol-utils egg but it is probably not going to work as you expect, considering that global variables inside modules are also rewritten to be something else.
Perhaps if you can say what you're really trying to do, I can help you with an alternate solution.
I am cleaning up some (Chicken) scheme code and I want to identify all lists/procedures not used in a given program. Is there a specific option to pass either to the Chicken compiler or to csi -s I can use to do so without listing out each define and grep-ing for the identifiers in the *.scm scripts?
you could use the repl function from eval unit and pass to that an evaluator function that keeps track of the symbol if it is a list or a lambda before calling eval on the argument.
It is not possible to decide which top-level entries will be used, because it is possible to dynamically craft expressions:
(eval (list (string->symbol "+") 1 2)) → 3
It would be necessary to evaluate all possible permutations of your program.
If you put your code in a module, it will show a warning about unused, unexported identifiers when compiling it (you might need to use csc -v to show them).