What are the continuation passing style conversion rules? - scheme

I am trying to understand continuation passing style conversion.
I am trying to build a Scheme to C compiler and I want to use continuation passing style. Whether or not continuation passing style is the right way to do this can you guys explain me the conversion rules?
For example, here is a presentation by Marc Feeley, the creator of Gambit Scheme.
I will summarize the continuation passing style rules he gives, but note: I don't understand them. In particular I don't understand what C is.
Here is the notation:
[E]
C
which denotes the continuation passing style conversion of the expression E in the continuation C.
So, for example, one conversion rule is this one:
[c]
C
==>
(C c) ;; application of C to c
which denotes the CPS conversion of the constant c in the continuation C.
What is C? I am having trouble understanding what C is. It is like magic.
Another rule is:
[(if E1 E2 E3)]
C
==>
[E1]
(lambda (r1)
(if r1 [E2] [E3]))
C C
where E1 gets passed to r1.
But what is C?
Can you guys please explain?
Thanks.

If you scroll higher up in the article, to page 7, you will find definitions of what a continuation is, which is necessary to understand the rules for converting to continuation-passing style. An example given is
> (sqrt (+ (read) 1))
and it notes that the continuation for (read) is
a computation that takes a value, adds 1 to it, computes
its square-root, prints the result and goes to the next REPL interaction.
So the continuation C for an expression E is "whatever happens to the value of this expression". This is reiterated on page 20, with the example
(let ((square (lambda (x) (* x x))))
(write (+ (square 10) 1)))
where the continuation of (square 10) is
(lambda (r) (write (+ r 1)))
So as you are recursively translating the program to CPS style, the continuation C will grow as you get deeper into the expression. Note that each of the translation rules [E]|C results in a smaller un-translated E, perhaps empty if E was simple enough, and a larger translated C.

When you convert a code in CPS you practically introduce a strict discipline in evaluation.
When you write (+ x y z), it is unclear the order in which you evaluate each of +, x, y, z. If the language you write in explicitly defines an order, you know what happens. But if the language does not insert an order, you can define the order you wish by writing in/converting the code into CPS, in the example I proposed you would write:
(eval + (lambda (+)
(eval x (lambda(x)
(eval y (lambda (y)
(eval z (lambda (z)
(+ x y z))))
if you want a left-right evaluation.
If you write your code in CPS, this is like writing code in assembler, as each piece of code can be associated with an instruction that has a corresponding in very low programming. If you convert some code in CPS, you need to uniquely rename variables to avoid collisions. At the time the C language was created, I think it was not clearly defined the CPS transform; this is why the inline specifier rejects recursive calls. It is possible to convert a recursive function into a goto-loop by rewriting the C code and using the CPS transform, but standard C does not want to.
The ways to convert the code into CPS are many. In Mit-scheme for example, the input code is not explicitly rewritten in CPS form, but the evaluation process uses a combination of goto statements and trampoline calls to simulate it (this is a way you won's learn in school about, but it is used in practice).
The recursive CPS code can be converted directly into iterative loops (this is why scheme->C translators do the conversion first) to solve the tail recursion. The first edition of EPL of Dan Friedman details it. There is also an article of Friedman on this. If you cannot find it, I will try to find it for you.

Related

How does call/cc work with the CPS transformation from "Lisp in Small Pieces"?

The book Lisp in Small Pieces demonstrates a transformation from Scheme into continuation passing style (chapter 5.9.1, for those who have access to the book). The transformation represents continuations by lambda forms and call/cc is supposed to become equivalent to a simple (lambda (k f) (f k k)).
I do not understand how this can work because there is no distinction between application of functions and continuations.
Here is a version of the transformation stripped from everything except application (the full version can be found in this gist):
(define (cps e)
(if (pair? e)
(case (car e)
; ...
(else (cps-application e)))
(lambda (k) (k `,e))))
(define (cps-application e)
(lambda (k)
((cps-terms e)
(lambda (t*)
(let ((d (gensym)))
`(,(car t*) (lambda (,d) ,(k d))
. ,(cdr t*)))))))
(define (cps-terms e*)
(if (pair? e*)
(lambda (k)
((cps (car e*))
(lambda (a)
((cps-terms (cdr e*))
(lambda (a*)
(k (cons a a*)))))))
(lambda (k) (k '()))))
Now consider the CPS example from Wikipedia:
(define (f return)
(return 2)
3)
Above transformation would convert the application in the function body (return 2) to something like (return (lambda (g13) ...) 2). A continuation is passed as the first argument and the value 2 as the second argument. This would be fine if return was an ordinary function. However, return is supposed to be a continuation, which only takes a single argument.
I don't see how the pieces fit together. How can the transformation represent continuations as lambda forms but not give special consideration to their application?
I do not understand how this can work because there is no distinction between application of functions and continuations.
Implementing continuations without CPS requires approaches at the virtual machine level, such as using "spaghetti stacks": allocating lexical variables in heap-allocated frames that are subject to garbage collection. Capturing a continuation then just means obtaining an environment pointer which refers to a lexical frame in the spaghetti stack.
CPS builds a de facto spaghetti stack out of closures. A closure captures lexical bindings into an object with an indefinite lifetime. Under CPS, all closures capture the hidden variable k. That k serves the role of the parent frame pointer in the spaghetti stack; it chains the closures together.
Because the whole program is consistently CPS-transformed, there is a k parameter everywhere which points to a dynamically linked chain of closed-over environments that amounts to a de facto stack where execution can be restored.
The one missing piece of the puzzle is that CPS depends on tail calls. Tail calls ensure that we are not using the real stack; everything interesting is in the closed-over environments.
(However, even tail calls are not strictly required, as Henry Baker's approach, embodied in Chicken Scheme, teaches us. Our CPS-transformed code can use real calls that consume stack, but never return. Every once in a while we can move the reachable environment frames (and all contingent objects) from the stack into the heap, and rewind the stack pointer.)
Now consider the CPS example from Wikipedia:
Ah, but that's not a CPS example; that's an example of application code that uses continuations that are available somehow via call/cc.
It becomes CPS if either we transform it to CPS by hand, or use a compiler which does that mechanically.
However, return is supposed to be a continuation, which only takes a single argument.
Thus, return only takes a single argument because we're looking at application source code that hasn't been CPS-transformed.
The application-level continuations take one argument.
The CPS-implementation-level continuations will have the hidden k argument, like all functions.
The k parameter is analogous to a piece of machine context, like a stack or frame pointer. When using a conventional language, and call print("hello"), you don't ask, how come there is only one argument? Doesn't print have to receive the stack pointer so it knows where the parameters are? Of course when the print is compiled, the compiled code has a way of conveying that context from one function to another, invisible to the high level language.
In the case of CPS in Scheme, it's easy to get confused because the source and target language are both Scheme.

Bad Let in Form Scheme

(define (prime max)
(let ((a 2)))
(if not(= modulo max 2) 0)
((+ a 1)
prime(max))
)
It tells me bad let in form (let ((a 2))) but as far as I'm aware, the syntax and code is right
No, it is not right. let form has this syntax: (let binds body) Your bindings are ((a 2)). Where's your body? You put it outside the let form. This raises two problems: let is malformed by only having one argument instead of two, and a is undeclared at the location it appears in. (Without going into the logic of the code, which is also incorrect, assuming you are trying for a primality test function.)
let format is
(let ((<var1> <value1>)
(<var2> <value2>)
...
(<varN> <valueN>))
<expr1>
<expr2>
...
<exprN>)
Also the general form for calling a function is
(<function> <arg1> <arg2> ... <argN>)
so your not call is wrong, should be (not ...) and the call to prime should have the form (prime max).
You got the addition "operator" (+ a 1) correct but indeed one big difference between Lisp dialects and other languages is that you don't have special operators, just functions. (+ a 1) is just like (add a 1): you are just calling a function that is named +; no special unary prefix/postfix case or precedence and associativity rules... just functions: not is a function + is a function.
Lisp "syntax" may feel weird at first (if and because you've been exposed to other programming languages before), but the problem doesn't last long and after a little all the parenthesis just disappear and you begin to "see" the simple tree structure of the code.
On the other spectrum of syntax complexity you've for example C++ that is so complex that even expert programmers and compiler authors can debate long just about how to interpret and what is the semantic meaning of a given syntax construct. Not kidding there are C++ rules that goes more of less "if a syntax is ambiguous and could be considered both as a declaration and as an expression, then it's a declaration" (https://en.wikipedia.org/wiki/Most_vexing_parse). Go figure.

Scheme - converting to continuation-passing style

I kinda understand how to convert elementary functions such as arithmetics to continuation-passing style in Scheme.
But what if the function involves recursion?
For example,
(define funname
(lambda (arg0 arg1)
(and (some procedure)
(funname (- arg0 1) arg1))))
Please give me advices.
Thank you in advance.
One place that has a good explanation on continuations and CPS is Krishnamurthi's PLAI book. The relevant part (VII) doesn't depend on other parts of the book so you can jump right in there. There is specifically an extended example of converting code to CPS manually, and tackling recursive functions (the first part of chapter 17).
In addition, I wrote an extended version of that text for my class, which has more examples and more details on the subject -- you might find that useful too. In addition to the PLAI text, I cover some common uses of continuations like implementing generators, the ambiguous operator and more. (But note that PLAI continues with a discussion of implementation strategies, which my text doesn't cover.)
(define (func x y k)
(some-procedure
(lambda (ret)
(if ret
(- x 1
(lambda (ret)
(func ret y k)))
(k #f))))
You are lacking a base case, which is why the only explicit call to the continuation is (k #f). If you have a base case, then you'd pass the base case return value to the continuation, also. For example:
(define (func x y k)
(zero? x
(lambda (ret)
(if ret
(k y)
(some-procedure
(lambda (ret)
(if ret
(- x 1
(lambda (ret)
(func ret y k)))
(k #f))))))))
This partly duplicates Chris Jester-Young's answer, but well, I hope I can explain it better :-).
In CPS, the difference you're seeking is between these two things (roughly):
You can invoke a procedure, and pass it the continuation you were passed. That's the equivalent of a direct-style optimized tail call.
Or, you can invoke a procedure, and pass in as its continuation a new procedure that does something with the "return value," passing in your original continuation. This is the equivalent of a direct-style stack call.
The latter is what the lambdas in Chris's example are doing. Basically, evaluating a lambda creates a closure—and these closures are used to do the same job that stack frames do in the execution of a a direct-style program. In place of the return address in a stack frame, the closure contains a binding for a continuation function, and the code for the closure invokes this.

Common lisp macro syntax keywords: what do I even call this?

I've looked through On Lisp, Practical Common Lisp and the SO archives in order to answer this on my own, but those attempts were frustrated by my inability to name the concept I'm interested in. I would be grateful if anyone could just tell me the canonical term for this sort of thing.
This question is probably best explained by an example. Let's say I want to implement Python-style list comprehensions in Common Lisp. In Python I would write:
[x*2 for x in range(1,10) if x > 3]
So I begin by writing down:
(listc (* 2 x) x (range 1 10) (> x 3))
and then defining a macro that transforms the above into the correct comprehension. So far so good.
The interpretation of that expression, however, would be opaque to a reader not already familiar with Python list comprehensions. What I'd really like to be able to write is the following:
(listc (* 2 x) for x in (range 1 10) if (> x 3))
but I haven't been able to track down the Common Lisp terminology for this. It seems that the loop macro does exactly this sort of thing. What is it called, and how can I implement it? I tried macro-expanding a sample loop expression to see how it's put together, but the resulting code was unintelligible. Could anyone guide me in the right direction?
Thanks in advance.
Well, what for does is essentially, that it parses the forms supplied as its body. For example:
(defmacro listc (expr &rest forms)
;;
;;
;; (listc EXP for VAR in GENERATOR [if CONDITION])
;;
;;
(labels ((keyword-p (thing name)
(and (symbolp thing)
(string= name thing))))
(destructuring-bind (for* variable in* generator &rest tail) forms
(unless (and (keyword-p for* "FOR") (keyword-p in* "IN"))
(error "malformed comprehension"))
(let ((guard (if (null tail) 't
(destructuring-bind (if* condition) tail
(unless (keyword-p if* "IF") (error "malformed comprehension"))
condition))))
`(loop
:for ,variable :in ,generator
:when ,guard
:collecting ,expr)))))
(defun range (start end &optional (by 1))
(loop
:for k :upfrom start :below end :by by
:collecting k))
Apart from the hackish "parser" I used, this solution has a disadvantage, which is not easily solved in common lisp, namely the construction of the intermediate lists, if you want to chain your comprehensions:
(listc x for x in (listc ...) if (evenp x))
Since there is no moral equivalent of yield in common lisp, it is hard to create a facility, which does not require intermediate results to be fully materialized. One way out of this might be to encode the knowledge of possible "generator" forms in the expander of listc, so the expander can optimize/inline the generation of the base sequence without having to construct the entire intermediate list at run-time.
Another way might be to introduce "lazy lists" (link points to scheme, since there is no equivalent facility in common lisp -- you had to build that first, though it's not particularily hard).
Also, you can always have a look at other people's code, in particular, if they tries to solve the same or a similar problem, for example:
Iterate
Loop in SBCL
Pipes (which does the lazy list thing)
Macros are code transformers.
There are several ways of implementing the syntax of a macro:
destructuring
Common Lisp provides a macro argument list which also provides a form of destructuring. When a macro is used, the source form is destructured according to the argument list.
This limits how macro syntax looks like, but for many uses of Macros provides enough machinery.
See Macro Lambda Lists in Common Lisp.
parsing
Common Lisp also gives the macro the access to the whole macro call form. The macro then is responsible for parsing the form. The parser needs to be provided by the macro author or is part of the macro implementation done by the author.
An example would be an INFIX macro:
(infix (2 + x) * (3 + sin (y)))
The macro implementation needs to implement an infix parser and return a prefix expression:
(* (+ 2 x) (+ 3 (sin y)))
rule-based
Some Lisps provide syntax rules, which are matched against the macro call form. For a matching syntax rule the corresponding transformer will be used to create the new source form. One can easily implement this in Common Lisp, but by default it is not a provided mechanism in Common Lisp.
See syntax case in Scheme.
LOOP
For the implementation of a LOOP-like syntax one needs to write a parser which is called in the macro to parse the source expression. Note that the parser does not work on text, but on interned Lisp data.
In the past (1970s) this has been used in Interlisp in the so-called 'Conversational Lisp', which is a Lisp syntax with a more natural language like surface. Iteration was a part of this and the iteration idea has then brought to other Lisps (like Maclisp's LOOP, from where it then was brought to Common Lisp).
See the PDF on 'Conversational Lisp' by Warren Teitelmann from the 1970s.
The syntax for the LOOP macro is a bit complicated and it is not easy to see the boundaries between individual sub-statements.
See the extended syntax for LOOP in Common Lisp.
(loop for i from 0 when (oddp i) collect i)
same as:
(loop
for i from 0
when (oddp i)
collect i)
One problem that the LOOP macro has is that the symbols like FOR, FROM, WHEN and COLLECT are not the same from the "COMMON-LISP" package (a namespace). When I'm now using LOOP in source code using a different package (namespace), then this will lead to new symbols in this source namespace. For that reason some like to write:
(loop
:for i :from 0
:when (oddp i)
:collect i)
In above code the identifiers for the LOOP relevant symbols are in the KEYWORD namespace.
To make both parsing and reading easier it has been proposed to bring parentheses back.
An example for such a macro usage might look like this:
(iter (for i from 0) (when (oddp i) (collect i)))
same as:
(iter
(for i from 0)
(when (oddp i)
(collect i)))
In above version it is easier to find the sub-expressions and to traverse them.
The ITERATE macro for Common Lisp uses this approach.
But in both examples, one needs to traverse the source code with custom code.
To complement Dirk's answer a little:
Writing your own macros for this is entirely doable, and perhaps a nice exercise.
However there are several facilities for this kind of thing (albeit in a lisp-idiomatic way) out there of high quality, such as
Loop
Iterate
Series
Loop is very expressive, but has a syntax not resembling the rest of common lisp. Some editors don't like it and will indent poorly. However loop is defined in the standard. Usually it's not possible to write extentions to loop.
Iterate is even more expressive, and has a familiar lispy syntax. This doesn't require any special indentation rules, so all editors indenting lisp properly will also indent iterate nicely. Iterate isn't in the standard, so you'll have to get it yourself (use quicklisp).
Series is a framework for working on sequences. In most cases series will make it possible not to store intermediate values.

Seeking contrived example code: continuations!

So I believe I understand continuations now, at least on some level, thanks to the community scheme wiki and Learn Scheme in Fixnum Days.
But I'd like more practice -- that is, more example code I can work through in my head (preferably contrived, so there's not extraneous stuff to distract from the concept).
Specifically, I'd like to work through more problems with continuations that resume and/or coroutines, as opposed to just using them to exit a loop or whatever (which is fairly straightforward).
Anyway, if you know of good tutorials besides the ones I linked above, or if you'd care to post something you've written that would be a good exercise, I'd be very appreciative!
Yeah, continuations can be pretty mind-bending. Here's a good puzzle I found a while back - try to figure out what's printed and why:
(define (mondo-bizarro)
(let ((k (call/cc (lambda (c) c)))) ; A
(write 1)
(call/cc (lambda (c) (k c))) ; B
(write 2)
(call/cc (lambda (c) (k c))) ; C
(write 3)))
(mondo-bizarro)
Explanation of how this works (contains spoilers!):
The first call/cc stores returns it's own continuation and stores it in k.
The number 1 is written to the screen.
The current continuation, which is to continue at point B, is returned to k, which returns to A
This time, k is now bound to the continuation we got at B
The number 1 is written again to the screen
The current continuation, which is to continue at point B, is returned to k, which is another (but different) continuation to another point B
Once we're back in the original continuation, it's important to note that here k is still bound to A
The number 2 is written to the screen
The current continuation, which is to continue at point C, is returned to k, which returns to A
This time, k is now bound to the continuation we got at C
The number 1 is written again to the screen
The current continuation, which is to continue at point B, is returned to k, which returns to C
The number 3 is written to the screen
And you're done
Therefore, the correct output is 11213. The most common sticking point I've put in bold text - it's important to note that when you use continuations to 'reset' the value of k that it doesn't affect the value of k back in the original continuation. Once you know that it becomes easier to understand.
Brown University's programming languages course has a problem set on continuations publicly available.

Resources