Clojure comp doesn't tail-call-optimize (can create StackOverflow exception) - performance

I got stuck on a Clojure program handling a very large amount of data (image data). When the image was larger than 128x128, the program would crash with a StackOverflow exception. Because it worked for smaller images, I knew it wasn't an infinite loop.
There were lots of possible causes of high memory usage, so I spent time digging around. Making sure I was using lazy sequences correctly, making sure to use recur as appropriate, etc. The turning point came when I realized that this:
at clojure.core$comp$fn__5792.invoke(core.clj:2569)
at clojure.core$comp$fn__5792.invoke(core.clj:2569)
at clojure.core$comp$fn__5792.invoke(core.clj:2569)
referred to the comp function.
So I looked at where I was using it:
(defn pipe [val funcs]
((apply comp funcs) val))
(pipe the-image-vec
(map
(fn [index] (fn [image-vec] ( ... )))
(range image-size)))
I was doing per-pixel operations, mapping a function to each pixel to process it. Interestingly, comp doesn't appear to benefit from tail-call optimization, or do any kind of sequential application of functions. It seems that it was just composing things in the basic way, which when there are 65k functions, understandably overflows the stack. Here's the fixed version:
(defn pipe [val funcs]
(cond
(= (count funcs) 0) val
:else (recur ((first funcs) val) (rest funcs))))
recur ensures the recursion gets tail-call optimized, preventing a stack buildup.
If anybody can explain why comp works this way (or rather, doesn't work this way), I'd love to be enlightened.

First, a more straightforward MCVE:
(def fs (repeat 1e6 identity))
((apply comp fs) 99)
#<StackOverflowError...
But why does this happen? If you look at the (abridged) comp source:
(defn comp
([f g]
(fn
([x] (f (g x)))
([f g & fs]
(reduce1 comp (list* f g fs))))
You can see that the whole thing is basically just 2 parts:
The first argument overload that does the main work of wrapping each composed function call in another function.
Reducing over the functions using comp.
I believe the first point is the problem. comp works by taking the list of functions and continually wrapping each set of calls in functions. Eventually, this will exhaust the stack space if you try to compose too many functions, as it ends up creating a massive function that's wrapping many other functions.
So, why can TCO not help here? Because unlike most StackOverflowErrors, recursion is not the problem here. The recursive calls only ever reach one frame deep in the variardic case at the bottom. The problem is the building up of a massive function, which can't simply be optimizated away.
Why were you able to "fix" it though? Because you have access to val, so you're able to evaluate the functions as you go instead of building up one function to call later. comp was written using a simple implementation that works fine for most cases, but fails for extreme cases like the one you presented. It's fairly trivial to write a specialized version that handles massive collections though:
(defn safe-comp [& fs]
(fn [value]
(reduce (fn [acc f]
(f acc))
value
(reverse fs))))
Of course note though, this doesn't handle multiple arities like the core version does.
Honestly though, in 3 and a bit years of using Clojure, I've never once written (apply comp ...). While it's certainly possible you have experienced a case I've never needed to deal with, it's more likely that you're using the wrong tool for the job here. When this code is complete, post it on Code Review and we may be able to suggest better ways of accomplishing what you're trying to do.

Related

I can't seem to wrap my mind around call/cc in Scheme

Does anyone have a good guide as to how it works? Something with visual aids would be nice, every guide I've come across all seem to say the same thing I need a fresh take on it.
Here's the diagram that was left on our CS lab's whiteboard. So you're going to fetch some apples, and you grab a continuation before you begin. You wander through the forest, collecting apples, when at the end you apply your continuation on your apples. Suddenly, you find yourself where you were before you went into the forest, except with all of your apples.
(display
(call/cc (lambda (k)
(begin
(call-with-forest
(lambda (f)
(k (collect-apples f))))
(get-eaten-by-a-bear)))))
=> some apples (and you're not eaten by a bear)
I think a Bar Mitzvah and buried gold might have been involved.
Have a look at the continuation part of PLAI -- it's very "practical
oriented", and it uses a "black-hole" visualization for continuations that can help you
understand it.
There is no shortcut in learning call/cc. Read the chapters in The Scheme Programming Language or Teach Yourself Scheme in Fixnum Days.
I found that it helps to visualize the call stack. When evaluating an expression, keep track of the call stack at every step. (See for example http://4.flowsnake.org/archives/602) This may be non-intuitive at first, because in most languages the call stack is implicit; you don't get to manipulate it directly.
Now think of a continuation as a function that saves the call stack. When that function is called (with a value X), it restores the saved call stack, then passes X to it.
Never likes visual representation of call/cc as I can't reflect it back to the code (yes, poor imagination) ;)
Anyway, I think it is easier start not with call/cc but with call/ec (escape continuation) if you already familiar with exceptions in other languages.
Here is some code which should evaluate to value:
(lambda (x) (/ 1 x))
What if x will be equal '0'? In other languages we can throw exception, what about scheme?
We can throw it too!
(lambda (x) (call/ec (cont)
(if (= x 0) (cont "Oh noes!") (/ 1 x))))
call/ec (as well as call/cc) is works like "try" here. In imperative languages you can easily jump out of function simply returning value or throwing exception.
In functional you can't jump out, you should evaluate something. And call/* comes to rescue.
What it does it represent expression under "call/ec" as function (this named "cont" in my case) with one argument. When this function is called it replaces the WHOLE call/* to it's argument.
So, when (cont "Oh noes!") replaces (call/ec (cont) (if (= x 0) (cont "Oh noes!") (/ 1 x))) to "Oh noes!" string.
call/cc and call/ec are almost equals to each other except ec simplier to implement. It allows only jump up, whil cc may be jumped down from outside.

Common Lisp: What is the downside to using this filter function on very large lists?

I want to filter out all elements of list 'a from list 'b and return the filtered 'b. This is my function:
(defun filter (a b)
"Filters out all items in a from b"
(if (= 0 (length a)) b
(filter (remove (first a) a) (remove (first a) b))))
I'm new to lisp and don't know how 'remove does its thing, what kind of time will this filter run in?
There are two ways to find out:
you could test it with data
you could analyze your source code
Let's look at the source code.
lists are built of linked cons cells
length needs to walk once through a list
for EVERY recursive call of FILTER you compute the length of a. BAD!
(Use ENDP instead.)
REMOVE needs to walk once through a list
for every recursive call you compute REMOVE twice: BAD!
(Instead of using REMOVE on a, recurse with the REST.)
the call to FILTER will not necessarily be an optimized tail call.
In some implementations it might, in some you need to tell the compiler
that you want to optimize for tail calls, in some implementations
no tail call optimization is available. If not, then you get a stack
overflow on long enough lists.
(Use looping constructs like DO, DOLIST, DOTIMES, LOOP, REDUCE, MAPC, MAPL, MAPCAR, MAPLIST, MAPCAN, or MAPCON instead of recursion, when applicable.)
Summary: that's very naive code with poor performance.
Common Lisp provides this built in: SET-DIFFERENCE should do what you want.
http://www.lispworks.com/documentation/HyperSpec/Body/f_set_di.htm#set-difference
Common Lisp does not support tail-call optimization (as per the standard) and you might just run out of memory with an abysmal call-stack (depending on the implementation).
I would not write this function, becuase, as Rainer Joswig says, the standard already provides SET-DIFFERENCE. Nonetheless, if I had to provide an implementation of the function, this is the one I would use:
(defun filter (a b)
(let ((table (make-hash-table)))
(map 'nil (lambda (e) (setf (gethash e table) t)) a)
(remove-if (lambda (e) (gethash e table)) b)))
Doing it this way provides a couple of advantages, the most important one being that it only traverses b once; using a hash table to keep track of what elements are in a is likely to perform much better if a is long.
Also, using the generic sequence functions like MAP and REMOVE-IF mean that this function can be used with strings and vectors as well as lists, which is an advantage even over the standard SET-DIFFERENCE function. The main downside of this approach is if you want extend the function with a :TEST argument that allows the user to provide an equality predicate other than the default EQL, since CL hash-tables only work with a small number of pre-defined equality predicates (EQ, EQL, EQUAL and EQUALP to be precise).
(defun filter (a b)
"Filters out all items in a from b"
(if (not (consp a)) b
(filter (rest a) (rest b))))

reduce, or explicit recursion?

I recently started reading through Paul Graham's On Lisp with a friend, and we realized that we have very different opinions of reduce: I think it expresses a certain kind of recursive form very clearly and concisely; he prefers to write out the recursion very explicitly.
I suspect we're each right in some context and wrong in another, but we don't know where the line is. When do you choose one form over the other, and what do you think about when making that choice?
To be clear about what I mean by reduce vs. explicit recursion, here's the same function implemented twice:
(defun my-remove-if (pred lst)
(fold (lambda (left right)
(if (funcall pred left)
right
(cons left right)))
lst :from-end t))
(defun my-remove-if (pred lst)
(if lst
(if (funcall pred (car lst))
(my-remove-if pred (cdr lst))
(cons (car lst) (my-remove-if pred (cdr lst))))
'()))
I'm afraid I started out a Schemer (now we're Racketeers?) so please let me know if I've botched the Common Lisp syntax. Hopefully the point will be clear even if the code is incorrect.
If you have a choice, you should always express your computational intent in the most abstract terms possible. This makes it easier for a reader to figure out your intentions, and it makes it easier for the compiler to optimize your code. In your example, when the compiler trivially knows you are doing a fold operation by virtue of you naming it, it also trivially knows that it could possibly parallelize the leaf operations. It would be much harder for a compiler to figure that out when you write extremely low level operations.
I'm going to take a slightly-subjective question and give a highly-subjective answer, since Ira already gave a perfectly pragmatic and logical one. :-)
I know writing things out explicitly is highly valued in some circles (the Python guys make it part of their "zen"), but even when I was writing Python I never understood it. I want to write at the highest level possible, all the time. When I want to write things out explicitly, I use assembly language. The point of using a computer (and a HLL) is to get it to do these things for me!
For your my-remove-if example, the reduce one looks fine to me (apart from the Scheme-isms like fold and lst :-)). I'm familiar with the concept of reduce, so all I need to understand it is figure out your f(x,y) -> z. For the explicit variant, I had to think it for a second: I have to figure out the loop myself. Recursion isn't the hardest concept out there, but I think it is harder than "a function of two arguments".
I also don't care for a whole line being repeated -- (my-remove-if pred (cdr lst)). I think I like Lisp in part because I'm absolutely ruthless at DRY, and Lisp allows me to be DRY on axes that other languages don't. (You could put in another LET at the top to avoid this, but then it's longer and more complex, which I think is another reason to prefer the reduction, though at this point I might just be rationalizing.)
I think maybe the contexts in which the Python guys, at least, dislike implicit functionality would be:
when no-one could be expected to guess the behavior (like frobnicate("hello, world", True) -- what does True mean?), or:
cases when it's reasonable for implicit behavior to change (like when the True argument gets moved, or removed, or replaced with something else, since there's no compile-time error in most dynamic languages)
But reduce in Lisp fails both of these criteria: it's a well-understood abstraction that everybody knows, and that isn't going to change, at least not on any timescale I care about.
Now, I absolutely believe there are some cases where it'd be easier for me to read an explicit function call, but I think you'd have to be pretty creative to come up with them. I can't think of any offhand, because reduce and mapcar and friends are really good abstractions.
In Common Lisp one prefers the higher-order functions for data structure traversal, filtering, and other related operations over recursion. That's also to see from many provided functions like REDUCE, REMOVE-IF, MAP and others.
Tail recursion is a) not supported by the standard, b) maybe invoked differently with different CL compilers and c) using tail recursion may have side effects on the generated machine code for surrounding code.
Often, for certain data structures, many of these above operations are implemented with LOOP or ITERATE and provided as higher-order function. There is a tendency to prefer new language extensions (like LOOP and ITERATE) for iterative code over using recursion for iteration.
(defun my-remove-if (pred list)
(loop for item in list
unless (funcall pred item)
collect item))
Here is also a version that uses the Common Lisp function REDUCE:
(defun my-remove-if (pred list)
(reduce (lambda (left right)
(if (funcall pred left)
right
(cons left right)))
list
:from-end t
:initial-value nil))

Scheme Infix to Postfix

Let me establish that this is part of a class assignment, so I'm definitely not looking for a complete code answer. Essentially we need to write a converter in Scheme that takes a list representing a mathematical equation in infix format and then output a list with the equation in postfix format.
We've been provided with the algorithm to do so, simple enough. The issue is that there is a restriction against using any of the available imperative language features. I can't figure out how to do this in a purely functional manner. This is our fist introduction to functional programming in my program.
I know I'm going to be using recursion to iterate over the list of items in the infix expression like such.
(define (itp ifExpr)
(
; do some processing using cond statement
(itp (cdr ifExpr))
))
I have all of the processing implemented (at least as best I can without knowing how to do the rest) but the algorithm I'm using to implement this requires that operators be pushed onto a stack and used later. My question is how do I implement a stack in this function that is available to all of the recursive calls as well?
(Updated in response to the OP's comment; see the new section below the original answer.)
Use a list for the stack and make it one of the loop variables. E.g.
(let loop ((stack (list))
... ; other loop variables here,
; like e.g. what remains of the infix expression
)
... ; loop body
)
Then whenever you want to change what's on the stack at the next iteration, well, basically just do so.
(loop (cons 'foo stack) ...)
Also note that if you need to make a bunch of "updates" in sequence, you can often model that with a let* form. This doesn't really work with vectors in Scheme (though it does work with Clojure's persistent vectors, if you care to look into them), but it does with scalar values and lists, as well as SRFI 40/41 streams.
In response to your comment about loops being ruled out as an "imperative" feature:
(let loop ((foo foo-val)
(bar bar-val))
(do-stuff))
is syntactic sugar for
(letrec ((loop (lambda (foo bar) (do-stuff))))
(loop foo-val bar-val))
letrec then expands to a form of let which is likely to use something equivalent to a set! or local define internally, but is considered perfectly functional. You are free to use some other symbol in place of loop, by the way. Also, this kind of let is called 'named let' (or sometimes 'tagged').
You will likely remember that the basic form of let:
(let ((foo foo-val)
(bar bar-val))
(do-stuff))
is also syntactic sugar over a clever use of lambda:
((lambda (foo bar) (do-stuff)) foo-val bar-val)
so it all boils down to procedure application, as is usual in Scheme.
Named let makes self-recursion prettier, that's all; and as I'm sure you already know, (self-) recursion with tail calls is the way to go when modelling iterative computational processes in a functional way.
Clearly this particular "loopy" construct lends itself pretty well to imperative programming too -- just use set! or data structure mutators in the loop's body if that's what you want to do -- but if you stay away from destructive function calls, there's nothing inherently imperative about looping through recursion or the tagged let itself at all. In fact, looping through recursion is one of the most basic techniques in functional programming and the whole point of this kind of homework would have to be teaching precisely that... :-)
If you really feel uncertain about whether it's ok to use it (or whether it will be clear enough that you understand the pattern involved if you just use a named let), then you could just desugar it as explained above (possibly using a local define rather than letrec).
I'm not sure I understand this all correctly, but what's wrong with this simpler solution:
First:
You test if your argument is indeed a list:
If yes: Append the the MAP of the function over the tail (map postfixer (cdr lst)) to the a list containing only the head. The Map just applies the postfixer again to each sequential element of the tail.
If not, just return the argument unchanged.
Three lines of Scheme in my implementation, translates:
(postfixer '(= 7 (/ (+ 10 4) 2)))
To:
(7 ((10 4 +) 2 /) =)
The recursion via map needs no looping, not even tail looping, no mutation and shows the functional style by applying map. Unless I'm totally misunderstanding your point here, I don't see the need for all that complexity above.
Edit: Oh, now I read, infix, not prefix, to postfix. Well, the same general idea applies except taking the second element and not the first.

Append! in Scheme?

I'm learning R5RS Scheme at the moment (from PocketScheme) and I find that I could use a function that is built into some variants of Scheme but not all: Append!
In other words - destructively changing a list.
I am not so much interested in the actual code as an answer as much as understanding the process by which one could pass a list as a function (or a vector or string) and then mutate it.
example:
(define (append! lst var)
(cons (lst var))
)
When I use the approach as above, I have to do something like (define list (append! foo (bar)) which I would like something more generic.
Mutation, though allowed, is strongly discouraged in Scheme. PLT even went so far as to remove set-car! and set-cdr! (though they "replaced" them with set-mcar! and set-mcdr!). However, a spec for append! appeared in SRFI-1. This append! is a little different than yours. In the SRFI, the implementation may, but is not required to modify the cons cells to append the lists.
If you want to have an append! that is guaranteed to change the structure of the list that's being appended to, you'll probably have to write it yourself. It's not hard:
(define (my-append! a b)
(if (null? (cdr a))
(set-cdr! a b)
(my-append! (cdr a) b)))
To keep the definition simple, there is no error checking here, but it's clear that you will need to pass in a list of length at least 1 as a, and (preferably) a list (of any length) as b. The reason a must be at least length 1 is because you can't set-cdr! on an empty list.
Since you're interested in how this works, I'll see if I can explain. Basically, what we want to do is go down the list a until we get to the last cons pair, which is (<last element> . null). So we first see if a is already the last element in the list by checking for null in the cdr. If it is, we use set-cdr! to set it to the list we're appending, and we're done. If not, we have to call my-append! on the cdr of a. Each time we do this we get closer to the end of a. Since this is a mutation operation, we're not going to return anything, so we don't need to worry about forming our modified list as the return value.
Better late than never for putting in a couple 2-3 cents on this topic...
(1) There's nothing wrong with using the destructive procedures in Scheme while there is a single reference to the stucture being modified. So for example, building a large list efficiently, piecemeal via a single reference - and when complete, making that (now presumably not-to-be-modified) list known and referred to from various referents.
(2) I think APPEND! should behave like APPEND, only (potentially) destructively. And so APPEND! should expect any number of lists as arguments. Each list but the last would presumably be SET-CDR!'d to the next.
(3) The above definition of APPEND! is essentially NCONC from Mac Lisp and Common Lisp. (And other lisps).

Resources