Improve performance of a ClojureScript program - performance

I have a ClojureScript program that mainly performs math calculations on collections. It was developed in idiomatic, host-independent Clojure, so it's easy to benchmark it. To my surprise (and contrary to what the answers would suggest to Which is faster, Clojure or ClojureScript (and why)?), the same code in ClojureScript runs 5-10 times slower than its Clojure equivalent.
Here is what I did. I opened a lein repl and a browser repl at http://clojurescript.net/. Then I tried these snippets in both REPLs.
(time (dotimes [x 1000000] (+ 2 8)))
(let [coll (list 1 2 3)] (time (dotimes [x 1000000] (first coll))))
Then I opened a javascript console at the browser repl and wrote a minimalist benchmark function,
function benchmark(count, fun) {
var t0 = new Date();
for (i = 0; i < count; i++) {
fun();
}
var t1 = new Date();
return t1.getTime() - t0.getTime();
}
Back to the browser REPL:
(defn multiply [] (* 42 1.2))
Then try both native javascript multiplication, and its clojurescript variant in the javascript console,
benchmark(1000000, cljs.user.multiply);
benchmark(1000000, function(){ 42 * 1.2 });
What I found
Native javascript math is comparable to math in clojure
ClojureScript is 5-10 times slower than either of them
Now my question is, how can I improve the performance of my ClojureScript program?
There are some approaches I've considered so far
Fall back to using mutable javascript arrays and objects behind the scenes. (Is this possible at all?)
Fall back to using native javascript math operators. (Is this possible at all?)
Use javascript arrays explicitly with (aget js/v 0)
Use a less ambitious implementation of clojure-for-javascript, like https://github.com/chlorinejs/chlorine or https://github.com/gozala/wisp They generate a more idiomatic javascript, but they don't support namespaces which I 'm using a lot.

JavaScript has explicit return, so
function () { 42 * 1.2 }
does nothing; you'll want to benchmark
function () { return 42 * 1.2 }
instead. This happens to be exactly what the ClojureScript version compiles to, so there won't be any difference (in ClojureScript, basic arithmetic functions in non-higher-order usage get inlined as regular operator-based JavaScript expressions).
Now, Clojure is definitely faster than ClojureScript at this point. Part of the reason is that Clojure is still more carefully tuned than ClojureScript, although ClojureScript is improving at a pretty great pace in this department. Another part is that Clojure has a more mature JIT to take advantage of (the modern JS engines, V8 in particular, are pretty great, but not quite HotSpot-grade just yet).
The magnitude of the difference is somewhat tricky to measure, though; the fact that JITs are involved means that a loop with a body free of any side effects, such as the one in the question, will likely be optimized away, possibly even on the first run through it (through the use of on-stack replacement, used by HotSpot and I think also V8 -- I'd have to check to be sure though). So, better to benchmark something like
(def arr (long-array 1))
;;; benchmark this
(dotimes [_ 1000000]
(aset (longs arr) 0 (inc (aget (longs arr) 0))))
(longs call to avoid reflection in Clojure; could also use ^longs hint).
Finally, it certainly is the case, in both Clojure and ClojureScript, that for certain kinds of particularly performance-sensitive code it's best to use native arrays and such. Happily, there's no problem with doing so: on the ClojureScript side, you've got array, js-obj, aget, aset, make-array, you can use :mutable metadata on fields in deftype to be able to set! them in method bodies etc.

ClojureScript math is JavaScript math. Yes, if performance is critical, use JavaScript arrays and the provided low-level operators, these are guaranteed to produce optimal code where possible (i.e. no higher order usage). The ClojureScript persistent data structures are written this way: array mutation, arithmetic, bit twiddling.
I have a small example of efficient ClojureScript - http://github.com/swannodette/cljs-stl/blob/master/src/cljs_stl/spectral/demo.cljs that you might find useful as a guide.

Related

What is the performance cost of converting between seqs and vectors?

Many core Clojure functions return lazy sequences, even when vectors are passed into them. For example, if I had a vector of numbers, and wanted to filter them based on some predicate but get another vector back, I'd have to do something like this:
(into [] (filter my-pred my-vec))
Or:
(vec (filter my-pred my-vec))
Though I'm not sure if there's any meaningful difference between the two.
Is this operation expensive, or do you get it effectively for free, as when converting to/from a transient?
I understand that the seq is lazy so nothing will actually get calculated until you plop it into the output vector, but is there an overhead to converting from a seq and a concrete collection? Can it be characterized in terms of big-O, or does big-O not make sense here? What about the other way, when converting from a vector to a seq?
There's an FAQ in the Clojure site for good use cases for transducers, which could be handy for some complex transformations (more than just filtering, or when the predicate is fairly complex). Otherwise you can leverage on filterv, which is on the core library and you can assume it does any reasonable optimization for you.
TL;DR Don't worry about it
Longer version:
The main cost is memory allocation/GC. Usually this is trivial. If you have too much data to fit simultaneously in RAM, the lazy version can save you.
If you want to measure toy problems, you can experiment with the Criterium library. Try powers of 10 from 10^2 up to 10^9.
(crit/quick-bench (println :sum (reduce + 0 (into [] (range (Math/pow 10 N))))))
for N=2..9 with and without the (into [] ...) part.

Clojure comp doesn't tail-call-optimize (can create StackOverflow exception)

I got stuck on a Clojure program handling a very large amount of data (image data). When the image was larger than 128x128, the program would crash with a StackOverflow exception. Because it worked for smaller images, I knew it wasn't an infinite loop.
There were lots of possible causes of high memory usage, so I spent time digging around. Making sure I was using lazy sequences correctly, making sure to use recur as appropriate, etc. The turning point came when I realized that this:
at clojure.core$comp$fn__5792.invoke(core.clj:2569)
at clojure.core$comp$fn__5792.invoke(core.clj:2569)
at clojure.core$comp$fn__5792.invoke(core.clj:2569)
referred to the comp function.
So I looked at where I was using it:
(defn pipe [val funcs]
((apply comp funcs) val))
(pipe the-image-vec
(map
(fn [index] (fn [image-vec] ( ... )))
(range image-size)))
I was doing per-pixel operations, mapping a function to each pixel to process it. Interestingly, comp doesn't appear to benefit from tail-call optimization, or do any kind of sequential application of functions. It seems that it was just composing things in the basic way, which when there are 65k functions, understandably overflows the stack. Here's the fixed version:
(defn pipe [val funcs]
(cond
(= (count funcs) 0) val
:else (recur ((first funcs) val) (rest funcs))))
recur ensures the recursion gets tail-call optimized, preventing a stack buildup.
If anybody can explain why comp works this way (or rather, doesn't work this way), I'd love to be enlightened.
First, a more straightforward MCVE:
(def fs (repeat 1e6 identity))
((apply comp fs) 99)
#<StackOverflowError...
But why does this happen? If you look at the (abridged) comp source:
(defn comp
([f g]
(fn
([x] (f (g x)))
([f g & fs]
(reduce1 comp (list* f g fs))))
You can see that the whole thing is basically just 2 parts:
The first argument overload that does the main work of wrapping each composed function call in another function.
Reducing over the functions using comp.
I believe the first point is the problem. comp works by taking the list of functions and continually wrapping each set of calls in functions. Eventually, this will exhaust the stack space if you try to compose too many functions, as it ends up creating a massive function that's wrapping many other functions.
So, why can TCO not help here? Because unlike most StackOverflowErrors, recursion is not the problem here. The recursive calls only ever reach one frame deep in the variardic case at the bottom. The problem is the building up of a massive function, which can't simply be optimizated away.
Why were you able to "fix" it though? Because you have access to val, so you're able to evaluate the functions as you go instead of building up one function to call later. comp was written using a simple implementation that works fine for most cases, but fails for extreme cases like the one you presented. It's fairly trivial to write a specialized version that handles massive collections though:
(defn safe-comp [& fs]
(fn [value]
(reduce (fn [acc f]
(f acc))
value
(reverse fs))))
Of course note though, this doesn't handle multiple arities like the core version does.
Honestly though, in 3 and a bit years of using Clojure, I've never once written (apply comp ...). While it's certainly possible you have experienced a case I've never needed to deal with, it's more likely that you're using the wrong tool for the job here. When this code is complete, post it on Code Review and we may be able to suggest better ways of accomplishing what you're trying to do.

How can I speed up compilation of Common Lisp `IF` statements?

I have a system that generates decision trees and converts them into nested Common Lisp if statements with predicates that check if a variable value is >= or <= a given integer e.g.
(LAMBDA (V1 V2)
(IF (>= V1 2)
(IF (<= V1 3)
(IF (<= V2 3)
(IF (>= V2 2) 16 (IF (>= V2 1) 6 0))
(IF (<= V2 4) 10 0))
(IF (<= V1 4)
(IF (>= V2 1) (IF (<= V2 3) 6 0) 0)
0))
(IF (>= V1 1)
(IF (>= V2 2) (IF (<= V2 4) 10 0) 0)
0)))
I then use eval to compile the Lisp code, producing functions that run much faster than interpreting the original decision tree. This compilation step takes surprisingly long, though: a function with 5000 nested ifs takes over a minute to compile (in Clozure Common Lisp on a powerbook), even though generating the if statement took about 100 milliseconds. Why does such a simple structure take so long? Is there anything I can do to substantially speed it up, some declaration maybe? I'd greatly appreciate any pointers you can offer.
The actual portable function to compile functions is called COMPILE.
You can tell the Common Lisp compiler to invest less work via low optimize qualities for speed, space, debug and compilation-speed - whether this has any influence depends on the implementation.
The Clozure CL compiler is usually not the brightest one, but relatively fast. Generally I think the compiler maintainer might be able to give you more hints how to speed up compilation. Generally I would look for three
tell the compiler to do less work: no type inference, no code optimization, no generation of debug information, no space saving effort, ...
if it is necessary tell the compiler things which it would have to infer - like instead of type inference by the compiler, declare all the types during code generation. But that would mean that you actually need some advantage from type declarations like increased runtime safety or code optimizations.
the compiler itself may have speed penalties which may depend on the size of the source code. For example if that is quadratic, the compile time would increase by four if we double the code size. Only the compiler maintainers may know what to do in those cases - maybe they would need to implement more efficient data structures or similar....
The next option is to use a Lisp interpreter. They usually have very little definition time overhead - but the code usually runs much slower at runtime. In some problem domains it may be possible to follow a mixed approach: compile code which changes not very often and interpret code which changes often.
You could certainly (declare (optimize (compilation-speed 3))), and maybe reduce other qualities (see http://clhs.lisp.se/Body/d_optimi.htm#optimize).
However, I'd guess that the slow compilation is caused by the optimizations the compiler makes, so the result seems likely to be not so fast at execution time. But maybe not, you'd have to experiment.
I'd also think about what optimizations you could make yourself using your domain knowledge. Hints for that might also come from analyzing the output of disassemble on your generated functions.
Finally, maybe you can translate your decision trees into lookup tables, if the number of distinct values is not too big.

How can I tell if my tail-recursive Scheme function is being optimized correctly

I have a Scheme function who's basic form looks like this
(define (foo param var)
(cond ((end-condition) (return-something))
((other-end-condit) (return-something-else))
(else
(let ((newvar (if some-condition
(make-some-updated var)
(destructive-update! var))))
(foo param newvar)))))
I feel like this is pretty clearly something that needs to be optimized to iteration in compilation, but when I compile it (with chicken) it still runs incredibly slowly. (if I understand the R5RS specs: http://groups.csail.mit.edu/mac/ftpdir/scheme-reports/r5rs-html.old/r5rs_22.html, this looks like it should work)
I wrote the same exact algorithm with a while loop in python and the interpreted program terminated in seconds. My compiled scheme takes about 15 minutes, and I am positive the algorithm is the same.
I think this is a tail recursion not getting optimized issue, as I can't think what else it could possibly be, but I cant figure it out. Any ideas? For what its worth, the var is a hash and the destructive update is merely adding an element, although it also returns the updated hash to be passed in as newvar.
That function is indeed tail-recursive, so you're good there. However, tail recursion just means that stack space won't grow, not that your program is guaranteed to run fast. If you want to see if your program is really running tail-recursively, run it while watching the total memory taken by Chicken (and make sure you aren't allocating memory in make-some-updated, which you might be). If the memory grows, then Chicken isn't compiling your program correctly according to the standard.

Scheme Infix to Postfix

Let me establish that this is part of a class assignment, so I'm definitely not looking for a complete code answer. Essentially we need to write a converter in Scheme that takes a list representing a mathematical equation in infix format and then output a list with the equation in postfix format.
We've been provided with the algorithm to do so, simple enough. The issue is that there is a restriction against using any of the available imperative language features. I can't figure out how to do this in a purely functional manner. This is our fist introduction to functional programming in my program.
I know I'm going to be using recursion to iterate over the list of items in the infix expression like such.
(define (itp ifExpr)
(
; do some processing using cond statement
(itp (cdr ifExpr))
))
I have all of the processing implemented (at least as best I can without knowing how to do the rest) but the algorithm I'm using to implement this requires that operators be pushed onto a stack and used later. My question is how do I implement a stack in this function that is available to all of the recursive calls as well?
(Updated in response to the OP's comment; see the new section below the original answer.)
Use a list for the stack and make it one of the loop variables. E.g.
(let loop ((stack (list))
... ; other loop variables here,
; like e.g. what remains of the infix expression
)
... ; loop body
)
Then whenever you want to change what's on the stack at the next iteration, well, basically just do so.
(loop (cons 'foo stack) ...)
Also note that if you need to make a bunch of "updates" in sequence, you can often model that with a let* form. This doesn't really work with vectors in Scheme (though it does work with Clojure's persistent vectors, if you care to look into them), but it does with scalar values and lists, as well as SRFI 40/41 streams.
In response to your comment about loops being ruled out as an "imperative" feature:
(let loop ((foo foo-val)
(bar bar-val))
(do-stuff))
is syntactic sugar for
(letrec ((loop (lambda (foo bar) (do-stuff))))
(loop foo-val bar-val))
letrec then expands to a form of let which is likely to use something equivalent to a set! or local define internally, but is considered perfectly functional. You are free to use some other symbol in place of loop, by the way. Also, this kind of let is called 'named let' (or sometimes 'tagged').
You will likely remember that the basic form of let:
(let ((foo foo-val)
(bar bar-val))
(do-stuff))
is also syntactic sugar over a clever use of lambda:
((lambda (foo bar) (do-stuff)) foo-val bar-val)
so it all boils down to procedure application, as is usual in Scheme.
Named let makes self-recursion prettier, that's all; and as I'm sure you already know, (self-) recursion with tail calls is the way to go when modelling iterative computational processes in a functional way.
Clearly this particular "loopy" construct lends itself pretty well to imperative programming too -- just use set! or data structure mutators in the loop's body if that's what you want to do -- but if you stay away from destructive function calls, there's nothing inherently imperative about looping through recursion or the tagged let itself at all. In fact, looping through recursion is one of the most basic techniques in functional programming and the whole point of this kind of homework would have to be teaching precisely that... :-)
If you really feel uncertain about whether it's ok to use it (or whether it will be clear enough that you understand the pattern involved if you just use a named let), then you could just desugar it as explained above (possibly using a local define rather than letrec).
I'm not sure I understand this all correctly, but what's wrong with this simpler solution:
First:
You test if your argument is indeed a list:
If yes: Append the the MAP of the function over the tail (map postfixer (cdr lst)) to the a list containing only the head. The Map just applies the postfixer again to each sequential element of the tail.
If not, just return the argument unchanged.
Three lines of Scheme in my implementation, translates:
(postfixer '(= 7 (/ (+ 10 4) 2)))
To:
(7 ((10 4 +) 2 /) =)
The recursion via map needs no looping, not even tail looping, no mutation and shows the functional style by applying map. Unless I'm totally misunderstanding your point here, I don't see the need for all that complexity above.
Edit: Oh, now I read, infix, not prefix, to postfix. Well, the same general idea applies except taking the second element and not the first.

Resources