optimizing simple Common Lisp gibbs sampler program - performance

As an exercise, I rewrote the example program in the blog post Gibbs sampler in various languages (revisited) by Darren Wilkinson.
The code appears below. This code runs on my (5 year old) machine in around 53 seconds, using SBCL 1.0.56, creating a core image using buildapp, and then running it with
time ./gibbs > gibbs.dat
Since this was how the timings were calculated for the other languages in the post, I thought I would do something comparable
The C code in the post runs in around 25 seconds. I'd like to try and speed up the Lisp code if possible.
##############################
gibbs.lisp
##############################
(eval-when (:compile-toplevel :load-toplevel :execute)
(require :cl-rmath) (setf *read-default-float-format* 'double-float))
(defun gibbs (N thin)
(declare (fixnum N thin))
(declare (optimize (speed 3) (safety 1)))
(let ((x 0.0) (y 0.0))
(declare (double-float x y))
(print "Iter x y")
(dotimes (i N)
(dotimes (j thin)
(declare (fixnum i j))
(setf x (cl-rmath::rgamma 3.0 (/ 1.0 (+ (* y y) 4))))
(setf y (cl-rmath::rnorm (/ 1.0 (+ x 1.0)) (/ 1.0 (sqrt (+ (* 2 x) 2))))))
(format t "~a ~a ~a~%" i x y))))
(defun main (argv)
(declare (ignore argv))
(gibbs 50000 1000))
Then I built the executable gibbs with calling sh gibbs.sh with gibbs.sh as
##################
gibbs.sh
##################
buildapp --output gibbs --asdf-tree /usr/share/common-lisp/source/ --asdf-tree /usr/local/share/common-lisp/source/ --load-system cl-rmath --load gibbs.lisp --entry main
I get 6 compiler notes when compiling with SBCL 1.0.56, which reproduce below. I'm not sure what to do about them, but would be grateful for any hints.
; compiling file "/home/faheem/lisp/gibbs.lisp" (written 30 MAY 2012 02:00:55 PM):
; file: /home/faheem/lisp/gibbs.lisp
; in: DEFUN GIBBS
; (SQRT (+ (* 2 X) 2))
;
; note: unable to
; optimize
; due to type uncertainty:
; The result is a (VALUES (OR (DOUBLE-FLOAT 0.0) (COMPLEX DOUBLE-FLOAT))
; &OPTIONAL), not a (VALUES FLOAT &REST T).
; (/ 1.0d0 (SQRT (+ (* 2 X) 2)))
;
; note: unable to
; optimize
; due to type uncertainty:
; The second argument is a (OR (DOUBLE-FLOAT 0.0)
; (COMPLEX DOUBLE-FLOAT)), not a (COMPLEX
; DOUBLE-FLOAT).
;
; note: forced to do static-fun Two-arg-/ (cost 53)
; unable to do inline float arithmetic (cost 12) because:
; The second argument is a (OR (DOUBLE-FLOAT 0.0) (COMPLEX DOUBLE-FLOAT)), not a DOUBLE-FLOAT.
; The result is a (VALUES (OR (COMPLEX DOUBLE-FLOAT) (DOUBLE-FLOAT 0.0))
; &OPTIONAL), not a (VALUES DOUBLE-FLOAT &REST T).
; (CL-RMATH:RGAMMA 3.0d0 (/ 1.0d0 (+ (* Y Y) 4)))
;
; note: doing float to pointer coercion (cost 13)
; (SQRT (+ (* 2 X) 2))
;
; note: doing float to pointer coercion (cost 13)
; (CL-RMATH:RNORM (/ 1.0d0 (+ X 1.0d0)) (/ 1.0d0 (SQRT (+ (* 2 X) 2))))
;
; note: doing float to pointer coercion (cost 13)
;
; compilation unit finished
; printed 6 notes
; /home/faheem/lisp/gibbs.fasl written
; compilation finished in 0:00:00.073
UPDATE 1: Rainer Joswig's answer pointed out that the argument of the SQRT might be negative, which
was the source of the obscure compiler notes I was seeing, namely
The result is a (VALUES (OR (DOUBLE-FLOAT 0.0) (COMPLEX DOUBLE-FLOAT))
; &OPTIONAL), not a (VALUES FLOAT &REST T).
The compiler was complaining that since it didn't know whether the value of the argument was positive,
the result could be a complex number. Since in the example, the value x is a sample variate from the gamma distribution,
it is always greater than 0. It was helpfully pointed out by Stephan in the SBCL user mailing list,
(see second message in the thread "Optimizing a simple Common Lisp Gibbs sampler program", that this could be resolved by declaring x to be greater than or zero as follows,
(declare (type (double-float 0.0 *) x))
See the Common Lisp Hyperspec for the relevant documentation about
FLOAT types
and
Interval Designators.
This seems to speed up the code a bit. It is now reliably below 52 seconds, but still, not much of a gain.
This also leaves the notes about
note: doing float to pointer coercion (cost 13)
If this is not fixable for some reason, I'd like to know why. Also, an explanation of what the note means would be interesting, regardless.
Specifically, what does the word pointer mean here? Is this related to the fact that C functions are being called? Also, cost 13 doesn't seem
very useful. What is being measured?
Also, Rainer suggested that it might be possible to reduce consing, which might reduce runtime.
I don't know whether either consing reduction is possible, or whether it would reduce runtime,
but I'd be interested in opinions and approaches. Overall, it seems not much can be done to improve the performance of this function.
Maybe it is too small and simple.

Note that Common Lisp has a THE special operator. It allows you to declare types for expression results. This for example allows you to narrow down types if possible.
For example what is the result of (SQRT somefloat) ? It can be a float, but it could be a complex number if somefloat is negative. If you know that somefloat is always positive (and only then), then you could write (the double-float (sqrt somefloat)). The compiler then might be able to generate more efficient code.
Also note that Common Lisp has OPTIMIZE declarations. If you want fastest code you need to make sure that you set them accordingly. Possibly only for individual functions. Usually it is better than changing optimization globally to be very aggressive.
Common Lisp has a function DISASSEMBLE which lets you look at the compiled code.
Then there is the macro TIME. Interesting information you get from it includes how much consing it does. With double-float arithmetic there is probably a large amount of consing. It would be useful to ask on the SBCL mailing list for help. Maybe someone can tell you how to avoid that consing.

This works for me:
(sqrt (the (double-float 0d0) (+ (* 2d0 x) 2d0)))

Related

Difference between usage of set! and define

In the following code:
(define x 14)
(display x) ; x = 14
(set! x 13)
(display x) ; x = 13
(define x 14)
(display x) ; x = 14
(set! y 13) ; SchemeError: y not found!
(display y)
What we a use case where someone would want to use set! over just define, if define can be used for everything that set! can be used for + the actual definition itself?
define creates a new binding between a name and a value (a variable), set! mutates an existing binding. These are not the same operation, languages like Python which confuse the operations notwithstanding.
In particular something like
(define x 1)
...
(define x 2)
is illegal: you can only create the variable once. Implementations may not check this, but that doesn't make it legal. Once you've created the binding, if you want to modify it you need to do that with set!.
A particular case where implementations (including Racket) are intentionally sloppy about this is when they are being used interactively. Quite often if you're interacting with the system you may want to say, for instance:
> (define square (λ (x) (+ x x)))
... ooops, that's not right, is it?
... Better fix it using the command-line editing
> (define square (λ (x) (* x x)))
In cases like that it's clearly better for the implementation just to allow this repeated definition of things, because it's going to make the life of users enormously easier.
But in programs such repeated definitions in the same scope are (almost?) always bugs, and they really ought to be caught: if you want to mutate a binding, use set!. Racket in particular will certainly puke on these.
Finally note that define is simply not legal in all the places set! is: even in Racket (which allows define in many more places than Scheme does) this is not even slightly legal code:
(define (foo x)
(define v x)
(if (odd? x)
(define v (* v 2))
(define v (/ v 2)))
v)
While this is
(define (foo x)
(define v x)
(if (odd? x)
(set! v (* v 2))
(set! v (/ v 2)))
v)
(It's still terrible code, but it is legal.).

SICP exercise 1.17 issue

I'm doing exercise 1.18 in SICP and I face some trouble. The goal is to make a procedure based on 2 previous exercises. This procedure implements so-called Russian peasant method (or Ancient Egyptian multiplication). I wrote a code, but one procedure just doesn't want to execute. Here's my code:
#lang sicp
(define (double a) (+ a a))
(define (halve a) (/ a 2))
(define (r_m a b)
(iter a b 0))
(define (iter a b n)
(cond ((= b 0) 0)
((even? a) (iter (halve a) (double b) (+ n b)))
(else (iter (halve a) (double b) n))))
So, when I call my procedure (r_m) with such arguments (r_m 13 19) it stops after 1st iteration.
(iter (halve a) (double b) (+ n b) (with arguments 13 and 19) gives this result: iter (13/2) 38 19
After that, program tries to check if 13/2 is odd. But it can't check such number (13/2), because odd? expects an integer, not this undone division.
For some reason, the halve procedure doesn't work when called. I don't really understand why, because other procedures (double and simple + n b) work fine.
Thank you in advance and I hope my grammar doesn't hurt you too much.
There are several things wrong with your program. Apart from anything else, even if halve worked the way you want it to work, how would b become zero? This is not the only problem!
However the particular case of halve happens because you are assuming programming languages to do what they normally do: incorrect arithmetic which is convenient for the machine, rather than correct arithmetic which is convenient for humans. Scheme tries hard to do correct arithmetic. What, mathematically, is 13/2? It's not 6, or 7, or 3, it's 13/2, or 6 + 1/2: it's a rational number, not an integer.
If you want the next integer below 13/2, the way you want to get it is by subtracting 1 before you divide: (halve (- 13 1)) is 6, exactly. So if you change that cond clause to have (halve (- a 1)) your program will be closer to working (but, in fact it will then fail to terminate, so in a sense it will be further from working...).

Scheme : recursive process much faster than iterative

I am studying SICP and wrote two procedures to compute the sum of 1/n^2, the first generating a recursive process and the second generating an iterative process :
(define (sum-rec a b)
(if (> a b)
0
(exact->inexact (+ (/ 1 (* a a)) (sum-rec (1+ a) b)))))
(define (sum-it a b)
(define (sum_iter a tot)
(if (> a b)
tot
(sum_iter (1+ a) (+ (/ 1 (* a a)) tot))))
(exact->inexact (sum_iter a 0)))
I tested that both procedures give exactly the same results when called with small values of b, and that the result is approaching $pi^2/6$ as b gets larger, as expected.
But surprisingly, calling (sum-rec 1 250000) is almost instantaneous whereas calling (sum-it 1 250000) takes forever.
Is there an explanation for that?
As was mentioned in the comments, sum-it in its present form is adding numbers using exact arithmetic, which is slower than the inexact arithmetic being used in sum-rec. To do an equivalent comparison, this is how you should implement it:
(define (sum-it a b)
(define (sum_iter a tot)
(if (> a b)
tot
(sum_iter (1+ a) (+ (/ 1.0 (* a a)) tot))))
(sum_iter a 0))
Notice that replacing the 1 with a 1.0 forces the interpreter to use inexact arithmetic. Now this will return immediately:
(sum-it 1 250000)
=> 1.6449300668562465
You can reframe both of these versions so that they do exact or inexact arithmetic appropriately, simply by controlling what value they use for zero and relying on the contagion rules. These two are in Racket, which doesn't have 1+ by default but does have a nice syntax for optional arguments with defaults:
(define (sum-rec low high (zero 0.0))
(let recurse ([i low])
(if (> i high)
zero
(+ (/ 1 (* i i)) (recurse (+ i 1))))))
(define (sum-iter low high (zero 0.0))
(let iterate ([i low] [accum zero])
(if (> i high)
accum
(iterate (+ i 1) (+ (/ 1 (* i i)) accum)))))
The advantage of this is you can see the performance difference easily for both versions. The disadvantage is that you'd need a really smart compiler to be able to optimize the numerical operations here (I think, even if it knew low and high were machine integers, it would have to infer that zero is going to be some numerical type and generate copies of the function body for all the possible types).

Differences between two similar definitions

Is there any difference between
(define make-point cons)
and
(define (make-point x y)
(cons x y))
?
Is one more efficient than the other, or are they totally equivalent?
There are a few different issues here.
As Oscar Lopez points out, one is an indirection, and one is a wrapper. Christophe De Troyer did some timing and noted that without optimization, the indirection can take twice as much time as the indirection. That's because the alias makes the value of the two variables be the same function. When the system evaluates (cons …) and (make-point …) it evaluates the variables cons and make-point and gets the same function back. In the indirection version, make-point and cons are not the same function. make-point is a new function that makes another call to cons. That's two function calls instead of one. So speed can be an issue, but a good optimizing compiler might be able to make the difference negligible.
However, there's a very important difference if you have the ability to change the value of either of these variables later. When you evaluate (define make-point kons), you evaluate the variable kons once and set the value of make-point to that one value that you get at that evaluation time. When you evaluate (define (make-point x y) (kons x y)), you're setting the value of make-point to a new function. Each time that function is called, the variable kons is evaluated, so any change to the variable kons is reflected. Let's look at an example:
(define (kons x y)
(cons x y))
(display (kons 1 2))
;=> (1 . 2)
Now, let's write an indirection and an alias:
(define (kons-indirection x y)
(kons x y))
(define kons-alias kons)
These produce the same output now:
(display (kons-indirection 1 2))
;=> (1 . 2)
(display (kons-alias 1 2))
;=> (1 . 2)
Now let's redefine the kons function:
(set! kons (lambda (x y) (cons y x))) ; "backwards" cons
The function that was a wrapper around kons, that is, the indirection, sees the new value of kons, but the alias does not:
(display (kons-indirection 1 2))
;=> (2 . 1) ; NEW value of kons
(display (kons-alias 1 2))
;=> (1 . 2) ; OLD value of kons
Semantically they're equivalent: make-point will cons two elements. But the first one is creating an alias of the cons function, whereas the second one is defining a new function that simply calls cons, hence it'll be slightly slower, but the extra overhead will be negligible, even inexistent if the compiler is good.
For cons, there is no difference between your two versions.
For variadic procedures like +, the difference between + and (lambda (x y) (+ x y)) is that the latter constrains the procedure to being called with two arguments only.
Out of curiosity I did a quick and dirty experiment. It seems to be the case that just aliasing cons is almost twice as fast than wrapping it in a new function.
(define mk-point cons)
(define (make-point x y)
(cons x y))
(let ((start (current-inexact-milliseconds)))
(let loop ((n 100000000))
(mk-point 10 10)
(if (> n 0)
(loop (- n 1))
(- (current-inexact-milliseconds) start))))
(let ((start (current-inexact-milliseconds)))
(let loop ((n 100000000))
(make-point 10 10)
(if (> n 0)
(loop (- n 1))
(- (current-inexact-milliseconds) start))))
;;; Result
4141.373046875
6241.93212890625
>
Ran in DrRacket 5.3.6 on Xubuntu.

Why doesn't this Racket code terminate?

I'm reading about lazy evaluation and having trouble understanding a basic example they gave.
#lang racket
(define (bad-if x y z)
(if x y z))
(define (factorial-wrong x)
(bad-if (= x 0)
1
(* x (factorial-wrong (- x 1)))))
(factorial-wrong 4)
I'm a little confused as to why this program never terminates. I know the following code works just fine:
(define (factorial x)
(if (= x 0)
1
(* x (factorial (- x 1)))))
(factorial 4)
So I'm assuming it has something to do with scope. I tried step by step debugging and factorial-wrong executes the recursion function even when x is mapped to 0.
The standard if
(if test-expr then-expr else-expr)
will only evaluate either then-expr or else-expr, depending on test-expr, because this if is either a special form or a syntactic extension based on a special form, which means it doesn't follow the normal evaluation rules.
bad-if, on the other hand, is a standard procedure. In that case, Scheme first evaluates both expressions since they are parameters to the procedure bad-if before actually executing bad-if. So, even for x = 0, (* x (factorial -1)) will be evaluated, which will in turn evaluate (* x (factorial -2)) and so on, in an endless loop.
Use the stepper!
To be more specific:
Snip the #lang racket off the top of your program
Change the language level to "Intermediate Student"
Click on the Step button. Watch carefully to see where things go off the rails.

Resources