Scheme: why is Internal Definition faster than External Definition? - performance

I tried running the program below
(define (odd-internal x)
(define (even x)
(if (zero? x)
#t
(odd-internal (sub1 x))))
(if (zero? x)
#f
(even (sub1 x))))
(define (odd-external x)
(if (zero? x)
#f
(even (sub1 x))))
(define (even x)
(if (zero? x)
#t
(odd-external (sub1 x))))
(begin (display "Using internal definition\n")
(time (odd-internal 40000000)))
(begin (display "Using external definition\n")
(time (odd-external 40000000)))
This is the result in Racket
Using internal definition
cpu time: 166 real time: 165 gc time: 0
#f
Using external definition
cpu time: 196 real time: 196 gc time: 0
#f
There you can see using internal definition is quite a bit faster. I've tried running on Chez Scheme and the result is similar. Why is that?

I was amazed that it was a difference so from the commens of Lexis answer I split the two version in each their file internal.rkt and external.rkt and compiled them and decompiled them in this way:
raco make external.rkt
raco decompile compiled/external_rkt.zo
This goes one step further than looking at the fully expanded program in the macro stepper. It looks very non human readable so I have prettyfied it with the most important parts in tact:
(define (odd-external x1)
(if (zero? x1)
'#f
(let ((x2 (sub1 x1)))
(if (zero? x2)
'#t
(let ((x3 (sub1 x2)))
(if (zero? x3)
'#f
(let ((x4 (sub1 x3)))
(if (zero? x4)
'#t
(let ((x5 (sub1 x4)))
(if (zero? x5) '#f (even (sub1 x5))))))))))))
(define (even x1)
(if (zero? x1)
'#t
(let ((x2 (sub1 x1)))
(if (zero? x2)
'#f
(let ((x3 (sub1 x2)))
(if (zero? x3)
'#t
(let ((x4 (sub1 x3)))
(if (zero? x4)
'#f
(let ((x5 (sub1 x4)))
(if (zero? x5)
'#t
(let ((x6 (sub1 x5)))
(if (zero? x6)
'#f
(let ((x7 (sub1 x6)))
(if (zero? x7)
'#t
(odd-external (sub1 x7))))))))))))))))
Nothing special here. It unrolls the loop a certain times and constant folds. Notice we still have mutual recursion and that the unrolling is 5 and 7 times. The constant was even constant folded so it had replaced my call with (even 399999995) so the compiler had also run the code 5 turns and given up. The interesting thing is the internal version:
(define (odd-internal x1)
(if (zero? x1)
'#f
(let ((x2 (sub1 x1)))
(if (zero? x2)
'#t
(let ((x3 (sub1 x2)))
(if (zero? x3)
'#f
(let ((x4 (sub1 x3)))
(if (zero? x4)
'#t
(let ((x5 (sub1 x4)))
(if (zero? x5)
'#f
(let ((x6 (sub1 x5)))
(if (zero? x6)
'#t
(let ((x7 (sub1 x6)))
(if (zero? x7)
'#f
(let ((x8 (sub1 x7)))
(if (zero? x8)
'#t
(odd-internal
(sub1 x8))))))))))))))))))
It is no longer mutual recursion since it calls itself after 8 times. An each round does 8 turns while the other version did 7, then 5.. In two rounds the internal one has done 16 rounds while the other has 12. The initial call on the internal one is (odd-internal '399999992) so the compiler did 8 rounds before giving up.
I guess the code in side the functions at the decompiler level are open coded and the code at each step is very cheap making the number of calls the reason for the 25% speed increase. After all 4 more is 25% more per recursion that coincides with the difference in computing time. This is speculations based on observation so it would be interesting to have a comment from Lexi on this.

Your numbers are too small to be meaningful. The difference between 166 ms and 196 ms is, in absolute terms, tiny. Who knows what other factors could be influencing that? VM warmup time, differences in memory allocation, or any host of other things could easily cause a discrepancy of that size. To be sure, you should make the numbers much bigger.
On my machine, running Racket v7.0, I increased the arguments from 40000000 to 1000000000 and ran the program. The results were 2.361 s for the internal definition case and 2.212 s for the external definition case. Given the sorts of factors listed above, that difference is too small to be meaningful.
Benchmarking is hard, and benchmarking languages that run on VMs and are JIT compiled is harder. Even if you account for warmup and GC, run lots of iterations and take the averages, and generally try to do things right, the results you get could still be nearly meaningless, as the 2017 OOPSLA paper Virtual Machine Warmup Blows Hot and Cold explains:
Virtual Machines (VMs) with Just-In-Time (JIT) compilers are traditionally thought to execute programs in two phases: the initial warmup phase determines which parts of a program would most benefit from dynamic compilation, before JIT compiling those parts into machine code; subsequently the program is said to be at a steady state of peak performance. Measurement methodologies almost always discard data collected during the warmup phase such that reported measurements focus entirely on peak performance. We introduce a fully automated statistical approach, based on changepoint analysis, which allows us to determine if a program has reached a steady state and, if so, whether that represents peak performance or not. Using this, we show that even when run in the most controlled of circumstances, small, deterministic, widely studied microbenchmarks often fail to reach a steady state of peak performance on a variety of common VMs. Repeating our experiment on 3 different machines, we found that at most 43.5% of ⟨VM, benchmark⟩ pairs consistently reach a steady state of peak performance.
Emphasis mine. Make sure you’re measuring what you think you’re measuring.

First off this will depend on your implementation since nested defines can be implemented in more than one way. On my Chez Scheme 9.5 setup I get a rather consistent 25% faster run time when I use odd-internal.
Now, for the why. This happens because nested defines (i.e. internal defines) are wildly different than actual defines.
When you use define on the top level, you are adding a new record to the free-variables table. Whenever you try to evaluate a variable that is not bound to any lambda, it is looked up in the free-variables (hash) table. This search is very efficient, but it's slower than fetching a bound variable. So when you calculate (odd-external 40000000) that you fetch even and odd-external from that table about 40mil times - even with caching and other cool stuff, this is still work to be done.
In contrast, nested defines create a bound variable. One way they are implemented is as nested lambda/let/letrec expressions. That way the odd-internal function would be transformed into[1]:
(define (odd-internal x)
(let ((even (lambda (x)
(if (zero? x)
#t
(odd-internal (sub1 x))))))
(if (zero? x)
#f
(even (sub1 x)))))
(Which is a simplification of what Chez Scheme does).
Now every time you apply odd-internal it's still a free-variable, so you hash it and find it in the free-variables table. However, when you apply even, you just grab it from the environment (which can cost as little as a single memory dereference even without cool tricks).
A fun experiment would be to define both odd and even as bound variables, so all 40mil variable fetches would benefit from quick bound-variable fetch times. I saw a 16% improvement on top of the original 25%. Here's the code:
(define (odd-quick x)
(define (odd x) (if (zero? x) #f (even (sub1 x))))
(define (even x) (if (zero? x) #t (odd (sub1 x))))
(odd x))
[1] let is a syntactic suger for a lambda application, so you can read that code as:
(define (odd-internal x)
((lambda (even)
(if (zero? x)
#f
(even (sub1 x))))
(lambda (x)
(if (zero? x)
#t
(odd-internal (sub1 x))))))

Related

Does a non-null Scheme list contain at least one atom?

In The Little Schemer (4th Ed.) it is claimed that a list for which null? is false contains at least one atom, or so I understand from my reading of the text.
This doesn't make sense to me, since (atom '()) is false, and we can stick those into a list to make it non-null:
> (null? '(()))
#f
So my question is, is this a mistake in my reading, or a matter of definitions? Since it's not in the errata I assume such a well-studied book wouldn't have a mistake like this.
If we considered (()) to be the same as (() . ()) or even (cons '() '()) and then considered cons an atom then I could see how you can get there, but I don't think that's what's going on.
(this was tested in Racket 7.0, with the definition of atom? given in the book, i.e.
(define atom?
(lambda (x)
(and (not (pair? x)) (not (null? x)))))
I know this doesn't cover funny Racket features, but should be sufficient here.)
lat is assumed to be a list of atoms at that point in the book.
If it's not empty, by definition it contains some atoms in it.
It's not about Lisp, it's about the book's presentation.
I think lat indicates list of atoms. Thus if lat is not null?, then it needs to contain at least one atom.
There is a procedure called lat? defined as such:
(define lat?
(lambda (l)
(cond
((null? l) #t)
((atom? (car l))
(lat? (cdr l)))
(else #f))))
(lat? '(()) ; ==> #f so by definition '(()) is not a lat and thus the statement does not apply to that list.
A list can contain any type of elements, including empty and other lists, both which are not atoms. lat is a restricted to a flat list with only atomic elements.
As a concept an “atom” is something that can not be broken into smaller parts. A number 42 is an atom, a list (42 43) is not an atom since it contains two smaller parts (namely the numbers 42 and 43). Since an empty list does not contain any smaller parts, it is by this logic an atom.
Now let’s attempt to implement an atom? predicate, that determines whether it’s input is an atom.
(define (atom? x)
(cond
[(number? x) #t]
[(symbol? x) #t]
[(char? x) #t]
...
[else #f]))
Here the ... needs to be replaced with a test for every atomic data type supported by the implementation. This can potentially be a long list. In order to avoid this, we can try to be clever:
(define (atom? x)
(not (list? x)))
This will correctly return false for non-empty lists, and true for numbers, characters etc. However it will return false for the empty list.
Since it is up to the authors of the book to define the term “atom” (the word does not appear in the language standard) they might have opted for the above simple definition.
Note that the definition as non-list is misleading when the language contains other compound data structures such as vectors and structures. If I recall correctly the only compound data structure discussed in the book is lists.

scheme - why does this function take much longer to run

I have these two functions, foldl and foldr. After a chain of function definitions were made i tested two alternatives, and the only differences i could find between the two chains for function calls and definitions was between these two functions, and for some reason the function that calls foldr takes exceptionally longer (tested with large input)
Here is foldl:
(define (foldl op z ls)
(if (null? ls)
z
(foldl op (op z (car ls)) (cdr ls))))
and here is foldr:
(define (foldr op z ls)
(if (null? ls)
z
(op (car ls) (foldr op z (cdr ls)))))
My question is why does the chain that calls foldr, take a ridiculously longer time to run compared to the chain that calls foldl?
Your implementation of foldl is tail recursive because the foldl is the last function called each time through. Your implementation of foldr is not tail recursive because op is the last thing called each time through.
Ok, so what does that mean?
When foldl calls itself each time through, op has already been applied and returned a value. The compiler can optimize this into an equivalent loop. In contrast, when foldr calls itself, op still needs to be applied and so the program must remember to apply op after the recursive call to foldr returns a value. Unfortunately, the recursive call to foldr cannot return a value until op is applied to the next recursive call to foldr and so on until the end of the list. Then at the end of the list, each of the pending applications of op must be applied one by one.
Remembering all the applications of op that are pending takes time and memory space.

Is there an elisp function like "cl-every" that returns nil on lists of unequal length?

I want to compare two lists in elisp using (cl-every #'eq list1 list2). However, this can return t if one of the lists is longer than the other, and I don't want that. I could call length on the lists, but then I'm traversing each list twice, which is needlessly inefficient. Is there a function like cl-every that also checks for equal lengths as well?
OOTB
I don't think there is such a function OOTB.
Roll your own
I don't think it is hard to implement one, just modify cl-every.
Length
Note that length is implemented in C and, unless your lists are huge, should not noticeably affect performance:
(defun list-elements-eq (l1 l2)
(and (= (length l1)
(length l2))
(every #'eq l1 l2)))
Equal
You can also use equal for your lists: since equal starts with testing for eq, it will be a weaker relationship than what you want.
Beware that in Common Lisp equal may not terminate for circular structures.
Emacs Lisp is "smarter": it detects cicularity:
(setq x (list 1))
(setcdr x x)
(setq y (list 1))
(setcdr y y)
(eq x y)
(equal x y)
Debugger entered--Lisp error: (circular-list #1=(1 . #1#))
equal(#1=(1 . #1#) #2=(1 . #2#))
(arguably, it should returns t instead).
Common Lisp
Generally speaking, Common Lisp is a better language for "heavy lifting".
If your lists are huge, you might want to use it instead of Emacs Lisp:
(defun list-elements-eq (l1 l2)
(if (endp l1)
(endp l2)
(and (not (endp l2))
(eq (pop l1) (pop l2))
(list-elements-eq l1 l2))))
This is tail-recursive, and any decent CL will optimize recursion away (possibly depending on the optimize setting).
You don't want to use deep recursion in ELisp (see Why is there no tail recursion optimization in Emacs lisp, not but like other scheme?).

What is an atom in scheme?

I thought that 'a is suppose to be an atom in scheme. But when I use an online interpreter and evalute the following I get back #f.
(atom? 'a)
The Scheme standard does not define atom?; the usual definition is
(define (atom? x)
(and (not (pair? x))
(not (null? x))))
With that definition,
> (atom? 'a)
#t
so I think you are right and the SISC online REPL is wrong.

How to write sort-nums or a better version

so i'm trying to write a procedure known as sort-nums so that i can get all the numbers and sort them like this
(define (sort-nums lst)
(if (null? lst) null
(if (number? (car lst)
i want this part to keep the number and then delete anything that isnt a number
(sort (cons (car lst) (sort-nums (cdr lst))))))
if possible would this work or would i need to write it in a different way an example to prove that it works would be like
(sort-nums (list 'a 'c 24 'f 'g 16))
(16 24)
You can make your life easier by sorting and stripping numbers separately. Try
(sort (list-transform-positive '(a 2 b 1)
number?)
<)
First we select only those things that are numbers (using list-transform-positive), then we sort them ascending (using sort).
As a general tip, you will find lisp much easier to work with if you indent intelligently.

Resources