I am going through some list functions and programming them in Scheme. I am doing this for fun. This is not a school/college assignment. It is humbling to realize that I am stumbling with very basic functions and statements!
To my surprise this member? function consistently raises an error, with more than one Scheme REPL reporting that I am trying to tap the non-function #t.
(define member?
(lambda (atm lst)
(cond
((null? lst) #f)
((equal? atm (car lst)) #t)
(else
((member? atm (cdr lst)))))))
What am I doing wrong?
((member? atm (cdr lst)))
Assuming (member? atm (cdr lst)) evaluates to #t, the above is equivalent to (#t), so it tries to apply #t as a function, which is what the error message is telling you.
Remove the outer parentheses and you'll get the result you want.
Related
In The Little Schemer (4th Ed.) it is claimed that a list for which null? is false contains at least one atom, or so I understand from my reading of the text.
This doesn't make sense to me, since (atom '()) is false, and we can stick those into a list to make it non-null:
> (null? '(()))
#f
So my question is, is this a mistake in my reading, or a matter of definitions? Since it's not in the errata I assume such a well-studied book wouldn't have a mistake like this.
If we considered (()) to be the same as (() . ()) or even (cons '() '()) and then considered cons an atom then I could see how you can get there, but I don't think that's what's going on.
(this was tested in Racket 7.0, with the definition of atom? given in the book, i.e.
(define atom?
(lambda (x)
(and (not (pair? x)) (not (null? x)))))
I know this doesn't cover funny Racket features, but should be sufficient here.)
lat is assumed to be a list of atoms at that point in the book.
If it's not empty, by definition it contains some atoms in it.
It's not about Lisp, it's about the book's presentation.
I think lat indicates list of atoms. Thus if lat is not null?, then it needs to contain at least one atom.
There is a procedure called lat? defined as such:
(define lat?
(lambda (l)
(cond
((null? l) #t)
((atom? (car l))
(lat? (cdr l)))
(else #f))))
(lat? '(()) ; ==> #f so by definition '(()) is not a lat and thus the statement does not apply to that list.
A list can contain any type of elements, including empty and other lists, both which are not atoms. lat is a restricted to a flat list with only atomic elements.
As a concept an “atom” is something that can not be broken into smaller parts. A number 42 is an atom, a list (42 43) is not an atom since it contains two smaller parts (namely the numbers 42 and 43). Since an empty list does not contain any smaller parts, it is by this logic an atom.
Now let’s attempt to implement an atom? predicate, that determines whether it’s input is an atom.
(define (atom? x)
(cond
[(number? x) #t]
[(symbol? x) #t]
[(char? x) #t]
...
[else #f]))
Here the ... needs to be replaced with a test for every atomic data type supported by the implementation. This can potentially be a long list. In order to avoid this, we can try to be clever:
(define (atom? x)
(not (list? x)))
This will correctly return false for non-empty lists, and true for numbers, characters etc. However it will return false for the empty list.
Since it is up to the authors of the book to define the term “atom” (the word does not appear in the language standard) they might have opted for the above simple definition.
Note that the definition as non-list is misleading when the language contains other compound data structures such as vectors and structures. If I recall correctly the only compound data structure discussed in the book is lists.
I tried running the program below
(define (odd-internal x)
(define (even x)
(if (zero? x)
#t
(odd-internal (sub1 x))))
(if (zero? x)
#f
(even (sub1 x))))
(define (odd-external x)
(if (zero? x)
#f
(even (sub1 x))))
(define (even x)
(if (zero? x)
#t
(odd-external (sub1 x))))
(begin (display "Using internal definition\n")
(time (odd-internal 40000000)))
(begin (display "Using external definition\n")
(time (odd-external 40000000)))
This is the result in Racket
Using internal definition
cpu time: 166 real time: 165 gc time: 0
#f
Using external definition
cpu time: 196 real time: 196 gc time: 0
#f
There you can see using internal definition is quite a bit faster. I've tried running on Chez Scheme and the result is similar. Why is that?
I was amazed that it was a difference so from the commens of Lexis answer I split the two version in each their file internal.rkt and external.rkt and compiled them and decompiled them in this way:
raco make external.rkt
raco decompile compiled/external_rkt.zo
This goes one step further than looking at the fully expanded program in the macro stepper. It looks very non human readable so I have prettyfied it with the most important parts in tact:
(define (odd-external x1)
(if (zero? x1)
'#f
(let ((x2 (sub1 x1)))
(if (zero? x2)
'#t
(let ((x3 (sub1 x2)))
(if (zero? x3)
'#f
(let ((x4 (sub1 x3)))
(if (zero? x4)
'#t
(let ((x5 (sub1 x4)))
(if (zero? x5) '#f (even (sub1 x5))))))))))))
(define (even x1)
(if (zero? x1)
'#t
(let ((x2 (sub1 x1)))
(if (zero? x2)
'#f
(let ((x3 (sub1 x2)))
(if (zero? x3)
'#t
(let ((x4 (sub1 x3)))
(if (zero? x4)
'#f
(let ((x5 (sub1 x4)))
(if (zero? x5)
'#t
(let ((x6 (sub1 x5)))
(if (zero? x6)
'#f
(let ((x7 (sub1 x6)))
(if (zero? x7)
'#t
(odd-external (sub1 x7))))))))))))))))
Nothing special here. It unrolls the loop a certain times and constant folds. Notice we still have mutual recursion and that the unrolling is 5 and 7 times. The constant was even constant folded so it had replaced my call with (even 399999995) so the compiler had also run the code 5 turns and given up. The interesting thing is the internal version:
(define (odd-internal x1)
(if (zero? x1)
'#f
(let ((x2 (sub1 x1)))
(if (zero? x2)
'#t
(let ((x3 (sub1 x2)))
(if (zero? x3)
'#f
(let ((x4 (sub1 x3)))
(if (zero? x4)
'#t
(let ((x5 (sub1 x4)))
(if (zero? x5)
'#f
(let ((x6 (sub1 x5)))
(if (zero? x6)
'#t
(let ((x7 (sub1 x6)))
(if (zero? x7)
'#f
(let ((x8 (sub1 x7)))
(if (zero? x8)
'#t
(odd-internal
(sub1 x8))))))))))))))))))
It is no longer mutual recursion since it calls itself after 8 times. An each round does 8 turns while the other version did 7, then 5.. In two rounds the internal one has done 16 rounds while the other has 12. The initial call on the internal one is (odd-internal '399999992) so the compiler did 8 rounds before giving up.
I guess the code in side the functions at the decompiler level are open coded and the code at each step is very cheap making the number of calls the reason for the 25% speed increase. After all 4 more is 25% more per recursion that coincides with the difference in computing time. This is speculations based on observation so it would be interesting to have a comment from Lexi on this.
Your numbers are too small to be meaningful. The difference between 166 ms and 196 ms is, in absolute terms, tiny. Who knows what other factors could be influencing that? VM warmup time, differences in memory allocation, or any host of other things could easily cause a discrepancy of that size. To be sure, you should make the numbers much bigger.
On my machine, running Racket v7.0, I increased the arguments from 40000000 to 1000000000 and ran the program. The results were 2.361 s for the internal definition case and 2.212 s for the external definition case. Given the sorts of factors listed above, that difference is too small to be meaningful.
Benchmarking is hard, and benchmarking languages that run on VMs and are JIT compiled is harder. Even if you account for warmup and GC, run lots of iterations and take the averages, and generally try to do things right, the results you get could still be nearly meaningless, as the 2017 OOPSLA paper Virtual Machine Warmup Blows Hot and Cold explains:
Virtual Machines (VMs) with Just-In-Time (JIT) compilers are traditionally thought to execute programs in two phases: the initial warmup phase determines which parts of a program would most benefit from dynamic compilation, before JIT compiling those parts into machine code; subsequently the program is said to be at a steady state of peak performance. Measurement methodologies almost always discard data collected during the warmup phase such that reported measurements focus entirely on peak performance. We introduce a fully automated statistical approach, based on changepoint analysis, which allows us to determine if a program has reached a steady state and, if so, whether that represents peak performance or not. Using this, we show that even when run in the most controlled of circumstances, small, deterministic, widely studied microbenchmarks often fail to reach a steady state of peak performance on a variety of common VMs. Repeating our experiment on 3 different machines, we found that at most 43.5% of ⟨VM, benchmark⟩ pairs consistently reach a steady state of peak performance.
Emphasis mine. Make sure you’re measuring what you think you’re measuring.
First off this will depend on your implementation since nested defines can be implemented in more than one way. On my Chez Scheme 9.5 setup I get a rather consistent 25% faster run time when I use odd-internal.
Now, for the why. This happens because nested defines (i.e. internal defines) are wildly different than actual defines.
When you use define on the top level, you are adding a new record to the free-variables table. Whenever you try to evaluate a variable that is not bound to any lambda, it is looked up in the free-variables (hash) table. This search is very efficient, but it's slower than fetching a bound variable. So when you calculate (odd-external 40000000) that you fetch even and odd-external from that table about 40mil times - even with caching and other cool stuff, this is still work to be done.
In contrast, nested defines create a bound variable. One way they are implemented is as nested lambda/let/letrec expressions. That way the odd-internal function would be transformed into[1]:
(define (odd-internal x)
(let ((even (lambda (x)
(if (zero? x)
#t
(odd-internal (sub1 x))))))
(if (zero? x)
#f
(even (sub1 x)))))
(Which is a simplification of what Chez Scheme does).
Now every time you apply odd-internal it's still a free-variable, so you hash it and find it in the free-variables table. However, when you apply even, you just grab it from the environment (which can cost as little as a single memory dereference even without cool tricks).
A fun experiment would be to define both odd and even as bound variables, so all 40mil variable fetches would benefit from quick bound-variable fetch times. I saw a 16% improvement on top of the original 25%. Here's the code:
(define (odd-quick x)
(define (odd x) (if (zero? x) #f (even (sub1 x))))
(define (even x) (if (zero? x) #t (odd (sub1 x))))
(odd x))
[1] let is a syntactic suger for a lambda application, so you can read that code as:
(define (odd-internal x)
((lambda (even)
(if (zero? x)
#f
(even (sub1 x))))
(lambda (x)
(if (zero? x)
#t
(odd-internal (sub1 x))))))
I thought that 'a is suppose to be an atom in scheme. But when I use an online interpreter and evalute the following I get back #f.
(atom? 'a)
The Scheme standard does not define atom?; the usual definition is
(define (atom? x)
(and (not (pair? x))
(not (null? x))))
With that definition,
> (atom? 'a)
#t
so I think you are right and the SISC online REPL is wrong.
This is probably a simple thing I'm missing, but I'm trying to get the cdr of a pair and every call to say (cdr (cons 'a '5)) comes back as (5). I sort of get why that is, but how can I get the it to return without the parens?
I don't want to use flatten because what I'm trying to get (i.e. the cdr) might itself be another procedure expression already wrapped in parens, so I don't want to flatten the list.
(If it matters, I'm working on transforming a let expression into a lambda expression, and this is one of the steps I'm taking, trying to break apart the lambda bindings so I can move them around).
When applied to a proper list, cdr will always return another list (including '(), the empty list).
With proper list I mean a list which ends with the empty list. For instance, when you do this (define lst '(4 5)) under the hood this is what gets assigned to lst: (cons 4 (cons 5 '())), so when you evaluate (cdr lst) you get the second element of the first cons, which happens to be (cons 5 '()), which in turn gets printed as (5).
For extracting only the second element in the list (not the second element of the first cons, which is what cdr does) you could:
As has been pointed in the comments, use (car (cdr lst)) or just (cadr lst) for short
Even simpler: use (second lst)
Another possibility - if the list only has two elements and it's ok to replace it with an improper list, use (define cell (cons 4 5)) or (define cell '(4 . 5)) to build a cons cell and then you can use (car cell) to extract the first element and (cdr cell) to extract the second element.
so i'm trying to write a procedure known as sort-nums so that i can get all the numbers and sort them like this
(define (sort-nums lst)
(if (null? lst) null
(if (number? (car lst)
i want this part to keep the number and then delete anything that isnt a number
(sort (cons (car lst) (sort-nums (cdr lst))))))
if possible would this work or would i need to write it in a different way an example to prove that it works would be like
(sort-nums (list 'a 'c 24 'f 'g 16))
(16 24)
You can make your life easier by sorting and stripping numbers separately. Try
(sort (list-transform-positive '(a 2 b 1)
number?)
<)
First we select only those things that are numbers (using list-transform-positive), then we sort them ascending (using sort).
As a general tip, you will find lisp much easier to work with if you indent intelligently.