Can't cons cells be implemented efficiently at the library level in Clojure? - data-structures

Clojure has its own collections, and has no need of the traditional lispy cons cells. But I find the concept interesting, and it is used in some teaching materials (e.g., SICP). I have been wondering if there is any reasons that this cons primitive needs to be a primitive. Can't we just implement it (and the traditional functions that operate on it) in a library? I searched, but I found no such library already written.

Cons cells are an important building block in Lisp for s-expressions. See for example the various publications by McCarthy about Lisp and Symbolic Expressions from 1958 onwards (for example Recursive Functions of Symbolic Expressions). Every list in Lisp is made of cons cells.
It's definitely possible to implement linked lists (and trees, ...) with cons cells as a library. But for Lisp they are so central, that it needs them early on and with a very efficient implementation.
In a Lisp system typically there are many cons cells and a high rate of allocating new cons cells (called consing). Thus the implementors of a Lisp may want to optimize their Lisp implementation for:
small size of cons cells -> not more than two machine words, one word for the car and one word for the cdr
fast allocation of new cons cells
efficient garbage collection of cons cells (find no-longer used cons cells very quickly)
storing primitive data (numbers, characters, ...) directly in cons cells -> no pointer overhead
optimize locality of Lisp data like cons cell structures (lists, assoc lists, trees, ...) for example by using a generational/copying garbage collector and/or memory regions for cons cells
Thus Lisp systems use all kinds of tricks to achieve that. For example pointers may encode if they point to a cons cell - thus the cons cell itself does not need a type tag. Fixnums have very few tag bits and fit into the CAR or CDR of a cons cell. On the MIT Lisp Machine the system also had the feature to omit the CDR part of a cons cell, when it was a part of a linear list.
To achieve all these optimization goals one usually needs a hand-tuned implementation of a Lisp runtime in assembler and/or C. A Lisp processor or a Lisp VM usually will provide CAR, CDR, CONS, CONSP, ... as machine instructions.
It's like TFB said: similarly one can implement floating point numbers in a library, but it will not be efficient compared to native floating point numbers and operations supported by a CPU. Lisp implementations provide cons cells at a very very low level.
But outside of such a Lisp implementation, its clearly possible to implement cons cells as a library - with somewhat worse space and time efficiency.
Side note
Maclisp had cons cells with more than two slots called Hunks

You could implement it yourself. Here is an attempt:
(defprotocol cons-cell
(car [this])
(cdr [this])
(rplaca [this v])
(rplacd [this v]))
(deftype Cons [^:volatile-mutable car
^:volatile-mutable cdr]
cons-cell
(car [this] (.car this))
(cdr [this] (.cdr this))
(rplaca [this value] (set! car value))
(rplacd [this value] (set! cdr value)))
(defn cons [car cdr]
(Cons. car cdr))
Circular list:
(let [head (cons 0 nil)]
(rplacd head head)
head)

Of course, you can implement cons cells with no tools other than lambda (called fn in Clojure).
(defn cons' [a d]
(fn [f] (f a d)))
(defn car' [c]
(c (fn [a d] a)))
(defn cdr' [c]
(c (fn [a d] d)))
user> (car' (cdr' (cons' 1 (cons' 2 nil))))
2
This is as space-efficient as you can get in Clojure (a lambda closing over two bindings is just an object with two fields). car and cdr could obviously be more time-efficient if you used a record or something instead; the point I'm making is that yes, of course you can make cons cells, even if you have next to no tools available.
Why isn't it done, though? We already have better tools available. Clojure's sequence abstraction makes a better list than cons cells do, and vectors are a perfectly fine tuple. There's just no great need for cons cells. Combine that with the fact that anyone who does want them will find it trivially easy to implement anew, and there are no customers for a prospective library solution.

Related

Common Lisp: What is the downside to using this filter function on very large lists?

I want to filter out all elements of list 'a from list 'b and return the filtered 'b. This is my function:
(defun filter (a b)
"Filters out all items in a from b"
(if (= 0 (length a)) b
(filter (remove (first a) a) (remove (first a) b))))
I'm new to lisp and don't know how 'remove does its thing, what kind of time will this filter run in?
There are two ways to find out:
you could test it with data
you could analyze your source code
Let's look at the source code.
lists are built of linked cons cells
length needs to walk once through a list
for EVERY recursive call of FILTER you compute the length of a. BAD!
(Use ENDP instead.)
REMOVE needs to walk once through a list
for every recursive call you compute REMOVE twice: BAD!
(Instead of using REMOVE on a, recurse with the REST.)
the call to FILTER will not necessarily be an optimized tail call.
In some implementations it might, in some you need to tell the compiler
that you want to optimize for tail calls, in some implementations
no tail call optimization is available. If not, then you get a stack
overflow on long enough lists.
(Use looping constructs like DO, DOLIST, DOTIMES, LOOP, REDUCE, MAPC, MAPL, MAPCAR, MAPLIST, MAPCAN, or MAPCON instead of recursion, when applicable.)
Summary: that's very naive code with poor performance.
Common Lisp provides this built in: SET-DIFFERENCE should do what you want.
http://www.lispworks.com/documentation/HyperSpec/Body/f_set_di.htm#set-difference
Common Lisp does not support tail-call optimization (as per the standard) and you might just run out of memory with an abysmal call-stack (depending on the implementation).
I would not write this function, becuase, as Rainer Joswig says, the standard already provides SET-DIFFERENCE. Nonetheless, if I had to provide an implementation of the function, this is the one I would use:
(defun filter (a b)
(let ((table (make-hash-table)))
(map 'nil (lambda (e) (setf (gethash e table) t)) a)
(remove-if (lambda (e) (gethash e table)) b)))
Doing it this way provides a couple of advantages, the most important one being that it only traverses b once; using a hash table to keep track of what elements are in a is likely to perform much better if a is long.
Also, using the generic sequence functions like MAP and REMOVE-IF mean that this function can be used with strings and vectors as well as lists, which is an advantage even over the standard SET-DIFFERENCE function. The main downside of this approach is if you want extend the function with a :TEST argument that allows the user to provide an equality predicate other than the default EQL, since CL hash-tables only work with a small number of pre-defined equality predicates (EQ, EQL, EQUAL and EQUALP to be precise).
(defun filter (a b)
"Filters out all items in a from b"
(if (not (consp a)) b
(filter (rest a) (rest b))))

reduce, or explicit recursion?

I recently started reading through Paul Graham's On Lisp with a friend, and we realized that we have very different opinions of reduce: I think it expresses a certain kind of recursive form very clearly and concisely; he prefers to write out the recursion very explicitly.
I suspect we're each right in some context and wrong in another, but we don't know where the line is. When do you choose one form over the other, and what do you think about when making that choice?
To be clear about what I mean by reduce vs. explicit recursion, here's the same function implemented twice:
(defun my-remove-if (pred lst)
(fold (lambda (left right)
(if (funcall pred left)
right
(cons left right)))
lst :from-end t))
(defun my-remove-if (pred lst)
(if lst
(if (funcall pred (car lst))
(my-remove-if pred (cdr lst))
(cons (car lst) (my-remove-if pred (cdr lst))))
'()))
I'm afraid I started out a Schemer (now we're Racketeers?) so please let me know if I've botched the Common Lisp syntax. Hopefully the point will be clear even if the code is incorrect.
If you have a choice, you should always express your computational intent in the most abstract terms possible. This makes it easier for a reader to figure out your intentions, and it makes it easier for the compiler to optimize your code. In your example, when the compiler trivially knows you are doing a fold operation by virtue of you naming it, it also trivially knows that it could possibly parallelize the leaf operations. It would be much harder for a compiler to figure that out when you write extremely low level operations.
I'm going to take a slightly-subjective question and give a highly-subjective answer, since Ira already gave a perfectly pragmatic and logical one. :-)
I know writing things out explicitly is highly valued in some circles (the Python guys make it part of their "zen"), but even when I was writing Python I never understood it. I want to write at the highest level possible, all the time. When I want to write things out explicitly, I use assembly language. The point of using a computer (and a HLL) is to get it to do these things for me!
For your my-remove-if example, the reduce one looks fine to me (apart from the Scheme-isms like fold and lst :-)). I'm familiar with the concept of reduce, so all I need to understand it is figure out your f(x,y) -> z. For the explicit variant, I had to think it for a second: I have to figure out the loop myself. Recursion isn't the hardest concept out there, but I think it is harder than "a function of two arguments".
I also don't care for a whole line being repeated -- (my-remove-if pred (cdr lst)). I think I like Lisp in part because I'm absolutely ruthless at DRY, and Lisp allows me to be DRY on axes that other languages don't. (You could put in another LET at the top to avoid this, but then it's longer and more complex, which I think is another reason to prefer the reduction, though at this point I might just be rationalizing.)
I think maybe the contexts in which the Python guys, at least, dislike implicit functionality would be:
when no-one could be expected to guess the behavior (like frobnicate("hello, world", True) -- what does True mean?), or:
cases when it's reasonable for implicit behavior to change (like when the True argument gets moved, or removed, or replaced with something else, since there's no compile-time error in most dynamic languages)
But reduce in Lisp fails both of these criteria: it's a well-understood abstraction that everybody knows, and that isn't going to change, at least not on any timescale I care about.
Now, I absolutely believe there are some cases where it'd be easier for me to read an explicit function call, but I think you'd have to be pretty creative to come up with them. I can't think of any offhand, because reduce and mapcar and friends are really good abstractions.
In Common Lisp one prefers the higher-order functions for data structure traversal, filtering, and other related operations over recursion. That's also to see from many provided functions like REDUCE, REMOVE-IF, MAP and others.
Tail recursion is a) not supported by the standard, b) maybe invoked differently with different CL compilers and c) using tail recursion may have side effects on the generated machine code for surrounding code.
Often, for certain data structures, many of these above operations are implemented with LOOP or ITERATE and provided as higher-order function. There is a tendency to prefer new language extensions (like LOOP and ITERATE) for iterative code over using recursion for iteration.
(defun my-remove-if (pred list)
(loop for item in list
unless (funcall pred item)
collect item))
Here is also a version that uses the Common Lisp function REDUCE:
(defun my-remove-if (pred list)
(reduce (lambda (left right)
(if (funcall pred left)
right
(cons left right)))
list
:from-end t
:initial-value nil))

Append! in Scheme?

I'm learning R5RS Scheme at the moment (from PocketScheme) and I find that I could use a function that is built into some variants of Scheme but not all: Append!
In other words - destructively changing a list.
I am not so much interested in the actual code as an answer as much as understanding the process by which one could pass a list as a function (or a vector or string) and then mutate it.
example:
(define (append! lst var)
(cons (lst var))
)
When I use the approach as above, I have to do something like (define list (append! foo (bar)) which I would like something more generic.
Mutation, though allowed, is strongly discouraged in Scheme. PLT even went so far as to remove set-car! and set-cdr! (though they "replaced" them with set-mcar! and set-mcdr!). However, a spec for append! appeared in SRFI-1. This append! is a little different than yours. In the SRFI, the implementation may, but is not required to modify the cons cells to append the lists.
If you want to have an append! that is guaranteed to change the structure of the list that's being appended to, you'll probably have to write it yourself. It's not hard:
(define (my-append! a b)
(if (null? (cdr a))
(set-cdr! a b)
(my-append! (cdr a) b)))
To keep the definition simple, there is no error checking here, but it's clear that you will need to pass in a list of length at least 1 as a, and (preferably) a list (of any length) as b. The reason a must be at least length 1 is because you can't set-cdr! on an empty list.
Since you're interested in how this works, I'll see if I can explain. Basically, what we want to do is go down the list a until we get to the last cons pair, which is (<last element> . null). So we first see if a is already the last element in the list by checking for null in the cdr. If it is, we use set-cdr! to set it to the list we're appending, and we're done. If not, we have to call my-append! on the cdr of a. Each time we do this we get closer to the end of a. Since this is a mutation operation, we're not going to return anything, so we don't need to worry about forming our modified list as the return value.
Better late than never for putting in a couple 2-3 cents on this topic...
(1) There's nothing wrong with using the destructive procedures in Scheme while there is a single reference to the stucture being modified. So for example, building a large list efficiently, piecemeal via a single reference - and when complete, making that (now presumably not-to-be-modified) list known and referred to from various referents.
(2) I think APPEND! should behave like APPEND, only (potentially) destructively. And so APPEND! should expect any number of lists as arguments. Each list but the last would presumably be SET-CDR!'d to the next.
(3) The above definition of APPEND! is essentially NCONC from Mac Lisp and Common Lisp. (And other lisps).

What does it mean to 'hash cons'?

When to use it and why?
My question comes from the sentence: "hash cons with some classes and compare their instances with reference equality"
From Odersky, Spoon and Venners (2007), Programming in Scala, Artima Press, p. 243:
You hash cons instances of a class by caching all instances you have created in a weak collection. Then, any time you want a new instance of the class, you first check the cache. If the cache already has an element equal to the one you are about to create, you can reuse the existing instance. As a result of this arrangement, any two instances that are equal with equals() are also equal with reference equality.
Putting everyone's answers together:
ACL2 (A Computational Logic for Applicative Common Lisp) is a software system consisting of a programming language, an extensible theory in a first-order logic, and a mechanical theorem prover.
-- Wiki ACL2
In computer programming, cons (pronounced /ˈkɒnz/ or /ˈkɒns/) is a fundamental function in most dialects of the Lisp programming language. cons constructs (hence the name) memory objects which hold two values or pointers to values. These objects are referred to as (cons) cells, conses, or (cons) pairs. In Lisp jargon, the expression "to cons x onto y" means to construct a new object with (cons x y). The resulting pair has a left half, referred to as the car (the first element), and a right half (the second element), referred to as the cdr.
-- Wiki Cons
Logically, hons is merely another name for cons, i.e., the following is an ACL2 theorem:
(equal (hons x y) (cons x y))
Hons generally runs slower than cons because in creating a hons, an attempt is made to see whether a hons already exists with the same car and cdr. This involves search and the use of hash-tables.
-- http://www.cs.utexas.edu/~moore/acl2/current/HONS.html
Given your question:
hash cons with some classes and compare their instances with reference equality
It appears that hash cons is the process of hashing a LISP constructor to determine if an object already exists via equality comparison.
http://en.wikipedia.org/wiki/Hash_cons now redirects.
It is cons with hashing to allow eq (reference) comparison instead of a deep one. This is more efficient for memory (because identical objects are stored as references), and is of course faster if comparison is a common operation.
http://www.cs.utexas.edu/~moore/acl2/current/HONS.html describes an implementation for Lisp.

Self-referential data structures in Lisp/Scheme

Is there a way to construct a self-referential data structure (say a graph with cycles) in lisp or scheme? I'd never thought about it before, but playing around I can find no straightforward way to make one due to the lack of a way to make destructive modification. Is this just an essential flaw of functional languages, and if so, what about lazy functional languages like haskell?
In Common Lisp you can modify list contents, array contents, slots of CLOS instances, etc.
Common Lisp also allows to read and write circular data structures. Use
? (setf *print-circle* t)
T
; a list of two symbols: (foo bar)
? (defvar *ex1* (list 'foo 'bar))
*EX1*
; now let the first list element point to the list,
; Common Lisp prints the circular list
? (setf (first *ex1*) *ex1*)
#1=(#1# BAR)
; one can also read such a list
? '#1=(#1# BAR)
#1=(#1# BAR)
; What is the first element? The list itself
? (first '#1=(#1# BAR))
#1=(#1# BAR)
?
So-called pure Functional Programming Languages don't allow side-effects. Most Lisp dialects are not pure. They allow side-effects and they allow to modify data-structures.
See Lisp introduction books for more on that.
In Scheme, you can do it easily with set!, set-car!, and set-cdr! (and anything else ending in a bang ('!'), which indicates modification):
(let ((x '(1 2 3)))
(set-car! x x)
; x is now the list (x 2 3), with the first element referring to itself
)
Common Lisp supports modification of data structures with setf.
You can build a circular data structure in Haskell by tying the knot.
You don't need `destructive modification' to construct self-referential data structures; e.g., in Common Lisp, '#1=(#1#) is a cons-cell that contains itself.
Scheme and Lisp are capable of making destructive modifications: you can construct the circular cons above alternatively like this:
(let ((x (cons nil nil)))
(rplaca x x) x)
Can you let us know what material you're using while learning Lisp/Scheme? I'm compiling a target list for our black helicopters; this spreading of misinformation about Lisp and Scheme has to be stopped.
Yes, and they can be useful. One of my college professors created a Scheme type he called Medusa Numbers. They were arbitrary precision floating point numbers that could include repeating decimals. He had a function:
(create-medusa numerator denominator) ; or some such
which created the Medusa Number that represented the rational. As a result:
(define one-third (create-medusa 1 3))
one-third => ; scheme hangs - when you look at a medusa number you turn to stone
(add-medusa one-third (add-medusa one-third one-third)) => 1
as said before, this is done with judicious application of set-car! and set-cdr!
Not only is it possible, it's pretty central to the Common Lisp Object System: standard-class is an instance of itself!
I upvoted the obvious Scheme techniques; this answer addresses only Haskell.
In Haskell you can do this purely functionally using let, which is considered good style. One nice example is regexp-to-NFA conversion. You can also do it imperatively using IORefs, which is considered poor style as it forces all your code into the IO monad.
In general Haskell's lazy evaluation lends itself to lovely functional implementations of both cyclic and infinite data structures. In any complex let binding, all things bound may be used in all definitions. For example translating a particular finite-state machine into Haskell is a snap, no matter how many cycles it may have.
CLOS example:
(defclass node ()
((child :accessor node-child :initarg :child)))
(defun make-node-cycle ()
(let* ((node1 (make-instance 'node))
(node2 (make-instance 'node :child node1)))
(setf (node-child node1) node2)))
Tying the Knot (circular data structures in Haskell) on StackOverflow
See also the Haskell Wiki page: Tying the Knot
Hmm, self referential data structures in Lisp/Scheme, and SICP streams are not mentioned? Well, to summarize, streams == lazily evaluated list. It might be exactly the kind of self reference you've intended, but it's a kind of self reference.
So, cons-stream in SICP is a syntax that delays evaluating its arguments. (cons-stream a b) will return immediately without evaluating a or b, and only evaluates a or b when you invoke car-stream or cdr-stream
From SICP, http://mitpress.mit.edu/sicp/full-text/sicp/book/node71.html:
>
(define fibs
(cons-stream 0
(cons-stream 1
(add-streams (stream-cdr fibs)
fibs))))
This definition says that fibs is a
stream beginning with 0 and 1, such
that the rest of the stream can be
generated by adding fibs to itself
shifted by one place:
In this case, 'fibs' is assigned an object whose value is defined lazily in terms of 'fibs'
Almost forgot to mention, lazy streams live on in the commonly available libraries SRFI-40 or SRFI-41. One of these two should be available in most popular Schemes, I think
I stumbled upon this question while searching for "CIRCULAR LISTS LISP SCHEME".
This is how I can make one (in STk Scheme):
First, make a list
(define a '(1 2 3))
At this point, STk thinks a is a list.
(list? a)
> #t
Next, go to the last element (the 3 in this case) and replace the cdr which currently contains nil with a pointer to itself.
(set-cdr! (cdr ( cdr a)) a)
Now, STk thinks a is not a list.
(list? a)
> #f
(How does it work this out?)
Now if you print a you will find an infinitely long list of (1 2 3 1 2 3 1 2 ... and you will need to kill the program. In Stk you can control-z or control-\ to quit.
But what are circular-lists good for?
I can think of obscure examples to do with modulo arithmetic such as a circular list of the days of the week (M T W T F S S M T W ...), or a circular list of integers represented by 3 bits (0 1 2 3 4 5 6 7 0 1 2 3 4 5 ..).
Are there any real-world examples?

Resources