Duplicate lists when defining a list of lists in Common Lisp - compilation

I am defining a variable as a list containing lists of symbols:
(defparameter *var* '((a a a) (a x a) (a a a)))
When I try to change one of the elements with setf...
(setf (caar *var*) 'c)
... both the first and the last lists are updated.
> *var*
; ((c a a) (a x a) (c a a))
I noticed that when I evaluate the defparameter in the REPL, the setf command works as expected. This makes me think that the unexpected behaviour is related to the compilation process.
Questions:
What is happening and why?
What would be the canonical way of defining a list of fresh lists containing the same symbols in defparameter?
I am using SBCL.
Edit: My question is not similar to this question, because I am not asking how to copy lists so that they do not share structure, but rather why when compiling a defparameter lists with similar elements appear to share structure and how to define them so they do not.

This is a frequently asked question.
Why does SBCL show this implementation dependent behavior?
Try after loading the compiled file (use COMPILE-FILE to compile it):
* (eq (first *var*) (third *var*))
T
* *var*
((A A A) (A X A) (A A A))
* (setf *print-circle* t)
T
* *var*
(#1=(A A A) (A X A) #1#)
*
Above shows that the first and third sublists are the same.
Running it in the REPL is left as an exercise. It may show different results.
In ANSI Common Lisp the File Compiler is allowed to coalesce similar list data. (a a a) and (a a a) are similar. This may save memory space in a compiled program. Not every Common Lisp implementation does that, but SBCL does it. Remember, when the book Common Lisp the Language, 1st edition was published in 1984, the first Apple Macintosh had just 128KB RAM.
Solution: use COPY-TREE:
To create a fresh & non coalesced copy of the nested list use the function COPY-TREE. Fresh means that it is not literal data and thus one is allowed to modify it. COPY-TREE copies all levels of a cons tree, COPY-LIST only copies the top list. Since your data is nested, we need COPY-TREE:
(defparameter *var*
(copy-tree '((a a a) (a x a) (a a a))))

When you execute:
(setf (caar *var*) 'c)
You are relying on undefined behavior, because you are mutating a literal, constant value. This is explained in the specification of QUOTE but the same applies to all literal values:
The consequences are undefined if literal objects (including quoted objects) are destructively modified.
More generally you have to take care of ownership when deciding if you have the right to mutate or not a list (or any object). If you know that a list was just freshly consed, you can mutate it but sometimes the list is directly given to you and you should refrain from touching it. This is the case for example for &rest lists (see APPLY):
conforming programs must neither rely on the list structure of a rest list to be freshly consed
You can often avoid mutations by consing element in front of an existing list (if the list is supposed to be a property list (plist) or an association list (alist)). For example in case your function accepts a keyword argument named a:
(defun foo (&key a) (list a))
Then you can call it with as many :a as you want, only the first is taken into account:
(foo :a 3 :a 2 :a 1)
=> (3)
(apply 'foo (list :a 3 :a 2 :a 1))
=> (3)
But in cases where you do want to change the list significantly, you must use functions that produce a fresh copy (like remove or copy-list).

Related

Is it legal to modify a list created using quasiquote?

From my understanding, it is not legal to modify a list created using quote:
(let ((numbers '(3 2 1)))
(set-car! numbers 99) ; Illegal.
numbers)
What about lists created using quasiquote? Is it legal to modify lists created using quasiquote?
(let ((numbers `(3 2 1)))
(set-car! numbers 99) ; Legal?
numbers)
(let ((numbers `(,(+ 1 2) 2 1)))
(set-car! numbers 99) ; Legal?
numbers)
The short answer is no, this isn't "legal", and certainly this should never be done in a program that aims to be portable. R6RS and R7RS have almost identical language around this, so I'll just quote from R6RS, Section 11.17 Quasiquotation:
A quasiquote expression may return either fresh, mutable objects or literal structure for any structure that is constructed at run time during the evaluation of the expression. Portions that do not need to be rebuilt are always literal.
Section 4.2.8 of R7RS has the same language, except that it says "newly allocated" instead of "fresh".
Since it is an error to attempt to modify literals in Scheme, it is an error to modify the result of a quasiquote form. This is something that you may seem get away with sometimes, but it will bite you sooner or later. The real catch here is that "portions that do not need to be rebuilt are always literal". Other portions may or may not be literal.
More specifically for OP posted code, `(3 2 1) is guaranteed to evaluate to a list literal by the semantics of quasiquote described in Section 11.17 of R6RS:
Semantics: If no unquote or unquote-splicing forms appear within the <qq template>, the result of evaluating (quasiquote <qq template>) is equivalent to the result of evaluating (quote <qq template>).
R7RS contains similar language in Section 4.2.8. Since (quote (3 2 1)) creates a list literal, the expression `(3 2 1) must also evaluate to a list literal.
On the other hand, OP code `(,(+ 1 2) 2 1) must evaluate (+ 1 2) and insert that result into the resulting structure. In this case, unquote is used via the , operator, so the resulting list structure may or may not be a list literal.
To take one more example, consider the quasiquoted expression `(,(+ 1 2) (2 1)). Here the main result is a list structure which may or may not be a list literal, but the element (2 1) of the resulting structure is guaranteed to be a list literal since it does not need to be rebuilt in the construction of the final result.

How to use symbols and lists in scheme to process data?

I am a newbie in scheme, and I am in the process of writing a function that checks pairwise disjointess of rules (for the time being is incomplete), I used symbols and lists in order to represent the rues of the grammar. Uppercase symbol is a non-terminal in the grammar, and lowercase is a terminal. I am trying to check if a rule passes the pairwise disjointness test.
I will basically check if a rule has only one unique terminal in it. if it is the case, that rule passes the pairwise disjointness test. In scheme, I am thinking to realize that by representing the terminal symbol in lower case. An example of that rule would be:
'(A <= (a b c))
I will then check the case of a rule that contains an or. like:
'(A <= (a (OR (a b) (a c))))
Finally, I will check recursively for non terminals. A rule for that case would be
'(A <= (B b c))
However, What is keeping me stuck is how to use those symbols as data in order to be processed and recurse upon it. I thought about converting the symbols to strings, but that did not in case of having a list like that for example '(a b c) How can I do it?
Here is what I reached so far:
#lang racket
(define grammar
'(A <= (a A b))
)
(define (pairwise-disjoint lst)
(print(symbol->string (car lst)))
(print( cddr lst))
)
Pairwise Disjoint
As far as I know, the only way to check if a set is pairwise disjoint is to enumerate every possible pair and check for matches. Note that this does not follow the racket syntax, but the meaning should still be pretty clear.
(define (contains-match? x lst)
(cond ((null? x) #f) ; Nothing to do
((null? lst) #f) ; Finished walking full list
((eq? x (car lst)) #t) ; Found a match, no need to go further
(else
(contains-match? x (cdr lst))))) ; recursive call to keep walking
(define (pairwise-disjoint? lst)
(if (null? lst) #f
(let ((x (car lst)) ; let inner vars just for readability
(tail (cdr lst)))
(not
;; for each element, check against all later elements in the list
(or (contains-match? x tail)
(contains-match? (car tail) (cdr tail)))))))
It's not clear to me what else you're trying to do, but this is the going to be the general method. Depending on your data, you may need to use a different (or even custom-made) check for equality, but this works as is for normal symbols:
]=> (pairwise-disjoint? '(a b c d e))
;Value: #t
]=> (pairwise-disjoint? '(a b c d e a))
;Value: #f
Symbols & Data
This section is based on what I perceive to be a pretty fundamental misunderstanding of scheme basics by OP, and some speculation about what their actual goal is. Please clarify the question if this next bit doesn't help you!
However, What is keeping me stuck is how to use those symbols as data...
In scheme, you can associate a symbol with whatever you want. In fact, the define keyword really just tells the interpreter "Whenever I say contains-match? (which is a symbol) I'm actually referring to this big set of instructions over there, so remember that." The interpreter remembers this by storing the symbol and the thing it refers to in a big table so that it can be found later.
Whenever the interpreter runs into a symbol, it will look in its table to see if it knows what it actually means and substitute the real value, in this case a function.
]=> pairwise-disjoint?
;Value 2: #[compound-procedure 2 pairwise-disjoint?]
We tell the interpreter to keep the symbol in place rather than substituting by using the quote operator, ' or (quote ...):
]=> 'pairwise-disjoint?
;Value: pairwise-disjoint?
All that said, using define for your purposes is probably a really poor decision for all of the same reasons that global variables are generally bad.
To hold the definitions of all your particular symbols important to the grammar, you're probably looking for something like a hash table where each symbol you know about is a key and its particulars are the associated value.
And, if you want to pass around symbols, you really need to understand the quote and quasiquote.
Once you have your definitions somewhere that you can find them, the only work that's left to you is writing something like I did above that is maybe a little more tailored to your particular situation.
Data Types
If you have Terminals and Non-Terminals, why not make data-types for each? In #lang racket the way to introduce new data type is with struct.
;; A Terminal is just has a name.
(struct Terminal (name))
;; A Non-terminal has a name and a list of terms
;; The list of terms may contain Terminals, Non-Terminals, or both.
(struct Non-terminal (name terms))
Processing Non-terminals
Now we can find the Terminals in a Non-Terminal's list of terms using the predicate Terminal? which is provided automatically when we define the Terminal as a struct.
(define (find-terminals non-terminal)
(filter Terminal? (Non-terminal-terms non-terminal)))
Pairwise Disjoint Terminals
Once we have filtered the list of terms we can determine properties:
;; List(Terminal) -> Boolean
define (pairwise-disjoint? terminals)
(define (roundtrip terms)
(set->list (list->set terms)))
(= (length (roundtrip terminals)
(length terminals))))
The round trip list->set->list isn't necessarily optimized for speed, of course and profiling actual working implementations may justify refactoring, but at least it's been black-boxed.
Notes
Defining data types with struct provides all sorts of options for validating data as the type is instantiated. If you look at the Racket code base, you will see struct used frequently in the more recent portions.
Since grammar has a list within a list, I think you'll have to either test via list? before calling symbol->string (since, as you discovered, symbol->string won't work on a list), or else you could do something like this:
(map symbol->string (flatten grammar))
> '("A" "<=" "a" "A" "b")
Edit: For what you're doing, i guess the flatten route might not be that helpful. so ya, test via list? each time when parsing and handle accordingly.

Common Lisp: What is the downside to using this filter function on very large lists?

I want to filter out all elements of list 'a from list 'b and return the filtered 'b. This is my function:
(defun filter (a b)
"Filters out all items in a from b"
(if (= 0 (length a)) b
(filter (remove (first a) a) (remove (first a) b))))
I'm new to lisp and don't know how 'remove does its thing, what kind of time will this filter run in?
There are two ways to find out:
you could test it with data
you could analyze your source code
Let's look at the source code.
lists are built of linked cons cells
length needs to walk once through a list
for EVERY recursive call of FILTER you compute the length of a. BAD!
(Use ENDP instead.)
REMOVE needs to walk once through a list
for every recursive call you compute REMOVE twice: BAD!
(Instead of using REMOVE on a, recurse with the REST.)
the call to FILTER will not necessarily be an optimized tail call.
In some implementations it might, in some you need to tell the compiler
that you want to optimize for tail calls, in some implementations
no tail call optimization is available. If not, then you get a stack
overflow on long enough lists.
(Use looping constructs like DO, DOLIST, DOTIMES, LOOP, REDUCE, MAPC, MAPL, MAPCAR, MAPLIST, MAPCAN, or MAPCON instead of recursion, when applicable.)
Summary: that's very naive code with poor performance.
Common Lisp provides this built in: SET-DIFFERENCE should do what you want.
http://www.lispworks.com/documentation/HyperSpec/Body/f_set_di.htm#set-difference
Common Lisp does not support tail-call optimization (as per the standard) and you might just run out of memory with an abysmal call-stack (depending on the implementation).
I would not write this function, becuase, as Rainer Joswig says, the standard already provides SET-DIFFERENCE. Nonetheless, if I had to provide an implementation of the function, this is the one I would use:
(defun filter (a b)
(let ((table (make-hash-table)))
(map 'nil (lambda (e) (setf (gethash e table) t)) a)
(remove-if (lambda (e) (gethash e table)) b)))
Doing it this way provides a couple of advantages, the most important one being that it only traverses b once; using a hash table to keep track of what elements are in a is likely to perform much better if a is long.
Also, using the generic sequence functions like MAP and REMOVE-IF mean that this function can be used with strings and vectors as well as lists, which is an advantage even over the standard SET-DIFFERENCE function. The main downside of this approach is if you want extend the function with a :TEST argument that allows the user to provide an equality predicate other than the default EQL, since CL hash-tables only work with a small number of pre-defined equality predicates (EQ, EQL, EQUAL and EQUALP to be precise).
(defun filter (a b)
"Filters out all items in a from b"
(if (not (consp a)) b
(filter (rest a) (rest b))))

Append! in Scheme?

I'm learning R5RS Scheme at the moment (from PocketScheme) and I find that I could use a function that is built into some variants of Scheme but not all: Append!
In other words - destructively changing a list.
I am not so much interested in the actual code as an answer as much as understanding the process by which one could pass a list as a function (or a vector or string) and then mutate it.
example:
(define (append! lst var)
(cons (lst var))
)
When I use the approach as above, I have to do something like (define list (append! foo (bar)) which I would like something more generic.
Mutation, though allowed, is strongly discouraged in Scheme. PLT even went so far as to remove set-car! and set-cdr! (though they "replaced" them with set-mcar! and set-mcdr!). However, a spec for append! appeared in SRFI-1. This append! is a little different than yours. In the SRFI, the implementation may, but is not required to modify the cons cells to append the lists.
If you want to have an append! that is guaranteed to change the structure of the list that's being appended to, you'll probably have to write it yourself. It's not hard:
(define (my-append! a b)
(if (null? (cdr a))
(set-cdr! a b)
(my-append! (cdr a) b)))
To keep the definition simple, there is no error checking here, but it's clear that you will need to pass in a list of length at least 1 as a, and (preferably) a list (of any length) as b. The reason a must be at least length 1 is because you can't set-cdr! on an empty list.
Since you're interested in how this works, I'll see if I can explain. Basically, what we want to do is go down the list a until we get to the last cons pair, which is (<last element> . null). So we first see if a is already the last element in the list by checking for null in the cdr. If it is, we use set-cdr! to set it to the list we're appending, and we're done. If not, we have to call my-append! on the cdr of a. Each time we do this we get closer to the end of a. Since this is a mutation operation, we're not going to return anything, so we don't need to worry about forming our modified list as the return value.
Better late than never for putting in a couple 2-3 cents on this topic...
(1) There's nothing wrong with using the destructive procedures in Scheme while there is a single reference to the stucture being modified. So for example, building a large list efficiently, piecemeal via a single reference - and when complete, making that (now presumably not-to-be-modified) list known and referred to from various referents.
(2) I think APPEND! should behave like APPEND, only (potentially) destructively. And so APPEND! should expect any number of lists as arguments. Each list but the last would presumably be SET-CDR!'d to the next.
(3) The above definition of APPEND! is essentially NCONC from Mac Lisp and Common Lisp. (And other lisps).

Self-referential data structures in Lisp/Scheme

Is there a way to construct a self-referential data structure (say a graph with cycles) in lisp or scheme? I'd never thought about it before, but playing around I can find no straightforward way to make one due to the lack of a way to make destructive modification. Is this just an essential flaw of functional languages, and if so, what about lazy functional languages like haskell?
In Common Lisp you can modify list contents, array contents, slots of CLOS instances, etc.
Common Lisp also allows to read and write circular data structures. Use
? (setf *print-circle* t)
T
; a list of two symbols: (foo bar)
? (defvar *ex1* (list 'foo 'bar))
*EX1*
; now let the first list element point to the list,
; Common Lisp prints the circular list
? (setf (first *ex1*) *ex1*)
#1=(#1# BAR)
; one can also read such a list
? '#1=(#1# BAR)
#1=(#1# BAR)
; What is the first element? The list itself
? (first '#1=(#1# BAR))
#1=(#1# BAR)
?
So-called pure Functional Programming Languages don't allow side-effects. Most Lisp dialects are not pure. They allow side-effects and they allow to modify data-structures.
See Lisp introduction books for more on that.
In Scheme, you can do it easily with set!, set-car!, and set-cdr! (and anything else ending in a bang ('!'), which indicates modification):
(let ((x '(1 2 3)))
(set-car! x x)
; x is now the list (x 2 3), with the first element referring to itself
)
Common Lisp supports modification of data structures with setf.
You can build a circular data structure in Haskell by tying the knot.
You don't need `destructive modification' to construct self-referential data structures; e.g., in Common Lisp, '#1=(#1#) is a cons-cell that contains itself.
Scheme and Lisp are capable of making destructive modifications: you can construct the circular cons above alternatively like this:
(let ((x (cons nil nil)))
(rplaca x x) x)
Can you let us know what material you're using while learning Lisp/Scheme? I'm compiling a target list for our black helicopters; this spreading of misinformation about Lisp and Scheme has to be stopped.
Yes, and they can be useful. One of my college professors created a Scheme type he called Medusa Numbers. They were arbitrary precision floating point numbers that could include repeating decimals. He had a function:
(create-medusa numerator denominator) ; or some such
which created the Medusa Number that represented the rational. As a result:
(define one-third (create-medusa 1 3))
one-third => ; scheme hangs - when you look at a medusa number you turn to stone
(add-medusa one-third (add-medusa one-third one-third)) => 1
as said before, this is done with judicious application of set-car! and set-cdr!
Not only is it possible, it's pretty central to the Common Lisp Object System: standard-class is an instance of itself!
I upvoted the obvious Scheme techniques; this answer addresses only Haskell.
In Haskell you can do this purely functionally using let, which is considered good style. One nice example is regexp-to-NFA conversion. You can also do it imperatively using IORefs, which is considered poor style as it forces all your code into the IO monad.
In general Haskell's lazy evaluation lends itself to lovely functional implementations of both cyclic and infinite data structures. In any complex let binding, all things bound may be used in all definitions. For example translating a particular finite-state machine into Haskell is a snap, no matter how many cycles it may have.
CLOS example:
(defclass node ()
((child :accessor node-child :initarg :child)))
(defun make-node-cycle ()
(let* ((node1 (make-instance 'node))
(node2 (make-instance 'node :child node1)))
(setf (node-child node1) node2)))
Tying the Knot (circular data structures in Haskell) on StackOverflow
See also the Haskell Wiki page: Tying the Knot
Hmm, self referential data structures in Lisp/Scheme, and SICP streams are not mentioned? Well, to summarize, streams == lazily evaluated list. It might be exactly the kind of self reference you've intended, but it's a kind of self reference.
So, cons-stream in SICP is a syntax that delays evaluating its arguments. (cons-stream a b) will return immediately without evaluating a or b, and only evaluates a or b when you invoke car-stream or cdr-stream
From SICP, http://mitpress.mit.edu/sicp/full-text/sicp/book/node71.html:
>
(define fibs
(cons-stream 0
(cons-stream 1
(add-streams (stream-cdr fibs)
fibs))))
This definition says that fibs is a
stream beginning with 0 and 1, such
that the rest of the stream can be
generated by adding fibs to itself
shifted by one place:
In this case, 'fibs' is assigned an object whose value is defined lazily in terms of 'fibs'
Almost forgot to mention, lazy streams live on in the commonly available libraries SRFI-40 or SRFI-41. One of these two should be available in most popular Schemes, I think
I stumbled upon this question while searching for "CIRCULAR LISTS LISP SCHEME".
This is how I can make one (in STk Scheme):
First, make a list
(define a '(1 2 3))
At this point, STk thinks a is a list.
(list? a)
> #t
Next, go to the last element (the 3 in this case) and replace the cdr which currently contains nil with a pointer to itself.
(set-cdr! (cdr ( cdr a)) a)
Now, STk thinks a is not a list.
(list? a)
> #f
(How does it work this out?)
Now if you print a you will find an infinitely long list of (1 2 3 1 2 3 1 2 ... and you will need to kill the program. In Stk you can control-z or control-\ to quit.
But what are circular-lists good for?
I can think of obscure examples to do with modulo arithmetic such as a circular list of the days of the week (M T W T F S S M T W ...), or a circular list of integers represented by 3 bits (0 1 2 3 4 5 6 7 0 1 2 3 4 5 ..).
Are there any real-world examples?

Resources