Representing A Tree in Clojure - data-structures

What would be an idiomatic way to represent a tree in Clojure? E.g.:
A
/ \
B C
/\ \
D E F
Performance is not important and the trees won't grow past 1000 elements.

'(A (B (D) (E)) (C (F)))

There's a scary way of doing it using just cons:
(defn mktree
([label l r] (cons label (cons l r)))
([leaf] (cons leaf (cons nil nil))))
(defn getlabel [t] (first t))
(defn getchildren [t] (rest t))
(defn getleft [t] (first (getchildren t)))
(defn getright [t] (rest (getchildren t)))
Note that children isn't a list; it's a pair. If your trees aren't just binary, you could make it a list. use nil when there's no left or right child, of course.
Otherwise, see this answer.
The tree in your picture:
(mktree 'A (mktree 'B (mktree 'D) (mktree 'E)) (mktree 'C nil (mktree 'F)))

Trees underly just about everything in Clojure because they lend themselves so nicely to structural sharing in persistent data structure. Maps and Vectors are actually trees with a high branching factor to give them bounded lookup and insert time. So the shortest answer I can give (though it's not really that useful) is that I really recommend Purely functional data structures by Chris Okasaki for a real answer to this question. Also Rich Hickey's video on Clojure data structures on blip.tv
(set 'A 'B 'C)

Related

How does one make symbols that refer to structure slots in lisp?

I am self learning lisp and thought a nice non-trivial program would be to write a set of standard tree insertion and manipulation routines. I figured it could be done with CONS but wanted to try it with a structure.
I put together one version that worked:
(defstruct treenode data left right)
(defun tree-insert ( value tree )
"Insert data into tree"
(if tree
(if (< value (treenode-data tree))
(setf (treenode-left tree) (tree-insert value (treenode-left tree)))
(setf (treenode-right tree) (tree-insert value (treenode-right tree))))
(setf tree (make-treenode :data value)))
tree)
which rebuilt the tree at every step which seemed computationally inefficient. By inefficient, I mean that I have to use setf every time I do another level of recursion. So I wanted to try a scheme that passed the tree by reference rather than by value so I could make assignments in the subroutine that inserts into a tree.
I cobbled the following together, which does not work (but give me credit for having comments):
(defstruct treenode data left right)
(defun tree-insert ( value tree )
"Insert data value into tree, using pass by reference.
value A datum to insert, in this version has to be a number.
tree The tree passed as a symbol."
(setq tval (symbol-value tree))
(if (eq tval nil)
(set tree (make-treenode :data value)) ; Empty tree. Place data here.
(if (< value (treenode-data tval)) ; Non-empty node. Decide which subtree for insert.
(tree-insert value (treenode-left tval)) ; Left side
(tree-insert value (treenode-right tval)))) ; Right side. This is a stable sort.
nil)
? (setf tr nil)
NIL
? (tree-insert 10 'tr)
NIL
? tr
#S(TREENODE :DATA 10 :LEFT NIL :RIGHT NIL)
?
The initial insert works fine. Passing a symbol the (set tree ...) correctly inserts the structure with left and right porinters nil.
Of course, the problem that follows is that on the recursive call to tree-insert I am not passing a symbol.
That is the hangup. I haven't found a way to refer to a structure slot as a symbol that I can then pass to tree-insert.
I've been looking around for a couple of days and found this interesting comment about the defstruct macro: "defstruct not only defines an access function for each slot, but also arranges for setf to work properly on such access functions, defines a predicate named name-p, defines a constructor function named make-name, and defines a copier function named copy-name. All names of automatically created functions are interned in whatever package is current at the time the defstruct form is processed (see package). Also, all such functions may be declared inline at the discretion of the implementation to improve efficiency; if you do not want some function declared inline, follow the defstruct form with a notinline declaration to override any automatic inline declaration."
So, what could I do to do the magic that setf does? I know I can do assignments to slots with setf, but I haven't gotten setf to work in a function due to lexical scope rules. Maybe like adding automatic functions to allow symbol generating, like (treenode-data-symbol tr)?
Of course, lisp programmers have dealt with binary trees since before my first PDP-8/L. What's the lispy way to do this?
This is an edited question. User Rainer Joswig gave a very fast and concise response. I learned a lot from the example he gave. I was interested in the issue of modifying the tree directly rather than using a return value from a function.
From the comments I've seen here, and the one answer by Rainer Joswig, should I draw the conclusion that the pointer manipulation is computationally low in cost, and that the best lisp approach is to use a function that returns a tree rather than relying on an approach of modifying the argument?
simple version for your inspiration:
(defstruct node a b v)
(defun insert-tree (tree value)
(cond ((null tree)
(setf tree (make-node :v value)))
((> (node-v tree)
value)
(setf (node-a tree)
(insert-tree (node-a tree) value)))
(t
(setf (node-b tree)
(insert-tree (node-b tree) value))))
tree)
using it:
CL-USER 171 > (let ((tree nil))
(loop for i in '(4 7 3 5 9 10 11 8)
do (setf tree (insert-tree tree i)))
(pprint tree)
(values))
#S(NODE :A #S(NODE :A NIL :B NIL :V 3)
:B #S(NODE :A #S(NODE :A NIL :B NIL :V 5)
:B #S(NODE :A #S(NODE :A NIL :B NIL :V 8)
:B #S(NODE :A NIL
:B #S(NODE :A NIL
:B NIL
:V 11)
:V 10)
:V 9)
:V 7)
:V 4)
Now, if wanted to do less setf operations, we could check whether the returned subtree is the same which we passed. This will only not be the case when we create a new node.
(defun insert-tree (tree value)
(cond ((null tree)
(setf tree (make-node :v value)))
((> (node-v tree)
value)
(let ((new-tree (insert-tree (node-a tree) value)))
(unless (eql new-tree (node-a tree))
(setf (node-a tree) new-tree))))
(t
(setf (node-b tree)
(insert-tree (node-b tree) value))))
tree)
or with a local macro hiding part of the code:
(defun insert-tree (tree value)
(macrolet ((insert (place call &aux (new-value-sym (gensym "new-value")))
`(let ((,new-value-sym ,call))
(unless (eql ,place ,new-value-sym)
(setf ,place ,new-value-sym)))))
(cond ((null tree)
(setf tree (make-node :v value)))
((> (node-v tree)
value)
(insert (node-a tree) (insert-tree (node-a tree) value)))
(t
(insert (node-b tree) (insert-tree (node-b tree) value))))
tree))
Trying to add an answer from another angle.
In standard Common Lisp structures have a bunch of limitations to make them low-level and efficient to use. Among those limitations:
access to structure slots via slot names is undefined. some implementations do it, others not.
redefining a structure definition has undefined consequences. This means that in some cases, one best restarts Lisp to do that...
The idea behind that: all operations to structures should be able to be inlined and an executing program should not need any further information about structure slots (their names, their memory locations, ...). There would be no dynamic lookup at runtime.
Then Common Lisp in general has this further limitation: it has no first class pointers. There is no mechanism to provide a pointer only pointing directly to the slot of a structure. In some older Lisp dialects this might be possible via a concept of locatives - pointers in those languages. Common Lisp does not support that.
This means practically: to update a slot of a structure, one needs access to the structure and a setter operation.
How do update the slot of a structure?
I can think of two simple ways:
pass the structure, a new value and an indicator what to update -> then dispatch on the indicator and call the right updater
Example
(defun update (s indicator value)
(case indicator
(:a (setf (node-a s) value))
(:b (setf (node-b s) value))))
(update tree :a (make-node :v 100))
Pass a closure, which does the update
Example:
(let ((tree ...))
(flet ((do-something (updater)
(funcall updater (make-node :v 100))))
(do-something (lambda (value) (setf (node-a tree) value) ...)))
With much thanks to Rainer and Will, I understand Common Lisp better, now. The point about not having first class pointers is huge. I don't have to keep looking for that anymore, though I did see a package that implemented refs in my searches.
The key problem in my approach was that I defined an empty tree as nil. Since passing nil doesn't allow any manipulation of the argument, nil being immutable, the algorithm was bound to fail.
Redefining the empty tree as '(nil) allows the program to work.
;; Make list of 5 random numbers.
(setf r5 (loop for i from 1 to 5 collect (random 100)))
;; Initialize tr to empty tree.
;; Empty tree is '(nil). Tree with data is '(data left right),
;; where left and right are either empty tree or tree with data.
(setf tr '(nil))
(defun tree-insert ( value tree )
"Insert data into tree. tree is modified with an insertion."
(if (equal tree '(nil))
(progn ; Empty (sub)tree. Insert value.
(setf (car tree) value)
(setf (cdr tree) (list (list nil)(list nil))))
(progn ; Non-empty subtree.
(if (< value (car tree))
(tree-insert value (second tree)) ; Insert on left.
(tree-insert value (third tree))))) ; Insert on right.
nil)
;; Load tree with the list of random numbers defined above.
(mapc (lambda (val) (tree-insert val tr)) r5)
(defun tree-walk (tree)
"Retrieve keys in sorted order."
(if (car tree)
(progn
(tree-walk (second tree)) ; Left subtree.
(format t " ~d" (car tree))
(tree-walk (third tree))))) ; Right subtree.
;; Walk the tree.
(tree-walk tr)
Example in use:
? (setf r5 (loop for i from 1 to 5 collect (random 100)))
(22 50 76 20 49)
? (setf tr '(nil))
(NIL)
? (mapc (lambda (val) (tree-insert val tr)) r5)
;Compiler warnings :
; In an anonymous lambda form at position 37: Undeclared free variable TR
(22 50 76 20 49)
? tr
(22 (20 (NIL) (NIL)) (50 (49 (NIL) (NIL)) (76 (NIL) (NIL))))
? (tree-walk tr)
20 22 49 50 76
NIL
?
So, several things to make this work. A mutable object has to be passed to the procedure. In this case I redesigned the structure to be a list, either '(nil) for empty, or '(data left right), where left and right are either '(nil) or '(data left right). A list containing nil can be manipulated. However, I had to use car and cdr to access the structure so as to preserve Lisp's pointer that was passed to the procedure.
Another thing I had to do was not use a list constant in the functon definitions. I'm sure knowledgeable people will be in the know about this, and chucle a bit about the opaque error that follow until the issue is understood, but if I had used '((nil)(nil)) rather than (list (list nil)(list nil)) in tree-insert it would not work. Looks like Lisp compiles list shorthand notation to a pointer to an abject in memory which is used on all subsequent calls of the function.
Oh, there is a leftover progn function call in tree-insert. That was from when I wrapped everything with progn to let me add print statements during debugging.
Running timing on the functions was interesting. It's fast! I'll run some timing comparisons to compare the functional reassignment approach vs the search and insert algorithms.
Thanks, again, to the expert comments. Since last contributing I've learned a little about map, loop/collect, and that variables leak out of functions into global space when let isn't used in function definitions. Also wrapping a funciton with a lot of output with (progn ... nil) saves screen space after large data structures are used. I have learned a lot with just this exercise.

Is there any way to create a copy of a binary tree using a recursive breadth first implementation?

Essentially what I am trying to do is take a binary tree with data definition
binary_tree: number | (symbol binary_tree binary_tree)
and create a new version of the tree where each leaf (a number) is replaced with a counter number. I am trying to do this left-to-right and then top-down, so using a breadth first search seems like the obvious choice to visit every node in order. However, my problem is this. I need to accumulate a new binary tree to return it. Is there any possible way to do this since we are visiting each node in order?
So in short if I have a tree defined like this:
(define bt '(foo (bar 26 12) (baz 11 (quux 117 14))))
i need to process and accumulate a new list such that
(define bt '(foo (bar 0 1) (baz 2 (quux 3 4))))
Here is my code:
(define (number-leaves bst)
(define (helper queue counter)
(cond[(non-empty-queue? queue)
(define x (dequeue! queue))
(cond [(number? x)(cons counter (helper queue (+ 1 counter)))]
[(symbol? (car x))(begin (enqueue! queue (car(cdr x)))
(enqueue! queue (car(cdr(cdr x))))
(cons (list(car x)) (helper queue counter)))])]
['()]))
(begin (define q (make-queue))
(enqueue! q bst)
(helper q 0)))
as of now this function returns
(foo bar baz 0 1 2 quux 3 4)
It seems to me that it is impossible to accumulate into a recursive data definition while processing the tree breadth first. What can I do? (NB: car = first and cdr = rest in the EOPL racket dialect)
I think you will need to do two passes.
breath first while making a lookup based on the pairs that hold the leaf and their incremental new value. It's important to use the pairs since the numbers by themselves are not guaranteed unique, thus (eq? 2 2) can be #t even though they are different places.. Comparing the pairs that holds the twos are guaranteed to be only eq? with the very same value.
A standard post order tree traversal where you fetch the new values from the lookup.
The lookup should be a hash for efficiency, but it can be a assoc list for small trees. If you expect the hash to do O(1) when iterating over all the element twice will still only make it O(n).

Another way of writing a mimum value procedure in Scheme?

So if i have the following, which returns the smallest value out of a set of four numbers:
(define (minimum2 a b c d)
(cond ((and (< a b) (< a c) (< a d)) a)
((and (< b c) (< b d)) b)
((< c d) c)
(else d)))
But, I want to write it so that I compare a to b and find the smallest value between those two, then compare c and d, and find the smallest value between those, and then compare those two smallest values together to find the actual minimum. If what I wrote was tough to understand, think of it like a tournament bracket, where a "plays" b, and the winner plays the other winner between c and d. Thank you in advance for the help!
Here's one way to do it:
(define (min4 a b c d)
(define (min2 x y)
(if (< x y) x y))
(min2 (min2 a b) (min2 c d)))
Another way to do it, if you don't want to use an internal function:
(define (min4 a b c d)
(let ((min-ab (if (< a b) a b))
(min-cd (if (< c d) c d)))
(if (< min-ab min-cd) min-ab min-cd)))
Here are two ways to do this. I think that the first, using reduce, is much more idiomatic, but it's not doing the tournament style structure, though it uses the same number of comparisons. The second, which does a tournament style structure, is actually just a special case of a generalized merge-sort. The reason that the number of comparisons is the same is that in the tournament style comparison,
min(a,b,c,d) = min(min(a,b),min(c,d))
and in the reduce formulation,
min(a,b,c,d) = min(min(min(a,b),c),d)
Both require three calls the lowest level min procedure.
A reduce based approach
This solution uses a fold (more commonly called reduce in Lisp languages, in my experience). Scheme (R5RS) doesn't include reduce or fold, but it's easy to implement:
(define (reduce function initial-value list)
(if (null? list)
initial-value
(reduce function (function initial-value (car list))
(cdr list))))
A left-associative fold is tail recursive and efficient. Given a binary function f, an initial value i, and a list [x1,…,xn], it returns f(f(…f(f(i, x1), x2)…), xn-1), xn).
In this case, the binary function is min2. R5R5 actually already includes an n-ary (well, it actually requires at least one arguments, it's at-least-one-ary) min, which means min would already work as a binary function, but then again, if you wanted to use the built in min, you'd just do (min a b c d) in the first place. So, for the sake of completeness, here's a min2 that accepts exactly two arguments.
(define (min2 a b)
(if (< a b)
a
b))
Then our n-ary min* is simply a reduction of min2 over an initial value and a list. We can use the . notation in the argument list to make this a variadic function that requires at least one argument. This means that we can do (min* x) => x, in addition to the more typical many-argument calls.
(define (min* a . rest)
(reduce min2 a rest))
For example:
(min* 4 2 1 3)
;=> 1
A true tournament-style solution based on merge sort
A proper tournament style min is actually isomorphic to merge sort. Merge sort recursively splits a list in half (this can be done in place using the indices of the original list, as opposed to actually splitting the list into new lists), until lists of length one are produced. Then adjacent lists are merged to produce lists of length two. Then, adjacent lists of length two are merged to produce lists of length four, and so on, until there is just one sorted list. (The numbers here don't always work out perfectly if the length of the input list isn't a power of two, but the same principle applies.) If you write an implementation of merge sort that takes the merge function as a parameter, then you can have it return the one element list that contains the smaller value.
First, we need a function to split a list into left and right sides:
(define (split lst)
(let loop ((left '())
(right lst)
(len (/ (length lst) 2)))
(if (< len 1)
(list (reverse left) right)
(loop (cons (car right) left)
(cdr right)
(- len 1)))))
> (split '(1 2 3 4))
((1 2) (3 4))
> (split '(1))
(() (1))
> (split '(1 2 3))
((1) (2 3))
Merge sort is now pretty easy to implement:
(define (merge-sort list merge)
(if (or (null? list) (null? (cdr list)))
list
(let* ((sides (split list))
(left (car sides))
(right (cadr sides)))
(merge (merge-sort left merge)
(merge-sort right merge)))))
We still need the merge procedure. Rather than the standard one that takes two lists and returns a list of their sorted elements, we need one that can take two lists, where each has at most one element, and at most one of the lists may be empty. If either list is empty, the non-empty list is returned. If both lists are non-empty, then the one with the smaller element is returned. I've called it min-list.
(define (min-list l1 l2)
(cond
((null? l1) l2)
((null? l2) l1)
(else (if (< (car l1) (car l2))
l1
l2))))
In this case, you can define min* to make a call to merge-sort, where the merge procedure is min-list. Merge-sort will return a list containing one element, so we need car to take that element from the list.
(define (min* a . rest)
(car (merge-sort (cons a rest) min-list)))
(min* 7 2 3 6)
;=> 2

How would I go about improving the efficiency (Scheme)?

I have this code:
(define tree `(A (B (C)) (D (E)) (C (E))))
(define (prog1 graph)
(let ([seen `()])
(define (sub g)
(cond
[(member (car g) seen) `()]
[else
(set! seen (cons (car g) seen))
(cond
[(null? (cdr g)) (list (car g))]
[else
(cons (car g) (map sub (cdr g)))])]))
(delete `() (sub graph))))
(define delete
(lambda (x y)
(if (null? y )
`()
(if (eqv? (car y) x)
(delete x (cdr y))
(cons (car y) (delete x (cdr y)))))))
It prints a connected graph where all the nodes appear once.
Running (prog1 tree)
prints: (A (B (C)) (D (E)))
I have looked at various depth-first searches in lisp (something which is similar to what I'm trying to do) and they appear to be much more elegant to this, some using iterative approaches. I am aware that the program isn't very efficient (on huge trees it runs pretty slow) so how would I go about improving the efficiency of this code?
Thanks, James
The member procedure performs an O(n) lookup on lists every time it's invoked. That's not what we want for quickly testing set membership, for that you should use a data structure providing an O(1) complexity for both adding elements and testing element membership in a collection, ideally a Set data structure or in its place a Hash Table. For example, in Racket try replacing these lines (or use the default hash table implementation in your Scheme interpreter):
(let ([seen `()]) => (let ([seen (make-hash)])
[(member (car g) seen) `()] => [(hash-has-key? seen (car g)) '()]
(set! seen (cons (car g) seen)) => (hash-set! seen (car g) 'ok)
Also, in general you want to use quotes in your code: '() instead of quasiquotes: `(), see the links to understand the difference and when it's appropriate to use quasiquoting.
Finally, you can use the built-in remove procedure, there's no need to implement your own delete.
In most cases the bottleneck in this code will not be the tree traversal, but the member lookup. The complexity of your function seems to be roughly O(M*N), where M is the number of distinct nodes and N is the number of total nodes. The reason why M goes into this as a factor is because you're looking up nodes in a linear list, which takes time proportional to the length of the list (which in your case is proportional to the number of distinct nodes).
The way to get rid of the M is to use a more efficient data structure for the lookup. R6RS defines hash tables, for instance.

How do I generate all permutations of certain size with repetitions in Scheme?

I am learning Scheme and I am trying to generate permutations with repetitions of certain size.
For example, given n=4 and set S = {a, b, c, d, e, f}, I'd like to generate all possible permutations: {a,a,a,a},{a,a,a,b},...,{a,a,a,f},{a,a,b,a},{a,a,b,b},...,{a,a,b,f},...{f,a,a,a},{f,a,a,b}...,{f,a,a,f},...{f,f,f,f}.
The trouble is that I can't understand how to pick 'a' 4 times, and remember that i had picked it 4 times, then pick 'a' 3 times, and 'b' one time, and remember all this, so I don't pick it again.
I know that these kinds of problems are best solved with recursive algorithms, but it just makes everything more complicated, like, how do I remember in the recursion, what elements have I picked.
I don't know how to approach this problem at all. I would be very glad if someone wrote out the thought process of solving this problem. I'd appreciate it very much!
Please help me.
Thanks, Boda Cydo.
It's good to start from the procedure's interface and expected results. Your procedure is going to be called (permutations size elements) and is expected to return a list of permutations of the items in ELEMENTS, each permutation being SIZE items long. Figure you're going to represent a "permutation" as a list. So if you called (permutations 1 '(a b c)) you'd expect an output of ((a) (b) (c)).
So the trick about recursive procedures, is you have to figure out what the base condition is that you can answer easily, and the recursive step which you can answer by modifying the solution of a simpler problem. For PERMUTATIONS, figure the recursive step is going to involve decreasing SIZE, so the base step is going to be when SIZE is 0, and the answer is a list of a zero-length permutation, i. e. (()).
To answer the recursive step, you have to figure out what to do to the result for size N - 1 to get a result for size N. To do this, it can help to write out some expected results for small N and see if you can discern a pattern:
ELEMENTS = (a b)
SIZE (PERMUTATIONS SIZE ELEMENTS)
0 ( () )
1 ( (a) (b) )
2 ( (a a) (a b) (b a) (b b) )
3 ( (a a a) (a a b) (a b a) (a b b) (b a a) ... )
So basically what you want to do is, given R = (permutations n elements), you can get (permutations (+ n 1) elements) by taking each permutation P in R, and then for each element E in ELEMENTS, adjoin E to P to create a new permutation, and collect a list of them. And we can do this with nested MAPs:
(define (permutations size elements)
(if (zero? size)
'(())
(flatmap (lambda (p) ; For each permutation we already have:
(map (lambda (e) ; For each element in the set:
(cons e p)) ; Add the element to the perm'n.
elements))
(permutations (- size 1) elements))))
I'm using FLATMAP for the outer mapping, because the inner MAP creates lists of new permutations, and we have to append those lists together to create the one big flat list of permutations that we want.
Of course, this is all assuming you know about and have a good handle on sequence operations like MAP. If you don't it'd be real difficult to come up with an elegant solution like I just did here.
Here is another version: I used reduce, not flatmap. I wrote it in MIT-scheme.
(define (per s)
(define (ins c before after)
(if (null? after)
(list (append before (list c)))
(append (list (append before (list c) after))
(ins c
(append before (list (car after)))
(cdr after)))))
(define (iter l)
(cond ((null? l)
'(()))
(else
(let ((rest (iter (cdr l))))
(reduce-left append
()
(map (lambda (x) (ins (car l) () x) )
rest))))))
(iter s))
(per '(1 3 2 4))
Hint: You can use parameters to a recursive call to "remember" what other recursive calls have done. ;)

Resources