Idiomatic way to recurse through collections in Clojure - data-structures

I'm trying to understand what is the idiomatic way in Clojure to recurse through a tree or a list represented by a Clojure list (or another collection type).
I could write the following to count the elements in a flat collection (ignore the fact that it's not tail-recursive):
(defn length
([xs]
(if (nil? (seq xs))
0
(+ 1 (length (rest xs))))))
Now in Scheme or CL all the examples only ever do this over lists, so the idiomatic base case test in those languages would be (nil? xs). In Clojure we'd like this function to work on all collection types, so is the idiomatic test (nil? (seq xs)), or maybe (empty? xs), or something completely different?
The other case I'd like to consider is tree traversal, i.e. traversing through a list or vector that represents a tree, e.g. [1 2 [3 4].
For example, counting the nodes in a tree:
(defn node-count [tree]
(cond (not (coll? tree)) 1
(nil? (seq tree)) 0
:else (+ (node-count (first tree)) (node-count (rest tree)))))
Here we use (not (coll? tree)) to check for atoms, whereas in Scheme/CL we'd use atom?. We also use (nil? (seq tree)) to check for an empty collection. And finally we use first and rest to destructure the current tree to the left branch and the rest of the tree.
So to summarise, are the following forms idiomatic in Clojure:
(nil? (seq xs)) to test for the empty collection
(first xs) and (rest xs) to dig into the collection
(not (coll? xs)) to check for atoms

The idiomatic test for a non-empty seqable is (seq coll):
(if (seq coll)
...
)
The nil? is unnecessary, since a non-nil return value from seq is guaranteed to be a seq and thus neither nil nor false and therefore truthy.
If you want to deal with the nil case first, you can change the if to if-not or seq to empty?; the latter is implemented as a composition of seq with not (which is why it is not idiomatic to write (not (empty? xs)), cf. the docstring of empty?).
As for first / rest -- it's useful to remember about the strict variant of rest, next, the use of which is more idiomatic than wrapping rest in a seq.
Finally, coll? checks if its argument is a Clojure persistent collection (an instance of clojure.lang.IPersistentCollection). Whether this is an appropriate check for "non-atoms" depends on whether the code needs to handle Java data structures as non-atoms (via interop): e.g. (coll? (java.util.HashSet.)) is false, as is (coll? (into-array [])), but you can call seq on both. There is a function called seqable? in core.incubator in the new modular contrib which promises to determine whether (seq x) would succeed for a given x.

I personally like the following approach to recurse through a collection:
(defn length
"Calculate the length of a collection or sequence"
([coll]
(if-let [[x & xs] (seq coll)]
(+ 1 (length xs))
0)))
Features:
(seq coll) is idiomatic for testing whether a collection is empty (as per Michal's great answer)
if-let with (seq coll) automatically handles both the nil and empty collection case
You can use destructuring to name the first and next elements as you like for use in your function body
Note that in general it is better to write recursive functions using recur if possible, so that you get the benefits of tail recursion and don't risk blowing up the stack. So with this in mind, I'd actually probably write this specific function as follows:
(defn length
"Calculate the length of a collection or sequence"
([coll]
(length coll 0))
([coll accumulator]
(if-let [[x & xs] (seq coll)]
(recur xs (inc accumulator))
accumulator)))
(length (range 1000000))
=> 1000000

Related

Why is the empty list produced here in this iteration?

Let's take the following function to get a pair of numbers:
; (range 1 3) --> '(1 2 3)
(define (range a b)
(if (> a b) nil
(cons a (range (+ 1 a) b))))
; generate pair of two numbers with 1 <= i < j <= N
(define (get-pairs n)
(map (lambda (i)
(map (lambda (j) (list i j))
(range 1 (- i 1))))
(range 1 n)))
(get-pairs 2)
; (() ((2 1)))
(get-pairs 3)
(() ((2 1)) ((3 1) (3 2)))
Why does the above produce '() as the first element of the output? Comparing this with python, I would expect it to just give the three pairs, something like:
>>> for i in range(1,3+1): # +1 because the range is n-1 in python
... for j in range(1,i-1+1):
... print (i,j)
...
(2, 1)
(3, 1)
(3, 2)
I suppose maybe it has to do with when i is 1?
(map (lambda (j) (list 1 j)) '())
; ()
Is that just an identity in Scheme that a map with an empty list is always an empty list?
When i is 1, the inner map is over (range 1 0), which is () by your own definition. Since map takes a procedure and a list (or lists) of values, applies the procedure to each value in the list in turn, and returns a list containing the results, mapping any procedure over a list containing no values will return a list containing no values.
It might help to create a simple definition for map to see how this might work. Note that this definition is not fully featured; it only takes a single list argument:
(define (my-map proc xs)
(if (null? xs)
'()
(cons (proc (car xs))
(my-map proc (cdr xs)))))
Here, when the input list is empty, there are no values to map over, so an empty list is returned. Otherwise the procedure proc is applied to the first value in the input list, and the result is consed onto the result of mapping over the rest of the list.
A couple of observations:
First, the empty list is not represented by nil in either standard Scheme or vanilla Racket, and you should not be using it. In the early days of Scheme nil was allowed as a crutch for programmers coming from other lisps, but this has not been the case for a long time. I don't think that it was ever in any of the RnRS standards, but nil may have survived in some specific implementations until maybe R4RS (1991). SICP was from that era. Today you should use '() to represent empty list literals in Scheme so that your code can run on any Scheme implementation. Racket's #lang sicp allows code directly from the book to be run, but that should not keep you from using the common notation. Note that Common Lisp does use nil as a self-evaluating symbol to represent both the empty list, and boolean false. Seeing this in Scheme just doesn't look right today.
Second, you will probably be led astray more often than to wisdom by thinking in terms of Python when trying to understand Scheme code. In this particular case, map is an iteration construct, but it is not the same thing as a for loop. A for loop is usually used for side-effects, but map is used to transform a list. Scheme has a for-each form which is meant to be used for its side-effects, and in that sense is more like a for loop. The Python version that is posted above is not at all like the Scheme version, though. Instead of returning the results in a list, the results are printed. In the Scheme code, when i is 1, the inner mapping is over (range 1 0) --> (). But, in the Python code, when i is 1, the inner loop is over range(1, 1), so the body of this for loop is not executed and nothing is printed.
Better to think carefully about the Scheme code you want to understand, falling back on basic definitions, than to cobble together a model based on Python that has possibly unconsidered corner cases.

How to use the built-in function filter with lambda in Scheme programming?

"Implement unique, which takes in a list s and returns a new list containing the same elements as s with duplicates removed."
scm> (unique '(1 2 1 3 2 3 1))
(1 2 3)
scm> (unique '(a b c a a b b c))
(a b c)
What I've tried so far is:
(define (unique s)
(cond
((null? s) nil)
(else (cons (car s)(filter ?)
This question required to use the built-in filter function. The general format of filter function is (filter predicate lst), and I was stuck on the predicate part. I am thinking it should be a lambda function. Also, what should I do to solve this question recursively?
(filter predicate list) returns a new list obtained by eliminating all the elements of the list that does not satisfy the predicate. So if you get the first element of the list, to eliminate its duplicates, if they exists, you could simply eliminate from the rest of the list all the elements equal to it, something like:
(filter
(lambda (x) (not (eqv? x (first lst)))) ; what to maintain: all the elements different from (first lst)
(rest lst)) ; the list from which to eleminate it
for instance:
(filter (lambda (x) (not (eqv? x 1))) '(2 1 3 2 1 4))
produces (2 3 2 1 4), eliminating all the occurrences of 1.
Then if you cons the first element with the list resulting from the filter, you are sure that there is only a “copy” of that element in the resulting list.
The last step needed to write your function is to repeat recursively this process. In general, when you have to apply a recursive process, you have to find a terminal case, in which the result of the function can be immediately given (as the empty list for lists), and the general case, in which you express the solution assuming that you have already available the function for a “smaller” input (for instance a list with a lesser number of elements).
Consider this definition:
define (unique s)
(if (null? s)
'()
(cons (first s)
(filter
(lambda (x) (not (eq? x (first s))))
(unique (rest s))))))
(rest s) is a list which has shorter than s. So you can apply unique to it and find a list without duplicates. If, from this list, you remove the duplicates of the first element with filter, and then cons this element at the beginning of the result, you have a list without any duplicate.
And this is a possibile solution to your problem.

Scheme - Return list of N Repetitions of said list

First off, this is a homework question so just looking for guidance and not an answer.
Write a function named (cycle ALIST N) that accepts a list of elements ALIST and an integer N. This function returns a list containing N repetitions of the elements of ALIST. If N is non-positive, this function returns the empty list.
I will be honest in that I'm not sure how to begin solving this problem. I've been thinking of writing a helper function then using cons calling this n times but just looking if I'm on the correct track here.
One of the more common ways to tackle recursive problems is to begin thinking about it at the end. In other words, under what conditions should you stop? —When are you done? If you can write this base case down, then you only need to ask, what do I do when I am one step away from stopping? This is the recursive step, and for relatively simple recursive problems you are done as the whole problem is either "continue" to do the same thing or "stop."
Knowing the base case usually tells you what kind of extra information you may need to carry around, if any.
In the case of scheme and racket, which support tail call optimization, you may end up with different kinds of recursion. For example:
(define (normal-factorial n)
(if (zero? n)
1
(* n (normal-factorial (- n 1)))))
(define (tail-factorial n)
(letrec ((tf (lambda (product index)
(if (zero? index)
product
(tf (* product index) (- index 1))))))
(tf n (- n 1))))
In the first case, we build up a product without ever multiplying until the very end, while in the second we multiply as soon as possible and carry around this temporary product the whole time.
Not all problems easily lend themselves to one kind of recursion or the other.
You have different strategies you can make. The simplest is probably not the most effiecent but the one that produces less code:
(require srfi/26) ; cut
(define (cycle lst n)
(define dup-lst (map (cut make-list n <>) lst))
(foldr append '() dup-lst))
So what this does is that the map creates a list of lists where each is n elements of each. The foldr flattens it by using append.
With more hands on you can make it more efficient. I'm thinking roll your own recursion consing the elements from end to beginning in an accumulator:
(define (cycle lst n)
(let helper ((lst (reverse lst)) (c n) (acc '()))
(cond ((null? lst) acc)
((<= c 0) (helper ...))
(else (helper ...)))))
I've left out the recursive parts. What this does is a base case on the empty list, a reset recur with a c reset to n and the cdr when c is zero and the default case keeping lst while reducing c and cons-ing the first element of lst to acc. This is a O(n) solution.

Frequency List in Scheme [duplicate]

This is extremely easy if I can use an array in imperative language or map (tree-structure) in C++ for example. In scheme, I have no idea how to start this idea? Can anyone help me on this?
Thanks,
Your question wasn't very specific about what's being counted. I will presume you want to create some sort of frequency table of the elements. There are several ways to go about this. (If you're using Racket, scroll down to the bottom for my preferred solution.)
Portable, pure-functional, but verbose and slow
This approach uses an association list (alist) to hold the elements and their counts. For each item in the incoming list, it looks up the item in the alist, and increments the value of it exists, or initialises it to 1 if it doesn't.
(define (bagify lst)
(define (exclude alist key)
(fold (lambda (ass result)
(if (equal? (car ass) key)
result
(cons ass result)))
'() alist))
(fold (lambda (key bag)
(cond ((assoc key bag)
=> (lambda (old)
(let ((new (cons key (+ (cdr old) 1))))
(cons new (exclude bag key)))))
(else (let ((new (cons key 1)))
(cons new bag)))))
'() lst))
The incrementing is the interesting part. In order to be pure-functional, we can't actually change any element of the alist, but instead have to exclude the association being changed, then add that association (with the new value) to the result. For example, if you had the following alist:
((foo . 1) (bar . 2) (baz . 2))
and wanted to add 1 to baz's value, you create a new alist that excludes baz:
((foo . 1) (bar . 2))
then add baz's new value back on:
((baz . 3) (foo . 1) (bar . 2))
The second step is what the exclude function does, and is probably the most complicated part of the function.
Portable, succinct, fast, but non-functional
A much more straightforward way is to use a hash table (from SRFI 69), then update it piecemeal for each element of the list. Since we're updating the hash table directly, it's not pure-functional.
(define (bagify lst)
(let ((ht (make-hash-table)))
(define (process key)
(hash-table-update/default! ht key (lambda (x) (+ x 1)) 0))
(for-each process lst)
(hash-table->alist ht)))
Pure-functional, succinct, fast, but non-portable
This approach uses Racket-specific hash tables (which are different from SRFI 69's ones), which do support a pure-functional workflow. As another benefit, this version is also the most succinct of the three.
(define (bagify lst)
(foldl (lambda (key ht)
(hash-update ht key add1 0))
#hash() lst))
You can even use a for comprehension for this:
(define (bagify lst)
(for/fold ((ht #hash()))
((key (in-list lst)))
(hash-update ht key add1 0)))
This is more a sign of the shortcomings of the portable SRFI 69 hashing library, than any particular failing of Scheme for doing pure-functional tasks. With the right library, this task can be implemented easily and functionally.
In Racket, you could do
(count even? '(1 2 3 4))
But more seriously, doing this with lists in Scheme is much easier that what you mention. A list is either empty, or a pair holding the first item and the rest. Follow that definition in code and you'll get it to "write itself out".
Here's a hint for a start, based on HtDP (which is a good book to go through to learn about these things). Start with just the function "header" -- it should receive a predicate and a list:
(define (count what list)
...)
Add the types for the inputs -- what is some value, and list is a list of stuff:
;; count : Any List -> Int
(define (count what list)
...)
Now, given the type of list, and the definition of list as either an empty list or a pair of two things, we need to check which kind of list it is:
;; count : Any List -> Int
(define (count what list)
(cond [(null? list) ...]
[else ...]))
The first case should be obvious: how many what items are in the empty list?
For the second case, you know that it's a non-empty list, therefore you have two pieces of information: its head (which you get using first or car) and its tail (which you get with rest or cdr):
;; count : Any List -> Int
(define (count what list)
(cond [(null? list) ...]
[else ... (first list) ...
... (rest list) ...]))
All you need now is to figure out how to combine these two pieces of information to get the code. One last bit of information that makes it very straightforward is: since the tail of a (non-empty) list is itself a list, then you can use count to count stuff in it. Therefore, you can further conclude that you should use (count what (rest list)) in there.
In functional programming languages like Scheme you have to think a bit differently and exploit the way lists are being constructed. Instead of iterating over a list by incrementing an index, you go through the list recursively. You can remove the head of the list with car (single element), you can get the tail with cdr (a list itself) and you can glue together a head and its tail with cons. The outline of your function would be like this:
You have to "hand-down" the element you're searching for and the current count to each call of the function
If you hit the empty list, you're done with the list an you can output the result
If the car of the list equals the element you're looking for, call the function recursively with the cdr of the list and the counter + 1
If not, call the function recursively with the cdr of the list and the same counter value as before
In Scheme you generally use association lists as an O(n) poor-man's hashtable/dictionary. The only remaining issue for you would be how to update the associated element.

Count occurrence of element in a list in Scheme?

This is extremely easy if I can use an array in imperative language or map (tree-structure) in C++ for example. In scheme, I have no idea how to start this idea? Can anyone help me on this?
Thanks,
Your question wasn't very specific about what's being counted. I will presume you want to create some sort of frequency table of the elements. There are several ways to go about this. (If you're using Racket, scroll down to the bottom for my preferred solution.)
Portable, pure-functional, but verbose and slow
This approach uses an association list (alist) to hold the elements and their counts. For each item in the incoming list, it looks up the item in the alist, and increments the value of it exists, or initialises it to 1 if it doesn't.
(define (bagify lst)
(define (exclude alist key)
(fold (lambda (ass result)
(if (equal? (car ass) key)
result
(cons ass result)))
'() alist))
(fold (lambda (key bag)
(cond ((assoc key bag)
=> (lambda (old)
(let ((new (cons key (+ (cdr old) 1))))
(cons new (exclude bag key)))))
(else (let ((new (cons key 1)))
(cons new bag)))))
'() lst))
The incrementing is the interesting part. In order to be pure-functional, we can't actually change any element of the alist, but instead have to exclude the association being changed, then add that association (with the new value) to the result. For example, if you had the following alist:
((foo . 1) (bar . 2) (baz . 2))
and wanted to add 1 to baz's value, you create a new alist that excludes baz:
((foo . 1) (bar . 2))
then add baz's new value back on:
((baz . 3) (foo . 1) (bar . 2))
The second step is what the exclude function does, and is probably the most complicated part of the function.
Portable, succinct, fast, but non-functional
A much more straightforward way is to use a hash table (from SRFI 69), then update it piecemeal for each element of the list. Since we're updating the hash table directly, it's not pure-functional.
(define (bagify lst)
(let ((ht (make-hash-table)))
(define (process key)
(hash-table-update/default! ht key (lambda (x) (+ x 1)) 0))
(for-each process lst)
(hash-table->alist ht)))
Pure-functional, succinct, fast, but non-portable
This approach uses Racket-specific hash tables (which are different from SRFI 69's ones), which do support a pure-functional workflow. As another benefit, this version is also the most succinct of the three.
(define (bagify lst)
(foldl (lambda (key ht)
(hash-update ht key add1 0))
#hash() lst))
You can even use a for comprehension for this:
(define (bagify lst)
(for/fold ((ht #hash()))
((key (in-list lst)))
(hash-update ht key add1 0)))
This is more a sign of the shortcomings of the portable SRFI 69 hashing library, than any particular failing of Scheme for doing pure-functional tasks. With the right library, this task can be implemented easily and functionally.
In Racket, you could do
(count even? '(1 2 3 4))
But more seriously, doing this with lists in Scheme is much easier that what you mention. A list is either empty, or a pair holding the first item and the rest. Follow that definition in code and you'll get it to "write itself out".
Here's a hint for a start, based on HtDP (which is a good book to go through to learn about these things). Start with just the function "header" -- it should receive a predicate and a list:
(define (count what list)
...)
Add the types for the inputs -- what is some value, and list is a list of stuff:
;; count : Any List -> Int
(define (count what list)
...)
Now, given the type of list, and the definition of list as either an empty list or a pair of two things, we need to check which kind of list it is:
;; count : Any List -> Int
(define (count what list)
(cond [(null? list) ...]
[else ...]))
The first case should be obvious: how many what items are in the empty list?
For the second case, you know that it's a non-empty list, therefore you have two pieces of information: its head (which you get using first or car) and its tail (which you get with rest or cdr):
;; count : Any List -> Int
(define (count what list)
(cond [(null? list) ...]
[else ... (first list) ...
... (rest list) ...]))
All you need now is to figure out how to combine these two pieces of information to get the code. One last bit of information that makes it very straightforward is: since the tail of a (non-empty) list is itself a list, then you can use count to count stuff in it. Therefore, you can further conclude that you should use (count what (rest list)) in there.
In functional programming languages like Scheme you have to think a bit differently and exploit the way lists are being constructed. Instead of iterating over a list by incrementing an index, you go through the list recursively. You can remove the head of the list with car (single element), you can get the tail with cdr (a list itself) and you can glue together a head and its tail with cons. The outline of your function would be like this:
You have to "hand-down" the element you're searching for and the current count to each call of the function
If you hit the empty list, you're done with the list an you can output the result
If the car of the list equals the element you're looking for, call the function recursively with the cdr of the list and the counter + 1
If not, call the function recursively with the cdr of the list and the same counter value as before
In Scheme you generally use association lists as an O(n) poor-man's hashtable/dictionary. The only remaining issue for you would be how to update the associated element.

Resources