Immutable queue in Clojure - algorithm

What is the best way to obtain a simple, efficient immutable queue data type in Clojure?
It only needs two operations, enqueue and dequeue with the usual semantics.
I considered lists and vectors of course, but I understand that they have comparatively poor performance (i.e. O(n) or worse) for modifications at the end and beginning respectively - so not ideal for queues!
Ideally I'd like a proper persistent data structure with O(log n) for both enqueue and dequeue operations.

Problem solved - solution for others who may find it helpful.
I've found that Clojure has the clojure.lang.PersistentQueue class that does what is needed.
You can create an instance like this:
(def x (atom clojure.lang.PersistentQueue/EMPTY))
As far as I can see, you currently need to use the Java interop to create the instance but as Michal helpfully pointed out you can use peek, pop and conj subsequently.

I use the following function queue to create a PersistentQueue. Optionally, you might want to have a print-method and a data-reader if you're going to be printing and reading the queues.
The usual Clojure functions are already implemented for PersistentQueue.
peek -- get the head
pop -- returns a new PersistentQueue without the head
conj -- add item to the tail
empty? -- true if empty
seq -- contents as a sequence (list)
(defn queue
([] clojure.lang.PersistentQueue/EMPTY)
([coll] (reduce conj clojure.lang.PersistentQueue/EMPTY coll)))
(defmethod print-method clojure.lang.PersistentQueue
[q ^java.io.Writer w]
(.write w "#queue ")
(print-method (sequence q) w))
(comment
(let [*data-readers* {'queue #'queue}]
(read-string (pr-str (queue [1 2 3])))))

Clojure could really benefit from a queue literal. This would be cleaner (and more portable) than relying on Java interop.
However, it's not that hard to roll your own portable persistent queue, just using ordinary household items like lists.
Consider the queue as two lists, one providing the head portion of the queue, and the other the tail. enqueue adds to the first list, dequeue pops from the latter. Most ISeq functions are trivially implemented.
Probably the only tricky part is what happens when the tail is empty and you want to dequeue. In that case the head list is reversed and becomes the new tail, and the empty list becomes the new head list. I believe, even with the overhead of the reverse, that enqueue and dequeue remain O(1), though the k is going to be higher than a vanilla vector of course.
Happy queueing!

Related

When and why to use hash tables in CL instead of a-lists?

I believe common lisp is the only language I have worked with that have a variety of extremely useful data structures.
The a-list being the most important one to me. I use it all the time.
When and why do you (or should you) use hash tables?
My reluctance to using them is that, unlike the other data structures, hashtables in CL are not visible lists. Which honestly, I find weird considering almost everything is a list.
Maybe I am missing something in my inexperience?
The hash table is very useful when you have to access a large set of values through a key, since the complexity of this operation with a hash table is O(1), while the complexity of the operation using an a-list is O(n), where n is the length of the list.
So, I use it when I need to access multiple times a set of values which has more then few elements.
There are lot of assumptions to address in your question:
I believe common lisp is the only language I have worked with that have a variety of extremely useful data structures.
I don't think this is particularly true, the standard libraries of popular languages are filled with lot of data structures too (C++, Java, Rust, Python)
When and why do you (or should you) use hash tables?
Data-structures come with costs in terms of memory and processor usage: a list must be searched linearly to find an element, whereas an hash-table has a constant lookup cost: for small lists however the linear search might be faster than the constant lookup. Moreover, there are other criteria like: do I want to access the data concurrently? a List can be manipulated in a purely functional way, making data-sharing across threads easier than with a hash-table (but hash-table can be associated with a mutex, etc.)
My reluctance to using them is that, unlike the other data structures, hashtables in CL are not visible lists. Which honestly, I find weird considering almost everything is a list.
The source code of Lisp programs is made mostly of Lists and symbols, even if there is no such restriction. But at runtime, CL has a lot of different types that are not at all related to lists: bignums, floating points, rational numbers, complex numbers, vectors, arrays, packages, symbols, strings, classes and structures, hash-tables, readtables, functions, etc. You can model a lot of data at runtime by putting them in lists, which is something that works well for a lot of cases, but they are by far not the only types available.
Just to emphasize a little bit, when you write:
(vector 0 1 2)
This might look like a list in your code, but at runtime the value really is a different kind of object, a vector. Do not be confused by how things are expressed in code and how they are represented during code execution.
If you don't use it already, I suggest installing and using the Alexandria Lisp libray (see https://alexandria.common-lisp.dev/). There are useful functions to convert from and to hash-tables from alists or plists.
More generally, I think it is important to architecture your libraries and programs in a way that hide implementation details: you define a function make-person and accessors person-age, person-name, etc. as well as other user-facing functions. And the actual implementation can use hash tables, lists, etc. but this is not really a concern that should be exposed, because exposing that is a risk: you won't be able to easily change your mind later if you find out that the performance is bad or if you want to add a cache, use a database, etc.
I find however that CL is good at making nice interfaces that do not come with too much accidental complexity.
My reluctance to using them is that, unlike the other data structures, hashtables in CL are not visible lists.
They are definitely not lists, but indeed they are not visible either:
#<HASH-TABLE :TEST EQL :COUNT 1 {100F4BA883}>
this doesn't show what's inside the hash-table. During development it will require more steps to inspect what's inside (inspect, describe, alexandria:hash-table-alist, defining a non-portable print-object method…).
serapeum:dict
I like very much serapeum:dict, coupled with (serapeum:toggle-pretty-print-hash-table) (also the Cookbook).
CL-USER> (serapeum:dict :a 1 :b 2 :c 3)
;; => #<HASH-TABLE :TEST EQUAL :COUNT 3 {100F6012D3}>
CL-USER> (serapeum:toggle-pretty-print-hash-table)
;; print the above HT again:
CL-USER> **
(SERAPEUM:DICT
:A 1
:B 2
:C 3
)
Not only is it printed readably, but it allows to create the hash-table with initial elements at the same time (unlike make-hash-table) and you can read it back in. It's even easy to save such a structure on file.
Serapeum is a solid library.
Now, use hash-tables more easily.
When to use a hash-table: You need to do fast (approximately "constant time") look-ups of data.
When to use an a-list: You have a need to dynamically shadow data you pass on to functions.
If neither of these obviously apply, you have to make a choice. And then, possibly, benchmark your choice. And then evaluate if rewriting it using the other choice would be better. In some experimentation that someone else did, well over a decade ago, the trade-off between an a-list and a hash-map in most Common Lisp implementation is somewhere in the region of 5 to 20 keys.
However, if you have a need to "shadow" bindings, for functions you call, an a-list does provide that "for free", and a hash-map does not. So if that is something that your code does a lot of, an a-list MAY be the better choice.
* (defun lookup (alist key) (assoc key alist))
LOOKUP
* (lookup '((key1 . value1) (key2 . value2)) 'key1)
(KEY1 . VALUE1)
* (lookup '((key1 . value1) (key2 . value2)) 'key2)
(KEY2 . VALUE2)
* (lookup '((key2 . value3) (key1 . value1) (key2 . value2)) 'key2)
(KEY2 . VALUE3)

Why list ++ requires to scan all elements of list on its left?

The Haskell tutorial says, be cautious that when we use "Hello"++" World", the new list construction has to visit all single elements(here, every character of "Hello"), so if the list on the left of "++" is long, then using "++" will bring down performance.
I think I was not understanding correctly, does Haskell's developers never tune the performance of list operations? Why this operation remains slow, to have some kind of syntax consistencies in any lambda function or currying?
Any hints? Thanks.
In some languages, a "list" is a general-purpose sequence type intended to offer good performance for concatenation, splitting, etc. In Haskell, and most traditional functional languages, a list is a very specific data structure, namely a singly-linked list. If you want a general-purpose sequence type, you should use Data.Sequence from the containers package (which is already installed on your system and offers very good big-O asymptotics for a wide variety of operations), or perhaps some other one more heavily optimized for common usage patterns.
If you have immutable list which has a head and a reference to the tail, you cannot change its tail. If you want to add something to the 'end' of the list, you have to reach the end and then put all items one by one to the head of your right list. It is the fundamential property of immutable lists: concatenation is expensive.
Haskell lists are like singly-linked lists: they are either empty or they consist of a head and a (possibly empty) tail. Hence, when appending something to a list, you'll first have to walk the entire list to get to the end. So you end up traversing the entire list (the list to which you append, that is), which needs O(n) runtime.

Scheme Infix to Postfix

Let me establish that this is part of a class assignment, so I'm definitely not looking for a complete code answer. Essentially we need to write a converter in Scheme that takes a list representing a mathematical equation in infix format and then output a list with the equation in postfix format.
We've been provided with the algorithm to do so, simple enough. The issue is that there is a restriction against using any of the available imperative language features. I can't figure out how to do this in a purely functional manner. This is our fist introduction to functional programming in my program.
I know I'm going to be using recursion to iterate over the list of items in the infix expression like such.
(define (itp ifExpr)
(
; do some processing using cond statement
(itp (cdr ifExpr))
))
I have all of the processing implemented (at least as best I can without knowing how to do the rest) but the algorithm I'm using to implement this requires that operators be pushed onto a stack and used later. My question is how do I implement a stack in this function that is available to all of the recursive calls as well?
(Updated in response to the OP's comment; see the new section below the original answer.)
Use a list for the stack and make it one of the loop variables. E.g.
(let loop ((stack (list))
... ; other loop variables here,
; like e.g. what remains of the infix expression
)
... ; loop body
)
Then whenever you want to change what's on the stack at the next iteration, well, basically just do so.
(loop (cons 'foo stack) ...)
Also note that if you need to make a bunch of "updates" in sequence, you can often model that with a let* form. This doesn't really work with vectors in Scheme (though it does work with Clojure's persistent vectors, if you care to look into them), but it does with scalar values and lists, as well as SRFI 40/41 streams.
In response to your comment about loops being ruled out as an "imperative" feature:
(let loop ((foo foo-val)
(bar bar-val))
(do-stuff))
is syntactic sugar for
(letrec ((loop (lambda (foo bar) (do-stuff))))
(loop foo-val bar-val))
letrec then expands to a form of let which is likely to use something equivalent to a set! or local define internally, but is considered perfectly functional. You are free to use some other symbol in place of loop, by the way. Also, this kind of let is called 'named let' (or sometimes 'tagged').
You will likely remember that the basic form of let:
(let ((foo foo-val)
(bar bar-val))
(do-stuff))
is also syntactic sugar over a clever use of lambda:
((lambda (foo bar) (do-stuff)) foo-val bar-val)
so it all boils down to procedure application, as is usual in Scheme.
Named let makes self-recursion prettier, that's all; and as I'm sure you already know, (self-) recursion with tail calls is the way to go when modelling iterative computational processes in a functional way.
Clearly this particular "loopy" construct lends itself pretty well to imperative programming too -- just use set! or data structure mutators in the loop's body if that's what you want to do -- but if you stay away from destructive function calls, there's nothing inherently imperative about looping through recursion or the tagged let itself at all. In fact, looping through recursion is one of the most basic techniques in functional programming and the whole point of this kind of homework would have to be teaching precisely that... :-)
If you really feel uncertain about whether it's ok to use it (or whether it will be clear enough that you understand the pattern involved if you just use a named let), then you could just desugar it as explained above (possibly using a local define rather than letrec).
I'm not sure I understand this all correctly, but what's wrong with this simpler solution:
First:
You test if your argument is indeed a list:
If yes: Append the the MAP of the function over the tail (map postfixer (cdr lst)) to the a list containing only the head. The Map just applies the postfixer again to each sequential element of the tail.
If not, just return the argument unchanged.
Three lines of Scheme in my implementation, translates:
(postfixer '(= 7 (/ (+ 10 4) 2)))
To:
(7 ((10 4 +) 2 /) =)
The recursion via map needs no looping, not even tail looping, no mutation and shows the functional style by applying map. Unless I'm totally misunderstanding your point here, I don't see the need for all that complexity above.
Edit: Oh, now I read, infix, not prefix, to postfix. Well, the same general idea applies except taking the second element and not the first.

Append! in Scheme?

I'm learning R5RS Scheme at the moment (from PocketScheme) and I find that I could use a function that is built into some variants of Scheme but not all: Append!
In other words - destructively changing a list.
I am not so much interested in the actual code as an answer as much as understanding the process by which one could pass a list as a function (or a vector or string) and then mutate it.
example:
(define (append! lst var)
(cons (lst var))
)
When I use the approach as above, I have to do something like (define list (append! foo (bar)) which I would like something more generic.
Mutation, though allowed, is strongly discouraged in Scheme. PLT even went so far as to remove set-car! and set-cdr! (though they "replaced" them with set-mcar! and set-mcdr!). However, a spec for append! appeared in SRFI-1. This append! is a little different than yours. In the SRFI, the implementation may, but is not required to modify the cons cells to append the lists.
If you want to have an append! that is guaranteed to change the structure of the list that's being appended to, you'll probably have to write it yourself. It's not hard:
(define (my-append! a b)
(if (null? (cdr a))
(set-cdr! a b)
(my-append! (cdr a) b)))
To keep the definition simple, there is no error checking here, but it's clear that you will need to pass in a list of length at least 1 as a, and (preferably) a list (of any length) as b. The reason a must be at least length 1 is because you can't set-cdr! on an empty list.
Since you're interested in how this works, I'll see if I can explain. Basically, what we want to do is go down the list a until we get to the last cons pair, which is (<last element> . null). So we first see if a is already the last element in the list by checking for null in the cdr. If it is, we use set-cdr! to set it to the list we're appending, and we're done. If not, we have to call my-append! on the cdr of a. Each time we do this we get closer to the end of a. Since this is a mutation operation, we're not going to return anything, so we don't need to worry about forming our modified list as the return value.
Better late than never for putting in a couple 2-3 cents on this topic...
(1) There's nothing wrong with using the destructive procedures in Scheme while there is a single reference to the stucture being modified. So for example, building a large list efficiently, piecemeal via a single reference - and when complete, making that (now presumably not-to-be-modified) list known and referred to from various referents.
(2) I think APPEND! should behave like APPEND, only (potentially) destructively. And so APPEND! should expect any number of lists as arguments. Each list but the last would presumably be SET-CDR!'d to the next.
(3) The above definition of APPEND! is essentially NCONC from Mac Lisp and Common Lisp. (And other lisps).

What does it mean to 'hash cons'?

When to use it and why?
My question comes from the sentence: "hash cons with some classes and compare their instances with reference equality"
From Odersky, Spoon and Venners (2007), Programming in Scala, Artima Press, p. 243:
You hash cons instances of a class by caching all instances you have created in a weak collection. Then, any time you want a new instance of the class, you first check the cache. If the cache already has an element equal to the one you are about to create, you can reuse the existing instance. As a result of this arrangement, any two instances that are equal with equals() are also equal with reference equality.
Putting everyone's answers together:
ACL2 (A Computational Logic for Applicative Common Lisp) is a software system consisting of a programming language, an extensible theory in a first-order logic, and a mechanical theorem prover.
-- Wiki ACL2
In computer programming, cons (pronounced /ˈkɒnz/ or /ˈkɒns/) is a fundamental function in most dialects of the Lisp programming language. cons constructs (hence the name) memory objects which hold two values or pointers to values. These objects are referred to as (cons) cells, conses, or (cons) pairs. In Lisp jargon, the expression "to cons x onto y" means to construct a new object with (cons x y). The resulting pair has a left half, referred to as the car (the first element), and a right half (the second element), referred to as the cdr.
-- Wiki Cons
Logically, hons is merely another name for cons, i.e., the following is an ACL2 theorem:
(equal (hons x y) (cons x y))
Hons generally runs slower than cons because in creating a hons, an attempt is made to see whether a hons already exists with the same car and cdr. This involves search and the use of hash-tables.
-- http://www.cs.utexas.edu/~moore/acl2/current/HONS.html
Given your question:
hash cons with some classes and compare their instances with reference equality
It appears that hash cons is the process of hashing a LISP constructor to determine if an object already exists via equality comparison.
http://en.wikipedia.org/wiki/Hash_cons now redirects.
It is cons with hashing to allow eq (reference) comparison instead of a deep one. This is more efficient for memory (because identical objects are stored as references), and is of course faster if comparison is a common operation.
http://www.cs.utexas.edu/~moore/acl2/current/HONS.html describes an implementation for Lisp.

Resources