Many functional programming languages support and recommend the data constructor Cons (for lists like (1, (2, (3))), such as Haskell and Scala.
But what are its advantages? Such lists can neither be randomly accessed, nor be appended to in O(1).
Cons (a shorthand for "construct") is not a data structure, it's the name of the operation for creating a cons-cell . And by linking together several cells a data structure can be built, in particular - a linked list. The rest of the discussion assumes that a linked list is being created with cons operations.
Although it's possible to append in O(1) at the head, accessing elements randomly by index is a costly operation, that requires the traversal of all the elements before the one that's being accessed.
The advantages of a linked list? it's a functional data structure, cheap to create or recreate in case of modifications; it allows sharing of nodes between several lists and it allows easy garbage collection. It's very flexible, with correct abstractions it's possible to represent other, more complex data structures such as stacks, queues, trees, graphs. And there are many, many procedures written specifically for manipulating lists - for instance map, filter, fold, etc. that make working with lists a joy. Finally, a list is a recursive data structure, and recursion (specially tail recursion) is the preferred way to solve problems in functional programming languages; so in these languages it's natural to have a recursive data structure as the main data structure.
First of all, let's distinguish between "cons" as a nickname for the ML-style list data constructor usually called :: and what the nickname comes from, the original Lisp-style cons function.
In Lisps, cons cells are an all-purpose data structure not restricted to lists of homogeneous element type. The equivalent in ML-style languages would be nested pairs or 2-tuples, with the empty list represented by the "unit" type often written (). Óscar López gives a good overview of the utility of the Lisp cons, so I'll leave it at that.
In most ML-style languages the advantages of immutable cons lists are not too different from their use for lists in Lisps, trading off the flexibility of dynamic typing for the guarantees of static typing and the syntax of ML-style pattern matching.
In Haskell, however, the situation is rather different due to lazy evaluation. Constructors are lazy and pattern matching on them is one of the few ways to force evaluation, so in contrast to strictly-evaluated languages it is often the case that you should avoid tail recursion. Instead, by placing the recursive call in the tail of a list it becomes possible to compute each recursive call only when needed. If a lazily-generated list is processed with appropriately lazy functions like map or foldr, it becomes possible to construct and consume a large list in constant memory, with the tails being forced at the same rate the heads are abandoned for the GC to clean up.
A common perspective in Haskell is that a lazy cons list is not so much a data structure as it is a control structure--a reified loop that composes efficiently with other such loops.
That said, there are many cases where a cons list is not appropriate--such as when repeated random access is needed--and in those situations using lists is certainly not recommended.
Related
Why do we need stacks when we already have so many data structures that can also handle managing data in reverse directions [linked list & vectors]?
What's so useful/distinctive about stacks that other data structures don't have?
Also, why are only arrays and linked lists compatible with stacks, why arent vectors compatible with stacks?
This sounds like a homework question. I'd argue the final question is based on a flawed premise.
First, stacks are an age-old concept. Every computer I've ever programmed at the assembly code level has used them. So partly they exist for historical reasons.
But it's a useful concept. Stacks have three basic operations:
Push
Peek
Pop
They're an easy, first-in, last-out structure. They're GREAT for certain types of languages. There are programming languages built on the concept of stacks. They're exceedingly easy to write. (The language itself.) And they're kind of fun.
For instance:
8 7 mul
That's legal code in PostScript and a variety of other stack-based languages. The interpreter breaks that into tokens and then it sees an 8 and pushes it onto the stack. It sees the 7 and pushes that. It sees the mul and knows multiple takes two operands, so it pops the 7, pops the 8, multiplies them, and pushes 56. When that line is done, the stack has just a 56 on it (plus whatever was on it earlier than this.
Stacks are also a very useful in other forms of parsing. I use them when parsing certain types of input (like XML). When I reach an "end of object" marker, then I pop the top item on the stack, and the next thing I encounter in my input is working on the new top of stack item.
But stacks are a CONCEPT, not an IMPLEMENTATION. Arrays, linked lists, and vectors are all reasonable ways to implement a stack.
As for the last question -- it's a flawed question. What's the difference between an array and a vector? Well, that might be language-specific. In C++, arrays are fixed-length (and I almost never use them). I would use a vector to implement a stack, with the top of the stack being the last item in the vector.
That's a lighter-weight answer than using linked lists, depending upon how big your stack is going to grow. (Every time you grow a vector beyond whatever is allocated, it can be kind of expensive, so if you don't allocate enough space ahead of time, linked lists might be cheaper.)
After doing some reading + research, here's what I think:
Why do we need stacks when we already have so many data structures that can also handle managing data in reverse directions [linked list & vectors]?
we need stacks because of their feature to store data in reverse, and output the data in LIFO, along with their simplicity. Other data structures store data in sequential order, the order in which the data was originally provided, and stacks go in LIFO [this sequence is needed when working with applications]. Also since stacks are much simpler to use compared to linked lists and vectors, they are the ideal candidate, with their top powers of push, peek, and pop.
Think of it like this: "Don't use a bomb to kill flies when you have a net"
What's so useful/distinctive about stacks that other data structures don't have?
Other data structures are not as simple as stacks since stacks have 3 built-in functions: push, peek and pop. And even though we can store data reversed in vectors and linked lists, it is much more complicated. So think of it as a simple tool to get the job done.
Think of it like this: we use a for-loop even though we have a while loop that does the same job, but a for-loop is short simple and built for that specific purpose, whereas a while loop is a bigger and a bit unnecessary for a task that a for-loop can accomplish
Also, why are only arrays and linked lists compatible with stacks, why aren't vectors compatible with stacks?
With stacks, we use arrays and linked lists and NOT VECTORS because stacks were created using vectors. Hint, hint push, pop, are features that vectors have as well... Also, storing a vector within a stack is like storing a vector within a vector that is wrong and the compiler will throw an exception.
A stack is an ADT (abstract data type), consisting of a mathematical model (a sequence of elements) and some operations performed on that model (push, pop, isEmpty, makenull, and top). In many applications, these are the only operations you would want to do on a list: every time you delete an element, it’s the element that was most recently inserted into the list. An important application of stacks is call stacks, which are used to implement procedure calls. When a procedure A calls a procedure B in your program, a new stack frame for procedure B is pushed onto the call stack.
ADT’s are called interfaces in Java, and are the specs, while data structures are about how these specs are implemented. A stack can be implemented using an array data structure, or a linked list data structure, in such a manner that all the five stack ADT operations can be performed in O(1) time.
So, I wouldn’t call a stack a data structure; I would call it an ADT. Examples of data structures include arrays, sorted arrays, linked lists, doubly linked lists, binary search trees, hash tables, max heap, etc. Each data structure is well suited for a subset of operations that they perform efficiently on. Depending on which subset of operations (ADT) arises in your application or algorithm, you can choose a data structure that performs this subset of operations efficiently.
I'm aware that in lazy functional languages, linked lists take on generator-esque semantics, and that under an optimizing compiler, their overhead can be completely removed when they're not actually being used for storage.
But in eager functional languages, their use seems no less heavy, while optimizing them out seems more difficult. Is there a good performance reason that languages like Scheme use them over flat arrays as their primary sequence representation?
I'm aware of the obvious time complexity implications of using singly-linked lists; I'm more thinking about the in-practice performance consequences of using eager singly-linked lists as a primary sequence representation, compiler optimizations considered.
TL; DR: No performance advantage!
The first Lisp had cons (linked list cell) as only data structure and it used it for everything. Both Common Lisp and Scheme have vectors today, but for functional style it's not a good match. The linked list can have one recursive step add zero or more elements in front of an accumulator and in the end you have a list that was made with sharing between the iterations. The operation might do more than one recursion making several versions all sharing the tail. I would say sharing is the most important aspect of the linked list. If you make a minimax algorithm and store the state in a linked list you can change the state without having to copy the unchanged parts of the state.
Creator of C++ Bjarne Stroustrup mentions in a talk that the penalty of having the data structure scrambled and double the size like in a linked list is easily outperformed even if you insert in order and need to move half the elements in the data structure. Keep in mind these were doubly linked lists and mutation inserts, but he mentions that most of the time was following the pointers linearly, getting all the CPU cache misses, to find the correct spot and thus for every O(n) search in a sorted list a vector would be better.
If you have a program where you do many inserts on a list which is not in front then perhaps a tree is a better choice, which you can do in CL and Scheme with cons. In fact all of Chris Okasaki purly functional data structures can be implemented with cons. Most "mutable" structures in Haskell are implemented similar to it.
If you are suffering from performance problems in Scheme and after profiling you find out you should try to replace a linked list operation with an array one there is nothing standing in the way of that. In the end all algorithm choices have pros and cons. Hard computations are hard in any language.
From my limited knowledge of Haskell, it seems that Maps (from Data.Map) are supposed to be used much like a dictionary or hashtable in other languages, and yet are implemented as self-balancing binary search trees.
Why is this? Using a binary tree reduces lookup time to O(log(n)) as opposed to O(1) and requires that the elements be in Ord. Certainly there is a good reason, so what are the advantages of using a binary tree?
Also:
In what applications would a binary tree be much worse than a hashtable? What about the other way around? Are there many cases in which one would be vastly preferable to the other? Is there a traditional hashtable in Haskell?
Hash tables can't be implemented efficiently without mutable state, because they're based on array lookup. The key is hashed and the hash determines the index into an array of buckets. Without mutable state, inserting elements into the hashtable becomes O(n) because the entire array must be copied (alternative non-copying implementations, like DiffArray, introduce a significant performance penalty). Binary-tree implementations can share most of their structure so only a couple pointers need to be copied on inserts.
Haskell certainly can support traditional hash tables, provided that the updates are in a suitable monad. The hashtables package is probably the most widely used implementation.
One advantage of binary trees and other non-mutating structures is that they're persistent: it's possible to keep older copies of data around with no extra book-keeping. This might be useful in some sort of transaction algorithm for example. They're also automatically thread-safe (although updates won't be visible in other threads).
Traditional hashtables rely on memory mutation in their implementation. Mutable memory and referential transparency are at ends, so that relegates hashtable implementations to either the IO or ST monads. Trees can be implemented persistently and efficiently by leaving old leaves in memory and returning new root nodes which point to the updated trees. This lets us have pure Maps.
The quintessential reference is Chris Okasaki's Purely Functional Data Structures.
Why is this? Using a binary tree reduces lookup time to O(log(n)) as opposed to O(1)
Lookup is only one of the operations; insertion/modification may be more important in many cases; there are also memory considerations. The main reason the tree representation was chosen is probably that it is more suited for a pure functional language. As "Real World Haskell" puts it:
Maps give us the same capabilities as hash tables do in other languages. Internally, a map is implemented as a balanced binary tree. Compared to a hash table, this is a much more efficient representation in a language with immutable data. This is the most visible example of how deeply pure functional programming affects how we write code: we choose data structures and algorithms that we can express cleanly and that perform efficiently, but our choices for specific tasks are often different their counterparts in imperative languages.
This:
and requires that the elements be in Ord.
does not seem like a big disadvantage. After all, with a hash map you need keys to be Hashable, which seems to be more restrictive.
In what applications would a binary tree be much worse than a hashtable? What about the other way around? Are there many cases in which one would be vastly preferable to the other? Is there a traditional hashtable in Haskell?
Unfortunately, I cannot provide an extensive comparative analysis, but there is a hash map package, and you can check out its implementation details and performance figures in this blog post and decide for yourself.
My answer to what the advantage of using binary trees is, would be: range queries. They require, semantically, a total preorder, and profit from a balanced search tree organization algorithmically. For simple lookup, I'm afraid there may only be good Haskell-specific answers, but not good answers per se: Lookup (and indeed hashing) requires only a setoid (equality/equivalence on its key type), which supports efficient hashing on pointers (which, for good reasons, are not ordered in Haskell). Like various forms of tries (e.g. ternary tries for elementwise update, others for bulk updates) hashing into arrays (open or closed) is typically considerably more efficient than elementwise searching in binary trees, both space and timewise. Hashing and Tries can be defined generically, though that has to be done by hand -- GHC doesn't derive it (yet?). Data structures such as Data.Map tend to be fine for prototyping and for code outside of hotspots, but where they are hot they easily become a performance bottleneck. Luckily, Haskell programmers need not be concerned about performance, only their managers. (For some reason I presently can't find a way to access the key redeeming feature of search trees amongst the 80+ Data.Map functions: a range query interface. Am I looking the wrong place?)
I've been working on toy a Database in Clojure and wanted to implement a B+ Tree. When I started thinking about it, I realised there may not be a way to have something like a pointer/reference to other nodes in Clojure. It doesn't matter for something like a BST or a lot of other Tree structures since all you need is to store a Node's child. But what do I do in something like a B+ tree where I need to be able to refer to a Node's sibling?
When looking for solutions, I came across a post in Google Groups about how you don't implement a Doubly linked list in Clojure because there are other ways of doing things in Clojure.
What do I do for a B+ Tree though?
It's not that it's difficult to have references to objects in clojure; but generally, these references are immutable. It's immutability which makes the doubly linked list impossible, because unlike a singly-linked list, you can't change any part of it without creating a mutation somewhere.
To see this, suppose I have a singly linked list,
a -> b -> c
and suppose I want to change the head of it. I can do so, with changing the entirety of the list. I create a new list by creating a new value for the head value, and reuse the tail:
a'-> b -> c
But doubly linked lists are impossible. So in clojure, and other functional languages, we sometimes use a zipper in such situations.
Now, suppose you really need mutable references in Clojure -- how do it? Well, depending on what concurrency semantics you need, clojure has vars, refs, atoms, etc.
Also, with deftype, you can create objects that have mutable fields, and these mutable fields can hold references to other things. You can also use raw java arrays in clojure for this same purpose.
Is your database going to be an in-memory database, or a disk-backed database? If on disk, I think that the issue of pointer swizzling is trickier than that of having mutable references.
Getting back to the issue of functional data structures, I believe that it is possible to create B-trees which have purely functional semantics. The first clue here is that it's a tree, and trees are the bread butter and meat of functional data structures. Secondly, note that there are databases which work in an append-only fashion -- couchDB for instance. This has the benefit that the database is its own log, in a sense. To get more of an idea of the costs and benefits of this approach you might want to watch Slava Akhmechet's presentation. His company, RethinkDB, eventually took a sort of hybrid approach, IIRC.
You may wish to look at Chouser's finger trees in Clojure to see how the functionality of a doubly-linked list may be implemented using functional style.
Alternatively, you may simply want to step back and ask yourself why you believe that B+ is a good choice of data structure for a functional language.
If you are unfamiliar with the alternatives, you may want to look at Chris Okazaki's book "Purely Functional Data Structures."
Do linked lists have any practical uses at all. Many computer science books compare them to arrays and say the main advantage is that they are mutable. However, most languages provide mutable versions of arrays. So do linked lists have any actual uses in the real world, or are they just part of computer science theory?
They're absolutely precious (in both the popular doubly-linked version and the less-popular, but simpler and faster when applicable!, single-linked version). For example, inserting (or removing) a new item in a specified "random" spot in a "mutable version of an array" (e.g. a std::vector in C++) is O(N) where N is the number of items in the array, because all that follow (on average half of them) must be shifted over, and that's an O(N) operation; in a list, it's O(1), i.e., constant-time, if you already have e.g. the pointer to the "previous" item. Big-O differences like this are absolutely huge -- the difference between a real-world usable and scalable program, and a toy, "homework"-level one!-)
Linked lists have many uses. For example, implementing data structures that appear to the end user to be mutable arrays.
If you are using a programming language that provides implementations of various collections, many of those collections will be implemented using linked lists. When programming in those languages, you won't often be implementing a linked list yourself but it might be wise to understand them so you can understand what tradeoffs the libraries you use are making. In other words, the set "just part of computer science theory" contains elements that you just need to know if you are going to write programs that just work.
The main Applications of Linked Lists are
For representing Polynomials
It means in addition/subtraction /multipication.. of two polynomials.
Eg:p1=2x^2+3x+7 and p2=3x^3+5x+2
p1+p2=3x^3+2x^2+8x+9
In Dynamic Memory Management
In allocation and releasing memory at runtime.
*In Symbol Tables
in Balancing paranthesis
Representing Sparse Matrix
Ref:-
http://www.cs.ucf.edu/courses/cop3502h.02/linklist3.pdf
So do linked lists have any actual uses in the real world,
A Use/Example of Linked List (Doubly) can be Lift in the Building.
- A person have to go through all the floor to reach top (tail in terms of linked list).
- A person can never go to some random floor directly (have to go through intermediate floors/Nodes).
- A person can never go beyond the top floor (next to the tail node is assigned null).
- A person can never go beyond the ground/last floor (previous to the head node is assigned null in linked list).
Yes of course it's useful for many reasons.
Anytime for example that you want efficient insertion and deletion from the list. To find a place of insertion you have an O(N) search, but to do an insertion if you already have the correct position it is O(1).
Also the concepts you learn from working with linked lists help you learn how to make tree based data structures and many other data structures.
A primary advantage to a linked list as opposed to a vector is that random-insertion time is as simple as decoupling a pair of pointers and recoupling them to the new object (this is of course, slightly more work for a doubly-linked list). A vector, on the other hand generally reorganizes memory space on insertions, causing it to be significantly slower. A list is not as efficient, however, at doing things like adding on the end of the container, due to the necessity to progress all the way through the list.
An Immutable Linked List is the most trivial example of a Persistent Data Structure, which is why it is the standard (and sometimes even only) data structure in many functional languages. Lisp, Scheme, ML, Haskell, Scala, you name it.
Linked Lists are very useful in dynamic memory allocation. These lists are used in operating systems. insertion and deletion in linked lists are very useful. Complex data structures like tree and graphs are implemented using linked lists.
Arrays that grow as needed are always just an illusion, because of the way computer memory works. Under the hood, it's just a continous block of memory that has to be reallocated when enough new elements have been added. Likewise if you remove elements from the array, you'll have to allocate a new block of memory, copy the array and release the previous block to reclaim the unused memory. A linked list allows you to grow and shrink a list of elements without having to reallocate the rest of the list.
Linked lists are useful because elements can be efficiently spliced and removed in the middle as others noted. However a downside to linked lists are poor locality of reference. I prefer not using lists for this reason unless I have an explicit need for the capabilities.