Why is appending to a list bad? - performance

I've recently started learning scala, and I've come across the :: (cons) function, which prepends to a list.
In the book "Programming in Scala" it states that there is no append function because appending to a list has performance o(n) whereas prepending has a performance of o(1)
Something just strikes me as wrong about that statement.
Isn't performance dependent on implementation? Isn't it possible to simply implement the list with both forward and backward links and store the first and last element in the container?
The second question I suppose is what I'm supposed to do when I have a list, say 1,2,3 and I want to add 4 to the end of it?

The key is that x :: somelist does not mutate somelist, but instead creates a new list, which contains x followed by all elements of somelist. This can be done in O(1) time because you only need to set somelist as the successor of x in the newly created, singly linked list.
If doubly linked lists were used instead, x would also have to be set as the predecessor of somelist's head, which would modify somelist. So if we want to be able to do :: in O(1) without modifying the original list, we can only use singly linked lists.
Regarding the second question: You can use ::: to concatenate a single-element list to the end of your list. This is an O(n) operation.
List(1,2,3) ::: List(4)

Other answers have given good explanations for this phenomenon. If you are appending many items to a list in a subroutine, or if you are creating a list by appending elements, a functional idiom is to build up the list in reverse order, cons'ing the items on the front of the list, then reverse it at the end. This gives you O(n) performance instead of O(n²).

Since the question was just updated, it's worth noting that things have changed here.
In today's Scala, you can simply use xs :+ x to append an item at the end of any sequential collection. (There is also x +: xs to prepend. The mnemonic for many of Scala's 2.8+ collection operations is that the colon goes next to the collection.)
This will be O(n) with the default linked implementation of List or Seq, but if you use Vector or IndexedSeq, this will be effectively constant time. Scala's Vector is probably Scala's most useful list-like collection—unlike Java's Vector which is mostly useless these days.
If you are working in Scala 2.8 or higher, the collections introduction is an absolute must read.

Prepending is faster because it only requires two operations:
Create the new list node
Have that new node point to the existing list
Appending requires more operations because you have to traverse to the end of the list since you only have a pointer to the head.
I've never programmed in Scala before, but you could try a List Buffer

Most functional languages prominently figure a singly-linked-list data structure, as it's a handy immutable collection type. When you say "list" in a functional language, that's typically what you mean (a singly-linked list, usually immutable). For such a type, append is O(n) whereas cons is O(1).

Related

Is it faster/traditional to use append vs cons and reverse in Scheme?

When building a list recursively in Scheme I see two types of examples scattered about the internet. One in which a new value is appended with append every iteration. The other in which a new value is prepended every iteration with cons then after the list is complete reverse is called once.
My gut instinct is that the latter is faster, but if the Scheme interpreter cached an end of list pointer or is doing some other optimization then the former would be just as fast and more readable. If the interpreter is doing this optimization, is it guaranteed to be available in all interpreters?
Using cons is always preferred. Using append is terribly inefficient, as it will always traverse the list to the end just to add a new element there, whereas cons adds the element at the beginning. There is no such thing as a pointer to the end of the list, so the optimization you suggested isn't performed at all.
When building large lists element by element this matters a lot, as cons is an O(1) operation, whereas append is O(n) for each new element added, degrading to O(n^2) complexity! (for a great analogy, see: Schlemiel the Painter's algorithm). In the end, is much cheaper to simply add elements at the beginning and if necessary reverse the list at the end, achieving O(n) complexity.

Merge sort on a double linked list

Reading up on the good ways to sort a linked list (besides assigning an array and quick-sorting), it looks like mergesort is one of the better methods.
See: Merge Sort a Linked List
The current questions on this topic are non-specific as to weather the list is single or double linked.
My question is:
Is there improved methods of merge-sorting that take advantage of double linked lists?
(or is it just as good to use the same method as a single linked list and assign the previous link just to ensure the list remains valid)
To the best of my knowledge, there are no special time-saving tricks that arise when doing mergesort on a doubly-linked list that don't also work on a singly-linked list.
Mergesort was originally developed for data stored on reels of magnetic tape that could only be read in a forwards direction, and most standard implementations of mergesort only do forwards passes over the elements. The two major operations required by mergesort are splitting the input in half (hard in both singly- and doubly-linked lists) and merging elements back together (basically the same in both cases). As a result, without doing something substantially more clever than a standard mergesort, I would not expect there to be any major speedups.

Most efficient way to implement stack and queue together?

What would be the most appropriate way to implement a stack and a queue together efficiently, in a single data structure. The number of elements is infinite. The retrieval and insertion should both happen in constant time.
A doubly linked list, has all the computational complexity attributes you desire, but poor cache locality.
A ring buffer (array) that allows for appending and removing at head and tail has the same complexity characteristics. It uses a dynamic array and requires reallocation, once the number of elements grows beyond it's capacity.
But, similar to an array list / vector generally being faster in practice for sequential access versus a linked list. In most cases it will be faster and more memory efficient than using a doubly linked list implementation.
It is one of the possible implementations for the dequeue abstract data structure, see e.g. the ArrayDeque<E> implementation in Java.
A doubly linked list can solve this problem with all operations taking constant time:
It allows push() or enqueue() by appending the element to the
list in constant time.
It allows pop() by removing the last element in constant time
It allows dequeue() by removing the first element, also in constant time.
A two-way linked list is going to be best for this. Each node in the list has two references: one to the item before it and one to the item after it. The main list object maintains a reference to the item at the front of the list and one at the back of the list.
Any time it inserts an item, the list:
creates a new node, giving it a reference to the previous first or last node in the list (depending on whether you're adding to the front or back).
connects the previous first or last node to point at the newly-created node.
updates its own reference to the first or last node, to point at the new node.
Removing an item from the front or back of the list effectively reverses this process.
Inserting to the front or back of the structure will always be an O(1) operation.

Deletion in Hash Table

I was reading the book Introduction To Algorithms and I came across this:
We can delete an element in O(1) time if the lists are doubly linked. (Note that CHAINED-HASH-DELETE takes as input an element x
and not its key k, so that we don’t have to search for x first. If the
hash table supports deletion, then its linked lists should be doubly
linked so that we can delete an item quickly. If the lists were only
singly linked, then to delete element x, we would first have to find
x in the list T[h(x.key)] so that we could update the next
attribute of x’s predecessor. With singly linked lists, both deletion
and searching would have the same asymptotic running times.)
How can we delete an element in O(1) time if the lists are double linked? First we will need to find the element and then we can delete it in O(1). But to find it we need O(length of the list) time. Maybe it's faster deleting in a doubly linked list (because we can search from the both ends of the list at the same time, but that is only constant improvement), but I don't see how it can be done in O(1) time.
Thank you in advance.
The answer is in the text;
Note that CHAINED-HASH-DELETE takes as input an element x and not its key k, so that we don’t have to search for x first.
You already have the item so you only have to remove it from the chain and do a delete.
To remove item X you need to get the previous and next node in the list and link them together before you delete X so the list remains unbroken. In a doubly linked list you already have a link to previous and next so this is constant. In a single linked list you would only have a link to next and so you need to scan through the list to find the previous node.
I think the confusion here is because of the implicit assumption in CLRS. In this book, objects are often treated as property bags where required properties can be added at runtime - much like JavaScript but unlike Java/C# world. So if you want to put x in linked list, you don't necessarily need to create a Node object first and then add properties for Previous, Next and Value. Instead, you just add those properties to x itself. Many of us who have grown up with statically typed languages would be shocked at this design but for algorithm design with pseudo code, it removes unnecessary clutter. I think authors should have clarified this. In any case, without ability to add Previous, Next properties to object dynamically, yes, it would not be O(1) even with doubly linked lists.

implement linked list using array - advantages & disadvantages

I know how to implement linked list using array. For example
we define a struct as follow:
struct Node{
int data;
int link;
}
"data" stores the info and "link" stores the index in the array of next node.
Can anybody tell me what is the advantage and disadvantage of implementing a linked list using array compared to "ordinary" linked list? Any suggestion will be appreciated.
If you back a linked list with an array, you'll end up with the disadvantages of both. Consequently, this is probably not a very good way to implement it.
Some immediate disadvantages:
You'll have dead space in the array (entries which aren't currently used for items) taking up memory
You'll have to keep track of the free entries - after a few insertions and deletions, these free entries could be anywhere.
Using an array will impose an upper limit on the size of the linked list.
I suppose some advantages are:
If you're on a 64 bit system, your "pointers" will take up less space (though the extra space required by free entries probably outweighs this advantage)
You could serialise the array to disk and read it back in with an mmap() call easily. Though, you'd be better off using some sort of protocol buffer for portability.
You could make some guarantees about elements in the array being close to each other in memory.
Can anybody tell me what is the advantage and disadvantage of implementation of linked list using array compared to "ordinary" linked list?
linked lists have the following complexity:
cons x xs : O(1)
append n m : O(n)
index i xs : O(n)
if your representation uses a strict, contiguous array, you will have different complexity:
cons will require copying the old array: O(n)
append will require copying both arrays into a new contiguous space: O(n + m)
index can be implemented as array access: O(1)
That is, a linked list API implemented in terms of arrays will behave like an array.
You can mitigate this somewhat by using a linked list or tree of strict arrays, leading to ropes or finger trees or lazy sequences.
stack in implement two way.
first in using array and second is using linked list.
some disadvatages in using array then most of programmer use linked list in stack implement.
first is stack using linked list first not declare stack size and not limited data store in stack. second is linked list in pointer essay to declare and using it.
only one pointer use in linked list. its called top pointer.
stack is lifo method use. but some disadvantages in linked list program implemention.
Most of programmer use stack implemention using liked list.
Using Array implementation, you can have sequential & faster access to nodes of list, on the other hand,
If you implement Linked list using pointers, you can have random access to nodes.
Array implementation is helpful when you are dealing with fixed no. Of elements because resizing an array is expensive as far as performance is concerned because if you are required to insert/delete nodes from middle of the list it you have to shift every node afterwise.
Contrary to this, You should use pointer implemention when you don't know no. of nodes you would want, as such a list can grow/shrink efficiently & you don't need to shift any nodes, it can be done by simply dereferencing & referencing pointers.

Resources