Efficient implementation of immutable (double) LinkedList - data-structures

Having read this question Immutable or not immutable? and reading answers to my previous questions on immutability, I am still a bit puzzled about efficient implementation of simple LinkedList that is immutable. In terms of array tha seems to be easy - copy the array and return new structure based on that copy.
Supposedly we have a general class of Node:
class Node{
private Object value;
private Node next;
}
And class LinkedList based on the above allowing the user to add, remove etc. Now, how would we ensure immutability? Should we recursively copy all the references to the list when we insert an element?
I am also curious about answers in Immutable or not immutable? that mention cerain optimization leading to log(n) time and space with a help of a binary tree. Also, I read somewhere that adding an elem to the front is 0(1) as well. This puzzles me greatly, as if we don't provide the copy of the references, then in reality we are modifying the same data structures in two different sources, which breaks immutability...
Would any of your answers alo work on doubly-linked lists? I look forward to any replies/pointers to any other questions/solution. Thanks in advance for your help.

Supposedly we have a general class of Node and class LinkedList based on the above allowing the user to add, remove etc. Now, how would we ensure immutability?
You ensure immutability by making every field of the object readonly, and ensuring that every object referred to by one of those readonly fields is also an immutable object. If the fields are all readonly and only refer to other immutable data, then clearly the object will be immutable!
Should we recursively copy all the references to the list when we insert an element?
You could. The distinction you are getting at here is the difference between immutable and persistent. An immutable data structure cannot be changed. A persistent data structure takes advantage of the fact that a data structure is immutable in order to re-use its parts.
A persistent immutable linked list is particularly easy:
abstract class ImmutableList
{
public static readonly ImmutableList Empty = new EmptyList();
private ImmutableList() {}
public abstract int Head { get; }
public abstract ImmutableList Tail { get; }
public abstract bool IsEmpty { get; }
public abstract ImmutableList Add(int head);
private sealed class EmptyList : ImmutableList
{
public override int Head { get { throw new Exception(); } }
public override ImmutableList Tail { get { throw new Exception(); } }
public override bool IsEmpty { get { return true; } }
public override ImmutableList Add(int head)
{
return new List(head, this);
}
}
private sealed class List : ImmutableList
{
private readonly int head;
private readonly ImmutableList tail;
public override int Head { get { return head; } }
public override ImmutableList Tail { get { return tail; } }
public override bool IsEmpty { get { return false; } }
public override ImmutableList Add(int head)
{
return new List(head, this);
}
}
}
...
ImmutableList list1 = ImmutableList.Empty;
ImmutableList list2 = list1.Add(100);
ImmutableList list3 = list2.Add(400);
And there you go. Of course you would want to add better exception handling and more methods, like IEnumerable<int> methods. But there is a persistent immutable list. Every time you make a new list, you re-use the contents of an existing immutable list; list3 re-uses the contents of list2, which it can do safely because list2 is never going to change.
Would any of your answers also work on doubly-linked lists?
You can of course easily make a doubly-linked list that does a full copy of the entire data structure every time, but that would be dumb; you might as well just use an array and copy the entire array.
Making a persistent doubly-linked list is quite difficult but there are ways to do it. What I would do is approach the problem from the other direction. Rather than saying "can I make a persistent doubly-linked list?" ask yourself "what are the properties of a doubly-linked list that I find attractive?" List those properties and then see if you can come up with a persistent data structure that has those properties.
For example, if the property you like is that doubly-linked lists can be cheaply extended from either end, cheaply broken in half into two lists, and two lists can be cheaply concatenated together, then the persistent structure you want is an immutable catenable deque, not a doubly-linked list. I give an example of a immutable non-catenable deque here:
http://blogs.msdn.com/b/ericlippert/archive/2008/02/12/immutability-in-c-part-eleven-a-working-double-ended-queue.aspx
Extending it to be a catenable deque is left as an exercise; the paper I link to on finger trees is a good one to read.
UPDATE:
according to the above we need to copy prefix up to the insertion point. By logic of immutability, if w delete anything from the prefix, we get a new list as well as in the suffix... Why to copy only prefix then, and not suffix?
Well consider an example. What if we have the list (10, 20, 30, 40), and we want to insert 25 at position 2? So we want (10, 20, 25, 30, 40).
What parts can we reuse? The tails we have in hand are (20, 30, 40), (30, 40) and (40). Clearly we can re-use (30, 40).
Drawing a diagram might help. We have:
10 ----> 20 ----> 30 -----> 40 -----> Empty
and we want
10 ----> 20 ----> 25 -----> 30 -----> 40 -----> Empty
so let's make
| 10 ----> 20 --------------> 30 -----> 40 -----> Empty
| /
| 10 ----> 20 ----> 25 -/
We can re-use (30, 40) because that part is in common to both lists.
UPDATE:
Would it be possible to provide the code for random insertion and deletion as well?
Here's a recursive solution:
ImmutableList InsertAt(int value, int position)
{
if (position < 0)
throw new Exception();
else if (position == 0)
return this.Add(value);
else
return tail.InsertAt(value, position - 1).Add(head);
}
Do you see why this works?
Now as an exercise, write a recursive DeleteAt.
Now, as an exercise, write a non-recursive InsertAt and DeleteAt. Remember, you have an immutable linked list at your disposal, so you can use one in your iterative solution!

Should we recursively copy all the references to the list when we insert an element?
You should recursively copy the prefix of the list up until the insertion point, yes.
That means that insertion into an immutable linked list is O(n). (As is inserting (not overwriting) an element in array).
For this reason insertion is usually frowned upon (along with appending and concatenation).
The usual operation on immutable linked lists is "cons", i.e. appending an element at the start, which is O(1).
You can see clearly the complexity in e.g. a Haskell implementation. Given a linked list defined as a recursive type:
data List a = Empty | Node a (List a)
we can define "cons" (inserting an element at the front) directly as:
cons a xs = Node a xs
Clearly an O(1) operation. While insertion must be defined recursively -- by finding the insertion point. Breaking the list into a prefix (copied), and sharing that with the new node and a reference to the (immutable) tail.
The important thing to remember about linked lists is :
linear access
For immutable lists this means:
copying the prefix of a list
sharing the tail.
If you are frequently inserting new elements, a log-based structure , such as a tree, is preferred.

There is a way to emulate "mutation" : using immutable maps.
For a linked list of Strings (in Scala style pseudocode):
case class ListItem(s:String, id:UUID, nextID: UUID)
then the ListItems can be stored in a map where the key is UUID:
type MyList = Map[UUID, ListItem]
If I want to insert a new ListItem into val list : MyList :
def insertAfter(l:MyList, e:ListItem)={
val beforeE=l.getElementBefore(e)
val afterE=l.getElementAfter(e)
val eToInsert=e.copy(nextID=afterE.nextID)
val beforeE_new=beforeE.copy(nextID=e.nextID)
val l_tmp=l.update(beforeE.id,beforeE_new)
return l_tmp.add(eToInsert)
}
Where add, update, get takes constant time using Map: http://docs.scala-lang.org/overviews/collections/performance-characteristics
Implementing double linked list goes similarly.

Related

Java ArrayDeque instantiation from existing Collection vs Iteration

I am trying to evaluate two of creating an ArrayDeque with existing collection.. I see two options :
Using ArrayDeque's constructor which accepts an existing Collection
Iterate over the Collection and call dequeu.offer(element) while iterating over Collection
From my benchmarks I see first option running faster than second. Is there any reason for the first option to be better than second ?
On the first option.
public ArrayDeque(Collection<? extends E> c) {
allocateElements(c.size());
addAll(c);
}
It allocates all the elements in one call (that is, it allocates an array with enough elements) and then just adds them one by one (without any additional allocations).
But when you use option 2, several allocations happen during this process when ArrayDeque grows (doubling itself); plus on each reallocation the elements are copied to the new array.
Just one allocation (for option 1) versus several allocations plus copying (for option 2) gives the difference.
For reference:
public boolean offerLast(E e) {
addLast(e);
return true;
}
public void addLast(E e) {
if (e == null)
throw new NullPointerException();
elements[tail] = e;
if ( (tail = (tail + 1) & (elements.length - 1)) == head)
doubleCapacity();
}
All code is from Java 8.
Creating new ArrayDeque with constructor and in way to iterate through Collection and add each element by one is similar. Because in constructor of this class used iteration through all collection.

Collections Navigate and update, (no new collections) How to do with Java 8

I have a aList and a bList, both have one field common which is my refernece to match two lists.
Once the two lists reference matches i want to update the bList Objects with aList.
Conventional approach is as below, How can i achieve same in java 8 ?
// How to save below piece of two iterations (along with compare* and update*)
// using java 8 ?
// Stream filter will return new Collection but not update same (bList)
for (A a : aList)
{
for(B b: bList )
{
// compare*
if(a.getStrObj.equalsIgnoreCase(b.getStrObj))
{
// update*
// assume aObjs is initialized
b.getAObjs().add(a);
}
}
}
// Reference for Objects declaration
List<A> aList;
class A {
String strObj;
public String getStrObj()
{ return strObj; }
}
List<B> bList;
class B {
String strObj;
List<A> aObjs;
public getStrObj()
{ return strObj; }
public setAObjs(List<A> aObjs)
{ this.aObjs= aObjs; }
public getAObjs()
{ return this.aObjs;}
}
Your nested loop is not the best way to do it, even before Java 8 (unless you can prove that the lists will always be rather small). You should use a temporary Map with a fast lookup for one of the lists to avoid to perform m×n operations (string comparisons).
One way to do that with Java 8 is
Map<String, List<A>> m=aList.stream().collect(Collectors.groupingBy(A::getStrObj));
bList.forEach(b -> b.getAObjs()
.addAll(m.getOrDefault(b.getStrObj(), Collections.emptyList())));
Here we are performing m+n operations rather than m×n operations which scales much better with growing list sizes.
You can create an equivalent implementation with pre Java 8 constructs, i.e. two independent loops rather than two nested loops and the resulting code isn’t necessarily worse than the above Java 8 code.
Still, the above code might introduce to you some of the most important features (a method reference, a lambda expression, a stream collect operation and one of the new default operations of the Map interface), so you know where to start next time when solving a similar problem.

Indirect Enum or Class, Which One Should I Use for Building Basic Data Structures

When I tried to practice some basic data structure such as Linked /Doubly Linked/ Recycling Linked / Recycling Doubly Linked List, AVL Tree, Red-Black Tree, B-Tree and Treap by implementing them in Swift 2, I decided to do such things by taking advantage of Swift 2's new feature: indirect enum, because enum makes an empty node and a filled node more semantic than class.
But soonish it was found that for non-recycling linked lists, returning the inserted node after inserting an element makes no sense because the returned value is a value type but not a reference type. It is said that you cannot accelerate next insertion by writing information directly to the returned value because it is a copy of the inserted node but not a reference to the inserted node.
And what's worse is that mutating an indirect enum based node means writing the whole bunch of data of the associative value, which definitively introduces unnecessary system resource consumption, because the associative value in each enum case is a tuple in essence, which is a sort of contiguous data in memory in essence, which is the same to struct but doesn't have per property accessor to enable small bunch of data writing.
So which one should I use for building such basic data structures? Indirect enum or class?
Well it doesn't matter if it's swift 1 or swift 2 cause at the moment Enum and Structures are value types while Classes are called by reference. Since you want to use data structures in your code and like you called it yourself calling them by value is no good. You will have to use a Class in order for your code to do what you want it to do. Here is an example of a linked list using a Class:
class LLNode<T>
{
var key: T? var next: LLNode? var previous: LLNode?
}
//key printing
func printAllKeys()
{
var current: LLNode! = head; //assign the next instance
while(current != nil)
{
println("link item is: \(current.key)")
current = current.next
}
}
public class LinkedList<T: Equatable>
{ //create a new LLNode instance private
var head: LLNode<T> = LLNode<T>() //append a new item to a linked list
func addLink(key: T)
{ //establish the head node
if (head.key == nil)
{
head.key = key;
return;
} //establish the iteration variables
var current: LLNode? = head
while (current != nil)
{
if (current?.next == nil)
{
var childToUse: LLNode = LLNode<T>()
childToUse.key = key;
childToUse.previous = current
current!.next = childToUse;
break;
}
current = current?.next
} //end while
} ///end function
for more examples using swift and data structures please do visit:
data structures swift
Conclusion : Use Class if you want to call by reference else use Enum or Struct

Can we sort an IList partially?

IList<A_Desc,A_premium,B_Desc,B_Premium>
Can I sort two columns A_Desc,A_premium...based on A_Desc ?
And let B_Desc,B_Premium be remain in same order before sorting
First off, a list can only be of one type, and only has one "column" of data, so you actually want two lists and a data type that holds "desc" and "premium". "desc" sounds like a String to me; I don't know what Premium is, but I'll pretend it's a double for lack of better ideas. I don't know what this data is supposed to represent, so to me, it's just some thingie.
public class Thingie{
public String desc;
public double premium;
}
That is, of course, a terrible way to define the class- I should instead have desc and premium be private, and Desc and Premium as public Properties with Get and Set methods. But this is the fastest way for me to get the point across.
It's more canonical to make Thingie implement IComparable, and compare itself to other Thingie objects. But I'm editing an answer I wrote before I knew you needed to write a custom type, and had the freedom to just make it implement IComparable. So here's the IComparer approach, which lets you sort objects that don't sort themselves by telling C# how to sort them.
Implement an IComparer that operates over your custom type.
public class ThingieSorter: IComparer<Thingie>{
public int Compare(Thingie t1, Thingie t2){
int r = t1.desc.CompareTo(t2);
if(r != 0){return r;}
return t1.premium.CompareTo(t2);
}
}
C# doesn't require IList to implement Sort- it might be inefficient if it's a LinkedList. So let's make a new list, based on arrays, which does sort efficiently, and sort it:
public List<Thingie> sortedOf(IList<Thingie> list){
List<Thingie> ret = new List<Thingie>(list);
ret.sort(new ThingieSorter());
return ret;
}
List<Thingie> implements the interface IList<Thingie>, so replacing your original list with this one shouldn't break anything, as long as you have nothing holding onto the original list and magically expecting it to be sorted. If that's happening, refactor your code so it doesn't grab the reference until after your list has been sorted, since it can't be sorted in place.

Sorted hash table (map, dictionary) data structure design

Here's a description of the data structure:
It operates like a regular map with get, put, and remove methods, but has a sort method that can be called to sorts the map. However, the map remembers its sorted structure, so subsequent calls to sort can be much quicker (if the structure doesn't change too much between calls to sort).
For example:
I call the put method 1,000,000 times.
I call the sort method.
I call the put method 100 more times.
I call the sort method.
The second time I call the sort method should be a much quicker operation, as the map's structure hasn't changed much. Note that the map doesn't have to maintain sorted order between calls to sort.
I understand that it might not be possible, but I'm hoping for O(1) get, put, and remove operations. Something like TreeMap provides guaranteed O(log(n)) time cost for these operations, but always maintains a sorted order (no sort method).
So what's the design of this data structure?
Edit 1 - returning the top-K entries
Alhough I'd enjoy hearing the answer to the general case above, my use case has gotten more specific: I don't need the whole thing sorted; just the top K elements.
Data structure for efficiently returning the top-K entries of a hash table (map, dictionary)
Thanks!
For "O(1) get, put, and remove operations" you essentially need O(1) lookup, which implies a hash function (as you know), but the requirements of a good hash function often break the requirement to be easily sorted. (If you had a hash table where adjacent values mapped to the same bucket, it would degenerate to O(N) on lots of common data, which is a worse case you typically want a hash function to avoid.)
I can think of how to get you 90% of the way there. Set up a hashtable alongside a parallel index that is sorted. The index has a clean part (ordered) and a dirty part (unordered). The index would map keys to the values (or references to the values stored in the hashtable - whichever suits you in terms of performance or memory use). When you add to the hashtable, the new entry is pushed onto the back of the dirty list. When you remove from the hashtable, the entry is nulled/removed from the clean and dirty parts of the index. You can sort the index, which sorts the dirty entries only, then merges them into the already sorted 'clean' part of the index. And obviously you can iterate over the index.
As far as I can see, this gives you the O(1) everywhere except on the remove operation and is still fairly simple to implement with standard containers (at least as provided by C++, Java, or Python). It also gives you the "second sort is cheaper" condition by only needing to sort the dirty index entries and then letting you do an O(N) merge. The cost of all this is obviously extra memory for the index and extra indirection when using it.
Why exactly do you need a sort() function ?
What do you perhaps want and need is a Red-Black Tree.
http://en.wikipedia.org/wiki/Red-black_tree
These trees are automatically sorting your input by a comparator you give. They are complex, but have excellent O(n) characteristics. Couple your tree entries as key with a hash
map as dictionary and you get your datastructure.
In Java it is implemented as TreeMap as instance of SortedMap.
What you're looking at is a hashtable with pointers in the entries to the next entry in sorted order. It's a lot like the LinkedHashMap in java except that the links are tracking a sort order rather than the insertion order. You can actually implement this totally by wrapping a LinkedHashMap and having the implementation of sort transfer the entries from the LinkedHashMap into a TreeMap and then back into a LinkedHashMap.
Here's an implementation that sorts the entries in an array list rather than transferring to a tree map. I think the sort algorithm used by Collection.sort will do a good job of merging the new entries into the already sorted portion.
public class SortaSortedMap<K extends Comparable<K>,V> implements Map<K,V> {
private LinkedHashMap<K,V> innerMap;
public SortaSortedMap() {
this.innerMap = new LinkedHashMap<K,V>();
}
public SortaSortedMap(Map<K,V> map) {
this.innerMap = new LinkedHashMap<K,V>(map);
}
public Collection<V> values() {
return innerMap.values();
}
public int size() {
return innerMap.size();
}
public V remove(Object key) {
return innerMap.remove(key);
}
public V put(K key, V value) {
return innerMap.put(key, value);
}
public Set<K> keySet() {
return innerMap.keySet();
}
public boolean isEmpty() {
return innerMap.isEmpty();
}
public Set<Entry<K, V>> entrySet() {
return innerMap.entrySet();
}
public boolean containsKey(Object key) {
return innerMap.containsKey(key);
}
public V get(Object key) {
return innerMap.get(key);
}
public boolean containsValue(Object value) {
return innerMap.containsValue(value);
}
public void clear() {
innerMap.clear();
}
public void putAll(Map<? extends K, ? extends V> m) {
innerMap.putAll(m);
}
public void sort() {
List<Map.Entry<K,V>> entries = new ArrayList<Map.Entry<K,V>>(innerMap.entrySet());
Collections.sort(entries, new KeyComparator());
LinkedHashMap<K,V> newMap = new LinkedHashMap<K,V>();
for (Map.Entry<K,V> e: entries) {
newMap.put(e.getKey(), e.getValue());
}
innerMap = newMap;
}
private class KeyComparator implements Comparator<Map.Entry<K,V>> {
public int compare(Entry<K, V> o1, Entry<K, V> o2) {
return o1.getKey().compareTo(o2.getKey());
}
}
}
I don't know if there's a name, but you could store the current index of each item on the hash.
That is, you have a HashMap< Object, Pair( Integer, Object ) >
and a List<Object> objects
When you put, add to the tail or head of the list and insert into the hashmap with your data and the index of insertion. This is O(1).
When you get, pull from the hashmap and ignore the index. This is O(1).
When you remove, you pull from the map. Take the index and remove from the list as well. This is O(1)
When you sort, just sort the list. Either update the indexes in the map during the sort, or update after the sort is complete. This does not affect the O(nlgn) sort, as it's a linear step. O(nlgn + n) == O(nlgn)
Ordered Dictionary
Recent versions of Python (2.7, 3.1) have "ordered dictionaries" which sound like what you're describing.
The official Python "ordered dictionary" implementation is inspired by previous 3rd-party implementations, as described in the PEP 372.
References:
collections.OrderedDict documentation for Python 2.7
collections.OrderedDict documentation for Python 3.1
PEP 372
ActiveState Ordered Dictionary recipe for Python ≥ 2.4
I'm not aware of a data structure classification with that exact behavior, at least not in Java Collections (or from nonlinear data structures class). Perhaps you can implement it, and it will henceforth be known as the RudigerMap.

Resources