Removing duplicates from an unsorted linked list - data-structures

How to remove duplicates from an unsorted linked list without using temporary buffers?
I can't think of any method.

There is the O(n^2) method, of course, that would work without buffers. I could elaborate if you like.

Related

Iterating over classes in a disjoint set data structure

I've implemented a disjoint set data structure for my program and I realized I need to iterate over all equivalence classes.
Searching the web, I didn't find any useful information on the best way to implement that or how it influences complexity. I'm quite surprised since it seems like something that would be needed quite often.
Is there a standard way of doing this? I'm thinking about using a linked list (I use C so I plan to store some pointers in the top element of each equivalence class) and updating it on each union operation. Is there a better way?
You can store pointers to top elements in hash-based set or in any balanced binary search tree. You only need to delete and add elements - both operations in these structures run in O(1)* and in O(logN) respectively. In linked list they run in O(N).
Your proposal seems very reasonable. If you thread a doubly-linked list through the representatives, you can splice out an element from the representatives list in time O(1) and then walk the list each time you need to list representatives.
#ardenit has mentioned that you can also use an external hash table or BST to store the representatives. That's certainly simpler to code up, though I suspect it won't be as fast as just threading a linked list through the items.

Search multiple lists by indirective ids

There are x (x=3 in this example) unsorted lists with identificators:
list1 list2 list3
array1[id3], array2[id4,id4a], array3[id1a,id1b]
array1[id4], array2[id3,id3a], array3[id4a,id4b]
array1[id1], array2[id2,id2a], array3[id3a,id3b]
array1[id2], array2[id1,id1a], array3[id2a,id2b]
...
array1[idn], array2[idn,idna], array3[idn,idnb]
I want to make pairs: {id1,id1b}, {id2,id2b} and so on. Sadly, i cannot do it directly. That's how it works: take id3 from list1 then find id3 in list2 then take id3a from list2 then find id3a in list3 and finally we got id3b.
It could be done with nested loops but what if there were more lists? Seems to be inefficient. Is there a better solution?
The only better solutions algorithmically would require a different representation. For example, if the lists can be sorted, then searches to get from key1->key2->key3->value could all be binary searches. That's probably the easiest and least intrusive solution to implement if you can just slightly change the data representation to be sorted.
If you use a different data structure outright like multiple hash tables, then each search could be constant-time (assuming no collisions). You could even consolidate this all to a single hash table with a 3-part key that maps to a single hash index storing the value.
You could also use BSTs, possibly tries, etc., but all of these algorithmic improvements will hinge on a different data representation.
Any search through an unsorted list is generally going to have to be O(N), since we cannot make any assumptions and are helpless but to potentially search the entire list. With three lists and 3 nested searches, we end up looking at a cubic complexity O(N^3) algorithm (doesn't scale very well).
Without changing the data representation, I think linear-time searches for each unsorted list is as good as you can get (and yes, that could be quite horrible), and you're probably looking at micro-optimizations like multithreading or SIMD.
I forgot to mention that after each iteration i'll get a new set of lists.
For example, in the first iteration:
array1[id1], array2[id2,id2a], array3[id3a,id3b]
array1[id2], array2[id1,id1a], array3[id2a,id2b]`
In the second one:
array1[id3], array2[id4,id4a], array3[id1a,id1b]
array1[id4], array2[id3,id3a], array3[id4a,id4b]
etc. So if I touch the keys to link them together in one iteration I will have to do the same in next one with the new set. It looks like each auxiliary structure has to be rebuilt. is it worthwhile then? No doubt, it depends. But more or less?

Merge sort on a double linked list

Reading up on the good ways to sort a linked list (besides assigning an array and quick-sorting), it looks like mergesort is one of the better methods.
See: Merge Sort a Linked List
The current questions on this topic are non-specific as to weather the list is single or double linked.
My question is:
Is there improved methods of merge-sorting that take advantage of double linked lists?
(or is it just as good to use the same method as a single linked list and assign the previous link just to ensure the list remains valid)
To the best of my knowledge, there are no special time-saving tricks that arise when doing mergesort on a doubly-linked list that don't also work on a singly-linked list.
Mergesort was originally developed for data stored on reels of magnetic tape that could only be read in a forwards direction, and most standard implementations of mergesort only do forwards passes over the elements. The two major operations required by mergesort are splitting the input in half (hard in both singly- and doubly-linked lists) and merging elements back together (basically the same in both cases). As a result, without doing something substantially more clever than a standard mergesort, I would not expect there to be any major speedups.

Using mergesort for list

Mergesort can be done in-place for list;unlike an array.
However,I have not found a reference yet which explains how this is achieved.
Any pointer is appreciated.
It actually is possible, though not straightforward, to implement an in-place merge sort for arrays. With linked lists the problem becomes quite simple. Each node in a linked list just has a value and a pointer to the next node. It is quite simple to break a linked list in half. Just traverse to the middle node, take its successor as the head of your second list and then set successor to null.
The merge step works just like you would expect. Don't make any new nodes, just relink the nodes from your two lists.

Why is appending to a list bad?

I've recently started learning scala, and I've come across the :: (cons) function, which prepends to a list.
In the book "Programming in Scala" it states that there is no append function because appending to a list has performance o(n) whereas prepending has a performance of o(1)
Something just strikes me as wrong about that statement.
Isn't performance dependent on implementation? Isn't it possible to simply implement the list with both forward and backward links and store the first and last element in the container?
The second question I suppose is what I'm supposed to do when I have a list, say 1,2,3 and I want to add 4 to the end of it?
The key is that x :: somelist does not mutate somelist, but instead creates a new list, which contains x followed by all elements of somelist. This can be done in O(1) time because you only need to set somelist as the successor of x in the newly created, singly linked list.
If doubly linked lists were used instead, x would also have to be set as the predecessor of somelist's head, which would modify somelist. So if we want to be able to do :: in O(1) without modifying the original list, we can only use singly linked lists.
Regarding the second question: You can use ::: to concatenate a single-element list to the end of your list. This is an O(n) operation.
List(1,2,3) ::: List(4)
Other answers have given good explanations for this phenomenon. If you are appending many items to a list in a subroutine, or if you are creating a list by appending elements, a functional idiom is to build up the list in reverse order, cons'ing the items on the front of the list, then reverse it at the end. This gives you O(n) performance instead of O(n²).
Since the question was just updated, it's worth noting that things have changed here.
In today's Scala, you can simply use xs :+ x to append an item at the end of any sequential collection. (There is also x +: xs to prepend. The mnemonic for many of Scala's 2.8+ collection operations is that the colon goes next to the collection.)
This will be O(n) with the default linked implementation of List or Seq, but if you use Vector or IndexedSeq, this will be effectively constant time. Scala's Vector is probably Scala's most useful list-like collection—unlike Java's Vector which is mostly useless these days.
If you are working in Scala 2.8 or higher, the collections introduction is an absolute must read.
Prepending is faster because it only requires two operations:
Create the new list node
Have that new node point to the existing list
Appending requires more operations because you have to traverse to the end of the list since you only have a pointer to the head.
I've never programmed in Scala before, but you could try a List Buffer
Most functional languages prominently figure a singly-linked-list data structure, as it's a handy immutable collection type. When you say "list" in a functional language, that's typically what you mean (a singly-linked list, usually immutable). For such a type, append is O(n) whereas cons is O(1).

Resources