Deletion in Hash Table - algorithm

I was reading the book Introduction To Algorithms and I came across this:
We can delete an element in O(1) time if the lists are doubly linked. (Note that CHAINED-HASH-DELETE takes as input an element x
and not its key k, so that we don’t have to search for x first. If the
hash table supports deletion, then its linked lists should be doubly
linked so that we can delete an item quickly. If the lists were only
singly linked, then to delete element x, we would first have to find
x in the list T[h(x.key)] so that we could update the next
attribute of x’s predecessor. With singly linked lists, both deletion
and searching would have the same asymptotic running times.)
How can we delete an element in O(1) time if the lists are double linked? First we will need to find the element and then we can delete it in O(1). But to find it we need O(length of the list) time. Maybe it's faster deleting in a doubly linked list (because we can search from the both ends of the list at the same time, but that is only constant improvement), but I don't see how it can be done in O(1) time.
Thank you in advance.

The answer is in the text;
Note that CHAINED-HASH-DELETE takes as input an element x and not its key k, so that we don’t have to search for x first.
You already have the item so you only have to remove it from the chain and do a delete.
To remove item X you need to get the previous and next node in the list and link them together before you delete X so the list remains unbroken. In a doubly linked list you already have a link to previous and next so this is constant. In a single linked list you would only have a link to next and so you need to scan through the list to find the previous node.

I think the confusion here is because of the implicit assumption in CLRS. In this book, objects are often treated as property bags where required properties can be added at runtime - much like JavaScript but unlike Java/C# world. So if you want to put x in linked list, you don't necessarily need to create a Node object first and then add properties for Previous, Next and Value. Instead, you just add those properties to x itself. Many of us who have grown up with statically typed languages would be shocked at this design but for algorithm design with pseudo code, it removes unnecessary clutter. I think authors should have clarified this. In any case, without ability to add Previous, Next properties to object dynamically, yes, it would not be O(1) even with doubly linked lists.

Related

Binary Search-accessing the middle element drawback

I am studying from my course book on Data Structures by Seymour Lipschutz and I have come across a point I don’t fully understand..
Binary Search Algorithm assumes that one has direct access to middle element in the list. This means that the list must be stored in some typeof linear array.
I read this and also recognised that in Python you can have access to the middle element at all times. Then the book goes onto say:
Unfortunately, inserting an element in an array requires elements to be moved down the list, and deleting an element from an array requires element to be moved up the list.
How is this a Drawback ?
Won’t we still be able to access the middle element by dividing the length of array by 2?
In the case where the array will not be modified, the cost of insertion and deletion are not relevant.
However, if an array is to be used to maintain a sorted set of non-fixed items, then insertion and deletion costs are relevant. In this case, binary search can be used to find items (possibly for deletion) and/or find where new items should be inserted. The drawback is that insertion and deletion require movement of other elements.
Python's bisect module provides binary search functionality that can be used for locating insertion points for maintaining sorted order. The drawback mentioned applies.
In some cases, a binary search tree may be a preferable alternative to a sorted array for maintaining a sorted set of non-fixed items.
It seems that author compares array-like structures and linked list
The first (array, Python and Java list, C++ vector) allows fast and simple access to any element by index, but appending, inserting or deletion might cause memory redistribution.
For the second we cannot address i-th element directly, we need to traverse list from the beginning, but when we have element - we can insert or delete quickly.

How to implement linked list with 1 million nodes?

I recently attended Microsoft Interview.
I was asked to implement linked list with 1 million nodes? How will you access 999999th node?
What is the optimal design strategy and implementation for such a question?
A linked list has fairly few variations, because much variation means it would be something other than a linked list.
You can vary it by having single or double linking. Single linking is where you have a pointer to the head (the first node, A say) which points to B which points to C, etc. To turn that into a double linked list you would also add a link from C to B and B to A.
If you have a double linked list then it's meaningful to retain a pointer to the list tail (the last node) as well as the head, which means accessing the last element is cheap, and elements near the end are cheaper, because you can work backwards or forwards... BUT... you would need to know what you want is at the end of the list... AND at the end of the day a linked list is still just that, and if it is going to get very large and that is a problem because of the nature of its use case, then a storage structure other than a linked list should probably be chosen.
You could hybridise your linked list of course, so you could index it or something for example, and there's nothing wrong with that in theory, but if you index ALL the nodes then the linked list nature is no longer of much value, and if you index only some, then the nodes in between indexed nodes have to be sorted or something so you can find a close node and work towards a target node... probably this would never be optimal and a better data structure should be chosen.
Really a linked list should be used when you don't want to do things like get a specific node, but want to iterate nodes regardless.
I have no idea about what I'm going to say, but, here goes:
You could conceptually split the list in sqrt(1000000) blocks, in such a way that you would have "reference pointers" every 1000 elements.
Think of it as having 1000 linked lists each with 1000 elements representing your list with 1000000 elements.
This is what comes to mind!
As Michael said you should present first the two classic variations of linked list. The next thing you should do is ask about insertion, search and deletion patterns.
These patterns will guide you towards a better fit data structure, because nobody wants a simple or double linked list with a million nodes.
A doubly circular Linked List with a static counter to point the index could be quite helpful in this case.
What I am suggesting is creating a Circular doubly Linked List having a counter variable which keeps track of the index of each node and a static variable which will hold overall number of nodes in the list.
Now when you have a search item for the index which is greater than 50% of the total nodes count i.e. searching elements at lower half you can start traversing the list from reverse direction.
Let say you have 10 nodes in your circular linked list and you want to search 8th node so you can quickly start traversing the list in opposite direction 2 times.
This approach reduces the iterations to search list item indexed at extremes but still in worst case you have to traverse half way through list for items in middle.
The only downfall in this approach is memory constraints which I am assuming is not an design concern here.

When is doubly linked list more efficient than singly linked list?

In an interview today I got asked the question.
Apart from answering reversing the list and both forward and backward traversal there was something "fundamental" in it that the interviewer kept stressing. I gave up and of course after interview did a bit of research. It seems that insertion and deletion are more efficient in doubly linked list than singly linked list. I am not quite sure how it can be more efficient for a doubly linked list since it is obvious that more references are required to change.
Can anybody explain the secret behind? I honestly did a quite a bit of research and failed to understand with my main trouble being the fact that a O(n) searching is still needed for the double linked list.
Insertion is clearly less work in a singly-linked list, as long as you are content to always insert at the head or after some known element. (That is, you cannot insert before a known element, but see below.)
Deletion, on the other hand, is trickier because you need to know the element before the element to be deleted.
One way of doing this is to make the delete API work with the predecessor of the element to be deleted. This mirrors the insert API, which takes the element which will be the predecessor of the new element, but it's not very convenient and it's hard to document. It's usually possible, though. Generally speaking, you arrive at an element in a list by traversing the list.
Of course, you could just search the list from the beginning to find the element to be deleted, so that you know what its predecessor was. That assumes that the delete API includes the head of the list, which is also inconvenient. Also, the search is stupidly slow.
The way that hardly anyone uses, but which is actually pretty effective, is to define a singly-linked list iterator to be the pointer to the element preceding the current target of the iterator. This is simple, only one indirection slower than using a pointer directly to the element, and makes both insertion and deletion fast. The downside is that deleting an element may invalidate other iterators to list elements, which is annoying. (It doesn't invalidate the iterator to the element being deleted, which is nice for traversals which delete some elements, but that's not much compensation.)
If deletion is not important, perhaps because the datastructures are immutable, singly-linked lists offer another really useful property: they allow structure-sharing. A singly-linked list can happily be the tail of multiple heads, something which is impossible for a doubly-linked list. For this reason, singly-linked lists have traditionally been the simple datastructure of choice for functional languages.
Here is some code that made it clearer to me... Having:
class Node{
Node next;
Node prev;
}
DELETE a node in a SINGLE LINKED LIST -O(n)-
You don't know which is the preceeding node so you have to traverse the list until you find it:
deleteNode(Node node){
prevNode = tmpNode;
tmpNode = prevNode.next;
while (tmpNode != null) {
if (tmpNode == node) {
prevNode.next = tmpNode.next;
}
prevNode = tmpNode;
tmpNode = prevNode.next;
}
}
DELETE a node in a DOUBLE LINKED LIST -O(1)-
You can simply update the links like this:
deleteNode(Node node){
node.prev.next = node.next;
node.next.prev = node.prev;
}
Here are my thoughts on Doubly-Linked List:
You have ready access\insert on both ends.
it can work as a Queue and a Stack at the same time.
Node deletion requires no additional pointers.
You can apply Hill-Climb traversal since you already have access on both ends.
If you are storing Numerical values, and your list is sorted, you can keep a pointer/variable for median, then Search operation can be highly optimal using Statistical approach.
If you are going to delete an element in a linked list, you will need to link the previous element to the next element. With a doubly linked list you have ready access to both elements because you have links to both of them.
This assumes that you already have a pointer to the element you need to delete and there is no searching involved.
'Apart from answering reversing the list and both forward and backward traversal there was something "fundamental"'.
Nobody seem to have mentioned: in a doubly linked list it is possible to reinsert a deleted element just by having a pointer to the deleted element. See Knuth's Dancing Links paper. I think that's pretty fundamental.
Because doubly linked lists have immediate access to both the front and end
of the list, they can insert data on either side at O(1) as well as delete data on either side at O(1). Because doubly linked lists can insert data at the end in O(1) time and delete data from the front in O(1) time, they make the perfect underlying data structure for a queue. Queeus are lists of items
in which data can only be inserted at the end and removed from the beginning.
queues are an example of an abstract data type, and
that we are able to use an array to implement them under the hood.
Now, since queues insert at the end and delete from the beginning, arrays
are only so good as the underlying data structure. While arrays are O(1) for
insertions at the end, they’re O(N) for deleting from the beginning.
A doubly linked list, on the other hand, is O(1) for both inserting at the end
and for deleting from the beginning. That’s what makes it a perfect fit for
serving as the queue’s underlying data structure.
The doubly linked list is used in LRU cache design since we need to remove the least recently items frequently. The deletion operation is faster. To delete the least recently used item, we just delete if from end, to a new item to add cache, we just append a new node to the beginning of the list
Doubly Linked List is used in navigation systems where front and back navigation is required. It is also used by the browser to implement backward and forward navigation of visited web pages that is a back and forward button.
Singly Linked List vs Doubly Linked List vs Dynamic Arrays:
When comparing the three main data structures, Doubly Linked Lists are most efficient in all major tasks and operations when looking at time complexity. For Doubly Linked Lists, it operates at constant time for all operations except only access by index, where it operated at linear time (n) as it needs to iterate through each node to get to the required index. When it comes to Insert, Remove, First, Last, Concatenation and Count, Doubly Linked list operates at constant time where Dynamic Arrays operate at linear time (n).
In terms of space complexity, Dynamic Arrays stores only elements therefore constant time complexity, singly linked lists stores the successor of each element therefore linear space complexity (n), and worst of all doubly linked list stores the predecessor and successor of each element and therefore also linear space complexity but (2*n).
Unless you have extremely limited resources / space then perhaps either Dynamic arrays or Singly linked lists are better, however, nowadays, space and resources are more and more abundant and so doubly linked lists are far better with the cost of more space.
Doubly Linked list is more effective than the Singly linked list when the location of the element to be deleted is given. Because it is required to operate on "4" pointers only & "2" when the element to be deleted is at the first node or at the last node.
struct Node {
int Value;
struct Node *Fwd;
struct Node *Bwd;
);
Only the below line of code will be enough to delete the element, if the element to be deleted is not in the first or last node.
X->Bwd->Fwd = X->Fwd; X->Fwd->Bwd = X->Bwd;

Why is appending to a list bad?

I've recently started learning scala, and I've come across the :: (cons) function, which prepends to a list.
In the book "Programming in Scala" it states that there is no append function because appending to a list has performance o(n) whereas prepending has a performance of o(1)
Something just strikes me as wrong about that statement.
Isn't performance dependent on implementation? Isn't it possible to simply implement the list with both forward and backward links and store the first and last element in the container?
The second question I suppose is what I'm supposed to do when I have a list, say 1,2,3 and I want to add 4 to the end of it?
The key is that x :: somelist does not mutate somelist, but instead creates a new list, which contains x followed by all elements of somelist. This can be done in O(1) time because you only need to set somelist as the successor of x in the newly created, singly linked list.
If doubly linked lists were used instead, x would also have to be set as the predecessor of somelist's head, which would modify somelist. So if we want to be able to do :: in O(1) without modifying the original list, we can only use singly linked lists.
Regarding the second question: You can use ::: to concatenate a single-element list to the end of your list. This is an O(n) operation.
List(1,2,3) ::: List(4)
Other answers have given good explanations for this phenomenon. If you are appending many items to a list in a subroutine, or if you are creating a list by appending elements, a functional idiom is to build up the list in reverse order, cons'ing the items on the front of the list, then reverse it at the end. This gives you O(n) performance instead of O(n²).
Since the question was just updated, it's worth noting that things have changed here.
In today's Scala, you can simply use xs :+ x to append an item at the end of any sequential collection. (There is also x +: xs to prepend. The mnemonic for many of Scala's 2.8+ collection operations is that the colon goes next to the collection.)
This will be O(n) with the default linked implementation of List or Seq, but if you use Vector or IndexedSeq, this will be effectively constant time. Scala's Vector is probably Scala's most useful list-like collection—unlike Java's Vector which is mostly useless these days.
If you are working in Scala 2.8 or higher, the collections introduction is an absolute must read.
Prepending is faster because it only requires two operations:
Create the new list node
Have that new node point to the existing list
Appending requires more operations because you have to traverse to the end of the list since you only have a pointer to the head.
I've never programmed in Scala before, but you could try a List Buffer
Most functional languages prominently figure a singly-linked-list data structure, as it's a handy immutable collection type. When you say "list" in a functional language, that's typically what you mean (a singly-linked list, usually immutable). For such a type, append is O(n) whereas cons is O(1).

Plain, linked and double linked lists: When and Why?

In what situations should I use each kind of list? What are the advantages of each one?
Plain list:
Stores each item sequentially, so random lookup is extremely fast (i.e. I can instantly say "I want the 657415671567th element, and go straight to it, because we know its memory address will be exactly 657415671567 bigger than the first item). This has little or no memory overhead in storage. However, it has no way of automatically resizing - you have to create a new array, copy across all the values, and then delete the old one. Plain lists are useful when you need to lookup data from anywhere in the list, and you know that your list will not be longer than a certain size.
Linked List:
Each item has a reference to the next item. This means that there is some overhead (to store the reference to the next item). Also, because they're not stored sequentially, you can't immediately go to the 657415671567th element - you have to start at the head (1st element), and then get its reference to go to the 2nd, and then get its reference, to get to the third, ... and then get its reference to get to the 657415671566th, and then get its reference to get to the 657415671567th. In this way, it is very inefficient for random lookup. However, it allows you to modify the length of the list. If your task is to go through each item sequentially, then it's about the same value as a plain list. If you need to change the length of the list, it could be better than a plain list. If you know the 566th element, and you're looking for the 567th, then all you need to do is follow the reference to the next one. However, if you know the 567th and you're looking for the 566th, the only way to find it is to start searching from the 1st element again. This is where Double Linked Lists come in handy...
Double Linked List:
Double linked lists store a reference to the previous element. This means you can traverse the list backwards as well as forwards. This could be very useful in some situations (such as the example given in the Linked List section). Other than that, they have most of the same advantages and disadvantages as a Linked List.
Answer from comments section:
For use as a queue:
You'd have to take all of those advantages and disadvantages into account: Can you say with confidence that your queue will have a maximum size? If your queue could be anywhere from 1 to 10000000000 elements long, then a plain list will just waste memory (and then may not even be big enough). In that case, I'd go with a Linked List. However, rather than storing the index of the front and rear, you should actually store the node.
Recap: A linked list is made up of "nodes", and each node stores the item as well as the reference to the next node
So you should store a reference to the first node, and the last node. Thus, when you enqueue, you stick a new node onto the rear (by linking the old rear one to the new rear one), and remember this new rear node. And, when you dequeue, you remove the front node, and remember the second one as the new "front node". That way, you don't have to worry about any of the middle elements. You can thus ignore the length of the queue (although you can store that too if you really want)
Nobody mentioned my favorite linked list: circularly linked list with a pointer to the last element. You get constant-time insertion and deletion at either end, plus constant-time destructive append. The only cost is that empty lists are a bit tricky. It's a sweet data structure: list, queue, and stack all in one.
One advantage of a doubly-linked list is that removal of a node whose pointer is specified is O(1).
With singly linked lists you can only traverse forwards. With doubly linked lists you can traverse backwards as well as forwards through the list. In general if you are going to use a linked list, there is really no good reason not to use a doubly linked list. I have only used single linked in school.
Doubly-linked list provides several advantages over a singly linked list:
Easier traversal: With a doubly linked list, each node has a pointer to both the previous and next node, allowing for easy traversal in both directions. This is useful for certain types of algorithms that need to move both forwards and backwards through the list.
Faster deletion: In a singly linked list, when you want to delete a node, you need to traverse the list to find the node before it, so that you can update the next pointer. In a doubly linked list, the node you want to delete already has a pointer to the previous node, so you can update the previous node's next pointer directly, making deletion faster.
Easier insertion: Similar to deletion, in a singly linked list, you need to traverse the list to find the node before the one you want to insert. With a doubly linked list, you can insert a new node directly before or after a given node, without the need to traverse the list.
Easier to implement in-place modification: With a doubly linked list, it is easy to move elements around within the list without creating new list elements or destroying old ones.
Easier to implement Queue and Stack : A doubly linked list makes it easy to implement queue and stack data structures.

Resources