Where can one use a (doubly-linked list) Positional List ADT? When the developer wants O(n) memory and O(1) (non-amortized behaviors) to an arbitrary position in the list? I would like to see an example of using a positional list. What would be the advantage of using a positional list over using an array-based sequence?
If your program often needs to add new elements to or delete elements from your data collection a list is likely to be a better choice than an array.
Deleting the element at position N of an array require a copy operation on all elements after element N. In principle:
Arr[N] = Arr[N+1]
Arr[N+1] = Arr[N+2]
...
A similar copy is needed when inserting an new element, i.e. to make room for the new element.
If your program frequently adds/deletes element, the many copy operations may hurt performance.
As a part of these operations the position of existing elements changes, i.e. an element at position 1000 will be at either position 999 or 1001 after an element is deleted/added at position 50.
This can be a problem if some part of your program has searched for a specific element and saved its position (e.g. position 1000). After an element delete/add operation, the saved position is no longer valid.
A (doubly-linked) list "solves" the 3 problems described. With a list you can add/delete elements without copying existing elements to new positions. Consequently, the position of a specific element (e.g. a pointer to an element) will still be valid after an add/delete operation.
To summarize: If your program (frequently) adds or deletes random located elements and if your program requires that position information isn't affected by add/delete operations, a list may be a better choice than an array.
Positional Lists
So when working with arrays indexes are great in locating positions for insertion and deletion. However, indexes are not great for linked structures (like a linked list), mainly because even if we have the index. We still have to traverse all previous nodes in the liked structure. That means an index based deletion or insertion in a linked list would run O(N) time(no bueno). It is a general rule of thumb that for data structure operations we want them to run either O(1) or O(log n). Also, indexes are not good at describing there positions relative to other nodes.
What do Positions allow us to do?
Positions allow us to achieve constant time insertions and deletions at arbitrary locations within our liked structure (kind cool, right??).
They also allow us to describe the element relative to other elements.
Essentially, a Position gives us a reference to a memory address which we can then use for constant time insertions and deletions. Your textbook image doesn't show a validate method, However, I would assume that the implementation does. So just be aware that you will need a utility method validate to verify that the position is in the linked structure.
What does a Position look like?
in reality a Position is a ADT (abstract data type) and in Java we formalize ADTs with interfaces, like so:
public interface Position <E>{
E getElement()throws IllegalStateException;
}
A Position is just an abstraction that gets implemented on a Node within a linked structure. Why do this? Well, it gives our code greater flexibility and better abstraction.
So to implement a Position for a linked list node, it would look something like this:
private static class Node<E> implements Position<E> {
// ALL THE OTHER METHODS WOULD BE HERE
}
The Node class is private and static because it is assumed to be nested inside the linked list class. Then ultimately we use this Position as a data type when dealing with all the other methods inside your Positional Linked list.
Real world example
Well you could use this method to create a parse tree for a parser. You could have a binary tree that implements and uses the Position interface for all of its methods and then use the tree for parsing
Related
Summary
I have a problem that seems as if it is likely to have many good known solutions with different performance characteristics. Assuming this is the case, a great answer could include code (or pseudo code), but ideally, it would reference the literature and give them name of this problem so I can explore the problem in greater detail and understand the space of solutions.
The problem
I have an initially empty array that can hold identifiers. Any identifiers in the array are unique (they're only ever in the array once).
var identifiers: [Identifier] = []
Over time a process will insert identifiers in to the array. It won't just append them to the end. They can be inserted anywhere in the array.
Identifiers will never be removed from the array – only inserted.
Insertion needs to be quick, so probably the structure won't literally be an array, but something supporting better than linear time insertion (such as a BTree).
After some identifiers have been added to identifiers, I want to be able to query the current position of any given identifier. So I need a function to look this up.
A linear time solution to this is simply to scan through identifiers from the start until an identifier is found, and the answer is the index that was reached.
func find(identifier: Identifier) -> Int? {
for index in identifiers.indices {
if identifiers[index] == identifier {
return index
}
}
return nil
}
But this linear scaling with the size of the array is problematic if the array is very large (perhaps 100s of millions of elements).
A hash map doesn't work
We can't put the positions of the identifiers in to a hash map. This doesn't work because identifier positions are not fixed after insertion in to the array. If other identifiers are inserted before them, they will drift to higher indexes.
However, a possible acceleration for the linear time algorithm would be to cache the initial insertions position of an identifier and begin a linear scan from there. Because identifiers are only inserted, it must be at that index, or an index after it (or not in identifiers at all). Once the identifier is found, the cache can be updated.
Another option could be to update the hash-map after any insertions to correct any positions. However this would slow down insert so that it is potentially a linear time operation (as previously mentioned, identifiers is probably not a literally array but some other structure allowing better than linear time insertion).
Summary
There's a linear time solution, and there's an optimisation using a hash map (at the cost of roughly doubling storage). Is there a much better solution for looking up the current index of an identifier, perhaps in log time?
You can use an order-statistic tree for this, based on a red-black tree or other self-balancing binary tree. Essentially, each node will have an extra integer field, storing the number of descendants it currently has. (Insertion and deletion operations, and their resultant rotations, will only result in updating O(log n) nodes so this can be done efficiently). To query the position of an identifier, you examine the descendant count of its left subtree and the descendant counts of the siblings of each of its right-side ancestors.
Note that while the classic use of an order-statistic tree is for a sorted list, nothing in the logic requires this; "node A is before node B" is all you need for insertion, tree rotations don't require re-evaluating the ordering, and having a pointer to the node (and having nodes store parent pointers) is all you need to query the index. All operations are in O(log n).
I am studying from my course book on Data Structures by Seymour Lipschutz and I have come across a point I don’t fully understand..
Binary Search Algorithm assumes that one has direct access to middle element in the list. This means that the list must be stored in some typeof linear array.
I read this and also recognised that in Python you can have access to the middle element at all times. Then the book goes onto say:
Unfortunately, inserting an element in an array requires elements to be moved down the list, and deleting an element from an array requires element to be moved up the list.
How is this a Drawback ?
Won’t we still be able to access the middle element by dividing the length of array by 2?
In the case where the array will not be modified, the cost of insertion and deletion are not relevant.
However, if an array is to be used to maintain a sorted set of non-fixed items, then insertion and deletion costs are relevant. In this case, binary search can be used to find items (possibly for deletion) and/or find where new items should be inserted. The drawback is that insertion and deletion require movement of other elements.
Python's bisect module provides binary search functionality that can be used for locating insertion points for maintaining sorted order. The drawback mentioned applies.
In some cases, a binary search tree may be a preferable alternative to a sorted array for maintaining a sorted set of non-fixed items.
It seems that author compares array-like structures and linked list
The first (array, Python and Java list, C++ vector) allows fast and simple access to any element by index, but appending, inserting or deletion might cause memory redistribution.
For the second we cannot address i-th element directly, we need to traverse list from the beginning, but when we have element - we can insert or delete quickly.
You might have come across someplace where it is mentioned that it is faster to find elements in hashmap/dictionary/table than list/array. My question is WHY?
(inference so far I made: Why should it be faster, as far I see, in both data structure, it has to travel throughout till it reaches the required element)
Let’s reason by analogy. Suppose you want to find a specific shirt to put on in the morning. I assume that, in doing so, you don’t have to look at literally every item of clothing you have. Rather, you probably do something like checking a specific drawer in your dresser or a specific section of your closet and only look there. After all, you’re not (I hope) going to find your shirt in your sock drawer.
Hash tables are faster to search than lists because they employ a similar strategy - they organize data according to the principle that every item has a place it “should” be, then search for the item by just looking in that place. Contrast this with a list, where items are organized based on the order in which they were added and where there isn’t a a particular pattern as to why each item is where it is.
More specifically: one common way to implement a hash table is with a strategy called chained hashing. The idea goes something like this: we maintain an array of buckets. We then come up with a rule that assigns each object a bucket number. When we add something to the table, we determine which bucket number it should go to, then jump to that bucket and then put the item there. To search for an item, we determine the bucket number, then jump there and only look at the items in that bucket. Assuming that the strategy we use to distribute items ends up distributing the items more or less evenly across the buckets, this means that we won’t have to look at most of the items in the hash table when doing a search, which is why the hash table tends to be much faster to search than a list.
For more details on this, check out these lecture slides on hash tables, which fills in more of the details about how this is done.
Hope this helps!
To understand this you can think of how the elements are stored in these Data structures.
HashMap/Dictionary as you know it is a key-value data structure. To store the element, you first find the Hash value (A function which always gives a unique value to a key. For example, a simple hash function can be made by doing the modulo operation.). Then you basically put the value against this hashed key.
In List, you basically keep appending the element to the end. The order of the element insertion would matter in this data structure. The memory allocated to this data structure is not contiguous.
In Array, you can think of it as similar to List. But In this case, the memory allocated is contiguous in nature. So, if you know the value of the address for the first index, you can find the address of the nth element.
Now think of the retrieval of the element from these Data structures:
From HashMap/Dictionary: When you are searching for an element, the first thing that you would do is find the hash value for the key. Once you have that, you go to the map for the hashed value and obtain the value. In this approach, the amount of action performed is always constant. In Asymptotic notation, this can be called as O(1).
From List: You literally need to iterate through each element and check if the element is the one that you are looking for. In the worst case, your desired element might be present at the end of the list. So, the amount of action performed varies, and in the worst case, you might have to iterate the whole list. In Asymptotic notation, this can be called as O(n). where n is the number of elements in the list.
From array: To find the element in the array, what you need to know is the address value of the first element. For any other element, you can do the Math of how relative this element is present from the first index.
For example, Let's say the address value of the first element is 100. Each element takes 4 bytes of memory. The element that you are looking for is present at 3rd position. Then you know the address value for this element would be 108. Math used is
Addresses of first element + (position of element -1 )* memory used for each element.
That is 100 + (3 - 1)*4 = 108.
In this case also as you can observe the action performed is always constant to find an element. In Asymptotic notation, this can be called as O(1).
Now to compare, O(1) will always be faster than O(n). And hence retrieval of elements from HashMap/Dictionary or array would always be faster than List.
I hope this helps.
What would be the most appropriate way to implement a stack and a queue together efficiently, in a single data structure. The number of elements is infinite. The retrieval and insertion should both happen in constant time.
A doubly linked list, has all the computational complexity attributes you desire, but poor cache locality.
A ring buffer (array) that allows for appending and removing at head and tail has the same complexity characteristics. It uses a dynamic array and requires reallocation, once the number of elements grows beyond it's capacity.
But, similar to an array list / vector generally being faster in practice for sequential access versus a linked list. In most cases it will be faster and more memory efficient than using a doubly linked list implementation.
It is one of the possible implementations for the dequeue abstract data structure, see e.g. the ArrayDeque<E> implementation in Java.
A doubly linked list can solve this problem with all operations taking constant time:
It allows push() or enqueue() by appending the element to the
list in constant time.
It allows pop() by removing the last element in constant time
It allows dequeue() by removing the first element, also in constant time.
A two-way linked list is going to be best for this. Each node in the list has two references: one to the item before it and one to the item after it. The main list object maintains a reference to the item at the front of the list and one at the back of the list.
Any time it inserts an item, the list:
creates a new node, giving it a reference to the previous first or last node in the list (depending on whether you're adding to the front or back).
connects the previous first or last node to point at the newly-created node.
updates its own reference to the first or last node, to point at the new node.
Removing an item from the front or back of the list effectively reverses this process.
Inserting to the front or back of the structure will always be an O(1) operation.
So I need to find a data structure for this situation that I'll describe:
This is not my problem but explains the data structure aspect i need more succinctly:
I have an army made up of platoons. Every platoon has a number of men and a rank number(highest being better). If an enemy were to attack my army, they would kill some POWER of my army, starting from the weakest platoon and working up, where it takes (platoon rank) amount of power to kill every soldier from a platoon.
I could easily simulate enemies attacking me by peeking and popping elements from my priority queue of platoons, ordered by rank number, but that is not what I need to do. What i need is to be able to allow enemies to view all the soldiers they would kill if they attacked me, without actually attacking, so without actually deleting elements from my priorityqueue(if i implemented it as a pq).
Sidenote: Java's PriorityQueue.Iterator() prints elements in a random order, I know an iterator is all I need, just fyi.
The problem is, if I implemented this as a pq, I can only see the top element, so I would have to pop platoons off as if they were dying and then push them back on when the thought of the attack has been calculated. I could also implement this as a linked list or array, but insertion takes too long. Ultimately I would love to use a priority queue I just need the ability to view either the (pick an index)'th element from the pq, or to have every object in the pq have a pointer to the next object in the pq, like a linked list.
Is this thought about maintaining pointers with a pq like a linked list possible within java's PriorityQueue? Is it implemented for me somewhere in PriorityQueue that I dont know about? is the index thing implemented? is there another data structure I can use that can better serve my purpose? Is it realistic for me to find the source code from Java's PriorityQueue and rewrite it on my machine to maintain these pointers like a linked list?
Any ideas are very welcome, not really sure which path I want to take on this one.
One thing you could do is an augmented binary search tree. That would allow efficient access to the nth smallest element while still keeping the elements ordered. You could also use a threaded binary search tree. That would allow you to step from one element to the next larger one in constant time, which is faster than in a normal binary tree. Both of these data structures are slower than a heap, though.