Iterable O(1) insert and random delete collection - data-structures

I am looking to implement my own collection class. The characteristics I want are:
Iterable - order is not important
Insertion - either at end or at iterator location, it does not matter
Random Deletion - this is the tricky one. I want to be able to have a reference to a piece of data which is guaranteed to be within the list, and remove it from the list in O(1) time.
I plan on the container only holding custom classes, so I was thinking a doubly linked list that required the components to implement a simple interface (or abstract class).
Here is where I am getting stuck. I am wondering whether it would be better practice to simply have the items in the list hold a reference to their node, or to build the node right into them. I feel like both would be fairly simple, but I am worried about coupling these nodes into a bunch of classes.
I am wondering if anyone has an idea as to how to minimize the coupling, or possibly know of another data structure that has the characteristics I want.

It'd be hard to beat a hash map.

Take a look at tries.
Apparently they can beat hashtables:
Unlike most other algorithms, tries have the peculiar feature that the time to insert, or to delete or to find is almost identical because the code paths followed for each are almost identical. As a result, for situations where code is inserting, deleting and finding in equal measure tries can handily beat binary search trees or even hash tables, as well as being better for the CPU's instruction and branch caches.
It may or may not fit your usage, but if it does, it's likely one of the best options possible.

In C++, this sounds like the perfect fit for std::unordered_set (that's std::tr1::unordered_set or boost::unordered_set to you if you have an older compiler). It's implemented as a hash set, which has the characteristics you describe.
Here's the interface documentation. Note that the hash containers actually offer two sets of iterators, the usual ones and local ones which only go through one bucket.
Many other languages have "hash sets" as well, certainly Java and C#.

Related

How to deal with duplicates in red-black trees?

So I've been(so far unsuccessfully) trying to make my red-black tree implementation work consistently with duplicates, but it seems to always be missing that small something, so here I am.
I tried make the tree lean to one side, but It didn't seem to balance it properly(from the color perspective).I'd like to ask how should one go about adding duplicates to a red-black tree?(apart obviously making the node fat, holding or pointing to duplicate key values).
Not really looking for a code review, more interested in suggestions. So basically the methods(taken from Introduction to Algorithms, Third Edition) I use for insert and balancing are these(while rotations are pretty obvious):
If you look at the pseudo-code you wrote here, it is completely agnostic to the question of whether keys are duplicate or not. The code here only looks at the result of comparing keys, and doesn't care if they are identical or not. In fact, unique-key implementations need to go out of their way to make RB-Insert detect duplicate keys. The data structure doesn't care about this naturally, and the algorithms and proofs hold whether there are duplicate keys or not. If you implemented these functions correctly, it should work as is.
I also disagree with the comments advising you to hold what you call "fat nodes". Holding multiple keys is the common implementation of C++'s std::multimap, for example. Not that from a computational complexity point of view, say that you have altogether n keys, but each k are a multiple. Using the "efficient" fat node version, the complexity of the basic find operation will be Θ(log(n / k)) = Θ(log(n) - log(k)); using the multiple key version, the complexity will be Θ(log(n)). In real life cases, probably k << n, which means that the relative difference is negligible.

A data structure with certain properties

I want to implement a data structure myself in C++11. What I'm planning to do is having a data structure with the following properties:
search. O(log(n))
insert. O(log(n))
delete. O(log(n))
iterate. O(n)
What I have been thinking about after research was implementing a balanced binary search tree. Are there other structures that would fulfill my needs? I am completely new to this topic and thought a question here would give me a good jumpstart.
First of all, using the existing standard library data types is definitely the way to go for production code. But since you are asking how to implement such data structures yourself, I assume this is mainly an educational exercise for you.
Binary search trees of some form (https://en.wikipedia.org/wiki/Self-balancing_binary_search_tree#Implementations) or B-trees (https://en.wikipedia.org/wiki/B-tree) and hash tables (https://en.wikipedia.org/wiki/Hash_table) are definitely the data structures that are usually used to accomplish efficient insertion and lookup. If you want to go wild you can combine the two by using a tree instead of a linked list to handle hash collisions (although this has a good potential to actually make your implementation slower if you don't make massive mistakes in sizing your hash table or in choosing an adequate hash function).
Since I'm assuming you want to learn something, you might want to have a look at minimal perfect hashing in the context of hash tables (https://en.wikipedia.org/wiki/Perfect_hash_function) although this only has uses in special applications (I had the opportunity to use a perfect minimal hash function exactly once). But it sure is fascinating. As you can see from the link above, the botany of search trees is virtually limitless in scope so you can also go wild on that front.

What are appropriate applications for a linked (doubly as well) list?

I have a question about fundamentals in data structures.
I understand that array's access time is faster than a linked list. O(1)- array vs O(N) -linked list
But a linked list beats an array in removing an element since there is no shifting needing O(N)- array vs O(1) -linked list
So my understanding is that if the majority of operations on the data is delete then using a linked list is preferable.
But if the use case is:
delete elements but not too frequently
access ALL elements
Is there a clear winner? In a general case I understand that the downside of using the list is that I access each node which could be on a separate page while an array has better locality.
But is this a theoretical or an actual concern that I should have?
And is the mixed-type i.e. create a linked list from an array (using extra fields) good idea?
Also does my question depend on the language? I assume that shifting elements in array has the same cost in all languages (at least asymptotically)
Singly-linked lists are very useful and can be better performance-wise relative to arrays if you are doing a lot of insertions/deletions, as opposed to pure referencing.
I haven't seen a good use for doubly-linked lists for decades.
I suppose there are some.
In terms of performance, never make decisions without understanding relative performance of your particular situation.
It's fairly common to see people asking about things that, comparatively speaking, are like getting a haircut to lose weight.
Before writing an app, I first ask if it should be compute-bound or IO-bound.
If IO-bound I try to make sure it actually is, by avoiding inefficiencies in IO, and keeping the processing straightforward.
If it should be compute-bound then I look at what its inner loop is likely to be, and try to make that swift.
Regardless, no matter how much I try, there will be (sometimes big) opportunities to make it go faster, and to find them I use this technique.
Whatever you do, don't just try to think it out or go back to your class notes.
Your problem is different from anyone else's, and so is the solution.
The problem with a list is not just the fragmentation, but mostly the data dependency. If you access every Nth element in array you don't have locality, but the accesses may still go to memory in parallel since you know the address. In a list it depends on the data being retrieved, and therefore traversing a list effectively serializes your memory accesses, causing it to be much slower in practice. This of course is orthogonal to asymptotic complexities, and would harm you regardless of the size.

What are some uses for linked lists?

Do linked lists have any practical uses at all. Many computer science books compare them to arrays and say the main advantage is that they are mutable. However, most languages provide mutable versions of arrays. So do linked lists have any actual uses in the real world, or are they just part of computer science theory?
They're absolutely precious (in both the popular doubly-linked version and the less-popular, but simpler and faster when applicable!, single-linked version). For example, inserting (or removing) a new item in a specified "random" spot in a "mutable version of an array" (e.g. a std::vector in C++) is O(N) where N is the number of items in the array, because all that follow (on average half of them) must be shifted over, and that's an O(N) operation; in a list, it's O(1), i.e., constant-time, if you already have e.g. the pointer to the "previous" item. Big-O differences like this are absolutely huge -- the difference between a real-world usable and scalable program, and a toy, "homework"-level one!-)
Linked lists have many uses. For example, implementing data structures that appear to the end user to be mutable arrays.
If you are using a programming language that provides implementations of various collections, many of those collections will be implemented using linked lists. When programming in those languages, you won't often be implementing a linked list yourself but it might be wise to understand them so you can understand what tradeoffs the libraries you use are making. In other words, the set "just part of computer science theory" contains elements that you just need to know if you are going to write programs that just work.
The main Applications of Linked Lists are
For representing Polynomials
It means in addition/subtraction /multipication.. of two polynomials.
Eg:p1=2x^2+3x+7 and p2=3x^3+5x+2
p1+p2=3x^3+2x^2+8x+9
In Dynamic Memory Management
In allocation and releasing memory at runtime.
*In Symbol Tables
in Balancing paranthesis
Representing Sparse Matrix
Ref:-
http://www.cs.ucf.edu/courses/cop3502h.02/linklist3.pdf
So do linked lists have any actual uses in the real world,
A Use/Example of Linked List (Doubly) can be Lift in the Building.
- A person have to go through all the floor to reach top (tail in terms of linked list).
- A person can never go to some random floor directly (have to go through intermediate floors/Nodes).
- A person can never go beyond the top floor (next to the tail node is assigned null).
- A person can never go beyond the ground/last floor (previous to the head node is assigned null in linked list).
Yes of course it's useful for many reasons.
Anytime for example that you want efficient insertion and deletion from the list. To find a place of insertion you have an O(N) search, but to do an insertion if you already have the correct position it is O(1).
Also the concepts you learn from working with linked lists help you learn how to make tree based data structures and many other data structures.
A primary advantage to a linked list as opposed to a vector is that random-insertion time is as simple as decoupling a pair of pointers and recoupling them to the new object (this is of course, slightly more work for a doubly-linked list). A vector, on the other hand generally reorganizes memory space on insertions, causing it to be significantly slower. A list is not as efficient, however, at doing things like adding on the end of the container, due to the necessity to progress all the way through the list.
An Immutable Linked List is the most trivial example of a Persistent Data Structure, which is why it is the standard (and sometimes even only) data structure in many functional languages. Lisp, Scheme, ML, Haskell, Scala, you name it.
Linked Lists are very useful in dynamic memory allocation. These lists are used in operating systems. insertion and deletion in linked lists are very useful. Complex data structures like tree and graphs are implemented using linked lists.
Arrays that grow as needed are always just an illusion, because of the way computer memory works. Under the hood, it's just a continous block of memory that has to be reallocated when enough new elements have been added. Likewise if you remove elements from the array, you'll have to allocate a new block of memory, copy the array and release the previous block to reclaim the unused memory. A linked list allows you to grow and shrink a list of elements without having to reallocate the rest of the list.
Linked lists are useful because elements can be efficiently spliced and removed in the middle as others noted. However a downside to linked lists are poor locality of reference. I prefer not using lists for this reason unless I have an explicit need for the capabilities.

Is there any practical usage of Doubly Linked List, Queues and Stacks?

I've been coding for quite sometime now. And my work pertains to solving real-world business scenarios. However, I have not really come across any practical usage of some of the data structures like the Linked List, Queues and Stacks etc.
Not even at the business framework level. Of course, there is the ubiquitous HashTable, ArrayList and of late the List...but is there any practical usage of some of the other basic data structures?
It would be great if someone gave a real-world solution where a Doubly Linked List "performs" better than the obvious easily usable counterpart.
Of course it’s possible to get by with only a Map (aka HashTable) and a List. A Queue is only a glorified List but if you use a Queue everywhere you really need a queue then your code gets a lot more readable because nobody has to guess what you are using that List for.
And then there are algorithms that work a lot better when the underlying data structure is not a plain List but a DoublyLinkedList due to the way they have to navigate the list. The same is valid for all other data structures: there’s always a use for them. :)
Stacks can be used for pairing (parseing) such as matching open brackets to closing brackets.
Queues can be used for messaging, or activity processing.
Linked list, or double linked lists can be used for circular navigation.
Most of these algorithms are usually at a lower level than your usual "business" application. For example indices on the database is a variation of a multiply linked list. Implementation of function calling mechanism(or a parse tree) is a stack. Queues and FIFOs are used for servicing network request etc.
These are just examples of collection structures that are optimized for speed in various scenarios.
LIFO-Stack and FIFO-Queue are reasonably abstract (behavioral spec-level) data structures, so of course there are plenty of practical uses for them. For example, LIFO-Stack is a great way to help remove recursion (stack up the current state and loop, instead of making a recursive call); FIFO-Queue helps "buffer up" and "peel away" work nuggets in a coroutine arrangement; etc, etc.
Doubly-linked-List is more of an implementation issue than a behavioral spec-level one, mostly... can be a good way to implement a FIFO-Queue, for example. If you need a sequence with fast splicing and removal give a pointer to one sequence iten, you'll find plenty of other real-world uses, too.
I use queues, linked lists etc. in business solutions all the time.
Except they are implemented by Oracle, IBM, JMS etc.
These constructs are generally at a much lower level of abstaction than you would want while implementing a business solution. Where a business problem would benifit from
such low level constructs (e.g. delivery route planning, production line scheduling etc.) there is usually a package available to do it or you.
I don't use them very often, but they do come up. For example, I'm using a queue in a current project to process asynchronous character equipment changes that must happen in the order the user makes them.
A linked list is useful if you have a subset of "selected" items out of a larger set of items, where you must perform one type of operation on a "selected" item and a default operation or no operation at all on a normal item and the set of "selected" items can change at will (possibly due to user input). Because linked list removal can be done nearly instantaneously (vs. the traversal time it would take for an array search), if the subsets are large enough then it's faster to maintain a linked list than to either maintain an array or regenerate the whole subset by scanning through the whole larger set every time you need the subset.
With a hash table or binary tree, you could search for a single "selected" item, but you couldn't search for all "selected" items without checking every item (or having a separate dictionary for every permutation of selected items, which is obviously impractical).
A queue can be useful if you are in a scenario where you have a lot of requests coming in and you want to make sure to handle them fairly, in order.
I use stacks whenever I have a recursive algorithm, which usually means it's operating on some hierarchical data structure, and I want to print an error message if I run out of memory instead of simply letting the software crash if the program stack runs out of space. Instead of calling the function recursively, I store its local variables in an object, run a loop, and maintain a stack of those objects.

Resources