Beneftis of Hybrid Data Structures on Efficiency - algorithm

I have this homework assignment in my Computer Science class that involves combining different data structures for apparent increased efficiency
TL;DR --- Scroll Down
""""Build a data structure which behaves like a linked list with a binary tree as an indexing structure. It should be able to be used as a linked list and inherited from to construct indexed queues and indexed stacks. You may assume that all things that will be put into this data structure are Comparable, so that the indexing tree will function as a binary search tree. You should build a class of iterators to facilitate interaction with this data structure. Insertion into the list can be done 'after' a location specified by a list iterator (which could sometimes be returned by a find method). Naturally, in an inherited indexed queue, insertion will only be at the back of the queue, however the indexing via the tree will need to preserve the binary search tree ordering, and similarly for an inherited indexed stack. You should have methods to insert and delete, and methods to find (returning an iterator) and sort (any sorting technique will suffice for this question, though you might well want to take advantage of the inherent ordering information derived from the tree!!). Test this structure using a main method which plays with people (perhaps compared via height?).""""
TL;DR --- What are the benefits of having Binary Search Tree nodes containing the same Objects as doubly linked list nodes?
Also, how would inheritance work with such a list?

What are the benefits of having Binary Search Tree nodes containing the same Objects as doubly linked list nodes?
Perhaps a better way of asking the same question would be "what are the benefits of connecting the nodes of a Binary Search Tree (BST) with additional links to construct a linked list out of the same nodes?"
The benefit of adding an extra link is an ability to iterate over the entire tree using O(1) memory. Without this additional link you would need O(Log(N)) memory to iterate the tree, because you would need to keep the position at each level.
The "payment" for this is the use of additional O(N) blocks of memory for the links, and a somewhat more complex algorithm for maintaining the data structure. This may be a fair deal when you iterate the same tree a significant number of times, while insertions and modifications are generally rare.
How would inheritance work with such a list?
Rather than inheriting from a list and also from a tree, you would implement interfaces for the list and for the tree.

Related

Iterating over classes in a disjoint set data structure

I've implemented a disjoint set data structure for my program and I realized I need to iterate over all equivalence classes.
Searching the web, I didn't find any useful information on the best way to implement that or how it influences complexity. I'm quite surprised since it seems like something that would be needed quite often.
Is there a standard way of doing this? I'm thinking about using a linked list (I use C so I plan to store some pointers in the top element of each equivalence class) and updating it on each union operation. Is there a better way?
You can store pointers to top elements in hash-based set or in any balanced binary search tree. You only need to delete and add elements - both operations in these structures run in O(1)* and in O(logN) respectively. In linked list they run in O(N).
Your proposal seems very reasonable. If you thread a doubly-linked list through the representatives, you can splice out an element from the representatives list in time O(1) and then walk the list each time you need to list representatives.
#ardenit has mentioned that you can also use an external hash table or BST to store the representatives. That's certainly simpler to code up, though I suspect it won't be as fast as just threading a linked list through the items.

Why convert BST to DLL?

conversion of Binary Search Tree to Doubly Linked List.
What is real world application or significance of this conversion?
Probably, the interviewer was trying to see if you understood what a BST and a linked list are, and also if you understood that if you do an inorder traversal of the BST, you'll get a list of the items in order. This exercise is a pretty good test of your understanding of BST traversal and linked list construction.
Building a linked list from a BST does have real-world use. For example, if you have a BST and you want to store it to disk or send it across the wire, it's easier to store or send as an ordered list of nodes. So you have to create that ordered list from the BST. In a language like C that doesn't have a dynamic list structure (like a C++ vector or a C# List<T> or a Java ArrayList), then the linked list is the tool of choice for building an arbitrarily-sized list.
Come to think of it, you could do this in-place. That is, you wouldn't need any extra memory except for the recursion stack. That would be very useful if you're working with a very large tree in a memory constrained environment such as an embedded system.
So, yes, there are real world uses for that technique.

Easy tree traversal and fast random node access

Edited after Alex Taggart's remark below.
I am using a zipper to easily traverse and edit a tree which can grow to many thousands of nodes. Each node is incomplete when it is first created. Data is going to be added/removed all the time in random positions, leaf nodes are going to be replaced by branches, etc.
The tree can be very unbalanced.
Fast random access to a node is also important.
An implementation would be to traverse the tree using a zipper and create a hash table of the nodes indexed by key. Needless to say the above would be very inefficient as:
2 copies of each node need to be created
any changes need to be consistently mirrored between the 2 data structures (tree and hashmap).
In short, is there a time/space efficient way to combine the easiness of traversing/updating with a zipper and the fast access of a hash table in clojure?
Clojure's data structures are persistent and use structural sharing. This means that operations like adding, removing or accumulating are not as inefficient as you describe. The memory cost will be minimal since you are not duplicating what's already there.
By default Clojure's data structures are immutable. The nodes in your tree like structure will thus not update themselves unless you use some sort of reference type (like a Var). I don't know enough about your specific use case to advice on the best way to access nodes. One way to access nodes in a nested structure is the get-in function where you supply the path to the node to return its value.
Hope this helps solving your problem.

When to choose RB tree, B-Tree or AVL tree?

As a programmer when should I consider using a RB tree, B- tree or an AVL tree?
What are the key points that needs to be considered before deciding on the choice?
Can someone please explain with a scenario for each tree structure why it is chosen over others with reference to the key points?
Take this with a pinch of salt:
B-tree when you're managing more than thousands of items and you're paging them from a disk or some slow storage medium.
RB tree when you're doing fairly frequent inserts, deletes and retrievals on the tree.
AVL tree when your inserts and deletes are infrequent relative to your retrievals.
I think B+ trees are a good general-purpose ordered container data structure, even in main memory. Even when virtual memory isn't an issue, cache-friendliness often is, and B+ trees are particularly good for sequential access - the same asymptotic performance as a linked list, but with cache-friendliness close to a simple array. All this and O(log n) search, insert and delete.
B+ trees do have problems, though - such as the items moving around within nodes when you do inserts/deletes, invalidating pointers to those items. I have a container library that does "cursor maintenance" - cursors attach themselves to the leaf node they currently reference in a linked list, so they can be fixed or invalidated automatically. Since there's rarely more than one or two cursors, it works well - but it's an extra bit of work all the same.
Another thing is that the B+ tree is essentially just that. I guess you can strip off or recreate the non-leaf nodes depending on whether you need them or not, but with binary tree nodes you get a lot more flexibility. A binary tree can be converted to a linked list and back without copying nodes - you just change the pointers then remember that you're treating it as a different data structure now. Among other things, this means you get fairly easy O(n) merging of trees - convert both trees to lists, merge them, then convert back to a tree.
Yet another thing is memory allocation and freeing. In a binary tree, this can be separated out from the algorithms - the user can create a node then call the insert algorithm, and deletes can extract nodes (detach them from the tree, but dont free the memory). In a B-tree or B+-tree, that obviously doesn't work - the data will live in a multi-item node. Writing insert methods that "plan" the operation without modifying nodes until they know how many new nodes are needed and that they can be allocated is a challenge.
Red black vs. AVL? I'm not sure it makes any big difference. My own library has a policy-based "tool" class to manipulate nodes, with methods for double-linked lists, simple binary trees, splay trees, red-black trees and treaps, including various conversions. Some of those methods were only implemented because I was bored at one time or another. I'm not sure I've even tested the treap methods. The reason I chose red-black trees rather than AVL is because I personally understand the algorithms better - which doesn't mean they're simpler, it's just a fluke of history that I'm more familiar with them.
One last thing - I only originally developed my B+ tree containers as an experiment. It's one of those experiments that never ended really, but it's not something I'd encourage others to repeat. If all you need is an ordered container, the best answer is to use the one that your existing library provides - e.g. std::map etc in C++. My library evolved over years, it took quite a while to get it stable, and I just relatively recently discovered it's technically non-portable (dependent on a bit of undefined behaviour WRT offsetof).
In memory B-Tree has the advantage when the number of items is more than 32000... Look at speedtest.pdf from stx-btree.
When choosing data structures you are trading off factors such as
speed of retrieval v speed of update
how well the structure copes with worst case operations, for example insertion of records that arrive in a sorted order
space wasted
I would start by reading the Wikipedia articles referenced by Robert Harvey.
Pragmatically, when working in languages such as Java the average programmer tends to use the collection classes provided. If in a performance tuning activity one discovers that the collection performance is problematic then one can seek alternative implementations. It's rarely the first thing a business-led development has to consider. It's extremely rare that one needs to implement such data structures by hand, there are usually libraries that can be used.

Self-sorted data structure with random access

I need to implement self-sorted data structure with random access. Any ideas?
A self sorted data structure can be binary search trees. If you want a self sorted data structure and a self balanced one. AVL tree is the way to go. Retrieval time will be O(lgn) for random access.
Maintaining a sorted list and accessing it arbitrarily requires at least O(lgN) / operation. So, look for AVL, red-black trees, treaps or any other similar data structure and enrich them to support random indexing. I suggest treaps since they are the easiest to understand/implement.
One way to enrich the treap tree is to keep in each node the count of nodes in the subtree rooted at that node. You'll have to update the count when you modify the tree (eg: insertion/deletion).
I'm not too much involved lately with data structures implementation. Probably this answer is not an answer at all... you should see "Introduction to algorithms" written by Thomas Cormen. That book has many "recipes" with explanations about the inner workings of many data structures.
On the other hand you have to take into account how much time do you want to spend writing an algorithm, the size of the input and the if there is an actual necessity of an special kind of datastructure.
I see one thing missing from the answers here, the Skiplist
https://en.wikipedia.org/wiki/Skip_list
You get order automatically, there is a probabilistic element to search and creation.
Fits the question no worse than binary trees.
Self sorting is a little bit to ambigious. First of all
What kind of data structure?
There are a lot of different data structures out there, such as:
Linked list
Double linked list
Binary tree
Hash set / map
Stack
Heap
And many more and each of them behave differently than others and have their benefits of course.
Now, not all of them could or should be self-sorting, such as the Stack, it would be weird if that one were self-sorting.
However, the Linked List and the Binary Tree could be self sorting, and for this you could sort it in different ways and on different times.
For Linked Lists
I would preffere Insertion sort for this, you can read various good articles about this on both wikis and other places. I like the pasted link though. Look at it and try to understand the concept.
If you want to sort after it is inserted, i.e. on random times, well then you can just implement a sorting algororithm different than insertion sort maybe, bubblesort or maybe quicksort, I would avoid bubblesort though, it's a lot slower! But easier to gasp the mind around.
Random Access
Random is always something thats being discusses around so have a read about how to perform good randomization and you will be on your way, if you have a linked list and have a "getAt"-method, you could just randomize an index between 0 and n and get the item at that index.

Resources