Useful data structure for the following case - algorithm

Which can be the beste data structures for the following case.
1.Should have operations like search, insert and delete. Mostly searching activities will be there.Around 90% of the operations will be search and rest are delete and insert.
2 Insertion,deletion and searching will be based on the key of the objects. Each key will point to a object. The keys will be sorted.
Any suggestion for optimal data structure will be highly appreciated.

AVL tree, or at least BST.
If you want to acces often the same elements you might want to consider splay trees too.
(Should I explain why?)

Not sure by what you mean with "data structures"
I would suggest MySQL.
Read more here: WikiPedia

Self-balancing tree of sorts (AVL, RB), or a hash table.

My guess is that you want to optimize time. Overall, a red-black tree will have logarithmic-time performance in all three operations. It will probably be your best overall bet on execution time; however, red-black trees are complex to implement and require a node structure meaning they will be stored using more memory than the contained data itself requires.

You want a tree-backed Map; basically you just want a tree where the nodes are dynamically sorted ("self-balanced") by key, with your objects hanging off of each node with corresponding key.
If you would like an "optimal" data structure, that completely depends on the distribution of patterns of inputs you expect. The nice thing about a self-balancing tree is you don't really need to care too much about the pattern of inputs. If you really want the best-guess as-close-to-optimal as possible we know of, and you don't know much about the specific sequences of queries, you can use a http://en.wikipedia.org/wiki/Tango_tree which is O(log(log(N))-competitive. This grows so slowly that, for all practical purposes, you have something which performs no worse than effectively a constant factor from the best possible data structure you could have chosen.
However it's somewhat grungy to implement, you may just be better using a library for a self-balancing tree.
Python:
https://github.com/pgrafov/python-avl-tree/
Java:
If you're just Java, just use a TreeMap (red-black tree based) and ignore the implementation details. Most languages have similar data structures in their standard libraries.

Related

Merkle tree for finding data inconsistencies - optimizing number of queries

I understand the idea behind using Merkle tree to identify inconsistencies in data, as suggested by articles like
Key Concepts: Using Merkle trees to detect inconsistencies in data
Merkle Tree | Brilliant Math & Science Wiki
Essentially, we use a recursive algorithm to traverse down from root we want to verify, and follow the nodes where stored hash values are different from server (with trusted hash values), all the way to the inconsistent leaf/datablock.
If there's only one such block (leaf) that's corrupted, this means we following a single path down to leaf, which is log(n) queries.
However, in the case of multiple inconsistent data blocks/leaves, we need up to O(n) queries. In the extreme case, all data blocks are corrupted, and our algorithm will need to send every single node to server (authenticator). In the real world this becomes costly due to the network.
So my question is, is there any known improvement to the basic traverse-from-root algorithm? A possible improvement I could think of is to query the level of nodes in the middle. For example, in the tree below, we send the server the two nodes in the second level ('64' and '192'), and for any node that returns inconsistency, we recursively go to the middle level of that sub-tree - something like a binary search based on height.
This increases our best case time from O(1) to O(sqrt(n)), and probably reduces our worst case time to some extent (I have not calculated how much).
I wonder if there's any better approach than this? I've tried to search for relevant articles on Google Scholar, but looks like most of the algorithm-focused papers are concerned with the merkle-tree traversal problem, which is different from the problem above.
Thanks in advance!

When to use a treap

Can anyone provide real examples of when is the best way to store your data is treap?
I want to understand in which situations treap will be better than heaps and tree structures.
If it's possible, please provide some examples from real situations.
I've tried to search cases of using treaps here and by googling, but did not find anything.
Thank you.
If hash values are used as priorities, treaps provide unique representation of the content.
Consider an order set of items implemented as an AVL-tree or rb-tree. Inserting items in different orders will typically end up in trees with different shapes (although all of them are balanced). For a given content a treap will always be of the same shape regardless of history.
I have seen two reasons for why unique representation could be useful:
Security reasons. A treap can not contain information on history.
Efficient sub tree sharing. The fastest algorithms for set operations I have seen use treaps.
I can not provide you any real-world examples. But I do use treaps to solve some problems in programming contests:
http://poj.org/problem?id=2761
http://poj.org/problem?id=3481
These are not actually real problems, but they make sense.
You can use it as a tree-based map implementation. Depending on the application, it could be faster. A couple of years ago I implemented a Treap and a Skip list myself (in Java) just for fun and did some basic benchmarking comparing them to TreeMap, and the Treap was the fastest. You can see the results here.
One of its greatest advantages is that it's very easy to implement, compared to Red-Black trees, for example. However, as far as I remember, it doesn't have a guaranteed cost in its operations (search is O(log n) with high probability), in comparison to Red-Black trees, which means that you wouldn't be able to use it in safety-critical applications where a specific time bound is a requirement.
Treaps are awesome variant of balanced binary search tree. There do exist many algorithms to balance binary trees, but most of them are horrible things with tons of special cases to handle. On the other hand , it is very easy to code Treaps.By making some use of randomness, we have a BBT that is expected to be of logarithmic height.
Some good problems to solve using treaps are --
http://www.spoj.com/problems/QMAX3VN/ ( Easy level )
http://www.spoj.com/problems/GSS6/ ( Moderate level )
Let's say you have a company and you want to create an inventory tool:
Be able to (efficiently) search products by name so you can update the stock.
Get, at any time, the product with the lowest items in stock, so that you are able to plan your next order.
One way to handle these requirements could be by using two different
data structures: one for efficient search by name, for instance, a
hash table, and a priority queue to get the item that most urgently
needs to be resupplied. You have to manage to coordinate those two
data structures and you will need more than twice memory. if we sort
the list of entries according to name, we need to scan the whole list
to find a given value for the other criterion, in this case, the
quantity in stock. Also, if we use a min-heap with the scarcer
products at its top, then we will need linear time to scan the whole
heap looking for a product to update.
Treap
Treap is the blend of tree and heap. The idea is to enforce BST’s
constraints on the names, and heap’s constraints on the quantities.
Product names are treated as the keys of a binary search tree.
The inventory quantities, instead, are treated as priorities of a
heap, so they define a partial ordering from top to bottom. For
priorities, like all heaps, we have a partial ordering, meaning that
only nodes on the same path from the root to leaves are ordered with
respect to their priority. In the above image, you can see that
children nodes always have a higher stock count than their parents,
but there is no ordering between siblings.
Reference
Any subtree in Treap is also a Treap (i.e. satisfies BST rule as well as min- or max- heap rule too). Due to this property, an ordered list can be easily split, or multiple ordered lists can be easily merged using Treaps than using an RB Tree. The implementation is easier. Design is also easier.

real world examples for binary tree structure

Can any one tell me some real world examples of binary tree structure ?
Binary trees are used all over the place in the real world. Pretty much every major implementation of a sorted data-structure uses one (usually a balanced variant, like red-black).
In C++, map and set are built on it.
Represent a uni-dimensional space.
I don't know world examples other than this one.
But it's large used for logical purposes and indexers.
Databases indexes. When you index a field, it is put in a binary tree for fast retrieval.
General Searching/sorting. A binary search tree will let you sort and search for data quickly
I believe Huffman Algorithm also use Binary tree!!!

When to choose RB tree, B-Tree or AVL tree?

As a programmer when should I consider using a RB tree, B- tree or an AVL tree?
What are the key points that needs to be considered before deciding on the choice?
Can someone please explain with a scenario for each tree structure why it is chosen over others with reference to the key points?
Take this with a pinch of salt:
B-tree when you're managing more than thousands of items and you're paging them from a disk or some slow storage medium.
RB tree when you're doing fairly frequent inserts, deletes and retrievals on the tree.
AVL tree when your inserts and deletes are infrequent relative to your retrievals.
I think B+ trees are a good general-purpose ordered container data structure, even in main memory. Even when virtual memory isn't an issue, cache-friendliness often is, and B+ trees are particularly good for sequential access - the same asymptotic performance as a linked list, but with cache-friendliness close to a simple array. All this and O(log n) search, insert and delete.
B+ trees do have problems, though - such as the items moving around within nodes when you do inserts/deletes, invalidating pointers to those items. I have a container library that does "cursor maintenance" - cursors attach themselves to the leaf node they currently reference in a linked list, so they can be fixed or invalidated automatically. Since there's rarely more than one or two cursors, it works well - but it's an extra bit of work all the same.
Another thing is that the B+ tree is essentially just that. I guess you can strip off or recreate the non-leaf nodes depending on whether you need them or not, but with binary tree nodes you get a lot more flexibility. A binary tree can be converted to a linked list and back without copying nodes - you just change the pointers then remember that you're treating it as a different data structure now. Among other things, this means you get fairly easy O(n) merging of trees - convert both trees to lists, merge them, then convert back to a tree.
Yet another thing is memory allocation and freeing. In a binary tree, this can be separated out from the algorithms - the user can create a node then call the insert algorithm, and deletes can extract nodes (detach them from the tree, but dont free the memory). In a B-tree or B+-tree, that obviously doesn't work - the data will live in a multi-item node. Writing insert methods that "plan" the operation without modifying nodes until they know how many new nodes are needed and that they can be allocated is a challenge.
Red black vs. AVL? I'm not sure it makes any big difference. My own library has a policy-based "tool" class to manipulate nodes, with methods for double-linked lists, simple binary trees, splay trees, red-black trees and treaps, including various conversions. Some of those methods were only implemented because I was bored at one time or another. I'm not sure I've even tested the treap methods. The reason I chose red-black trees rather than AVL is because I personally understand the algorithms better - which doesn't mean they're simpler, it's just a fluke of history that I'm more familiar with them.
One last thing - I only originally developed my B+ tree containers as an experiment. It's one of those experiments that never ended really, but it's not something I'd encourage others to repeat. If all you need is an ordered container, the best answer is to use the one that your existing library provides - e.g. std::map etc in C++. My library evolved over years, it took quite a while to get it stable, and I just relatively recently discovered it's technically non-portable (dependent on a bit of undefined behaviour WRT offsetof).
In memory B-Tree has the advantage when the number of items is more than 32000... Look at speedtest.pdf from stx-btree.
When choosing data structures you are trading off factors such as
speed of retrieval v speed of update
how well the structure copes with worst case operations, for example insertion of records that arrive in a sorted order
space wasted
I would start by reading the Wikipedia articles referenced by Robert Harvey.
Pragmatically, when working in languages such as Java the average programmer tends to use the collection classes provided. If in a performance tuning activity one discovers that the collection performance is problematic then one can seek alternative implementations. It's rarely the first thing a business-led development has to consider. It's extremely rare that one needs to implement such data structures by hand, there are usually libraries that can be used.

Self-sorted data structure with random access

I need to implement self-sorted data structure with random access. Any ideas?
A self sorted data structure can be binary search trees. If you want a self sorted data structure and a self balanced one. AVL tree is the way to go. Retrieval time will be O(lgn) for random access.
Maintaining a sorted list and accessing it arbitrarily requires at least O(lgN) / operation. So, look for AVL, red-black trees, treaps or any other similar data structure and enrich them to support random indexing. I suggest treaps since they are the easiest to understand/implement.
One way to enrich the treap tree is to keep in each node the count of nodes in the subtree rooted at that node. You'll have to update the count when you modify the tree (eg: insertion/deletion).
I'm not too much involved lately with data structures implementation. Probably this answer is not an answer at all... you should see "Introduction to algorithms" written by Thomas Cormen. That book has many "recipes" with explanations about the inner workings of many data structures.
On the other hand you have to take into account how much time do you want to spend writing an algorithm, the size of the input and the if there is an actual necessity of an special kind of datastructure.
I see one thing missing from the answers here, the Skiplist
https://en.wikipedia.org/wiki/Skip_list
You get order automatically, there is a probabilistic element to search and creation.
Fits the question no worse than binary trees.
Self sorting is a little bit to ambigious. First of all
What kind of data structure?
There are a lot of different data structures out there, such as:
Linked list
Double linked list
Binary tree
Hash set / map
Stack
Heap
And many more and each of them behave differently than others and have their benefits of course.
Now, not all of them could or should be self-sorting, such as the Stack, it would be weird if that one were self-sorting.
However, the Linked List and the Binary Tree could be self sorting, and for this you could sort it in different ways and on different times.
For Linked Lists
I would preffere Insertion sort for this, you can read various good articles about this on both wikis and other places. I like the pasted link though. Look at it and try to understand the concept.
If you want to sort after it is inserted, i.e. on random times, well then you can just implement a sorting algororithm different than insertion sort maybe, bubblesort or maybe quicksort, I would avoid bubblesort though, it's a lot slower! But easier to gasp the mind around.
Random Access
Random is always something thats being discusses around so have a read about how to perform good randomization and you will be on your way, if you have a linked list and have a "getAt"-method, you could just randomize an index between 0 and n and get the item at that index.

Resources