I am looking for a data structures that is like an AVL tree but with multiple keys.
I want a balanced tree to prevent skewing average lookup time. I have a situation where a node can have multiple keys so I want to specify one of the keys for lookup and wildcard the other.
Before, thinking of making a composite key based on the other keys, I want to check out if there are other ways to do it.
Any papers, suggestions?
Thanks in advance
A kd tree is useful when the number of dimensions is <= 5 (around). As dimensions increase, it gets less efficient.
Update: Mostly a k-d tree or R-Tree should do the job.
Cheers
Related
I understand the idea behind using Merkle tree to identify inconsistencies in data, as suggested by articles like
Key Concepts: Using Merkle trees to detect inconsistencies in data
Merkle Tree | Brilliant Math & Science Wiki
Essentially, we use a recursive algorithm to traverse down from root we want to verify, and follow the nodes where stored hash values are different from server (with trusted hash values), all the way to the inconsistent leaf/datablock.
If there's only one such block (leaf) that's corrupted, this means we following a single path down to leaf, which is log(n) queries.
However, in the case of multiple inconsistent data blocks/leaves, we need up to O(n) queries. In the extreme case, all data blocks are corrupted, and our algorithm will need to send every single node to server (authenticator). In the real world this becomes costly due to the network.
So my question is, is there any known improvement to the basic traverse-from-root algorithm? A possible improvement I could think of is to query the level of nodes in the middle. For example, in the tree below, we send the server the two nodes in the second level ('64' and '192'), and for any node that returns inconsistency, we recursively go to the middle level of that sub-tree - something like a binary search based on height.
This increases our best case time from O(1) to O(sqrt(n)), and probably reduces our worst case time to some extent (I have not calculated how much).
I wonder if there's any better approach than this? I've tried to search for relevant articles on Google Scholar, but looks like most of the algorithm-focused papers are concerned with the merkle-tree traversal problem, which is different from the problem above.
Thanks in advance!
Can anyone provide real examples of when is the best way to store your data is treap?
I want to understand in which situations treap will be better than heaps and tree structures.
If it's possible, please provide some examples from real situations.
I've tried to search cases of using treaps here and by googling, but did not find anything.
Thank you.
If hash values are used as priorities, treaps provide unique representation of the content.
Consider an order set of items implemented as an AVL-tree or rb-tree. Inserting items in different orders will typically end up in trees with different shapes (although all of them are balanced). For a given content a treap will always be of the same shape regardless of history.
I have seen two reasons for why unique representation could be useful:
Security reasons. A treap can not contain information on history.
Efficient sub tree sharing. The fastest algorithms for set operations I have seen use treaps.
I can not provide you any real-world examples. But I do use treaps to solve some problems in programming contests:
http://poj.org/problem?id=2761
http://poj.org/problem?id=3481
These are not actually real problems, but they make sense.
You can use it as a tree-based map implementation. Depending on the application, it could be faster. A couple of years ago I implemented a Treap and a Skip list myself (in Java) just for fun and did some basic benchmarking comparing them to TreeMap, and the Treap was the fastest. You can see the results here.
One of its greatest advantages is that it's very easy to implement, compared to Red-Black trees, for example. However, as far as I remember, it doesn't have a guaranteed cost in its operations (search is O(log n) with high probability), in comparison to Red-Black trees, which means that you wouldn't be able to use it in safety-critical applications where a specific time bound is a requirement.
Treaps are awesome variant of balanced binary search tree. There do exist many algorithms to balance binary trees, but most of them are horrible things with tons of special cases to handle. On the other hand , it is very easy to code Treaps.By making some use of randomness, we have a BBT that is expected to be of logarithmic height.
Some good problems to solve using treaps are --
http://www.spoj.com/problems/QMAX3VN/ ( Easy level )
http://www.spoj.com/problems/GSS6/ ( Moderate level )
Let's say you have a company and you want to create an inventory tool:
Be able to (efficiently) search products by name so you can update the stock.
Get, at any time, the product with the lowest items in stock, so that you are able to plan your next order.
One way to handle these requirements could be by using two different
data structures: one for efficient search by name, for instance, a
hash table, and a priority queue to get the item that most urgently
needs to be resupplied. You have to manage to coordinate those two
data structures and you will need more than twice memory. if we sort
the list of entries according to name, we need to scan the whole list
to find a given value for the other criterion, in this case, the
quantity in stock. Also, if we use a min-heap with the scarcer
products at its top, then we will need linear time to scan the whole
heap looking for a product to update.
Treap
Treap is the blend of tree and heap. The idea is to enforce BST’s
constraints on the names, and heap’s constraints on the quantities.
Product names are treated as the keys of a binary search tree.
The inventory quantities, instead, are treated as priorities of a
heap, so they define a partial ordering from top to bottom. For
priorities, like all heaps, we have a partial ordering, meaning that
only nodes on the same path from the root to leaves are ordered with
respect to their priority. In the above image, you can see that
children nodes always have a higher stock count than their parents,
but there is no ordering between siblings.
Reference
Any subtree in Treap is also a Treap (i.e. satisfies BST rule as well as min- or max- heap rule too). Due to this property, an ordered list can be easily split, or multiple ordered lists can be easily merged using Treaps than using an RB Tree. The implementation is easier. Design is also easier.
Which can be the beste data structures for the following case.
1.Should have operations like search, insert and delete. Mostly searching activities will be there.Around 90% of the operations will be search and rest are delete and insert.
2 Insertion,deletion and searching will be based on the key of the objects. Each key will point to a object. The keys will be sorted.
Any suggestion for optimal data structure will be highly appreciated.
AVL tree, or at least BST.
If you want to acces often the same elements you might want to consider splay trees too.
(Should I explain why?)
Not sure by what you mean with "data structures"
I would suggest MySQL.
Read more here: WikiPedia
Self-balancing tree of sorts (AVL, RB), or a hash table.
My guess is that you want to optimize time. Overall, a red-black tree will have logarithmic-time performance in all three operations. It will probably be your best overall bet on execution time; however, red-black trees are complex to implement and require a node structure meaning they will be stored using more memory than the contained data itself requires.
You want a tree-backed Map; basically you just want a tree where the nodes are dynamically sorted ("self-balanced") by key, with your objects hanging off of each node with corresponding key.
If you would like an "optimal" data structure, that completely depends on the distribution of patterns of inputs you expect. The nice thing about a self-balancing tree is you don't really need to care too much about the pattern of inputs. If you really want the best-guess as-close-to-optimal as possible we know of, and you don't know much about the specific sequences of queries, you can use a http://en.wikipedia.org/wiki/Tango_tree which is O(log(log(N))-competitive. This grows so slowly that, for all practical purposes, you have something which performs no worse than effectively a constant factor from the best possible data structure you could have chosen.
However it's somewhat grungy to implement, you may just be better using a library for a self-balancing tree.
Python:
https://github.com/pgrafov/python-avl-tree/
Java:
If you're just Java, just use a TreeMap (red-black tree based) and ignore the implementation details. Most languages have similar data structures in their standard libraries.
I'm reading Advanced Data Structures by Peter Brass.
In the beginning of the chapter on search trees, he stated that there is two models of search trees - one where nodes contain the actual object (the value if the tree is used as a dictionary), and an other where all objects are stored in leaves and internal nodes are only for comparisons.
What are the advantages of the second model over the first one?
One of the big advantages of a binary tree where data is only in the leaf nodes is that you can partition based on elements that are not in your dataset.
For example, if I have a possible dataset of 0-1 million, but the vast majority of items are either at the high end or low end but not in the middle, I may still want my first compare against 500,000 - even though that number is not in my data set. If every node had data, I could not do this. While not normally needed in theory, I've run into many times that partitioning based on a value outside my data simplified implementation.
B+ trees are an example of a case where all key/values are stored in leaf nodes. The primary advantage here is that since all items are in the leaf nodes, the leaf nodes can be linked together to form a linked list which allows rapid in-order traversal. If you access a particular element, you can always find the next element in the sequence without visiting any parents because the leaf nodes are linked together. Filesystems and database storage systems can take advantage of this structures for range searches and stuff.
Lets say you are building tree over some objects on some complex criteria. On example calculated from multiple properties. Sometimes you can't change this object to store calculated value and calculating this criteria is expansive. So you calculate this criteria only once, and store objects in leafs based on criteria result. Then when your tree is complete you can find required object much faster because you don't have to calculate criteria for each tree node in your path.
well storing information objects in the nodes, we talking in this case about a trie, is usefull for fast retrival of information(faster than storing stuff in an array/hashtable, where the worst case auf acces is O(n), in the trie this is O(m) [m is the lenght of n])
look here:
https://en.wikipedia.org/wiki/Trie
In a search tree this oerations can be much more complicated(look AVL Tree O(log n) ) and so can be slower and is more compley to implement.
What data structure to choose??
Well this depends on what u want to do
I need to implement self-sorted data structure with random access. Any ideas?
A self sorted data structure can be binary search trees. If you want a self sorted data structure and a self balanced one. AVL tree is the way to go. Retrieval time will be O(lgn) for random access.
Maintaining a sorted list and accessing it arbitrarily requires at least O(lgN) / operation. So, look for AVL, red-black trees, treaps or any other similar data structure and enrich them to support random indexing. I suggest treaps since they are the easiest to understand/implement.
One way to enrich the treap tree is to keep in each node the count of nodes in the subtree rooted at that node. You'll have to update the count when you modify the tree (eg: insertion/deletion).
I'm not too much involved lately with data structures implementation. Probably this answer is not an answer at all... you should see "Introduction to algorithms" written by Thomas Cormen. That book has many "recipes" with explanations about the inner workings of many data structures.
On the other hand you have to take into account how much time do you want to spend writing an algorithm, the size of the input and the if there is an actual necessity of an special kind of datastructure.
I see one thing missing from the answers here, the Skiplist
https://en.wikipedia.org/wiki/Skip_list
You get order automatically, there is a probabilistic element to search and creation.
Fits the question no worse than binary trees.
Self sorting is a little bit to ambigious. First of all
What kind of data structure?
There are a lot of different data structures out there, such as:
Linked list
Double linked list
Binary tree
Hash set / map
Stack
Heap
And many more and each of them behave differently than others and have their benefits of course.
Now, not all of them could or should be self-sorting, such as the Stack, it would be weird if that one were self-sorting.
However, the Linked List and the Binary Tree could be self sorting, and for this you could sort it in different ways and on different times.
For Linked Lists
I would preffere Insertion sort for this, you can read various good articles about this on both wikis and other places. I like the pasted link though. Look at it and try to understand the concept.
If you want to sort after it is inserted, i.e. on random times, well then you can just implement a sorting algororithm different than insertion sort maybe, bubblesort or maybe quicksort, I would avoid bubblesort though, it's a lot slower! But easier to gasp the mind around.
Random Access
Random is always something thats being discusses around so have a read about how to perform good randomization and you will be on your way, if you have a linked list and have a "getAt"-method, you could just randomize an index between 0 and n and get the item at that index.