Advantages of BTree+ over BTree [duplicate] - data-structures

This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
B- trees, B+ trees difference
What are the advantages/disadvantages of BTree+ over BTree? When should I prefer one over other? I'm also interested in knowing any real world examples where one has been preferred over other.

According to the Wikipedia article about BTree+, this kind of data structure is frequently used for indexing block-oriented storage. Apparently, BTree+ stored keys (and not values) are stored in the intermediate nodes. This would mean that you would need fewer intermediate node blocks and would increase the likelihood of a cache hit.
Real world examples include various file systems; see the linked article.

Related

Mahout Clustering with one dim K-means [duplicate]

This question already has answers here:
1D Number Array Clustering
(6 answers)
Closed 8 years ago.
Can I cluster data with one variable instead of many (What I had already test) using mahout K-means Algorithm ? if yes (I hope so :) )could you give me an Example of clustering and thinks
How big is your data? If it is not exabytes, you would be better off without Mahout.
If it is exabytes, use sampling, and then process it on a single machine.
See also:
Cluster one-dimensional data optimally?
1D Number Array Clustering
Which clustering algorithm is suitable for one-dimensional Lists without knowing k?
and many more.
Mahout is not your general go-to place for data anlysis. It only shines when you have Google scale data. Otherwise, the overhead is too large.

Optimal vector data structure? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
A data structure supporting O(1) random access and worst-case O(1) append?
I saw an answer a while ago on StackOverflow regarding a provably optimal vector ("array list") data structure, which, if I remember correctly, lazily copied elements onto a larger vector so that it wouldn't cause a huge pause every time the vector reallocated.
I remember it needed O(sqrt(n)) extra space for bookkeeping, and that the answer linked to a published paper, but that's about it... I'm having a really hard time searching for it (you can imagine that searches like optimal vector are getting me nowhere).
Where can I find the paper?
I think that the paper you are referring to is "Resizable Arrays in Optimal Time and Space" by Brodnik et al. Their data structure uses the lazy copying dynamic array you mentioned in your question as a building block to assemble this structure. There is this older question on Stack Overflow describing the lazy-copying data structure, which might be useful to get a better feel for how it works.
Hope this helps!

binary search tree use in real world programs? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Where is binary search used in practice?
What are the applications of binary trees?
I have done various exercises involving adding,deleting,ordering and so on.
However i am having a hard time visualizing the use of binary search tree in real world programs. I mean sure it is way faster than some of the other algorithms for searching. But is this its only use ?
Could you give me some examples of this algorithm use in real world software.
Whenever you use maps (or dictionaries) you are using binary search trees. This means that when you need to store arrays that look like
myArray["not_an_integer"] = 42;
you are probably using binary search trees.
In C++ for instance, you have the std::map and std::hash_map types. The first one is coded as a binary tree with O(log(n)) insertion and lookup, whereas the second one is coded as a hash map (with O(1) lookup time).
EDIT: I just found this answer. You should take a look at it.
Binary Space Partition is required in computer graphics. This uses Binary Search. More details are available at:
http://en.wikipedia.org/wiki/Binary_space_partitioning

Is there any summary which describes "real-life" applications of various data structures? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Practical uses of different data structures
Could please anyone point me out to a brief summary which describes real-life applications of various data structures? I am looking for ready-to-use summary not a reference to the Cormen's book :)
For example, almost every article says what a Binary tree is; but they doesn't provide with examples when they really should be used in real-life; the same for other data structures.
Thank you,
Data structures are so widely used that this summary will be actually enormous. The simplest cases are used almost every day -- hashmaps for easy searching of particular item. Linked lists -- for easy adding/removing elements(you can for example describe object's properties with linked lists and you can easily add or remove such properties). Priority queues -- used for many algorithms (Dijsktra's algorithm, Prim's algorithm for minimum spanning tree, Huffman's encoding). Trie for describing dictionary of words. Bloom filters for fast and cheap of memory search (your email's spam filter may use this). Data structures are all around us -- you really should study and understand them and then you can find application for them everywhere.

Skip Lists -- ever used them? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I'm wondering whether anyone here has ever used a skip list. It looks to have roughly the same advantages as a balanced binary tree but is simpler to implement. If you have, did you write your own, or use a pre-written library (and if so, what was its name)?
My understanding is that they're not so much a useful alternative to binary trees (e.g. red-black trees) as they are to B-trees for database use, so that you can keep the # of levels down to a feasible minimum and deal w/ base-K logs rather than base-2 logs for performance characteristics. The algorithms for probabilistic skip-lists are (IMHO) easier to get right than the corresponding B-tree algorithms. Plus there's some literature on lock-free skip lists. I looked at using them a few months ago but then abandoned the effort on discovering the HDF5 library.
literature on the subject:
Papers by Bill Pugh:
A skip list cookbook
Skip lists: A probabilistic alternative to balanced trees
Concurrent Maintenance of Skip Lists
non-academic papers/tutorials:
Eternally Confuzzled (has some discussion on several data structures)
"Skip Lists" by Thomas A. Anastasio
Actually, for one of my projects, I am implementing my own full STL. And I used a skiplist to implement my std::map. The reason I went with it is that it is a simple algorithm which is very close to the performance of a balanced tree but has much simpler iteration capabilities.
Also, Qt4's QMap was a skiplist as well which was the original inspiration for my using it in my std::map.
Years ago I implemented my own for a probabilistic algorithms class. I'm not aware of any library implementations, but it's been a long time. It is pretty simple to implement. As I recall they had some really nice properties for large data sets and avoided some of the problems of rebalancing. I think the implementation is also simpler than binary tries in general. There is a nice discussion and some sample c++ code here:
http://www.ddj.us/cpp/184403579?pgno=1
There's also an applet with a running demonstration. Cute 90's Java shininess here:
http://www.geocities.com/siliconvalley/network/1854/skiplist.html
Java 1.6 (Java SE 6) introduced ConcurrentSkipListSet and ConcurrentSkipListMap to the collections framework. So, I'd speculate that someone out there is really using them.
Skiplists tend to offer far less contention for locks in a multithreaded situation, and (probabilistically) have performance characteristics similar to trees.
See the original paper [pdf] by William Pugh.
I implemented a variant that I termed a Reverse Skip List for a rules engine a few years ago. Much the same, but the reference links run backward from the last element.
This is because it was faster for inserting sorted items that were most likely towards the back-end of the collection.
It was written in C# and took a few iterations to get working successfully.
The skip list has the same logarithmic time bounds for searching as is achieved by the binary search algorithm, yet it extends that performance to update methods when inserting or deleting entries. Nevertheless, the bounds are expected for the skip list, while binary search of a sorted table has a worst-case bound.
Skip Lists are easy to implement. But, adjusting the pointers on a skip list in case of insertion and deletion you have to be careful. Have not used this in a real program but, have doen some runtime profiling. Skip lists are different from search trees. The similarity is that, it gives average log(n) for a period of dictionary operations just like the splay tree. It is better than an unbalanced search tree but is not better than a balanced tree.
Every skip list node has forward pointers which represent the current->next() connections to the different levels of the skip list. Typically this level is bounded at a maximum of ln(N). So if N = 1million the level is 13. There will be that much pointers and in Java this means twie the number of pointers for implementing reference data types. where as a balanced search tree has less and it gives same runtime!!.
SkipList Vs Splay Tree Vs Hash As profiled for dictionary look up ops a lock stripped hashtable will give result in under 0.010 ms where as a splay tree gives ~ 1 ms and skip list ~720ms.

Resources