How did heap come up? - data-structures

I know heap data structure very well and used it to various applications before. But just wonder how the basic idea comes from? How a person would figure out a non-trivial structure with some specific properties? Can we invent some new structures through the similar ideas in our own cases?

The first published use of the heap structure appears to be by Robert J. Floyd in his Algorithm 113, Treesort, in 1962: https://dl.acm.org/citation.cfm?doid=368637.368654. His heap was a traditional tree, implemented with node pointers.
J.W.J. Williams published an algorithm, Heapsort (Algorithm 232), in 1964. It converted Floyd's node-based heap into the array-based binary heap that we all know and love. That was in the June 1964 issue of Communications of the ACM. I wasn't able to find a link to the reference on the ACM site.
Neither really says how he came to discover the structure.

Related

Efficiently finding all points dominated by a point in 2D

In the OCW Advanced Data Structures course, Prof. E. Demaine mentions a Data Structure that is able to find all the points dominated by a query point (b2, b3) using O(n) space and O(k) time, provided that a search for point b3 has already been completed, where k is the size of the output.
The solution works by transforming the above problem into a ray stabbing problem, and using a technique similar to fractional cascading, as shown in the following image from the lecture notes:
While the concept itself is intuitive, implementing the actual data structure is not straightforward at all.
Chazelle describes this in a paper as Filtering Search (pp712).
I would like to find additional literature or answers that describe and explain of this data structure and algorithm (perhaps with pseudo code and more images, with focus on implementation).
Additionally, I would also like to know more about whether this structure can be implemented in a way that is not "static". That is, I would like to be able to insert and delete points from the structure as efficiently as possible.
The book "Computational Geometry: Algorithms and Applications" covers data structures for questions like these. Each chapter has a nice section describing where to learn more, including more complex structures for answering the same problems that are not covered in the book. There are enough diagrams, but not much pseudocode.
Many structures like this can be dynamized using techniques discussed in the book "The design of dynamic data structures". Jeff Erickson has some nice notes on the topic. Using fractional cascading with it is discussed is Cache-Oblivious Streaming B-trees" - see the section about "cache-oblivious lookahead arrays.

Which type of Tree Data Structure is suitable for efficient frequent pattern mining?

I am currently working on frequent pattern mining(FPM). I was googling about the data structures which can be used for FPM. My main concern is space-compactness of the data structures as am planning to use distributed algorithm over it (handling synchronization over a DS that fits in my main memory). The list of data structures i have come across are,
Prefix-Tree
Compact Prefix-Tree or Radix Tree
Prefix Hash Tree (PHT)
Burst Tree (currently reading how it works)
I dunno the order in which each data structure evolved. Can anyone tell me which DS (not limited to the DS mentioned above) is the best Data Structure that fits my requirements ?
P.S: currently am considering burst tree is the best known space-efficient data structure for FPM.
I agree that the question is broad. However, if you're looking for a space-efficient prefix tree, then I would strongly recommend a Burst Trie. I wrote an implementation and was able to squeeze a lot of space efficiency out of it for Stripe's latest Capture the Flag. (They had a problem which used 4 nodes at less than 500mb each that "required" a suffix tree.)
If you're looking for an implementation of an efficient burst trie then check mine out.
https://github.com/nbauernfeind/scala-burst-trie

Detail of all MPI Algorithm?

Is there any document about how MPI functions such as MPI_Algather, MPI_AlltoAll, MPI_Allreduce etc.. are implemented ?
I would like to learn about their algorithm and compute the complexity of them in term of uni-directional or bi-directional bandwidth and total data transfer size for a number of nodes and fixed data size.
I think the exact implementation of those algoritms varies, depending on the communication mechanism: in example a network will have tree-based reduction algorithms, while shared memory models will have different ones.
I'm not exactly sure about where to find answers to such questions, but I think that a good search for papers in google scholar or having a look at this paper list at open-mpi.org should be useful.
http://www.amazon.com/Parallel-Programming-MPI-Peter-Pacheco/dp/1558603395/ref=sr_1_10?s=books&ie=UTF8&qid=1314807638&sr=1-10
shown above is great link that explains all the basic MPI algorithms and allows you to implement a simple version yourself. However, when doing comparisons between the algorithms that you have implemented and the MPI algorithms, you will see that they have made many optimizations depending on the size of the message and number of nodes that you are running on. Hopefully this helps

How can we classify tree data structurse?

There are various types of trees I know. For example, binary trees can be classified as binary search trees, two trees, etc.
Can anyone give me a complete classification of all the trees in computer science?
Please provide me with reliable references or web links.
It's virtually impossible to answer this question since there are essentially arbitrarily many different ways of using trees. The issue is that a tree is a structure - it's a way of showing how various pieces of data are linked to one another - and what you're asking for is every possible way of interpreting the meaning of that structure. This would be similar, for example, to asking for all uses of calculus in engineering; calculus is a tool with which you can solve an enormous class of problems, but there's no concise way to explain all possible uses of the integral because in each application it is used a different way.
In the case of trees, I've found that there are thousands of research papers describing different tree structures and ways of using trees to solve problems. They arise in string processing, genomics, computational geometry, theory of computation, artificial intelligence, optimization, operating systems, networking, compilers, and a whole host of other areas. In each of these domains they're used to encode specific structures that are domain-specific and difficult to understand without specialized knowledge of the field. No one reference can cover all these ares in any reasonable depth.
In short, you seem to already know the structure of a tree, and this general notion is transferrable to any of the above domains. But to try to learn every possible way of using this structure or all its applications would be a Herculean undertaking that no one, not even the legendary Don Knuth, could ever hope to achieve in a lifetime.
Wikipedia has a nice compilation of the various trees at the bottom of the page
Dictionary of Algorithms and Data Structures has more information
What specifics are you looking for?

Skip Lists -- ever used them? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I'm wondering whether anyone here has ever used a skip list. It looks to have roughly the same advantages as a balanced binary tree but is simpler to implement. If you have, did you write your own, or use a pre-written library (and if so, what was its name)?
My understanding is that they're not so much a useful alternative to binary trees (e.g. red-black trees) as they are to B-trees for database use, so that you can keep the # of levels down to a feasible minimum and deal w/ base-K logs rather than base-2 logs for performance characteristics. The algorithms for probabilistic skip-lists are (IMHO) easier to get right than the corresponding B-tree algorithms. Plus there's some literature on lock-free skip lists. I looked at using them a few months ago but then abandoned the effort on discovering the HDF5 library.
literature on the subject:
Papers by Bill Pugh:
A skip list cookbook
Skip lists: A probabilistic alternative to balanced trees
Concurrent Maintenance of Skip Lists
non-academic papers/tutorials:
Eternally Confuzzled (has some discussion on several data structures)
"Skip Lists" by Thomas A. Anastasio
Actually, for one of my projects, I am implementing my own full STL. And I used a skiplist to implement my std::map. The reason I went with it is that it is a simple algorithm which is very close to the performance of a balanced tree but has much simpler iteration capabilities.
Also, Qt4's QMap was a skiplist as well which was the original inspiration for my using it in my std::map.
Years ago I implemented my own for a probabilistic algorithms class. I'm not aware of any library implementations, but it's been a long time. It is pretty simple to implement. As I recall they had some really nice properties for large data sets and avoided some of the problems of rebalancing. I think the implementation is also simpler than binary tries in general. There is a nice discussion and some sample c++ code here:
http://www.ddj.us/cpp/184403579?pgno=1
There's also an applet with a running demonstration. Cute 90's Java shininess here:
http://www.geocities.com/siliconvalley/network/1854/skiplist.html
Java 1.6 (Java SE 6) introduced ConcurrentSkipListSet and ConcurrentSkipListMap to the collections framework. So, I'd speculate that someone out there is really using them.
Skiplists tend to offer far less contention for locks in a multithreaded situation, and (probabilistically) have performance characteristics similar to trees.
See the original paper [pdf] by William Pugh.
I implemented a variant that I termed a Reverse Skip List for a rules engine a few years ago. Much the same, but the reference links run backward from the last element.
This is because it was faster for inserting sorted items that were most likely towards the back-end of the collection.
It was written in C# and took a few iterations to get working successfully.
The skip list has the same logarithmic time bounds for searching as is achieved by the binary search algorithm, yet it extends that performance to update methods when inserting or deleting entries. Nevertheless, the bounds are expected for the skip list, while binary search of a sorted table has a worst-case bound.
Skip Lists are easy to implement. But, adjusting the pointers on a skip list in case of insertion and deletion you have to be careful. Have not used this in a real program but, have doen some runtime profiling. Skip lists are different from search trees. The similarity is that, it gives average log(n) for a period of dictionary operations just like the splay tree. It is better than an unbalanced search tree but is not better than a balanced tree.
Every skip list node has forward pointers which represent the current->next() connections to the different levels of the skip list. Typically this level is bounded at a maximum of ln(N). So if N = 1million the level is 13. There will be that much pointers and in Java this means twie the number of pointers for implementing reference data types. where as a balanced search tree has less and it gives same runtime!!.
SkipList Vs Splay Tree Vs Hash As profiled for dictionary look up ops a lock stripped hashtable will give result in under 0.010 ms where as a splay tree gives ~ 1 ms and skip list ~720ms.

Resources