Skip List vs. Binary Search Tree - algorithm

I recently came across the data structure known as a skip list. It seems to have very similar behavior to a binary search tree.
Why would you ever want to use a skip list over a binary search tree?

Skip lists are more amenable to concurrent access/modification. Herb Sutter wrote an article about data structure in concurrent environments. It has more indepth information.
The most frequently used implementation of a binary search tree is a red-black tree. The concurrent problems come in when the tree is modified it often needs to rebalance. The rebalance operation can affect large portions of the tree, which would require a mutex lock on many of the tree nodes. Inserting a node into a skip list is far more localized, only nodes directly linked to the affected node need to be locked.
Update from Jon Harrops comments
I read Fraser and Harris's latest paper Concurrent programming without locks. Really good stuff if you're interested in lock-free data structures. The paper focuses on Transactional Memory and a theoretical operation multiword-compare-and-swap MCAS. Both of these are simulated in software as no hardware supports them yet. I'm fairly impressed that they were able to build MCAS in software at all.
I didn't find the transactional memory stuff particularly compelling as it requires a garbage collector. Also software transactional memory is plagued with performance issues. However, I'd be very excited if hardware transactional memory ever becomes common. In the end it's still research and won't be of use for production code for another decade or so.
In section 8.2 they compare the performance of several concurrent tree implementations. I'll summarize their findings. It's worth it to download the pdf as it has some very informative graphs on pages 50, 53, and 54.
Locking skip lists is insanely fast. They scale incredibly well with the number of concurrent accesses. This is what makes skip lists special, other lock based data structures tend to croak under pressure.
Lock-free skip lists are consistently faster than locking skip lists but only barely.
transactional skip lists are consistently 2-3 times slower than the locking and non-locking versions.
locking red-black trees croak under concurrent access. Their performance degrades linearly with each new concurrent user. Of the two known locking red-black tree implementations, one essentially has a global lock during tree rebalancing. The other uses fancy (and complicated) lock escalation but still doesn't significantly outperform the global lock version.
lock-free red-black trees don't exist (no longer true, see Update).
transactional red-black trees are comparable with transactional skip-lists. That was very surprising and very promising. Transactional memory, though slower if far easier to write. It can be as easy as quick search and replace on the non-concurrent version.
Update
Here is paper about lock-free trees: Lock-Free Red-Black Trees Using CAS.
I haven't looked into it deeply, but on the surface it seems solid.

First, you cannot fairly compare a randomized data structure with one that gives you worst-case guarantees.
A skip list is equivalent to a randomly balanced binary search tree (RBST) in the way that is explained in more detail in Dean and Jones' "Exploring the Duality Between Skip Lists and Binary Search Trees".
The other way around, you can also have deterministic skip lists which guarantee worst case performance, cf. Munro et al.
Contra to what some claim above, you can have implementations of binary search trees (BST) that work well in concurrent programming. A potential problem with the concurrency-focused BSTs is that you can't easily get the same had guarantees about balancing as you would from a red-black (RB) tree. (But "standard", i.e. randomzided, skip lists don't give you these guarantees either.) There's a trade-off between maintaining balancing at all times and good (and easy to program) concurrent access, so relaxed RB trees are usually used when good concurrency is desired. The relaxation consists in not re-balancing the tree right away. For a somewhat dated (1998) survey see Hanke's ''The Performance of Concurrent Red-Black Tree Algorithms'' [ps.gz].
One of the more recent improvements on these is the so-called chromatic tree (basically you have some weight such that black would be 1 and red would be zero, but you also allow values in between). And how does a chromatic tree fare against skip list? Let's see what Brown et al. "A General Technique for Non-blocking Trees" (2014) have to say:
with 128 threads, our algorithm outperforms Java’s non-blocking skiplist
by 13% to 156%, the lock-based AVL tree of Bronson et al. by 63% to 224%, and a RBT that uses software transactional memory (STM) by 13 to 134 times
EDIT to add: Pugh's lock-based skip list, which was benchmarked in Fraser and Harris (2007) "Concurrent Programming Without Lock" as coming close to their own lock-free version (a point amply insisted upon in the top answer here), is also tweaked for good concurrent operation, cf. Pugh's "Concurrent Maintenance of Skip Lists", although in a rather mild way. Nevertheless one newer/2009 paper "A Simple Optimistic skip-list Algorithm" by Herlihy et al., which proposes a supposedly simpler (than Pugh's) lock-based implementation of concurrent skip lists, criticized Pugh for not providing a proof of correctness convincing enough for them. Leaving aside this (maybe too pedantic) qualm, Herlihy et al. show that their simpler lock-based implementation of a skip list actually fails to scale as well as the JDK's lock-free implementation thereof, but only for high contention (50% inserts, 50% deletes and 0% lookups)... which Fraser and Harris didn't test at all; Fraser and Harris only tested 75% lookups, 12.5% inserts and 12.5% deletes (on skip list with ~500K elements). The simpler implementation of Herlihy et al. also comes close to the lock-free solution from the JDK in the case of low contention that they tested (70% lookups, 20% inserts, 10% deletes); they actually beat the lock-free solution for this scenario when they made their skip list big enough, i.e. going from 200K to 2M elements, so that the probability of contention on any lock became negligible. It would have been nice if Herlihy et al. had gotten over their hangup over Pugh's proof and tested his implementation too, but alas they didn't do that.
EDIT2: I found a (2015 published) motherlode of all benchmarks: Gramoli's "More Than You Ever Wanted to Know about Synchronization. Synchrobench, Measuring the Impact of the Synchronization on Concurrent Algorithms": Here's a an excerpted image relevant to this question.
"Algo.4" is a precursor (older, 2011 version) of Brown et al.'s mentioned above. (I don't know how much better or worse the 2014 version is). "Algo.26" is Herlihy's mentioned above; as you can see it gets trashed on updates, and much worse on the Intel CPUs used here than on the Sun CPUs from the original paper. "Algo.28" is ConcurrentSkipListMap from the JDK; it doesn't do as well as one might have hoped compared to other CAS-based skip list implementations. The winners under high-contention are "Algo.2" a lock-based algorithm (!!) described by Crain et al. in "A Contention-Friendly Binary Search Tree" and "Algo.30" is the "rotating skiplist" from "Logarithmic data structures for
multicores". "Algo.29" is the "No hot spot non-blocking skip
list". Be advised that Gramoli is a co-author to all three of these winner-algorithm papers. "Algo.27" is the C++ implementation of Fraser's skip list.
Gramoli's conclusion is that's much easier to screw-up a CAS-based concurrent tree implementation than it is to screw up a similar skip list. And based on the figures, it's hard to disagree. His explanation for this fact is:
The difficulty in designing a tree that is lock-free stems from
the difficulty of modifying multiple references atomically. Skip lists
consist of towers linked to each other through successor pointers and
in which each node points to the node immediately below it. They are
often considered similar to trees because each node has a successor
in the successor tower and below it, however, a major distinction is
that the downward pointer is generally immutable hence simplifying
the atomic modification of a node. This distinction is probably
the reason why skip lists outperform trees under heavy contention
as observed in Figure [above].
Overriding this difficulty was a key concern in Brown et al.'s recent work.
They have a whole separate (2013) paper "Pragmatic Primitives for Non-blocking Data Structures"
on building multi-record LL/SC compound "primitives", which they call LLX/SCX, themselves implemented using (machine-level) CAS. Brown et al. used this LLX/SCX building block in their 2014 (but not in their 2011) concurrent tree implementation.
I think it's perhaps also worth summarizing here the fundamental ideas
of the "no hot spot"/contention-friendly (CF) skip list. It addapts an essential idea from the relaxed RB trees (and similar concrrency friedly data structures): the towers are no longer built up immediately upon insertion, but delayed until there's less contention. Conversely, the deletion of a tall tower can create many contentions;
this was observed as far back as Pugh's 1990 concurrent skip-list paper, which is why Pugh introduced pointer reversal on deletion (a tidbit that Wikipedia's page on skip lists still doesn't mention to this day, alas). The CF skip list takes this a step further and delays deleting the upper levels of a tall tower. Both kinds of delayed operations in CF skip lists are carried out by a (CAS based) separate garbage-collector-like thread, which its authors call the "adapting thread".
The Synchrobench code (including all algorithms tested) is available at: https://github.com/gramoli/synchrobench.
The latest Brown et al. implementation (not included in the above) is available at http://www.cs.toronto.edu/~tabrown/chromatic/ConcurrentChromaticTreeMap.java Does anyone have a 32+ core machine available? J/K My point is that you can run these yourselves.

Also, in addition to the answers given (ease of implementation combined with comparable performance to a balanced tree). I find that implementing in-order traversal (forwards and backwards) is far simpler because a skip-list effectively has a linked list inside its implementation.

In practice I've found that B-tree performance on my projects has worked out to be better than skip-lists. Skip lists do seem easier to understand but implementing a B-tree is not that hard.
The one advantage that I know of is that some clever people have worked out how to implement a lock-free concurrent skip list that only uses atomic operations. For example, Java 6 contains the ConcurrentSkipListMap class, and you can read the source code to it if you are crazy.
But it's not too hard to write a concurrent B-tree variant either - I've seen it done by someone else - if you preemptively split and merge nodes "just in case" as you walk down the tree then you won't have to worry about deadlocks and only ever need to hold a lock on two levels of the tree at a time. The synchronization overhead will be a bit higher but the B-tree is probably faster.

From the Wikipedia article you quoted:
Θ(n) operations, which force us to visit every node in ascending order (such as printing the entire list) provide the opportunity to perform a behind-the-scenes derandomization of the level structure of the skip-list in an optimal way, bringing the skip list to O(log n) search time. [...]
A skip list, upon which we have not
recently performed [any such] Θ(n) operations, does not
provide the same absolute worst-case
performance guarantees as more
traditional balanced tree data
structures, because it is always
possible (though with very low
probability) that the coin-flips used
to build the skip list will produce a
badly balanced structure
EDIT: so it's a trade-off: Skip Lists use less memory at the risk that they might degenerate into an unbalanced tree.

Skip lists are implemented using lists.
Lock free solutions exist for singly and doubly linked lists - but there are no lock free solutions which directly using only CAS for any O(logn) data structure.
You can however use CAS based lists to create skip lists.
(Note that MCAS, which is created using CAS, permits arbitrary data structures and a proof of concept red-black tree had been created using MCAS).
So, odd as they are, they turn out to be very useful :-)

Skip Lists do have the advantage of lock stripping. But, the runt time depends on how the level of a new node is decided. Usually this is done using Random(). On a dictionary of 56000 words, skip list took more time than a splay tree and the tree took more time than a hash table. The first two could not match hash table's runtime. Also, the array of the hash table can be lock stripped in a concurrent way too.
Skip List and similar ordered lists are used when locality of reference is needed. For ex: finding flights next and before a date in an application.
An inmemory binary search splay tree is great and more frequently used.
Skip List Vs Splay Tree Vs Hash Table Runtime on dictionary find op

Related

Do linked lists have a practical performance advantage in eager functional languages?

I'm aware that in lazy functional languages, linked lists take on generator-esque semantics, and that under an optimizing compiler, their overhead can be completely removed when they're not actually being used for storage.
But in eager functional languages, their use seems no less heavy, while optimizing them out seems more difficult. Is there a good performance reason that languages like Scheme use them over flat arrays as their primary sequence representation?
I'm aware of the obvious time complexity implications of using singly-linked lists; I'm more thinking about the in-practice performance consequences of using eager singly-linked lists as a primary sequence representation, compiler optimizations considered.
TL; DR: No performance advantage!
The first Lisp had cons (linked list cell) as only data structure and it used it for everything. Both Common Lisp and Scheme have vectors today, but for functional style it's not a good match. The linked list can have one recursive step add zero or more elements in front of an accumulator and in the end you have a list that was made with sharing between the iterations. The operation might do more than one recursion making several versions all sharing the tail. I would say sharing is the most important aspect of the linked list. If you make a minimax algorithm and store the state in a linked list you can change the state without having to copy the unchanged parts of the state.
Creator of C++ Bjarne Stroustrup mentions in a talk that the penalty of having the data structure scrambled and double the size like in a linked list is easily outperformed even if you insert in order and need to move half the elements in the data structure. Keep in mind these were doubly linked lists and mutation inserts, but he mentions that most of the time was following the pointers linearly, getting all the CPU cache misses, to find the correct spot and thus for every O(n) search in a sorted list a vector would be better.
If you have a program where you do many inserts on a list which is not in front then perhaps a tree is a better choice, which you can do in CL and Scheme with cons. In fact all of Chris Okasaki purly functional data structures can be implemented with cons. Most "mutable" structures in Haskell are implemented similar to it.
If you are suffering from performance problems in Scheme and after profiling you find out you should try to replace a linked list operation with an array one there is nothing standing in the way of that. In the end all algorithm choices have pros and cons. Hard computations are hard in any language.

T-Tree or B-Tree

T-tree algorithm is described in this paper
And T*-Tree is an improvement from T-tree for better use of query operations, including range queries and which contains all other good features of T-tree.
This algorithm is described in this paper "T*-tree: A Main Memory Database Index Structure for Real-Time Applications".
According to this research paper, T-Tree is faster than B-tree/B+tree when datasets fit in the memory.
I implemented T-Tree/T*Tree as they described in these papers and compared the performance with B-tree/B+tree, but B-tree/B+tree perform better than T-Tree/T*Tree in all test cases (insertion, deletion, searching).
I read that T-Tree is an efficient index structure for in-memory database, and it used by Oracle TimesTen. But my results did not show that.
If anyone may know the reason or have any comment about that, it will be great to hear from her (or him).
T-Trees are not a fundamental data structure in the same sense that AVL trees or B-trees are. They are just a hacked version of balanced binary trees and as such there may or may not be niche applications where they offer decent performance.
In this day and age they are bound to suffer horribly because of their poor locality, both in the sense of expected block/page transfer counts and in the sense of cache locality. The latter is evident since in all node accesses of a search except for the very last one, only the boundary values will be checked against the search key - all the rest is paged in or cached for nought.
Compare this to the excellent access locality of B-trees in general and B+trees in particular (not to mention cache-oblivious and cache-conscious versions that were designed explicitly with memory performance charactistics in mind).
Similar problems exist with the rebalancing. In the B-tree world many variations - starting with B+ and Blink - have been developed and perfected in order to achieve desired amortised performance characteristics, including aspects like concurrency (locking/latching) or the absence thereof. So most of the time you can simply go out and find a B-tree variation that fits your performance profile - or use the simple classic B+tree and be certain of decent results.
T-trees are more complicated than comparable B-trees and it seems that they have nothing to offer in the way of performance in general, given that the times of commodity hardware with a single-level memory 'hierarchy' have been gone for decades. Not only is the hard disk the new memory, the converse is also true and main memory is the new hard disk now. I.e. even without NUMA the cost of bringing data from main memory into the cache hierarchy is so high that it pays to minimise page transfers - which is precisely what B-trees and their variations do and the T-tree doesn't. Closer to the processor core it's the number of cache line accesses/transfers that matters but the picture remains the same.
In fact, if you take the idea of binary search - which is provably optimal - and think about ways of arranging the search keys in a manner that plays well with memory hierarchies (caches) then you invariably end up with something that looks uncannily like a B-tree...
If you program for performance then you'll find that winners are almost always located somewhere in the triangle between sorted arrays, B-trees and hashing. Even balanced binary trees are only competitive if their comparatively poor performance takes the back seat in the face of other considerations and key counts are fairly small, i.e. not more than a couple million.

Why are Haskell Maps implemented as balanced binary trees instead of traditional hashtables?

From my limited knowledge of Haskell, it seems that Maps (from Data.Map) are supposed to be used much like a dictionary or hashtable in other languages, and yet are implemented as self-balancing binary search trees.
Why is this? Using a binary tree reduces lookup time to O(log(n)) as opposed to O(1) and requires that the elements be in Ord. Certainly there is a good reason, so what are the advantages of using a binary tree?
Also:
In what applications would a binary tree be much worse than a hashtable? What about the other way around? Are there many cases in which one would be vastly preferable to the other? Is there a traditional hashtable in Haskell?
Hash tables can't be implemented efficiently without mutable state, because they're based on array lookup. The key is hashed and the hash determines the index into an array of buckets. Without mutable state, inserting elements into the hashtable becomes O(n) because the entire array must be copied (alternative non-copying implementations, like DiffArray, introduce a significant performance penalty). Binary-tree implementations can share most of their structure so only a couple pointers need to be copied on inserts.
Haskell certainly can support traditional hash tables, provided that the updates are in a suitable monad. The hashtables package is probably the most widely used implementation.
One advantage of binary trees and other non-mutating structures is that they're persistent: it's possible to keep older copies of data around with no extra book-keeping. This might be useful in some sort of transaction algorithm for example. They're also automatically thread-safe (although updates won't be visible in other threads).
Traditional hashtables rely on memory mutation in their implementation. Mutable memory and referential transparency are at ends, so that relegates hashtable implementations to either the IO or ST monads. Trees can be implemented persistently and efficiently by leaving old leaves in memory and returning new root nodes which point to the updated trees. This lets us have pure Maps.
The quintessential reference is Chris Okasaki's Purely Functional Data Structures.
Why is this? Using a binary tree reduces lookup time to O(log(n)) as opposed to O(1)
Lookup is only one of the operations; insertion/modification may be more important in many cases; there are also memory considerations. The main reason the tree representation was chosen is probably that it is more suited for a pure functional language. As "Real World Haskell" puts it:
Maps give us the same capabilities as hash tables do in other languages. Internally, a map is implemented as a balanced binary tree. Compared to a hash table, this is a much more efficient representation in a language with immutable data. This is the most visible example of how deeply pure functional programming affects how we write code: we choose data structures and algorithms that we can express cleanly and that perform efficiently, but our choices for specific tasks are often different their counterparts in imperative languages.
This:
and requires that the elements be in Ord.
does not seem like a big disadvantage. After all, with a hash map you need keys to be Hashable, which seems to be more restrictive.
In what applications would a binary tree be much worse than a hashtable? What about the other way around? Are there many cases in which one would be vastly preferable to the other? Is there a traditional hashtable in Haskell?
Unfortunately, I cannot provide an extensive comparative analysis, but there is a hash map package, and you can check out its implementation details and performance figures in this blog post and decide for yourself.
My answer to what the advantage of using binary trees is, would be: range queries. They require, semantically, a total preorder, and profit from a balanced search tree organization algorithmically. For simple lookup, I'm afraid there may only be good Haskell-specific answers, but not good answers per se: Lookup (and indeed hashing) requires only a setoid (equality/equivalence on its key type), which supports efficient hashing on pointers (which, for good reasons, are not ordered in Haskell). Like various forms of tries (e.g. ternary tries for elementwise update, others for bulk updates) hashing into arrays (open or closed) is typically considerably more efficient than elementwise searching in binary trees, both space and timewise. Hashing and Tries can be defined generically, though that has to be done by hand -- GHC doesn't derive it (yet?). Data structures such as Data.Map tend to be fine for prototyping and for code outside of hotspots, but where they are hot they easily become a performance bottleneck. Luckily, Haskell programmers need not be concerned about performance, only their managers. (For some reason I presently can't find a way to access the key redeeming feature of search trees amongst the 80+ Data.Map functions: a range query interface. Am I looking the wrong place?)

How well do zippers perform in practice, and when should they be used?

I think that the zipper is a beautiful idea; it elegantly provides a way to walk a list or tree and make what appear to be local updates in a functional way.
Asymptotically, the costs appear to be reasonable. But traversing the data structure requires memory allocation at each iteration, where a normal list or tree traversal is just pointer chasing. This seems expensive (please correct me if I am wrong).
Are the costs prohibitive? And what under what circumstances would it be reasonable to use a zipper?
I can provide one solid data point: John Dias and I had a paper in the 2005 ML Workshop where we compared the cost of using a zipper to represent control-flow graphs with the cost of using mutable record fields in Objective Caml. We were very pleasantly surprised to find that the performance of our compiler with the zipper-based control-flow graphs was actually slightly better than the performance using a traditional data structure with mutable records linked by pointers. We couldn't find serious analysis tools to tell us exactly why the zipper was faster, but I suspect the reason was that there were fewer invariants to maintain and so relatively fewer pointer assignments. It's also possible that the optimizer was smart enough to amortize some of the allocation costs incurred by the zipper. In any case, the zipper can be used in an optimizing compiler, and there is actually a slight performance benefit.

Skip Lists -- ever used them? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I'm wondering whether anyone here has ever used a skip list. It looks to have roughly the same advantages as a balanced binary tree but is simpler to implement. If you have, did you write your own, or use a pre-written library (and if so, what was its name)?
My understanding is that they're not so much a useful alternative to binary trees (e.g. red-black trees) as they are to B-trees for database use, so that you can keep the # of levels down to a feasible minimum and deal w/ base-K logs rather than base-2 logs for performance characteristics. The algorithms for probabilistic skip-lists are (IMHO) easier to get right than the corresponding B-tree algorithms. Plus there's some literature on lock-free skip lists. I looked at using them a few months ago but then abandoned the effort on discovering the HDF5 library.
literature on the subject:
Papers by Bill Pugh:
A skip list cookbook
Skip lists: A probabilistic alternative to balanced trees
Concurrent Maintenance of Skip Lists
non-academic papers/tutorials:
Eternally Confuzzled (has some discussion on several data structures)
"Skip Lists" by Thomas A. Anastasio
Actually, for one of my projects, I am implementing my own full STL. And I used a skiplist to implement my std::map. The reason I went with it is that it is a simple algorithm which is very close to the performance of a balanced tree but has much simpler iteration capabilities.
Also, Qt4's QMap was a skiplist as well which was the original inspiration for my using it in my std::map.
Years ago I implemented my own for a probabilistic algorithms class. I'm not aware of any library implementations, but it's been a long time. It is pretty simple to implement. As I recall they had some really nice properties for large data sets and avoided some of the problems of rebalancing. I think the implementation is also simpler than binary tries in general. There is a nice discussion and some sample c++ code here:
http://www.ddj.us/cpp/184403579?pgno=1
There's also an applet with a running demonstration. Cute 90's Java shininess here:
http://www.geocities.com/siliconvalley/network/1854/skiplist.html
Java 1.6 (Java SE 6) introduced ConcurrentSkipListSet and ConcurrentSkipListMap to the collections framework. So, I'd speculate that someone out there is really using them.
Skiplists tend to offer far less contention for locks in a multithreaded situation, and (probabilistically) have performance characteristics similar to trees.
See the original paper [pdf] by William Pugh.
I implemented a variant that I termed a Reverse Skip List for a rules engine a few years ago. Much the same, but the reference links run backward from the last element.
This is because it was faster for inserting sorted items that were most likely towards the back-end of the collection.
It was written in C# and took a few iterations to get working successfully.
The skip list has the same logarithmic time bounds for searching as is achieved by the binary search algorithm, yet it extends that performance to update methods when inserting or deleting entries. Nevertheless, the bounds are expected for the skip list, while binary search of a sorted table has a worst-case bound.
Skip Lists are easy to implement. But, adjusting the pointers on a skip list in case of insertion and deletion you have to be careful. Have not used this in a real program but, have doen some runtime profiling. Skip lists are different from search trees. The similarity is that, it gives average log(n) for a period of dictionary operations just like the splay tree. It is better than an unbalanced search tree but is not better than a balanced tree.
Every skip list node has forward pointers which represent the current->next() connections to the different levels of the skip list. Typically this level is bounded at a maximum of ln(N). So if N = 1million the level is 13. There will be that much pointers and in Java this means twie the number of pointers for implementing reference data types. where as a balanced search tree has less and it gives same runtime!!.
SkipList Vs Splay Tree Vs Hash As profiled for dictionary look up ops a lock stripped hashtable will give result in under 0.010 ms where as a splay tree gives ~ 1 ms and skip list ~720ms.

Resources