LMDB variant offering finger B-Tree? - key-value-store

A finger B-Tree is a B-Tree that tracks a user-specified associative "summarizing" operation on the leaves. When nodes are merged, the operation is used to combine summaries; when nodes are split the summary is recalculated using the node's grandchildren (but no deeper nodes).
By updating the summary data with each split/merge, a finger B-Tree is able to answer queries a the summary over any arbitrary range of keys in at most O(log n) page lookups (i.e. along the path from the root down to the floorkey of the range and the ceilkey of the range).
I don't think LMDB supports this out of the box, but I'd be happy to be wrong. Is anybody aware of an LMDB fork or variant which adds it? If not, is there another lightweight persistent (not necessarily transactional) on-disk BTree library that does?

RocksDB offers custom compaction filters and merge operators, which could be used to implement such summaries in a fairly efficient way, I think. Of course, it's architecture is very different from LMDB.

Related

Why Redis SortedSet uses Skip List instead of Balanced Tree?

The Redis document said as below :
ZSETs are ordered sets using two data structures to hold the same elements
in order to get O(log(N)) INSERT and REMOVE operations into a sorted
data structure.
The elements are added to a hash table mapping Redis objects to
scores. At the same time the elements are added to a skip list
mapping scores to Redis objects (so objects are sorted by scores in
this "view").
I can not understand very much. Could someone give me a detailed explanation?
Antirez said, see in https://news.ycombinator.com/item?id=1171423
There are a few reasons:
They are not very memory intensive. It's up to you basically. Changing parameters about the probability of a node to have a given number of levels will make then less memory intensive than btrees.
A sorted set is often target of many ZRANGE or ZREVRANGE operations, that is, traversing the skip list as a linked list. With this operation the cache locality of skip lists is at least as good as with other kind of balanced trees.
They are simpler to implement, debug, and so forth. For instance thanks to the skip list simplicity I received a patch (already in Redis master) with augmented skip lists implementing ZRANK in O(log(N)). It required little changes to the code.
About the Append Only durability & speed, I don't think it is a good idea to optimize Redis at cost of more code and more complexity for a use case that IMHO should be rare for the Redis target (fsync() at every command). Almost no one is using this feature even with ACID SQL databases, as the performance hint is big anyway.
About threads: our experience shows that Redis is mostly I/O bound. I'm using threads to serve things from Virtual Memory. The long term solution to exploit all the cores, assuming your link is so fast that you can saturate a single core, is running multiple instances of Redis (no locks, almost fully scalable linearly with number of cores), and using the "Redis Cluster" solution that I plan to develop in the future.
First of all, I think I got the idea of what the Redis documents says. Redis ordered set maintain the order of elements by the the element's score specified by user. But when user using some Redis Zset APIs, it only gives element args. For example:
ZREM key member [member ...]
ZINCRBY key increment member
...
redis need to know what value is about this member (element), so it uses hash table maintaining a mapping, just like the documents said:
The elements are added to a hash table mapping Redis objects to
scores.
when it receives a member, it finds its value through the hash table, and then manipulate the operation on the skip list to maintain the order of set. redis uses two data structure to maintain a double mapping to satisfy the need of the different API.
I read the papers by William Pugh Skip Lists: A Probabilistic
Alternative to Balanced Trees, and found the skip list is very elegant and easier to implement than rotating.
Also, I think the general binary balanced tree is able to do this work at the same time cost. I case I've missed something, please point that out.

Is a linked list in a B-tree node superior to an array?

I want to implement a B-tree index for my database.
I have read many data structure and algorithm books to learn how to do it. All implementations use an array to save data and child indexes.
Now I want to know: is a linked list in B-tree node superior to an array?
There are some ideas I've thought about:
when splitting a node, the copy operation will be more quickly than with an array.
when inserting data, if the data is inserted into the middle or at the head of the array, the speed is lower than inserting to the linked list.
The linked list is not better, in fact a simple array is not better either (except its simplicity which is good argument for it and search speed if sorted).
You have to realize that the "array" implementation is more a "reference" implementation than a true full power implementation. For example, the implementation of the data/key pairs inside a B-Tree node in commercial implementations uses many strategies to solve two problems: storage efficiency and efficient search of keys in the node.
With regard with efficient search, an array of key/value with an internal balanced tree structure on the top of it can make insertion/deletion/search be done in O(log N), for large B tree nodes it makes sense.
With regard to memory efficiency, the nature of data in the key and value is very important. For example, lexicographical keys can be shorten by a common start (e.g. "good", "great" have "g" in common), the data might be compressed as well using any possible scheme relevant to the nature of the data. The compression of keys is more complex as you will want to keep this lexicographical property. Remember that the more data and keys you stuff in a node, the fastest are the disk accesses.
The time to split a node is only partially relevant, as it will be much less than the time to read or write a node on typical media by several order of magnitude. On SSD and extremely fast disks (by 10 to 20 years it is expected to have disks as fast as RAM), many researches are conducted to find a successor to B-Trees, stratified B-Trees are an example.
If the BTree is itself stored on the disk then a linked list will make it very complicated to maintain.
Keep the B-Tree structure compact. This will allow more nodes per page, locality of data and allowing caching of more nodes, and fewer disk reads/cache misses.
Use an array.
The perceived in-memory computational benefits are inconsequential.
So, in short, no, linked list is not superior.
B-tree is typically used in DBs where the data is stored on disks and you want to minimize the number of blocks you want to read. I do not think your proposal would be efficient in that case (although it might be beneficial if you can load all data into RAM).
If you want to perform those two operations effectively you should use a Skip List (http://en.wikipedia.org/wiki/Skip_list). Performance-wise it will be similar to what you have outlined.

Why are Haskell Maps implemented as balanced binary trees instead of traditional hashtables?

From my limited knowledge of Haskell, it seems that Maps (from Data.Map) are supposed to be used much like a dictionary or hashtable in other languages, and yet are implemented as self-balancing binary search trees.
Why is this? Using a binary tree reduces lookup time to O(log(n)) as opposed to O(1) and requires that the elements be in Ord. Certainly there is a good reason, so what are the advantages of using a binary tree?
Also:
In what applications would a binary tree be much worse than a hashtable? What about the other way around? Are there many cases in which one would be vastly preferable to the other? Is there a traditional hashtable in Haskell?
Hash tables can't be implemented efficiently without mutable state, because they're based on array lookup. The key is hashed and the hash determines the index into an array of buckets. Without mutable state, inserting elements into the hashtable becomes O(n) because the entire array must be copied (alternative non-copying implementations, like DiffArray, introduce a significant performance penalty). Binary-tree implementations can share most of their structure so only a couple pointers need to be copied on inserts.
Haskell certainly can support traditional hash tables, provided that the updates are in a suitable monad. The hashtables package is probably the most widely used implementation.
One advantage of binary trees and other non-mutating structures is that they're persistent: it's possible to keep older copies of data around with no extra book-keeping. This might be useful in some sort of transaction algorithm for example. They're also automatically thread-safe (although updates won't be visible in other threads).
Traditional hashtables rely on memory mutation in their implementation. Mutable memory and referential transparency are at ends, so that relegates hashtable implementations to either the IO or ST monads. Trees can be implemented persistently and efficiently by leaving old leaves in memory and returning new root nodes which point to the updated trees. This lets us have pure Maps.
The quintessential reference is Chris Okasaki's Purely Functional Data Structures.
Why is this? Using a binary tree reduces lookup time to O(log(n)) as opposed to O(1)
Lookup is only one of the operations; insertion/modification may be more important in many cases; there are also memory considerations. The main reason the tree representation was chosen is probably that it is more suited for a pure functional language. As "Real World Haskell" puts it:
Maps give us the same capabilities as hash tables do in other languages. Internally, a map is implemented as a balanced binary tree. Compared to a hash table, this is a much more efficient representation in a language with immutable data. This is the most visible example of how deeply pure functional programming affects how we write code: we choose data structures and algorithms that we can express cleanly and that perform efficiently, but our choices for specific tasks are often different their counterparts in imperative languages.
This:
and requires that the elements be in Ord.
does not seem like a big disadvantage. After all, with a hash map you need keys to be Hashable, which seems to be more restrictive.
In what applications would a binary tree be much worse than a hashtable? What about the other way around? Are there many cases in which one would be vastly preferable to the other? Is there a traditional hashtable in Haskell?
Unfortunately, I cannot provide an extensive comparative analysis, but there is a hash map package, and you can check out its implementation details and performance figures in this blog post and decide for yourself.
My answer to what the advantage of using binary trees is, would be: range queries. They require, semantically, a total preorder, and profit from a balanced search tree organization algorithmically. For simple lookup, I'm afraid there may only be good Haskell-specific answers, but not good answers per se: Lookup (and indeed hashing) requires only a setoid (equality/equivalence on its key type), which supports efficient hashing on pointers (which, for good reasons, are not ordered in Haskell). Like various forms of tries (e.g. ternary tries for elementwise update, others for bulk updates) hashing into arrays (open or closed) is typically considerably more efficient than elementwise searching in binary trees, both space and timewise. Hashing and Tries can be defined generically, though that has to be done by hand -- GHC doesn't derive it (yet?). Data structures such as Data.Map tend to be fine for prototyping and for code outside of hotspots, but where they are hot they easily become a performance bottleneck. Luckily, Haskell programmers need not be concerned about performance, only their managers. (For some reason I presently can't find a way to access the key redeeming feature of search trees amongst the 80+ Data.Map functions: a range query interface. Am I looking the wrong place?)

How to implement an efficient excel-like app

I need to implement an efficient excel-like app.
I'm looking for a data structure that will:
Store the data in an efficient manner (for example - I don't want
to pre-allocate memory for unused cells).
Allow efficient update when the user changes a formula in one of the cells
Any ideas?
Thanks,
Li
In this case, you're looking for an online dictionary structure. This is a category of structures which allow you to associate one morsel of data (in this case, the coordinates that represent the cell) with another (in this case, the cell contents or formula). The "online" adjective means dictionary entries can be added, removed, or changed in real time.
There's many such structures. To name some more common ones: hash tables, binary trees, skip lists, linked lists, and even lists in arrays.
Of course, some of these are more efficient than others (depending on implementation and the number of entries). Typically I use hash tables for this sort of problem.
However, if you need to do range querying "modify all of the cells in this range", you may be better off with a binary tree or a more complicated spatial structure -- but not likely given the simple requirements of the problem.

Pragmatic guy's confusion about the application of tree data-structure

Having been learning data-structure and algorithm for a long time, I'm still uncertain about the practical application of those famous data-structure such as red-black tree, splay tree.
I know that B-tree has been widely used in database stuff.
With respect to other tree data-structure like red-black tree and splay tree etc,
have they been widely used in practice? If any, give some example.
Unlike B-tree whose structure can be retained and saved in disk, red-black and splay tree cannot achieve that, they are just in-memory structure, right? So how can they be as popular as B-tree?
I know that B-tree has been widely used in database stuff.
That isn’t very specific, is it?
In fact, B trees and red-black trees serve the exact same purpose: Both are index data structures, more precisely search trees, i.e. data structures that allow you to efficiently search for an item in a collection.
The only relevant difference between red-black trees and B trees is the fact that the latter incorporate some additional factors that improve their caching behaviour, which is required when access to memory is particularly slow due to high latency (simply put, an average access to the B tree will require less jumping around in memory than it does in the red-black tree, and more reading of adjacent memory locations, which is often much faster).
Historically, this has been used to store the index on a disk (secondary storage) which is very slow compared to main storage (RAM). Red-black trees, on the other hand, are often used when the index is retained in RAM (for example, the C++ std::map structure is usually implemented as a red-black tree).
This is going to change, though. Modern CPUs use caches to improve access to main memory further, and since acesss to the RAM is much slower than the cache, B trees (and their variants) once again become better suited than red-black trees.
Probably the most widely-used implementations of the red-black tree are the Java TreeMap and TreeSet library classes, used to implement sorted maps and sets of objects in a tree-like structure. Cadging a bit from this Wikipedia article, red-black trees require less reshuffling to be done on inserts and deletes, because they don't impose as stringent requirements on the fullness of the structure.
Many applications of sorted trees do not require the data structure to be written to disk. Often, data is received or generated in arbitrary order and sorted solely for the use of another part of the same program. At other times, data must be sorted before being output, but is then simply output as a flat file without conveying the tree structure. In any case, relatively few on-disk file formats are derived from simply writing the contents of memory to disk; storing data this way requires annoying pointer adjustments, and more importantly make the on-disk format depend on such details as the processor data word size, system byte order, and word alignment. Data is far more commonly either written out as (perhaps compressed) text, or is written to disk in a carefully-defined binary format. The only cases I can think of where any sorted tree is written to disk are databases and file systems, where the structure is loaded from disk into memory and used as is; in this case, B-trees are indeed the preferred data structure.
My favourite example of practical usage is in CPU scheduling, this task scheduler which employs an RB tree was shipped with the Linux 2.6.23 kernel. Of course there's plenty more as has already been pointed out, this is just my personal favourite.

Resources