Binary search with looking nearby values

Binary search with looking nearby values - performance

I am trying to find, if someone implemented binary search in following way -
Let suppose we have array of some elements, placed in contiguous memory.
Then when you compare middle element, the next few elements should be already in the the CPU cache. Comparing should be already free?
Yet I can not find anyone who doing this.
If no one do that, what could be the reason?

Classic binary search can be thought of as defining a binary tree structure on the elements. For example, if your array has 15 elements numbered 1 through 15, you would start out by looking at the middle element "8", and from there go either left or right to element "4" or "12":
(from Brodal et al, "Cache oblivious search trees via binary trees of small height", SODA'02, PDF preprint link)
What you are proposing is essentially adding a few more elements into each node, so that the "8" element would also contain some adjacent elements e.g. "9", "10", "11". I don't think this has been studied extensively before, but there is another very related idea that has been studied, namely going from a binary tree (with two children on each node) to a B-ary tree ("B-tree", with B children on each node). This is where you have multiple elements in a node that split up the resulting data into many different ranges.
By comparing a search key to the B-1 different elements in a node, you can determine which of the B children to recurse into.
It's possible to rearrange the elements in an array so that this search structure can be realized without using pointers and multiple memory allocations, so that it becomes just as space-efficient as doing binary search in a regular sorted array, but with fewer "jumps" when doing the search. In a regular sorted array, binary search does roughly log_2(N) jumps, whereas search in a B-tree does only log_2(N) / log_2(B) = log_B(N) jumps, which is a factor log_2(B) fewer!
Sergey Slotin has written a blog post with a complete example of how to implement a static B-tree that is stored implicitly in an array: https://algorithmica.org/en/b-tree

I think your proposal is decent, although it is slightly more complicated than you might think at first. It will only work for situations where the data under comparison is in the contiguous array itself (rather than having the array hold pointers). This is less common in higher-level languages.
From there it gets more technical. Are you going to go for L1 cache or L2 cache? Actually, it's probably only "free" if it's in the L1 cache. Are you going to query the CPU architecture to get the cache line sizes or are you going to use specs from a common architecture? Does your OS and architecture allow you to determine your spot in the cache line based on the address modulus? (I assume that it can be determined, but I don't know that for sure.)
You're trying to go faster than O(log(n)). How many jumps can you save with this approach? If your next jump is in the same cache line, you've done more work by iterating. It only helps when your target is in your cache line but your next jump is not. The L1 cache line might contain 8 (8-byte) or maybe 16 (4-byte) values, only half of which are usable if we land in the middle of the cache line on average.
I think at this point it becomes data-dependent. Can you predict that your target is "close" to your current spot? And by close, I mean within 16 values? To be that close means you're within log2(16)=4 close jumps, and the odds are that some of those jumps are in the cache. All of them are probably in the L2 cache. After all this, I think it's not worth it; you won't be any faster.

Related

Base 3 or more search? [duplicate]

I recently heard about ternary search in which we divide an array into 3 parts and compare. Here there will be two comparisons but it reduces the array to n/3. Why don't people use this much?

Actually, people do use k-ary trees for arbitrary k.
This is, however, a tradeoff.
To find an element in a k-ary tree, you need around k*ln(N)/ln(k) operations (remember the change-of-base formula). The larger your k is, the more overall operations you need.
The logical extension of what you are saying is "why don't people use an N-ary tree for N data elements?". Which, of course, would be an array.

A ternary search will still give you the same asymptotic complexity O(log N) search time, and adds complexity to the implementation.
The same argument can be said for why you would not want a quad search or any other higher order.

Searching 1 billion (a US billion - 1,000,000,000) sorted items would take an average of about 15 compares with binary search and about 9 compares with a ternary search - not a huge advantage. And note that each 'ternary compare' might involve 2 actual comparisons.

Wow. The top voted answers miss the boat on this one, I think.
Your CPU doesn't support ternary logic as a single operation; it breaks ternary logic into several steps of binary logic. The most optimal code for the CPU is binary logic. If chips were common that supported ternary logic as a single operation, you'd be right.
B-Trees can have multiple branches at each node; a order-3 B-tree is ternary logic. Each step down the tree will take two comparisons instead of one, and this will probably cause it to be slower in CPU time.
B-Trees, however, are pretty common. If you assume that every node in the tree will be stored somewhere separately on disk, you're going to spend most of your time reading from disk... and the CPU won't be a bottleneck, but the disk will be. So you take a B-tree with 100,000 children per node, or whatever else will barely fit into one block of memory. B-trees with that kind of branching factor would rarely be more than three nodes high, and you'd only have three disk reads - three stops at a bottleneck - to search an enormous, enormous dataset.
Reviewing:
Ternary trees aren't supported by hardware, so they run less quickly.
B-tress with orders much, much, much higher than 3 are common for disk-optimization of large datasets; once you've gone past 2, go higher than 3.

The only way a ternary search can be faster than a binary search is if a 3-way partition determination can be done for less than about 1.55 times the cost of a 2-way comparison. If the items are stored in a sorted array, the 3-way determination will on average be 1.66 times as expensive as a 2-way determination. If information is stored in a tree, however, the cost to fetch information is high relative to the cost of actually comparing, and cache locality means the cost of randomly fetching a pair of related data is not much worse than the cost of fetching a single datum, a ternary or n-way tree may improve efficiency greatly.

What makes you think Ternary search should be faster?
Average number of comparisons:
in ternary search = ((1/3)*1 + (2/3)*2) * ln(n)/ln(3) ~ 1.517*ln(n)
in binary search = 1 * ln(n)/ln(2) ~ 1.443*ln(n).
Worst number of comparisons:
in ternary search = 2 * ln(n)/ln(3) ~ 1.820*ln(n)
in binary search = 1 * ln(n)/ln(2) ~ 1.443*ln(n).
So it looks like ternary search is worse.

Also, note that this sequence generalizes to linear search if we go on
Binary search
Ternary search
...
...
n-ary search ≡ linear search
So, in an n-ary search, we will have "one only COMPARE" which might take upto n actual comparisons.

"Terinary" (ternary?) search is more efficient in the best case, which would involve searching for the first element (or perhaps the last, depending on which comparison you do first). For elements farther from the end you're checking first, while two comparisons would narrow the array by 2/3 each time, the same two comparisons with binary search would narrow the search space by 3/4.
Add to that, binary search is simpler. You just compare and get one half or the other, rather than compare, if less than get the first third, else compare, if less than get the second third, else get the last third.

Ternary search can be effectively used on parallel architectures - FPGAs and ASICs. For example if internal FPGA memory required for search is less than half of the FPGA resource, you can make a duplicate memory block. This would allow to simultaneously access two different memory addresses and do all comparisons in a single clock cycle. This is one of the reasons why 100MHz FPGA can sometimes outperform the 4GHz CPU :)

Here's some random experimental evidence that I haven't vetted at all showing that it's slower than binary search.

Almost all textbooks and websites on binary search trees do not really talk about binary trees! They show you ternary search trees! True binary trees store data in their leaves not internal nodes (except for keys to navigate). Some call these leaf trees and make the distinction between node trees shown in textbooks:
J. Nievergelt, C.-K. Wong: Upper Bounds for the Total Path Length of Binary Trees,
Journal ACM 20 (1973) 1–6.
The following about this is from Peter Brass's book on data structures.
2.1 Two Models of Search Trees
In the outline just given, we supressed an important point that at first seems
trivial, but indeed it leads to two different models of search trees, either of
which can be combined with much of the following material, but one of which
is strongly preferable.
If we compare in each node the query key with the key contained in the
node and follow the left branch if the query key is smaller and the right branch
if the query key is larger, then what happens if they are equal? The two models
of search trees are as follows:
Take left branch if query key is smaller than node key; otherwise take the
right branch, until you reach a leaf of the tree. The keys in the interior node
of the tree are only for comparison; all the objects are in the leaves.
Take left branch if query key is smaller than node key; take the right branch
if the query key is larger than the node key; and take the object contained
in the node if they are equal.
This minor point has a number of consequences:
{ In model 1, the underlying tree is a binary tree, whereas in model 2, each
tree node is really a ternary node with a special middle neighbor.
{ In model 1, each interior node has a left and a right subtree (each possibly a
leaf node of the tree), whereas in model 2, we have to allow incomplete
nodes, where left or right subtree might be missing, and only the
comparison object and key are guaranteed to exist.
So the structure of a search tree of model 1 is more regular than that of a tree
of model 2; this is, at least for the implementation, a clear advantage.
{ In model 1, traversing an interior node requires only one comparison,
whereas in model 2, we need two comparisons to check the three
possibilities.
Indeed, trees of the same height in models 1 and 2 contain at most approximately
the same number of objects, but one needs twice as many comparisons in model
2 to reach the deepest objects of the tree. Of course, in model 2, there are also
some objects that are reached much earlier; the object in the root is found
with only two comparisons, but almost all objects are on or near the deepest
level.
Theorem. A tree of height h and model 1 contains at most 2^h objects.
A tree of height h and model 2 contains at most 2^h+1 − 1 objects.
This is easily seen because the tree of height h has as left and right subtrees a
tree of height at most h − 1 each, and in model 2 one additional object between
them.
{ In model 1, keys in interior nodes serve only for comparisons and may
reappear in the leaves for the identification of the objects. In model 2, each
key appears only once, together with its object.
It is even possible in model 1 that there are keys used for comparison that
do not belong to any object, for example, if the object has been deleted. By
conceptually separating these functions of comparison and identification, this
is not surprising, and in later structures we might even need to define artificial
tests not corresponding to any object, just to get a good division of the search
space. All keys used for comparison are necessarily distinct because in a model
1 tree, each interior node has nonempty left and right subtrees. So each key
occurs at most twice, once as comparison key and once as identification key in
the leaf.
Model 2 became the preferred textbook version because in most textbooks
the distinction between object and its key is not made: the key is the object.
Then it becomes unnatural to duplicate the key in the tree structure. But in
all real applications, the distinction between key and object is quite important.
One almost never wishes to keep track of just a set of numbers; the numbers
are normally associated with some further information, which is often much
larger than the key itself.

You may have heard ternary search being used in those riddles that involve weighing things on scales. Those scales can return 3 answers: left is lighter, both are the same, or left is heavier. So in a ternary search, it only takes 1 comparison.
However, computers use boolean logic, which only has 2 answers. To do the ternary search, you'd actually have to do 2 comparisons instead of 1.
I guess there are some cases where this is still faster as earlier posters mentioned, but you can see that ternary search isn't always better, and it's more confusing and less natural to implement on a computer.

Theoretically the minimum of k/ln(k) is achieved at e and since 3 is closer to e than 2 it requires less comparisons. You can check that 3/ln(3) = 2.73.. and 2/ln(2) = 2.88.. The reason why binary search could be faster is that the code for it will have less branches and will run faster on modern CPUs.

I have just posted a blog about the ternary search and I have shown some results. I have also provided some initial level implementations on my git repo I totally agree with every one about the theory part of the ternary search but why not give it a try? As per the implementation that part is easy enough if you have three years of coding experience.
I found that if you have huge data set and you need to search it many times ternary search has an advantage.
If you think you can do better with a ternary search go for it.

Although you get the same big-O complexity (ln n) in both search trees, the difference is in the constants. You have to do more comparisons for a ternary search tree at each level. So the difference boils down to k/ln(k) for a k-ary search tree. This has a minimum value at e=2.7 and k=2 provides the optimal result.

What data structure will be best to store the order of explored nodes in an occupancy grid?

I have multiple robots, which explore an occupancy grid through some algorithm. I am trying to save the order of explored nodes. But I am not sure, which data structure can be used to save them efficiently.
I first thought of an tree, but the order can be repeatable like 1, 2, 5, 1. So, I feel, it may be too complex to store such an order in tree form. Then, I thought of an array, but it can be too much expensive in terms of memory for large grids.
I am a bit confused now. What data structure would be better(suppose grid is of 10,000 nodes). But the point is the order of explored nodes will be greater than 10,000 in this case as there will be overlap.
Thanks!

A tree makes little sense here with a need to preserve insertion order and the need to allow duplicates. Basically, as I understand it, we want to store the path in which the robot has traveled in the tightest form we can.
A compact, contiguous kind of sequence ends up making the most sense here (array, e.g.). It's cheaper than any linked structure (tree included) since there are no links to store.
There's little we can do to compact memory usage any further.
However, an unrolled list might be helpful here. Since it's not one giant contiguous block and instead a series of smaller blocks (ex: 4 kilobytes each) linked together, you can start, say, off-loading blocks at the front of the list to disk if you want to reduce memory use. The link overhead is trivial since we're only storing a link every N elements, where N could be some large number.

Why B-Tree for file systems?

I know this is a common question and I saw a few threads in Stack Overflow but still couldn't get it.
Here is an accepted answer from Stack overflow:
" Disk seeks are expensive. B-Tree structure is designed specifically to
avoid disk seeks as much as possible. Therefore B-Tree packs much more
keys/pointers into a single node than a binary tree. This property
makes the tree very flat. Usually most B-Trees are only 3 or 4 levels
deep and the root node can be easily cached. This requires only 2-3
seeks to find anything in the tree. Leaves are also "packed" this way,
so iterating a tree (e.g. full scan or range scan) is very efficient,
because you read hundreds/thousands data-rows per single block (seek).
In binary tree of the same capacity, you'd have several tens of levels
and sequential visiting every single value would require at least one
seek. "
I understand that B-Tree has more nodes (Order) than a BST. So it's definitely flat and shallow than a BST.
But these nodes are again stored as linked lists right?
I don't understand when they say that the keys are read as a block thereby minimising the no of I/Os.
Isn't the same argument hold good for BSTs too? Except that the links will be downwards?
Please someone explain it to me?

I understand that B-Tree has more nodes (Order) than a BST. So it's definitely flat and shallow than a BST. I don't understand when they say that the keys are read as a block thereby minimising the no of I/Os.
Isn't the same argument hold good for BSTs too? Except that the links will be downwards?
Basically, the idea behind using a B+tree in file systems is to reduce the number of disk reads. Imagine that all the blocks in a drive are stored as a sequentially allocated array. In order to search for a specific block you would have to do a linear scan and it would take O(n) every time to find a block. Right?
Now, imagine that you got smart and decided to use a BST, great! You would store all your blocks in a BST an that would take roughly O(log(n)) to find a block. Remember that every branch is a disk access, which is highly expensive!
But, we can do better! The problem now is that a BST is really "tall". Because every node only has a fanout (number of children) factor of 2, if we had to store N objects, our tree would be in the order of log(N) tall. So we would have to perform at most log(N) access to find our leaves.
The idea behind the B+tree structure is to increase the fanout factor (number of children), reducing the height of tree and, thus, reducing the number of disk access that we have to make in order to find a leave. Remember that every branch is a disk access. For instance, if you pack X keys in a node of a B+tree every node will point to at most X+1 children.
Also, remember that a B+tree is structured in a way that only the leaves store the actual data. That way, you can pack more keys in the internal nodes in order to fill up one disk block, that, for instance, stores one node of a B+tree. The more keys you pack in a node the more children it will point to and the shorter your tree will be, thus reducing the number of disk access in order to find one leave.
But these nodes are again stored as linked lists right?
Also, in a B+tree structure, sometimes the leaves are stored in a linked list fashion. Remember that only the leaves store the actual data. That way, with the linked list idea, when you have to perform a sequential access after finding one block you would do it faster than having to traverse the tree again in order to find the next block, right? The problem is that you still have to find the first block! And for that, the B+tree is way better than the linked list.
Imagine that if all the accesses were sequential and started in the first block of the disk, an array would be better than the linked list, because in a linked list you still have to deal with the pointers.
But, the majority of disk accesses, according to Tanenbaum, are not sequential and are accesses to files of small sizes (like 4KB or less). Imagine the time it would take if you had to traverse a linked list every time to access one block of 4KB...
This article explains it way better than me and uses pictures as well:
https://loveforprogramming.quora.com/Memory-locality-the-magic-of-B-Trees

A B-tree node is essentially an array, of pairs {key, link}, of a fixed size which is read in one chunk, typically some number of disk blocks. The links are all downwards. At the bottom layer the links point to the associated records (assuming a B+-tree, as in any practical implementation).
I don't know where you got the linked list idea from.

Each node in a B-tree implemented in disk storage consists of a disk block (normally a handful of kilobytes) full of keys and "pointers" that are accessed as an array and not - as you said - a linked list. The block size is normally file-system dependent and chosen to use the file system's read and write operations efficiently. The pointers are not normal memory pointers, but rather disk addresses, again chosen to be easily used by the supporting file system.

The main reason for B-tree is how it behaves on changes. If you have permanent structure, BST is OK, but in that case Hash function is even better. In case of file systems, you want a structure which changes as a whole as little as possible on inserts or deletes, and where you can perform find operation with as little reads as possible - these properties have B-trees.

How to calculate that a B+ tree is O(log(n)) for lookups

I'm studying B+trees for indexing and I try to understand more than just memorizing the structure. As far as I understand the inner nodes of a B+tree forms an index on the leaves and the leaves contains pointers to where the data is stored on disk. Correct? Then how are lookups made? If a B+tree is so much better than a binary tree, why don't we use B+trees instead of binary trees everywhere?
I read the wikipedia article on B+ trees and I understand the structure but not how an actual lookup is performed. Could you guide me perhaps with some link to reading material?
What are some other uses of B+ trees besides database indexing?

I'm studying B+trees for indexing and I try to understand more than just memorizing the structure. As far as I understand the inner nodes of a B+tree forms an index on the leaves and the leaves contains pointers to where the data is stored on disk. Correct?
No, the index is formed by the inner nodes (non-leaves). Depending on the implementation the leaves may contain either key/value pairs or key/pointer to value pairs. For example, a database index uses the latter, unless it is an IOT (Index Organized Table) in which case the values are inlined in the leaves. This depends mainly on whether the value is insanely large wrt the key.
Then how are lookups made?
In the general case where the root node is not a leaf (it does happen, at first), the root node contains a sorted array of N keys and N+1 pointers. You binary search for the two keys S0 and S1 such that S0 <= K < S1 (where K is what you are looking for) and this gives you the pointer to the next node.
You repeat the process until you (finally) hit a leaf node, which contains a sorted list of key-values pairs and make a last binary search pass on those.
If a B+tree is so much better than a binary tree, why don't we use B+trees instead of binary trees everywhere?
Binary trees are simpler to implement. One though cookie with B+Trees is to size the number of keys/pointers in inner nodes and the number of key/values pairs in leaves nodes. Another though cookie is to decide on the low and high watermark that leads to grouping two nodes or exploding one.
Binary trees also offer memory stability: an element inserted is not moved, at all, in memory. On the other hand, inserting an element in a B+Tree or removing one is likely to lead to elements shuffling
B+Trees are tailored for small keys/large values cases. They also require that keys can be duplicated (hopefully cheaply).
Could you guide me perhaps with some link to reading material?
I hope the rough algorithm I explained helped out, otherwise feel free to ask in the comments.
What are some other uses of B+ trees besides database indexing?
In the same vein: file-system indexing also benefits.
The idea is always the same: a B+Tree is really great with small keys/large values and caching. The idea is to have all the keys (inner nodes) in your fast memory (CPU Cache >> RAM >> Disk), and the B+Tree achieves that for large collections by pushing keys to the bottom. With all inner nodes in the fast memory, you only have one slow memory access at each search (to fetch the value).

B+ trees are better than binary tree all the dbms use them,
a lookup in B+Tree is LOGF N being F the base of LOG and the fan out. The lookup is performed exactly like in a binary tree but with a bigger fan out and lower height thats why it is way better.
B+Tree are usually known for having the data in the leaf(if they are unclustered probably not), this means you dont have to make another jump to the disk to get the data, you just take it from the leaf.
B+Tree is used almost everywhere, Operating Systems use them, datawarehouse (not so much here but still), lots of applications.
B+Tree are perfect for range queries, and are used whenever you have unique values, like a primary key, or any field with low cardinality.
If you can get this book http://www.amazon.com/Database-Management-Systems-Raghu-Ramakrishnan/dp/0072465638 its one of the best. Its basically the bible for any database guy.

BTree- predetermined size?

I read this on wikipedia:
In B-trees, internal (non-leaf) nodes can have a variable number of
child nodes within some pre-defined range. When data is inserted or
removed from a node, its number of child nodes changes. In order to
maintain the pre-defined range, internal nodes may be joined or split.
Because a range of child nodes is permitted, B-trees do not need
re-balancing as frequently as other self-balancing search trees, but
may waste some space, since nodes are not entirely full.
We have to specify this range for B trees. Even when I looked up CLRS (Intro to Algorithms), it seemed to make to use of arrays for keys and children. My question is- is there any way to reduce this wastage in space by defining the keys and children as lists instead of predetermined arrays? Is this too much of a hassle?
Also, for the life of me I'm not able to get a decent psedocode on btreeDeleteNode. Any help here is appreciated too.

When you say "lists", do you mean linked lists?
An array of some kind of element takes up one element's worth of memory per slot, whether that slot is filled or not. A linked list only takes up memory for elements it actually contains, but for each one, it takes up one element's worth of memory, plus the size of one pointer (two if it's a doubly-linked list, unless you can use the xor trick to overlap them).
If you are storing pointers, and using a singly-linked list, then each list link is twice the size of each array slot. That means that unless the list is less than half full, a linked list will use more memory, not less.
If you're using a language whose runtime has per-object overhead (like Java, and like C unless you are handling memory allocation yourself), then you will also have to pay for that overhead on each list link, but only once on an array, and the ratio is even worse.
I would suggest that your balancing algorithm should keep tree nodes at least half full. If you split a node when it is full, you will create two half-full nodes. You then need to merge adjacent nodes when they are less than half full. You can then use an array, safe in the knowledge that it is more efficient than a linked list.
No idea about the details of deletion, sorry!

B-Tree node has an important characteristic, all keys in the node is sorted. When finding a specific key, binary search is used to find the right position. Using binary search keeps the complexity of search algorithm in B-Tree O(logn).
If you replace the preallocated array with some kind of linked list, you lost the ordering. Unless you use some complex data structures, like skip list, to keep the search algorithm with O(logn). But it's totally unnecessary, skip list itself is better.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio