Why Double Hashing has bad cache performance? - caching

The Wikipedia article on Open Addressing says:
linear probing has the best cache performance but is most sensitive to clustering, while double hashing has poor cache performance but exhibits virtually no clustering; quadratic probing falls in-between in both areas
From what I understand, chaining isn't cache friendly and its due to the fact that chaining uses linked lists, which themselves aren't good with caches. Linear Probing on the other hand uses closed hashing and works better with caches.
But why does double hashing have bad cache performance, there are no linked lists involved and its still closed hashing ?
My understanding of the topic is limited, what am I overlooking ?

Related

Performance comparison between in-place vs out-of-pace matrix transposition

Wikipedia has an extensive summary of methods for in-place matrix transposition.
These methods look hard to implement, before committing to implementing them I'm wondering if there are any benchmarks or other evidence that would say these techniques outperform out-of-place matrix transposition in terms of walltime (on CPU/GPU/ any architecture)?
Do they all run slower compared to using an out-of-place transpose where the data is copied to a different destination?
In short, I don't think I have ever seen an in-place exchange being faster than out of place. Just think about "when did you last chose an in-place sort instead of the out of place version".
But the reason to use in-place is usually memory considerations or allocation considerations. In which case you are bound to use it.
For an out of place exchange, you also have to think about cache, cache and cache, subdivide your problem into smaller parts until both target and source can comfortably be in the cache at the same time.

T-Tree or B-Tree

T-tree algorithm is described in this paper
And T*-Tree is an improvement from T-tree for better use of query operations, including range queries and which contains all other good features of T-tree.
This algorithm is described in this paper "T*-tree: A Main Memory Database Index Structure for Real-Time Applications".
According to this research paper, T-Tree is faster than B-tree/B+tree when datasets fit in the memory.
I implemented T-Tree/T*Tree as they described in these papers and compared the performance with B-tree/B+tree, but B-tree/B+tree perform better than T-Tree/T*Tree in all test cases (insertion, deletion, searching).
I read that T-Tree is an efficient index structure for in-memory database, and it used by Oracle TimesTen. But my results did not show that.
If anyone may know the reason or have any comment about that, it will be great to hear from her (or him).
T-Trees are not a fundamental data structure in the same sense that AVL trees or B-trees are. They are just a hacked version of balanced binary trees and as such there may or may not be niche applications where they offer decent performance.
In this day and age they are bound to suffer horribly because of their poor locality, both in the sense of expected block/page transfer counts and in the sense of cache locality. The latter is evident since in all node accesses of a search except for the very last one, only the boundary values will be checked against the search key - all the rest is paged in or cached for nought.
Compare this to the excellent access locality of B-trees in general and B+trees in particular (not to mention cache-oblivious and cache-conscious versions that were designed explicitly with memory performance charactistics in mind).
Similar problems exist with the rebalancing. In the B-tree world many variations - starting with B+ and Blink - have been developed and perfected in order to achieve desired amortised performance characteristics, including aspects like concurrency (locking/latching) or the absence thereof. So most of the time you can simply go out and find a B-tree variation that fits your performance profile - or use the simple classic B+tree and be certain of decent results.
T-trees are more complicated than comparable B-trees and it seems that they have nothing to offer in the way of performance in general, given that the times of commodity hardware with a single-level memory 'hierarchy' have been gone for decades. Not only is the hard disk the new memory, the converse is also true and main memory is the new hard disk now. I.e. even without NUMA the cost of bringing data from main memory into the cache hierarchy is so high that it pays to minimise page transfers - which is precisely what B-trees and their variations do and the T-tree doesn't. Closer to the processor core it's the number of cache line accesses/transfers that matters but the picture remains the same.
In fact, if you take the idea of binary search - which is provably optimal - and think about ways of arranging the search keys in a manner that plays well with memory hierarchies (caches) then you invariably end up with something that looks uncannily like a B-tree...
If you program for performance then you'll find that winners are almost always located somewhere in the triangle between sorted arrays, B-trees and hashing. Even balanced binary trees are only competitive if their comparatively poor performance takes the back seat in the face of other considerations and key counts are fairly small, i.e. not more than a couple million.

Nearest Neighbour - Locality Sensitive Hashing Disadvantage

Locality sensitive hashing seems like a great technique for KNNs without any disadvantages. However, what would be a disadvantage of locality sensitive hashing if someone is using it in industry for practical applications? Under what situations will the LSH fail or do somewhat badly? Or does it take long time to code/tune?
This is a rather broad question, but since you are new here, I will attempt to answer.
LSH is not as perfect as you describe, of course, search for papers about it please. Maybe that question can help: How to understand Locality Sensitive Hashing?
There are many LSH libraries that provides automatic parameter configuration, but not for the most important one, R, used in solving a randomized
version of R-near neighbor. This is a major drawback, since the user has to
manually identify R at every input. That in my opinion, is a very important aspect you have to take into account, when it comes to practical applications.
About the performance, it all depends on your input! For, example in the kd-GeRaF project of mine, I had tested LSH thoroughly and I had seen that it may have some important issues when it comes to accuracy and search speed. The scope of the datasets where in a high dimensional space, where ANNS was performed.

Why do we use linear probing in hash tables when there is separate chaining linked with lists?

I recently learned about different methods to deal with collisions in hash tables and saw that the separate chaining with linked lists is always more time efficient than linear probing. For space efficiency, we allocate a predefined memory for linear probing which later on we might not use, but for separate chaining we use memory dynamically.
Is separate chaining with linked list more efficient than linear probing? If so, why do we then use linear probing at all?
I'm surprised that you saw chained hashing to be faster than linear probing - in practice, linear probing is typically significantly faster than chaining. This is primarily due to locality of reference, since the accesses performed in linear probing tend to be closer in memory than the accesses performed in chained hashing.
There are other wins in linear probing. For example, insertions into a linear probing hash table don't require any new allocations (unless you're rehashing the table), so in applications like network routers where memory is scarce, it's nice to know that once the table is set up, the elements can be placed into it with no risk of a malloc fail.
One weakness of linear probing is that, with a bad choice of hash function, primary clustering can cause the performance of the table to degrade significantly. While chained hashing can still suffer from bad hash functions, it's less sensitive to elements with nearby hash codes, which don't adversely impact the runtime. Theoretically, linear probing only gives expected O(1) lookups if the hash functions are 5-independent or if there's sufficient entropy in the keys. There are many ways to address this, since as using the Robin Hood hashing technique or hopscotch hashing, both of which have significantly better worst-cases than vanilla linear probing.
The other weakness of linear probing is that its performance significantly degrades as the load factor approaches 1. You can address this either by rehashing periodically or by using the Robin Hood hashing technique described above.
Hope this helps!
Linear probing is actually more memory efficient when the hash table is close to full.
Historically, one had very, very little memory, so every byte mattered (and there are still some cases where memory is very limited).
Why does it use less memory?
Consider what the tables look like: (separate chaining variations as per Wikipedia - there are other variations too, but they typically use more memory)
Linear Separate chaining #1 Separate chaining #2
probing List head in table Pointer in table
|------| |------|---| |---| |------|---|
|Object| |Object|Ptr| |Ptr| -> |Object|Ptr|
|------| |------|---| |---| |------|---|
|Object| |Object|Ptr| |Ptr| -> |Object|Ptr|
|------| |------|---| |---| |------|---|
| NULL | | NULL |Ptr| |Ptr|
|------| |------|---| |---|
. . .
. . .
. . .
(Ptr stands for "pointer" - any pointer not pointing to something can be considered NULL)
Separate chaining #1 clearly uses more memory than linear probing (always), as every element in the table is bigger by the size of the pointer.
Separate chaining #2 might have an advantage when there isn't much in the table, but when it gets full, it's going to have roughly an additional 2 pointers floating around for every element.
templatetypedef is probably right about linear probing typically being faster (he's rarely wrong), but it's typically taught that separate chaining is faster, and you see it in major API's (like Java implementations, for example), perhaps because of this believe, to avoid cases when linear probing is much slower (with a few well-selected values, you can quickly get to O(n) performance with linear probing while separate chaining would've still been O(1)), or perhaps for some other reason.

Is hash the best for application requesting high lookup speed?

I keep in mind that hash would be first thing I should resort to if I want to write an application which requests high lookup speed, and any other data structure wouldn't guarantee that.
But I got confused when saw some many post saying different, such as suffix tree, trie, to name a few.
So I wonder is hash always the best thing for high speed lookup? What if I want both high lookup speed and less space cost?
Is there any material (books or papers) lecturing about the data structures or algorithms **on high speed lookup and space efficiency? Any of this kind is highly appreciated.
So I wonder is hash always the best thing for high speed lookup?
No. As stated in comments:
There is never such a thing Best data structure for [some generic issue]. Everything is case dependent. Tries and radix trees might be great for strings, since you need to read the string anyway. arrays allows simplicity and great cache efficiency - and are usually the best for small scale static information
I once answered a related question of cases where a tree might be better then a hash table: Hash Table v/s Trees
What if I want both high lookup speed and less space cost?
The two might be self-contradicting. Even for the simple example of a hash table of size X vs a hash table of size 2*X. The bigger hash table is less likely to encounter collisions, and thus is expected to be faster then the smaller one.
Is there any material (books or papers) lecturing about the data
structures or algorithms on high speed lookup and space efficiency?
Introduction to Algorithms provide a good walk through on the main data structure used. Any algorithm developed is trying to provide a good space and time efficiency, but like said, there is a trade off, and some algorithms might be better for specific cases then others.
Choosing the right algorithm/data structure/design for the specific problem is what engineering is about, isn't it?
I assume you are talking about strings here, and the answer is "no", hashes are not the fastest or most space efficient way to look up strings, tries are. Of course, writing a hashing algorithm is much, much easier than writing a trie.
One thing you won't find in wikipedia or books about tries is that if you naively implement them with one node per letter, you end up with large numbers of inefficient, one-child nodes. To make a trie that really burns up the CPU you have to implement nodes so that they can have a variable number of characters. This, of course, is even harder than writing a plain trie.
I have written trie implementations that handle over a billion entries and I can tell you that if done properly it is insanely fast, nothing else compares.
One other issue with tries is that you have to write a custom heap, because if you just use some kind of generic memory management it will be slow. So in addition to implementing the trie, you have to implement the heap that the trie runs on. Pretty freakin complicated, but if you do it, you get batshit crazy speed.
Only a good implementation of hash will give you good performance. And you cannot compare hash with Trie for all situations. Situations where Trie is applicable, is fast, but it can be costly in terms of memory, (again dependent on implementation).
But have you measured performance? Or it is unnecessary optimization you are looking for. Did the map fail you?
That might also depend on the actual number of elements.
In complexity theory a hash is not bad, but complexity theory is only good if the actual number of elements is bigger than some threshold.
I.e. if you have only 2 elements, there is a faster method than a hash ;-)
Hash tables are a good general purpose structure but they can fail spectacularly if the hash function doesn't suit the input data. Worst case lookup is O(n). They also waste some space as you mentioned. Other general-purpose structures like balanced binary search trees have worse average case but better worst case performance than a hash table. This is important for real-time applications. A trie is a more special-purpose structure tailored to string lookup.

Resources