Which is better to handle collisions, Open addressing or Chaining? - data-structures

imagine that we have the set S containing n keys. we want to use the uniform hashing function in a table T[0,...,m-1] and m=n+2.
is it better to handle the collisions using open addressing or chaining?
I would appreciate if anyone can explain for me. thanks

Related

Ruby efficienct queries

So far in my new workplace I've been dealing with querying databases and finding out the most efficient ways of getting the desired data.
I've found out about using pluck and getting the desired attributes instead of loading the whole result in the memory and other tricks, such as using inject (reduce), map, reject and such, which made my life a whole lot easier.
However, I haven't exactly found any theoretical explanation why inject/map/reject should be used in order to gain higher efficiency, only some sort of empiric conclusions from my own attempts. Like, why should I use map instead of iterating over an array with an "each".
Could someone please shed some light?

Millions of searches in unordered_map, runtime hogger

I have around 5000 strings of size (length in range from 50-80 mostly). Currently i create an unordered map and push these keys and during execution i access (using map'
s find function) them 10-100 million times. I did some profiling around this search, seems to be the runtime hogger.
I searched for other better and faster search options, but somehow did not find anything substantial.
Do anyone have idea about, how to make it faster, open for custom made container also. I did try std::map, but did not help. Do share link if anyone have.
Also one more point to add, i also modify values against some keys at runtime also, but not that many times. Mostly it's search.
Having considered a similar question to yours C++ ~ 1M look-ups in unordered_map with string key works much slower than .NET code, I would guess you have run into the issue caused by hash function used by std::unordered_map. For strings with length of 50-80 that could lead to a lot of collisions and this would significantly degrade look-up performance.
I would suggest you to use some custom hash function for the std::unordered_map. Or you could give A fast, memory efficient hash map for C++ a try.

Is Universal family of hash functions only to prevent enemy attack?

If my intention is only to have a good hash function that spreads data evenly into all of the buckets, then I need not come up with a family of hash functions, I could just do with one good hash function, is that correct?
The purpose of having a family of hash functions is only to make it harder for the enemy to build a pathological data set as when we pick a hash function randomly, he/she has no information about which hash function is employed. Is my understanding right?
EDIT:
Since someone is trying to close as unclear; This question is to know the real purpose of employing a Universal family of hash functions.
I could just do with one good hash function, is that correct?
As you note later in your question, an "enemy" who knows which hash function you're using could prepare a pathological data set.
Further, hashing is just the first stage in storing data into your table's buckets - if you're implementing open addressing / closed hashing, you also need to select alternative buckets to probe after collisions: simple approaches like linear and quadratic probing generally provide adequate collision avoidance, and are likely mathematically simpler and therefore faster than rehashing, but they don't maintain a probability of the next probe finding an unused bucket at the load factor. Rehashing with another good hash function (including another from a family of such functions) does, so if that's important to you you may prefer to use a family of hash functions.
Note too that sometimes an in-memory hash table is used to say at which offsets/sectors on disk data is stored, so extra rehashing calculations with already-in-memory data may be far more appealing than a higher probability (with linear/quadratic probing) of waiting on disk I/O only to find another collision.

Fast and (practically) collision free hash

I have an object which calculates a (long) path. Two objects are equal if the calculates the same path. I previously tested if two objects were equal by just doing something like:
obj1.calculatePath() == obj2.calculatePath()
However, now this has become a performance bottleneck. I tried storing the path inside the object but since I have a lot of objects this became a memory issue instead.
I have estimated that a 64 bits hash should be enough to avoid collisions - assuming the hash is good (bijective).
So, since the usual fast hashes (Murmur etc.) do have collisions I would like to avoid them since it sounds like a headache when you can just use a hash like SHA-2. (it's much nicer if I can just trust the hash instead of doing additional checks in case the hashes of two objects match)
However, SHA is also "slow" compared to older hash functions (like the MD family) I wonder is it would be better to use something like MD5 or maybe even MD4.
So my question is: Assuming there are no evil hacker with a motive of creating collisions with specially crafted input - but only benign (random) inputs. Which hash function should I choose for a performance critical part of my code where I would like to avoid the added complexity of using an "insecure" hash like Murmur.
It's difficult to help without more information.
As it stands all anyone can recommend is a generic hash-function.
There's an element of give a few a go!
FNV-1a (http://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function)
Is usually a not-too-shabby starting point.
It's (a) easy to implement, (b) not usually 'bad' , (c) is computationally cheap and so applicable to your 'long' path issue.
However what I want to know is:
What space are these paths in? Are the in (x,y,z,t) 'real' space-time (i.e. trajectories)? Are paths through some graph? Are they file paths? Something else?
It's difficult to say more without more context.

ADT key-concepts for implementing a hash table

I need some advice for implementing a good hash table structure. I've researching something, but I would like some external opinions. Thanks!
Whatever hash function you choose you have to fullfil the following requirements:
provide a uniform distribution of hash values: a non-uniform distribution will increase the amount of collisions between mapped values.
good schema for collision resolution: it's almost impossible to avoid them, so you will have to implement some strategies such as "separate chaining" or "open addressing". A good starting point is http://task3.cc/44/hash-maps-with-linear-probing-and-separate-chaining/.

Resources