Hash Table vs Dictionary - data-structures

As far as I know hash table uses has key to store any item whereas dictionary uses simple key value pair to store item.it means that dictionary is a lot faster than hash table (Which I think. Please correct me if I am wrong).
Does this mean I should never use hash table?

The answer is "it depends".
A Dictionary is merely a way to map a key to a value. You can either use a library or implement one yourself.
A Hash table is a specific way to implement a dictionary where the key based upon a hash function. This function is usually based on modulo arithmetic. This means that two distinct value may end up with the hash key and therefore there will be a collision between the keys. It is then up to you (or whoever implements the hash table) to the determine how to resolve the collision. You could chain the value at the same key, re-hash and use a sub-hash table, or you may even want to start over with a new hash function (which would be expensive).
Depending on the underlying implementation of the dictionary (hash table) will affect your lookup performance.

Related

Inserting an element to a full hash table with a constant number of buckets

I am studying hash table at the moment, and got a question about its implementation with a fixed size of buckets.
Suppose we have a hash table with 23 elements(for example). Let's use the simplest hash function (hash_value = key%table_size) and the keys being integers only. If we say that one bucket can have at most only 1 element(no separate chaining), does it mean that when all buckets are full we will no longer be able to insert any element in the table at all? Or will we have to actually replace element that has the same hash value with a new element?
I do understand that I am putting a lot of constrains , and the real implementation might never look like that,but I want to be sure I understand that particular case.
A real implementation usually allows for a hash table to be able to resize, but this usually takes a long time and is undesired. Considering a fixed-size hash table, it would probably return an error code or throw an exception for the user to treat that error or not.
Or will we have to actually replace element that has the same hash value with a new element?
In Java's HashMap if you add a key that equals to another already present in the hash table only the value associated with that key will be replaced by the new one, but never if two keys hash to the same hash.
Yes. An "open" hash table - which you are describing - has a fixed size, so it can fill up.
However implementations will usually respond by copying all contents into a new, bigger table. In fact, normally they won't wait to fill entirely, but use some criterion - for example a fraction of all space used (sometimes called the "load factor") - to decide when it's time to expand.
Some implementations will also "shrink" themselves to a smaller table if the load factor becomes too small due to deletions.
You'd probably find reading Google's hash table implementation, which includes some documentation of its internals, to be a good learning experience.

hashing vs hash function, don't know the difference

For example, "Consistent hashing" and "Perfect hash function", in wikipedia, I click "hashing" and the link direct to "hash function", so it seems that they have the same meaning, but why does another exist? And is there any difference when using "hashing" or "hash function"? And is it ok to call "consistent hashing" as "consistent hash function"? Thanks!
A hash function takes some input data (typically a bunch of binary bytes, but could be anything - whatever you make it to) and calculates a hash value, which is typically an integer number (but, again, can be anything). The process of doing this is called hashing.
The hash value is always the same size, no matter what the input looks like. Well, I suppose you cold make a hash function that has a variable-size output, but I haven't seen one in the wild yet. It wouldn't be very practical. Thus, by its very nature, hashing is usually a one-way calculation. You can't normally get the original data back from the hash value, because there are many more possible input data combinations than there are possible hash values.
The main advantages are:
The hash value is always the same size
The same input will always generate the same output.
If it's a good hash function, different inputs will usually generate different outputs, but it's still possible that two different inputs generate the same output (this is called a hash collision).
If you have a cryptographical hash function you also get one more advantage:
From having only the hash value, it's impossible (unfeasible) to come up with input data that would hash to this value. Never mind that it's not the original input data, any kind of input data that would hash to the given output value is impossible to find in a useful timeframe.
The results of a hash function can be used in various ways. As mentioned in other answers, hash tables are one common use-case. Verifying data integrity is another case - for example, you download a file, then hash it, then check the hash value against the value that was specified in the webpage where you downloaded the file from. If they don't match, the file was not downloaded correctly. If you combine hash values with public-key cryptography you can get digital signatures. And I'm sure there are other uses to which the principle can be put.
you can write a hash function and what it does is to hash keys to bins.
In other words the hash function is doing the hashing.
I hope that clarifies it.
HashTable is a data Structure in which a given value is mapped with a particular key for faster access of elements. - Process of populating this data structure is known as hashing.
To do hashing , you need a function which will provide logic for mapping values to keys. This function is hash function
I hope this clarifies your doubt.

How to find the value for a given key if a hash function is chosen randomly from Universal family of hash functions?

I am taking a course on data structures in coursera and I read recently about Universal family of hash functions. If i choose a hash function randomly from a universal family of hash functions, How will i exactly remap it to look up for a value. If i have to remember the function chosen for each key, then i should maintain a list for it. And this evaluation of finding the correct hash function for a key itself will take linear time violating the constant time look up of hash tables. How should i proceed implementing it?
When making one hash map, you use one function from the family. When you rehash the entire map (typically because of lack of capacity or too many collisions) or create a separate map, you can then choose a different hashing function from the family. You wouldn't use two different functions to attempt to create the same hash map.

Using an array list with a hash table

I'm attempting to build a simple hash table from scratch. The hash table I have currently uses an array of linked lists. The hashing function takes the hash value of a key-pair objects modulo the size of the array for indexing. This is all well and good, but I'm wondering if I could dynamically expand my array by using an array-list once it starts to fill up (Tell me why this is not a good idea if you think so). Obviously the hash function would be compromised since we're finding indexes using the array length. What would be a good hash function to use that would allow my array of linked-lists to expand while not compromising the integrity of the hash function?
If I am understanding your question correctly, you will have to re-hash all elements after expanding the bucket array. It can be done by iterating over the contents of the old hash table, and inserting them into the newly expanded hash table.

Basics in Universal Hashing, how to ensure accessibility

to my current understanding Universal Hashing is a method whereby the hash function is chosen randomly at runtime in order to guarantee reasonable performance for any kind of input.
I understand we may do this in order to prevent manipulation by somebody choosing malicious input deliberately (a possibility of a deterministic hash function is know).
My Question is the following: Is it not true, that we still need to guarantee that a key will be mapped to the same address every time we hash it ? For instance if we want to retrieve information, but the hash function is chosen at random, how do we guarantee we can get back at our data ?
A universal hash function is a family of different hash functions that have the property that with high probability, two randomly-chosen elements from the universe will not collide no matter which hash function is chosen. Typically, this is implemented by having the implementation pick a random hash function from a family of hash functions to use inside the implementation. Once this hash function is chosen, the hash table works as usual - you use this hash function to compute a hash code for an object, then put the object into the appropriate location. The hash table has to remember the choice of the hash function it made and has to use it consistently throughout the program, since otherwise (as you've noted) it would forget where it mapped each element.
Hope this helps!

Resources