Using an array list with a hash table - algorithm

I'm attempting to build a simple hash table from scratch. The hash table I have currently uses an array of linked lists. The hashing function takes the hash value of a key-pair objects modulo the size of the array for indexing. This is all well and good, but I'm wondering if I could dynamically expand my array by using an array-list once it starts to fill up (Tell me why this is not a good idea if you think so). Obviously the hash function would be compromised since we're finding indexes using the array length. What would be a good hash function to use that would allow my array of linked-lists to expand while not compromising the integrity of the hash function?

If I am understanding your question correctly, you will have to re-hash all elements after expanding the bucket array. It can be done by iterating over the contents of the old hash table, and inserting them into the newly expanded hash table.

Related

Inserting an element to a full hash table with a constant number of buckets

I am studying hash table at the moment, and got a question about its implementation with a fixed size of buckets.
Suppose we have a hash table with 23 elements(for example). Let's use the simplest hash function (hash_value = key%table_size) and the keys being integers only. If we say that one bucket can have at most only 1 element(no separate chaining), does it mean that when all buckets are full we will no longer be able to insert any element in the table at all? Or will we have to actually replace element that has the same hash value with a new element?
I do understand that I am putting a lot of constrains , and the real implementation might never look like that,but I want to be sure I understand that particular case.
A real implementation usually allows for a hash table to be able to resize, but this usually takes a long time and is undesired. Considering a fixed-size hash table, it would probably return an error code or throw an exception for the user to treat that error or not.
Or will we have to actually replace element that has the same hash value with a new element?
In Java's HashMap if you add a key that equals to another already present in the hash table only the value associated with that key will be replaced by the new one, but never if two keys hash to the same hash.
Yes. An "open" hash table - which you are describing - has a fixed size, so it can fill up.
However implementations will usually respond by copying all contents into a new, bigger table. In fact, normally they won't wait to fill entirely, but use some criterion - for example a fraction of all space used (sometimes called the "load factor") - to decide when it's time to expand.
Some implementations will also "shrink" themselves to a smaller table if the load factor becomes too small due to deletions.
You'd probably find reading Google's hash table implementation, which includes some documentation of its internals, to be a good learning experience.

hashing vs hash function, don't know the difference

For example, "Consistent hashing" and "Perfect hash function", in wikipedia, I click "hashing" and the link direct to "hash function", so it seems that they have the same meaning, but why does another exist? And is there any difference when using "hashing" or "hash function"? And is it ok to call "consistent hashing" as "consistent hash function"? Thanks!
A hash function takes some input data (typically a bunch of binary bytes, but could be anything - whatever you make it to) and calculates a hash value, which is typically an integer number (but, again, can be anything). The process of doing this is called hashing.
The hash value is always the same size, no matter what the input looks like. Well, I suppose you cold make a hash function that has a variable-size output, but I haven't seen one in the wild yet. It wouldn't be very practical. Thus, by its very nature, hashing is usually a one-way calculation. You can't normally get the original data back from the hash value, because there are many more possible input data combinations than there are possible hash values.
The main advantages are:
The hash value is always the same size
The same input will always generate the same output.
If it's a good hash function, different inputs will usually generate different outputs, but it's still possible that two different inputs generate the same output (this is called a hash collision).
If you have a cryptographical hash function you also get one more advantage:
From having only the hash value, it's impossible (unfeasible) to come up with input data that would hash to this value. Never mind that it's not the original input data, any kind of input data that would hash to the given output value is impossible to find in a useful timeframe.
The results of a hash function can be used in various ways. As mentioned in other answers, hash tables are one common use-case. Verifying data integrity is another case - for example, you download a file, then hash it, then check the hash value against the value that was specified in the webpage where you downloaded the file from. If they don't match, the file was not downloaded correctly. If you combine hash values with public-key cryptography you can get digital signatures. And I'm sure there are other uses to which the principle can be put.
you can write a hash function and what it does is to hash keys to bins.
In other words the hash function is doing the hashing.
I hope that clarifies it.
HashTable is a data Structure in which a given value is mapped with a particular key for faster access of elements. - Process of populating this data structure is known as hashing.
To do hashing , you need a function which will provide logic for mapping values to keys. This function is hash function
I hope this clarifies your doubt.

Hash Table vs Dictionary

As far as I know hash table uses has key to store any item whereas dictionary uses simple key value pair to store item.it means that dictionary is a lot faster than hash table (Which I think. Please correct me if I am wrong).
Does this mean I should never use hash table?
The answer is "it depends".
A Dictionary is merely a way to map a key to a value. You can either use a library or implement one yourself.
A Hash table is a specific way to implement a dictionary where the key based upon a hash function. This function is usually based on modulo arithmetic. This means that two distinct value may end up with the hash key and therefore there will be a collision between the keys. It is then up to you (or whoever implements the hash table) to the determine how to resolve the collision. You could chain the value at the same key, re-hash and use a sub-hash table, or you may even want to start over with a new hash function (which would be expensive).
Depending on the underlying implementation of the dictionary (hash table) will affect your lookup performance.

How does hash table get(key) works when multiple keys are stored with linked nodes?

I am aware how hash table works. But I am not sure of the possible implementation of get(key) when multiple values are stored at the same place with the help of linked list.
For example:
set(1,'Val1') get stored at index 7
set(2,'Val2') also get stored at index 7. (Internal implementation create a linked list and store pointer at index 7. That's understandable).
But I am thinking if now I call get(2). How does Hash Table knows which Value to return. Because my hash function will resolve this to index 7. But at index 7 there are 2 values.
One possible way is to store at the linked node, both value and key.
Is there any other different implementation possible?
Go through the linked list and do a linear search for the key '2'. The properties of the hash function and the hash table size should guarantee that these lists' length is O(1) on average.
I think you misunderstood the fact that hash tables has to store their keys. The hash function is only for speeding up insertion/lookup.

How to implement a dynamic-size hash table?

I know the basic principle of the hash table data structure. If I have a hash table of size N, I have to distribute my data into these N buckets as evenly as possible.
But in reality, most languages have their built-in hash table types. When I use them, I don't need to know the size of hash table beforehand. I just put anything I want into it. For example, in Ruby:
h = {}
10000000.times{ |i| h[i]=rand(10000) }
How can it do this?
See the Dynamic resizing section of the Hash table article on Wikipedia.
The usual approach is to use the same logic as a dynamic array: have some number of buckets and when there is too much items in the hash table, create a new hash table with a larger size and move all the items to the new hash table.
Also, depending on the type of hash table, this resizing might not be necessary for correctness (i.e. it would still work even without resizing), but it is certainly necessary for performance.

Resources