I am searching for how to implement an ordered hash table but not finding anything.
I would like to create a hash table which you can iterate over and that gives you the elements based on the order in which you defined or inserted the keys. How do you do this generally, in a performant way? How is this implemented? If a language must be chosen for an example, I am thinking about doing this in JavaScript. I know for example JavaScript objects ("hash maps") are ordered hash maps, but I have no idea how they are implemented. I would like to learn how to implement the same thing from scratch for a custom programming language.
The example is, say you are listing the native script version of a language name, like "עִברִית" for "Hebrew", as the key, and you want to create a map of the native language to the english language, but you want them to stay in the order defined. How do you implement that?
The general, performant solution to this problem is combining a linked list with a hash table. You will have a doubly linked list of Nodes, and each one is indexed via the hash table. You can look things up in either direction in constant time. Broadly, the operations are implemented as follows,
Insert - O(1)* - Insert a Node to the end of the linked list, and reference that Node via its key via the hash map.
Retrieve by key - O(1)* - Using the hash table, find the corresponding Node and return its value.
Delete by key - O(1)* - Using the hash table, find the corresponding Node and remove it by removing its neighbour's references to it.
Traverse - O(n) - Traverse the linked list. n in this case is the number of Nodes, not the full capacity of the hash table.
* The actual insert/retrieve/delete times are subject to the worst-case of the hash table's implementation. Typically this is O(n) worst case, O(1) average.
(An alternative to this is to store a "next key" in each Node instead of a direct pointer to the next node. The runtimes are the same, but traversal needs to involve the hash table whereas in the direct pointer implementation it can be bypassed.)
So if you want to implement this on your own, you'll need a hash table and some Nodes to use within it.
Related
You might have come across someplace where it is mentioned that it is faster to find elements in hashmap/dictionary/table than list/array. My question is WHY?
(inference so far I made: Why should it be faster, as far I see, in both data structure, it has to travel throughout till it reaches the required element)
Let’s reason by analogy. Suppose you want to find a specific shirt to put on in the morning. I assume that, in doing so, you don’t have to look at literally every item of clothing you have. Rather, you probably do something like checking a specific drawer in your dresser or a specific section of your closet and only look there. After all, you’re not (I hope) going to find your shirt in your sock drawer.
Hash tables are faster to search than lists because they employ a similar strategy - they organize data according to the principle that every item has a place it “should” be, then search for the item by just looking in that place. Contrast this with a list, where items are organized based on the order in which they were added and where there isn’t a a particular pattern as to why each item is where it is.
More specifically: one common way to implement a hash table is with a strategy called chained hashing. The idea goes something like this: we maintain an array of buckets. We then come up with a rule that assigns each object a bucket number. When we add something to the table, we determine which bucket number it should go to, then jump to that bucket and then put the item there. To search for an item, we determine the bucket number, then jump there and only look at the items in that bucket. Assuming that the strategy we use to distribute items ends up distributing the items more or less evenly across the buckets, this means that we won’t have to look at most of the items in the hash table when doing a search, which is why the hash table tends to be much faster to search than a list.
For more details on this, check out these lecture slides on hash tables, which fills in more of the details about how this is done.
Hope this helps!
To understand this you can think of how the elements are stored in these Data structures.
HashMap/Dictionary as you know it is a key-value data structure. To store the element, you first find the Hash value (A function which always gives a unique value to a key. For example, a simple hash function can be made by doing the modulo operation.). Then you basically put the value against this hashed key.
In List, you basically keep appending the element to the end. The order of the element insertion would matter in this data structure. The memory allocated to this data structure is not contiguous.
In Array, you can think of it as similar to List. But In this case, the memory allocated is contiguous in nature. So, if you know the value of the address for the first index, you can find the address of the nth element.
Now think of the retrieval of the element from these Data structures:
From HashMap/Dictionary: When you are searching for an element, the first thing that you would do is find the hash value for the key. Once you have that, you go to the map for the hashed value and obtain the value. In this approach, the amount of action performed is always constant. In Asymptotic notation, this can be called as O(1).
From List: You literally need to iterate through each element and check if the element is the one that you are looking for. In the worst case, your desired element might be present at the end of the list. So, the amount of action performed varies, and in the worst case, you might have to iterate the whole list. In Asymptotic notation, this can be called as O(n). where n is the number of elements in the list.
From array: To find the element in the array, what you need to know is the address value of the first element. For any other element, you can do the Math of how relative this element is present from the first index.
For example, Let's say the address value of the first element is 100. Each element takes 4 bytes of memory. The element that you are looking for is present at 3rd position. Then you know the address value for this element would be 108. Math used is
Addresses of first element + (position of element -1 )* memory used for each element.
That is 100 + (3 - 1)*4 = 108.
In this case also as you can observe the action performed is always constant to find an element. In Asymptotic notation, this can be called as O(1).
Now to compare, O(1) will always be faster than O(n). And hence retrieval of elements from HashMap/Dictionary or array would always be faster than List.
I hope this helps.
Recently, I was asked in an interview about the usage of data structure.
The question was: what will be the data structure that I will intend to use while creating an English Dictionary. The dictionary will contain number of words under each alphabet and each word will have 1 meaning. Also, how will I implement the data structures to update, search and select different words?
What do you suggest guys? And what is the reason for your suggestion?
A hash table would be the preferred data structure to implement a dictionary with update, search and selection capabilities.
A hash table is a data structure that can store key-value pairs. It is essentially an array containing all of the keys to search on. A hash function(h()) is used to compute an index into an array in which an element can be inserted or searched. So when insertion is required, the hash function is used to find the location where the element needs to be inserted.
Insertion under reasonable assumptions is O(1). Each time we insert data, it takes O(1) time to insert it (assuming the hash function is O(1)).
Looking up data is also similar. If we need to find a the meaning of the word x, we need to calculate h(x), this would tell us where x is located in the hash table. So we can look up words (hash values) in O(1) as well.
However, O(1) insertion and search do not always hold true. There is nothing which guarantees that the hash function won't produce the same output for two different inputs, consequently there would be a collision. In order to handle this scenario various strategies can be employed, namely separate chaining and open addressing. But the search/insertion would no longer be O(1).
I'm more confused with Hashmap or Hashtable concept, when people say Hashmap is faster over List. I'm clear with hashing concept, in which the value is stored in hash code for the given key.
But when I want to retrieve the data how it works,
For example, I'm storing n number of strings with n different keys in a HashMap.
If I want to retrieve a specific value associated with specific key, how it will return it in O(1) of time ? Because the hashed key will be compared with all other keys right ?
Lets go on a word journey, say you have a bunch weird m&m's with all the letters.
Now it's your job is to vend people m&m's in the letter color combo of their choosing.
You have some choices about how to organize your shop. ( This act of organization will be metaphorically our hash function. )
You can sort your M&M's into buckets by color or by letter or by both. The question follows, what provides you the fastest retrieval time of a specific request?
The answer is rather intuitive, being that the sorting providing the fewest different M&Ms in each bucket facilitates the most efficient queering.
Lets say someone asked if you had any green Q ; if all your M&M's are in a single bin or list or bucket or otherwise unstructured container the answer will be far from accessible in O(1) as compared to keeping an organized shop.
This analogy relies on the concept of Separate chaining where each hash-Key corresponds to a container of multiple elements.
Without this concept the idea of hashing is more generally to use keys from uniformly throughout an array such that the amortized performance is constant. Collisions can be resolved through a variety of method variations and the Wikipedia article will tell you all about it.
http://en.wikipedia.org/wiki/Hash_table
"If the set of key-value pairs is fixed and known ahead of time (so insertions and deletions are not allowed), one may reduce the average lookup cost by a careful choice of the hash function, bucket table size, and internal data structures. In particular, one may be able to devise a hash function that is collision-free, or even perfect "
I am implementing a table in which each entry consists of two integers. The entries must be ordered in ascending order by key (according to the first integer of each set). All elements will be added to the table as the program is running and must be put in the appropriate slot. Time complexity is of utmost importance and I will only use the insert, remove, and iterate functions.
Which Java data structure is ideal for this implementation?
I was thinking LinkedHashMap, as it maps keys to values (each entry in my table is two values). It also provides O(1) insert/remove functionality. However, it is not sorted. If entries can be efficiently inserted in appropriate order as they come in, this is not a bad idea as the data structure would be sorted. But I have not read or thought of an efficient way to do this. (Maybe like a comparator)?
TreeMap has a time complexity of log(n) for both add and remove. It maintains sorted order and has an iterator. But can we do better than than log(n)?
LinkedList has O(1) add/remove. I could insert with a loop, but this seems inefficient as well.
It seems like TreeMap is the way to go. But I am not sure.
Any thoughts on the ideal data structure for this program are much appreciated. If I have missed an obvious answer, please let me know.
(It can be a data structure with a Set interface, as there will not be duplicates.)
A key-value pair suggests for a Map. As you need key based ordering it narrows down to a SortedMap, in your case a TreeMap. As far as keeping sorting elements in a data structure, it can't get better than O(logn). Look no further.
The basic idea is that you need to insert the key at a proper place. For that your code needs to search for that "proper place". Now, for searching like that, you cannot perform better than a binary search, which is log(n), which is why I don't think you can perform an insert better than log(n).
Hence, again, a TreeMap would be that I would advise you to use.
Moreover, if the hash values, that you state, (specially because there are no duplicates) can be enumerated (as in integer number, serial numbers or so), you could try using statically allocated arrays for doing that. Then you might get a complexity of O(1) perhaps!
I know that in chained hashing, the average number of table entries
examined in a successful search is approximately:
1+(load factor/2)
Would it be the same formula for the number table entries examined when adding elements to the hash table? I'm thinking it would be. Just want to make sure I'm not thinking about this wrong.
Yes. "Insert" is effectively a lookup operation with an additional modification.
However, if your hashing scheme involves any kind of rebalancing or resizing operation, then there may be a higher amortized operation count for inserts than lookups.
No. If you're doing a successful search, then of the N elements linked from the hash bucket, you'll on average visit half of them before finding the element you want. When adding elements that aren't already in the table, but the hashtable insert function isn't allowed to assume they're not there, you have to compare against all N elements linked from the relevant bucket before you've confirmed the element isn't already there. Twice as many operations. (If the hash table implementation provides an insert_known_new(N) function it can just append to the linked list at that hash bucket without any key comparisons with existing elements, but I've never seen a hash table provide such a function - it would hand over control of the hash table's class invariants to the user, which breaks encapsulation, though in this case justifiably in my opinion.)