How are hash collisions resolved for unique keys? - data-structures

If I want to implement a hash table, and two elements collide, I understand that I can either rehash via open addressing or chain at the index via a linked list.
How does the same idea apply for a set() data structure? Is the linkedlist at the collision index being traversed to see if the key already exists before insertion which implies O(N) time complexity?

Related

Is QMap a hash table?

I used Qmap many times but perhaps never used QHash. Now I'm reading about hash tables.
Is QMap a hash table?
I presume down there in a QHash we will find the ideas of Hash Maps. Should I say QHash is the implementation of a hash map (or hash table) data structure? Is QMap also the implementation of a hash table?
Can I use the terms map and table interchangeably?
No QMap is not a hash table.
Per the documentation:
The QMap class is a template class that provides a red-black-tree-based dictionary.
In other words it is a binary sort tree that uses the red-black tree algorithm to maintain balance. Meaning that searches will take O(logN) rather than O(1) as in the case of QHash.
It also means that QMap will keep the data sorted.
From the documentation you quoted:
QMap and QHash provide very similar functionality. The differences
are:
QHash provides average faster lookups than QMap. (See Algorithmic
Complexity for details.)
When iterating over a QHash, the items are arbitrarily ordered. With QMap, the items are always sorted by key.
The key type of a QHash must provide operator==() and a global
qHash(Key) function. The key type of a QMap must provide operator<()
specifying a total order. Since Qt 5.8.1 it is also safe to use a
pointer type as key, even if the underlying operator<() does not
provide a total order.
QHash is a hash table. QMap is a binary sort tree using the red black tree algorithm.

How do you implement an ordered hashtable?

I am searching for how to implement an ordered hash table but not finding anything.
I would like to create a hash table which you can iterate over and that gives you the elements based on the order in which you defined or inserted the keys. How do you do this generally, in a performant way? How is this implemented? If a language must be chosen for an example, I am thinking about doing this in JavaScript. I know for example JavaScript objects ("hash maps") are ordered hash maps, but I have no idea how they are implemented. I would like to learn how to implement the same thing from scratch for a custom programming language.
The example is, say you are listing the native script version of a language name, like "עִברִית" for "Hebrew", as the key, and you want to create a map of the native language to the english language, but you want them to stay in the order defined. How do you implement that?
The general, performant solution to this problem is combining a linked list with a hash table. You will have a doubly linked list of Nodes, and each one is indexed via the hash table. You can look things up in either direction in constant time. Broadly, the operations are implemented as follows,
Insert - O(1)* - Insert a Node to the end of the linked list, and reference that Node via its key via the hash map.
Retrieve by key - O(1)* - Using the hash table, find the corresponding Node and return its value.
Delete by key - O(1)* - Using the hash table, find the corresponding Node and remove it by removing its neighbour's references to it.
Traverse - O(n) - Traverse the linked list. n in this case is the number of Nodes, not the full capacity of the hash table.
* The actual insert/retrieve/delete times are subject to the worst-case of the hash table's implementation. Typically this is O(n) worst case, O(1) average.
(An alternative to this is to store a "next key" in each Node instead of a direct pointer to the next node. The runtimes are the same, but traversal needs to involve the hash table whereas in the direct pointer implementation it can be bypassed.)
So if you want to implement this on your own, you'll need a hash table and some Nodes to use within it.

Binary Search-accessing the middle element drawback

I am studying from my course book on Data Structures by Seymour Lipschutz and I have come across a point I don’t fully understand..
Binary Search Algorithm assumes that one has direct access to middle element in the list. This means that the list must be stored in some typeof linear array.
I read this and also recognised that in Python you can have access to the middle element at all times. Then the book goes onto say:
Unfortunately, inserting an element in an array requires elements to be moved down the list, and deleting an element from an array requires element to be moved up the list.
How is this a Drawback ?
Won’t we still be able to access the middle element by dividing the length of array by 2?
In the case where the array will not be modified, the cost of insertion and deletion are not relevant.
However, if an array is to be used to maintain a sorted set of non-fixed items, then insertion and deletion costs are relevant. In this case, binary search can be used to find items (possibly for deletion) and/or find where new items should be inserted. The drawback is that insertion and deletion require movement of other elements.
Python's bisect module provides binary search functionality that can be used for locating insertion points for maintaining sorted order. The drawback mentioned applies.
In some cases, a binary search tree may be a preferable alternative to a sorted array for maintaining a sorted set of non-fixed items.
It seems that author compares array-like structures and linked list
The first (array, Python and Java list, C++ vector) allows fast and simple access to any element by index, but appending, inserting or deletion might cause memory redistribution.
For the second we cannot address i-th element directly, we need to traverse list from the beginning, but when we have element - we can insert or delete quickly.

Data Structure for Ascending Order Key Value Pairs with Further Insertion

I am implementing a table in which each entry consists of two integers. The entries must be ordered in ascending order by key (according to the first integer of each set). All elements will be added to the table as the program is running and must be put in the appropriate slot. Time complexity is of utmost importance and I will only use the insert, remove, and iterate functions.
Which Java data structure is ideal for this implementation?
I was thinking LinkedHashMap, as it maps keys to values (each entry in my table is two values). It also provides O(1) insert/remove functionality. However, it is not sorted. If entries can be efficiently inserted in appropriate order as they come in, this is not a bad idea as the data structure would be sorted. But I have not read or thought of an efficient way to do this. (Maybe like a comparator)?
TreeMap has a time complexity of log(n) for both add and remove. It maintains sorted order and has an iterator. But can we do better than than log(n)?
LinkedList has O(1) add/remove. I could insert with a loop, but this seems inefficient as well.
It seems like TreeMap is the way to go. But I am not sure.
Any thoughts on the ideal data structure for this program are much appreciated. If I have missed an obvious answer, please let me know.
(It can be a data structure with a Set interface, as there will not be duplicates.)
A key-value pair suggests for a Map. As you need key based ordering it narrows down to a SortedMap, in your case a TreeMap. As far as keeping sorting elements in a data structure, it can't get better than O(logn). Look no further.
The basic idea is that you need to insert the key at a proper place. For that your code needs to search for that "proper place". Now, for searching like that, you cannot perform better than a binary search, which is log(n), which is why I don't think you can perform an insert better than log(n).
Hence, again, a TreeMap would be that I would advise you to use.
Moreover, if the hash values, that you state, (specially because there are no duplicates) can be enumerated (as in integer number, serial numbers or so), you could try using statically allocated arrays for doing that. Then you might get a complexity of O(1) perhaps!

Chaining Hash Tables - Average number of table entries examined when adding to hash table

I know that in chained hashing, the average number of table entries
examined in a successful search is approximately:
1+(load factor/2)
Would it be the same formula for the number table entries examined when adding elements to the hash table? I'm thinking it would be. Just want to make sure I'm not thinking about this wrong.
Yes. "Insert" is effectively a lookup operation with an additional modification.
However, if your hashing scheme involves any kind of rebalancing or resizing operation, then there may be a higher amortized operation count for inserts than lookups.
No. If you're doing a successful search, then of the N elements linked from the hash bucket, you'll on average visit half of them before finding the element you want. When adding elements that aren't already in the table, but the hashtable insert function isn't allowed to assume they're not there, you have to compare against all N elements linked from the relevant bucket before you've confirmed the element isn't already there. Twice as many operations. (If the hash table implementation provides an insert_known_new(N) function it can just append to the linked list at that hash bucket without any key comparisons with existing elements, but I've never seen a hash table provide such a function - it would hand over control of the hash table's class invariants to the user, which breaks encapsulation, though in this case justifiably in my opinion.)

Resources